BE_2025v15n5

Bioscience Evidence 2025, Vol.15, No.5, 219-227 http://bioscipublisher.com/index.php/be 223 some studies have also used machine learning methods to predict cold stress-related genes, improving the efficiency and accuracy of candidate gene screening (Tian et al., 2023). 5 Challenges and Limitations 5.1 Data complexity due to polyploidy and large genome size The cotton genome is large, especially the allotetraploid G. hirsutum and G. barbadense, with a size of approximately 2.5 Gb (Yang et al., 2022; Manivannan and Amal, 2023; Kumar et al., 2024; Sheri et al., 2025). They have many repetitive genes and homologous genes, which makes assembly, annotation and functional research all more difficult. Polyploidy also makes the regulation of gene expression more complex and increases the uncertainty in variant detection and gene editing. In addition, the proportion of repetitive sequences and transpositic elements in the genome is very high, which also makes data analysis more difficult. 5.2 Need for cotton-specific annotation resources Although there are databases such as CottonGen and CottonFGD, the functional annotation, phenotypic association and multi-omics integration resources for cotton are not yet complete (Ashraf et al., 2018; Li et al., 2021). The genomic versions published by different research groups vary in chromosome length and gene annotation, which can affect the accuracy of data alignment and analysis (Ashraf et al., 2018). Therefore, higher-quality and standardized annotation resources are needed, and a multi-species integration platform should also be established to better support gene mining and molecular breeding. 5.3 Limitations in computational power and reproducibility Cotton genomics research requires handling large-scale multi-omics data, including genomic, transcriptomic, epigenomic and variomic data, etc. This poses very high requirements for computing power and algorithm efficiency (Yang et al., 2022; Manivannan and Amal, 2023). The current process often encounters problems such as insufficient memory, tight storage and slow speed when dealing with extremely large data. Furthermore, some processes and databases lack unified standards and version management, and the results are not easy to reproduce and compare (Ashraf et al., 2018; Yang et al., 2022). Therefore, it is necessary to establish more efficient, scalable and standardized processes to enhance research efficiency and reliability. 5.4 Data sharing and FAIR principles in cotton genomics With the rapid increase of multi-omics data, data sharing and repeatability have become the focus of international attention. The application of the FAIR principle (searchable, accessible, interoperable, reusable) in cotton genomics is not sufficient (Ashraf et al., 2018; Li et al., 2021). Some data and results were not made public in a timely manner, or there was a lack of a unified metadata standard, which all affected the circulation and reuse of data. Promoting data standardization, open sharing and cross-platform interoperability is the key to enhancing international cooperation and research innovation. 6 Future Directions 6.1 Advances in long-read sequencing and pan-genomics for cotton The development of long-read sequencing (LRS) has greatly promoted the research of cotton genomics. LRS can generate very long sequence reads, making gene assembly more continuous and accurate, especially suitable for plant genomes with a large number of repetitive sequences. With the improvement of sequencing accuracy and throughput, LRS has become an important method for detecting structural variations, whole-genome assembly and pan-genome research. It provides new ideas for us to understand the germplasm diversity and functional gene variations of cotton (Amarasinghe et al., 2020; Logsdon et al., 2020; Amarasinghe et al., 2021; De Coster et al., 2021; Van Dijk et al., 2023). Pan-genomics, combined with multiple genomic data, can comprehensively demonstrate the genetic diversity within cotton species and lay the foundation for molecular breeding and functional gene research (Mascher et al., 2021). 6.2 Cloud-based and user-friendly bioinformatics platforms With the rapid growth of multi-omics data, cloud computing and one-stop analysis platforms are becoming increasingly important. Cloud platforms such as Majorbio Cloud and Galaxy integrate multi-omics analysis processes, visualization tools and online learning modules, lowering the threshold of data analysis and facilitating

RkJQdWJsaXNoZXIy MjQ4ODYzNA==