CGG2025v16n3

Cotton Genomics and Genetics 2025, Vol.16, No.3, 148-162 http://cropscipublisher.com/index.php/cgg 149 construction have been rapidly advanced, providing a data basis for the implementation of GS. In particular, the high-quality reference genome and pan-genome mapping of cotton completed in recent years have helped to discover key genes that control important traits such as yield and quality. On this basis, the rise of artificial intelligence technology has given breeding wings. Machine learning and deep learning methods can automatically extract complex patterns from massive multi-omics data for trait prediction and decision support. This makes it possible for crop breeding to shift from experience-driven to data-driven. For example, intelligent breeding systems that integrate genetic genotype, environment and phenotypic big data have emerged in some studies, which can accurately predict offspring traits, screen excellent genes and improve breeding efficiency in the early stages of breeding (Yan and Wang, 2022). It can be foreseen that the combination of artificial intelligence and genomic technology will lead the future "breeding 5.0" era and accelerate the cultivation of new crop varieties that meet future needs (Wu et al., 2024). This study focuses on "AI-assisted cotton genomic prediction breeding", sorts out the basic principles and application status of genomic selection in cotton breeding, as well as the practical application and research progress of AI algorithms such as machine learning and deep learning in cotton breeding, introduces the background status and technical needs of cotton breeding, and explains the concept and methodological basis of genomic prediction breeding, including genotype data acquisition, genetic variation analysis and prediction model establishment. It focuses on reviewing the research progress of artificial intelligence methods (such as random forests, support vector machines, neural networks, etc.) in the prediction of important cotton traits (yield, stress resistance and fiber quality). Through cases such as Australia's CSIRO breeding program, the US public breeding project and China's intelligent breeding practice, the actual application effect of AI in cotton breeding is analyzed, and suggestions for promoting cotton intelligent breeding are put forward. This study hopes to provide useful references for scientific researchers and breeders, and accelerate the cultivation of new high-yield, high-quality and stress-resistant cotton varieties. This is of great significance for ensuring the supply of textile raw materials, improving the competitiveness of the cotton industry and the sustainable development of agriculture. 2 Basic Principles of Genomic Prediction in Cotton 2.1 Concepts of genomic selection and phenotypic prediction Genomic selection is a breeding method that uses genome-wide molecular markers to predict the genetic potential of individuals. Unlike traditional breeding that relies on measured phenotypes, GS builds a prediction model by estimating marker effects in a training population, and directly predicts the genetic breeding value of candidate individuals that have not been phenotyped, thereby accelerating the selection process (Viana et al., 2016). The core of GS is to capture the genetic control information of quantitative traits using a large number of SNP markers across the genome. As long as the molecular marker coverage is dense enough, even if the effect of a single marker is small, the accumulation of thousands of markers can accurately predict complex traits. This strategy of "pre-selecting phenotypes with genomes" is regarded as a key step in modern crop breeding, which can improve selection accuracy, shorten generation cycles, and increase genetic gain. In cotton, GS is particularly suitable for breeding of typical quantitative traits such as yield, fiber quality, and stress resistance. It is reported that the prediction accuracy of GS for cotton fiber length and strength can reach a high level of 0.65-0.76, showing an effect superior to traditional phenotypic selection. Phenotypic prediction is the goal of GS, that is, to predict the phenotypic performance or breeding value of an individual through genotypic data. In addition to classic statistical methods such as GBLUP (genomic best linear unbiased prediction), machine learning algorithms have also been gradually applied to GS models in recent years to improve prediction accuracy (Billings et al., 2022). 2.2 Acquisition and quality control of genotypic data The premise for implementing genomic predictive breeding is high-quality genotypic data. The cotton genome is large (2n=4x=52) and highly repetitive, but the development of sequencing technology in recent years has made high-density typing possible. Commonly used genotype acquisition methods include SNP chips and resequencing. For example, the US CSIRO breeding project constructed a high-density chip containing 12 296 polymorphic SNP sites and genotyped 1 385 cotton materials. With the reduction of high-throughput sequencing costs, whole genome resequencing has become increasingly popular in cotton, and millions of marker variants can be detected

RkJQdWJsaXNoZXIy MjQ4ODYzNA==