Cotton Genomics and Genetics 2025, Vol.16, No.3, 148-162 http://cropscipublisher.com/index.php/cgg 152 Zhang et al. (2025) recently conducted a breakthrough in genome-wide association and prediction analysis of resistance to Verticillium wilt using data from natural cotton populations from different test sites in Xinjiang that have been identified for many years. As a result, 10 disease resistance QTL loci that are stable in multiple environments were identified, and a genomic selection breeding model was established based on these loci, which was verified in the offspring population for its good predictive ability for disease resistance phenotypes. This case proves that the multi-environment GS model can identify robust favorable allele combinations and achieve effective improvement of traits in complex environments. The integration of environmental data also includes the quantification of factors such as climate and soil. For example, meteorological indicators at the test site can be obtained through remote sensing, or measured environmental parameters can be used as covariates to add to the model. Some scholars have also proposed using the multimodal capabilities of deep learning to simultaneously input environmental and genomic data into the neural network, allowing the model to autonomously learn the interaction between the two. At present, for crops such as cotton, a major bottleneck in the integration of multi-environment data is adaptability: the model performs well in one region, but the accuracy may decrease when it is transferred to another region, so a larger range of training data and more physically explanatory environmental representations are needed (Gapare et al., 2018). With the construction of cotton experimental station networks and big data platforms in various countries, more abundant multi-environment genotype-phenotype data will be available in the future, creating conditions for the development of robust cross-environment prediction models. Multi-environment genomic prediction is expected to improve the reliability of breeding selection and screen out new cotton varieties that are both high-yielding and widely adaptable, which is of practical significance for responding to climate change and heterogeneous environmental challenges. 4 Research Progress on Prediction of Major Cotton Traits 4.1 Prediction models for yield traits Increasing cotton yield has always been the primary goal of breeding, and genomic prediction provides a new way to accelerate the selection of high-yield varieties. Cotton yield traits include seed cotton yield and its components (such as boll weight, boll number per plant, lint percentage, etc.), which are controlled by quantitative genes and easily affected by the environment. Early gene mapping studies have identified many QTLs related to yield, but the effect of a single locus is limited (Sun et al., 2022). Genomic selection predicts yield performance by integrating whole genome information. The Australian CSIRO study was the first to verify the prediction effect of GS on yield in a large-scale cotton breeding population: Li et al. (2022) conducted genotyping and two-season field trials on cotton of 1,385 breeding lines and established multiple prediction models. The results showed that the Bayesian model combining genomic markers and pedigrees had a correlation coefficient of 0.64 for the prediction of lint yield, and could accurately distinguish high-yield materials in lines that had not been field tested. This suggests that GS can help eliminate low-yield genotypes in the early breeding generation and improve selection efficiency. In the US public breeding program, researchers have also evaluated the feasibility of GS for yield traits. Billings et al. (2022) pointed out that the accuracy of genomic prediction of cotton yield and related agronomic traits is currently slightly lower than that of fiber quality and other traits, but is comparable to traditional phenotypic selection. With the optimization of models and the improvement of phenotypic accuracy, there is room for further improvement. They suggested that GS could be implemented on quality traits first, and then gradually expanded to complex traits such as yield after accumulating experience. In addition to directly predicting yield, the combination of high-throughput phenotyping and GS is also a direction of progress. For example, the extraction of cotton field canopy characteristics through remote sensing images and the use of machine learning models to predict final yield have been successful at the regional scale (Dhaliwal et al., 2022). In China, some scholars have used whole genome association analysis to identify key loci that affect yield composition and used them for marker-assisted selection (MAS), but the application of GS is still in its infancy. With the mapping of genetic variation maps of my country's cotton core germplasm and the accumulation of breeding big data, the role of genomic prediction in high-yield cotton breeding will gradually emerge. It is worth mentioning that AI technology can also help analyze the complex mechanism of yield formation. For example, Zhao et al. (2023) integrated transcriptome and machine learning and discovered multiple major regulatory genes
RkJQdWJsaXNoZXIy MjQ4ODYzNA==