CMB_2024v14n2

Computational Molecular Biology 2024, Vol.14, No.2, 54-63 http://bioscipublisher.com/index.php/cmb 55 and limitations associated with GWPS, such as high dimensionality, multicollinearity, and genotype-environment interactions. Additionally, it highlights recent advancements and future directions in the field, including the integration of deep learning models and digital breeding technologies. 2 Overview of Genome-Wide Prediction Techniques 2.1 Genomic selection (GS) Genomic Selection (GS) has revolutionized the field of plant and animal breeding by enabling the rapid selection of superior genotypes and accelerating the breeding cycle. Unlike traditional marker-assisted selection, which focuses on identifying individual loci associated with traits, GS uses all marker data as predictors of performance, leading to more accurate predictions (Jannink et al., 2010; Crossa et al., 2017). This approach is particularly beneficial for complex traits controlled by many genes with small effects, which traditional methods struggle to address effectively (Meuwissen et al., 2016; Varshney et al., 2017). The integration of GS into breeding programs has shown tangible genetic gains, as evidenced by its application in maize breeding, where significant improvements have been observed (Crossa et al., 2017). The success of GS hinges on its ability to incorporate all marker information into the prediction model, thereby avoiding biased marker effect estimates and capturing more of the variation due to small-effect quantitative trait loci (QTL). This comprehensive approach allows for the prediction of breeding values of lines in a population by analyzing their phenotypes and high-density marker scores. The accuracy of these predictions has been demonstrated in both simulation and empirical studies, with correlations between true breeding value and genomic estimated breeding value reaching levels as high as 0.85 for polygenic low heritability traits (Varshney et al., 2017). This level of accuracy is sufficient to consider selecting for agronomic performance using marker information alone, substantially accelerating the breeding cycle and enhancing gains per unit time. 2.2 Genomic prediction models Genomic prediction models are central to the implementation of GS, as they estimate the effects of markers across the entire genome on the target population based on a prediction model developed in the training population. These models are designed to capture small QTL effects that are often ignored in traditional association analysis, thereby providing a more comprehensive understanding of the genetic architecture of complex traits (Desta and Ortiz, 2014). Various genomic prediction models have been proposed, each with its strengths and limitations. For instance, the Bayesian Lasso, weighted Bayesian shrinkage regression (wBSR), and random forest (RF) are among the models that have shown promise in terms of predictive accuracy and computational efficiency (Heslot et al., 2012). The choice of genomic prediction model can significantly impact the accuracy of predictions and the genetic gain from selection. Comparative studies have shown that while many models achieve similar levels of accuracy, they differ in their susceptibility to overfitting, computation time, and the distribution of marker effect estimates (Heslot et al., 2012). Additionally, the integration of multi-trait and multi-environment models, high-throughput phenotyping, and deep learning approaches can further enhance the accuracy and efficiency of genomic predictions (Merrick et al., 2022). These advancements highlight the importance of continuous research and optimization of genomic prediction models to maximize their potential in breeding programs. 2.3 Machine learning and artificial intelligence applications The application of machine learning (ML) and artificial intelligence (AI) in genomic prediction represents a significant advancement in the field of breeding. Machine learning methods, such as random forest and deep learning, have been shown to capture non-additive effects and improve the accuracy of genomic predictions. These methods can handle large datasets with complex interactions, making them well-suited for genomic prediction tasks. For example, random forest, a machine learning method, has been found to be effective in capturing non-additive effects, which are often missed by traditional linear models (Heslot et al., 2012). The integration of ML and AI into genomic prediction models offers several advantages, including the ability to analyze large and complex datasets, improve prediction accuracy, and reduce computation time. High-throughput

RkJQdWJsaXNoZXIy MjQ4ODYzNA==