Plant Gene and Trait 2024, Vol.15, No.1, 15-22 http://genbreedpublisher.com/index.php/pgt 16 Figure 1 Breeding strategies for improving disease resistance in crops (Adopted from Deng et al., 2020) 2 Principles and Methods of GWAS Research 2.1 The fundamentals of GWAS Genome-wide association study (GWAS) is an important method to reveal the genetic basis of complex traits. Its core principle is to search for genetic variation loci (SNPS, single nucleotide polymorphisms) associated with target traits by comparing genotype differences between individuals or populations with different phenotypic characteristics (Tam et al., 2019). GWAS typically involves the collection of large-scale genotypic and phenotypic data and the analysis of these data using statistical methods to determine associations. Through GWAS studies, researchers can discover candidate genes associated with a target trait and gain insight into the genetic mechanisms of that trait. The basic GWAS process consists of the following steps: Researchers select a representative population, including individuals exhibiting different traits, and then collect genomic and phenotypic data of these individuals, using high-throughput sequencing technology or gene chip technology to obtain genotype data of individuals, often involving millions of SNPS (Uffelmann et al., 2021). After data preparation is complete, the association between genotype and phenotype is analyzed using statistical methods to determine which genetic variants are associated with the target trait, and finally the results are validated and interpreted by the researchers, and further functional studies may be conducted to reveal the underlying biological mechanisms. 2.2 Statistical methods of association study Association study is a core step in genome-wide association study that is used to determine the association between genotype and phenotype (Figure 2), and when conducting association study, researchers typically use a variety of statistical methods to assess the relationship between genetic variation sites (such as SNPS) and target traits. Commonly used statistical methods include Chi-square test, linear regression, Logistic regression, mixed model, etc. (Zeng et al., 2015). Chi-square test is a commonly used non-parametric test method to compare whether the difference between the observed frequency and the expected frequency is significant. Linear regression is suitable for analyzing the association between continuous traits and SNPS, and evaluating the relationship between genotype and phenotype by fitting linear models (Zeng et al., 2015). Logistic regression is often used to analyze bivariate disaggregated data, such as the occurrence or absence of disease, to assess the effect of SNP genotypes on disease risk. The hybrid model combines fixed effects and random effects, and can take into account the influence of factors such as population structure and kinship, which improves the accuracy of association analysis. When conducting large-scale association analysis, multiple comparison correction is required to control the error rate.
RkJQdWJsaXNoZXIy MjQ4ODYzMg==