BM_2024v15n1

Bioscience Method 2024, Vol.15, No.1, 8-19 http://bioscipublisher.com/index.php/bm 10 Acquisition of genetic data is usually accomplished through high-throughput genotype sequencing technology. The process involves extracting DNA from each study subject and then analyzing it using gene chips or next-generation sequencing (NGS) technology. Gene chip is a cost-effective method that can detect millions of known SNP sites simultaneously (Abdelraheem et al., 2021). Next-generation sequencing technology allows researchers to not only detect known SNP sites but also discover new genetic variants, although this method is more expensive. The acquired genetic data are then processed and quality controlled through bioinformatics methods to ensure data accuracy and usability. The collection of phenotypic data involves the precise measurement and recording of individual traits. This process requires the use of standardized methods to evaluate and record the performance of each individual on the studied traits while controlling environmental variables. For agricultural crops, phenotypic data can include traits such as yield, maturity, and disease resistance; while in human genetics research, it may include disease status, biochemical indicators, or other health indicators. The quality of phenotypic data directly affects the accuracy of GWAS analysis, so data reliability must be ensured through precise measurements and sufficient replication. During the data collection process, the representativeness and diversity of the sample also need to be taken into consideration. Selecting samples with sufficient numbers and genetic background diversity can help enhance the discovery power of GWAS, which is especially important when looking for rare variants or genes with small effects. In addition, collecting detailed environmental and lifestyle data may also be critical for some studies, as these factors may interact with genetic factors to influence trait performance. The collection and processing of genetic and phenotypic data required to conduct GWAS is a complex but critical process. High-quality data acquisition methods, including advanced sequencing technology, precise phenotypic measurements, and meticulous data processing and analysis, are the foundation for ensuring the success of GWAS and realizing its application potential in genetic research. 1.3 Statistical methods and computational tools for GWAS analysis In genome-wide association studies (GWAS), a range of statistical methods and computational tools are used to analyze the correlation between genetic and phenotypic data, aiming to identify genetic variants associated with specific traits. These statistical methods mainly include correlation analysis, group structure and kinship correction, and multivariate analysis. Association analysis is one of the core statistical methods in GWAS, which identifies potential genetic factors by calculating the correlation between the frequency of genetic markers (such as SNPs) and specific traits. The most commonly used method is single-locus association analysis, in which each SNP is tested individually for statistical association with trait performance. This is usually done through linear regression or logistic regression models, linear regression is used for continuous traits, and logistic regression is used for categorical traits (such as disease states) (Peng et al., 2022) . Considering that population structure and relatedness may lead to false positive associations, methods to correct for these potential confounding factors are also included in the GWAS analysis. Population structure refers to the genetic background differences present in a sample set, while kinship refers to the blood relationship between samples. These factors, if not controlled, may result in erroneous associations of genetic markers with traits. The effects of population structure can be identified and corrected by using methods like principal component analysis (PCA), while mixed linear models (MLM) can improve the accuracy of GWAS by taking into account both population structure and kinship. Multivariate analysis allows multiple traits or multiple genetic markers to be considered simultaneously to explore interactions and joint effects between them. This approach can help reveal the genetic basis of complex traits, especially when traits are biologically interconnected. To handle the complex data and statistical analysis of GWAS, a variety of computational tools and software packages have been developed. PLINK is one of the most widely used GWAS data analysis tools. It provides a series of functions, including data management, basic statistical analysis, correlation analysis, and control of

RkJQdWJsaXNoZXIy MjQ4ODYzMg==