BM_2024v15n1

Bioscience Method 2024, Vol.15, No.1, 8-19 http://bioscipublisher.com/index.php/bm 15 Population structure refers to the natural clustering that exists in a sample, which can lead to false positive association results. Differences in genetic background may also mask true gene-trait associations, especially when comparing across populations. To overcome this challenge, researchers need to use sophisticated statistical methods to correct for the effects of population structure, such as structural equation modeling or mixed linear models, but this increases the complexity and computational burden of the analysis. GWAS is mainly used to identify the association between frequently occurring genetic variants and traits, but its ability to detect rare variants is limited. Rare variants may have important effects on traits, but their low frequency in the population makes them difficult to detect through GWAS. This limits the ability of GWAS to reveal the full picture of genetic diversity. In GWAS, thousands of genetic markers are tested simultaneously for association with a specific trait, which requires correction for multiple comparisons to avoid false-positive results. Although correction for multiple testing (such as Bonferroni correction) can reduce the false positive rate, it also increases the risk of missing true associations (false negative results). Balancing this trade-off is an important consideration in GWAS design (Bardak et al., 2021). Even if GWAS successfully identifies genetic markers associated with a trait, translating these statistical associations into biological meaning remains a challenge. Identification of candidate genes near associated markers and their functions requires further experimental validation. In addition, the expression of traits is often controlled by multiple genes and affected by environmental factors, which makes interpreting GWAS results more complex. With the advancement of sequencing technology, the amount of genetic data generated has increased dramatically, which has placed higher demands on data storage, processing, and analysis. Processing large-scale data sets requires expensive computing resources and specialized data analysis skills, which may limit the capabilities of some research institutions or individual researchers. Despite these challenges and limitations, GWAS remain an indispensable tool in modern genetics and genomics research. Through continuous technological innovation and methodological improvements, as well as interdisciplinary collaboration, it is expected to overcome these obstacles in the future and more effectively utilize the potential of GWAS in genetic research. 3.2 Detection of rare variants and small effect variants in GWAS Detecting rare variants and variants of small effect presents specific challenges in genome-wide association studies (GWAS). These challenges arise primarily from the low minor allele frequencies (MAFs) of these variants and the subtle differences in their effects on traits. Because rare variants occur at low frequencies in the population, traditional single-variant analysis methods are often underpowered at typical next-generation sequencing (NGS) sample sizes. Additionally, as sample size increases, the multiple testing burden on single rare variant analysis increases because more unique rare variant sites will be detected. Therefore, obtaining adequate power for single variant rare variant analysis often requires extremely large sample sizes, which are often not practical and/or economically feasible . Analysis of rare variants typically uses “aggregate” testing, whereby identified variants are collectively tested on the basis of physically overlapping predefined genomic regions. This approach requires a clear definition of the set of variants suitable for analysis, typically by defining genomic regions, such as genes, into which overlapping rare variants are grouped, and is particularly suitable for large-scale indiscriminate scans (e.g., whole-exome sequencing/whole-genome sequencing) (Chen et al., 2022). Although many aggregated rare variant analysis methods have been developed, they mainly fall into two broad categories: burden tests and variance-component (or "core") tests. For example, the set-based Sequence Core Association Test (SKAT) and its variants are widely used in aggregated rare variant analysis. To address the challenge of rare variants, researchers are also working to combine publicly available whole-genome sequencing

RkJQdWJsaXNoZXIy MjQ4ODYzMg==