BM_2024v15n1

Bioscience Methods 2024, Vol.15 http://bioscipublisher.com/index.php/bm © 2024 BioSciPublisher, an online publishing platform of Sophia Publishing Group. All Rights Reserved. Sophia Publishing Group (SPG), founded in British Columbia of Canada, is a multilingual publisher. BioSciPublisher, operated by Sophia Publishing Group (SPG), is an international Open Access publishing platform that publishes scientific journals in the field of life science. Sophia Publishing Group (SPG), founded in British Columbia of Canada, is a multilingual publisher. Publisher Sophia Publishing Group Editedby Editorial Team of Bioscience Methods Email: edit@bm.bioscipublisher.com Website: http://bioscipublisher.com/index.php/bm Address: 11388 Stevenston Hwy, PO Box 96016, Richmond, V7A 5J5, British Columbia Canada Bioscience Methods (ISSN 1925-1920) is an open access, peer reviewed journal published online by BioSciPublisher. The journal publishes all the latest and outstanding research articles, letters and reviews in all areas of bioscience, the range of topics including (but are not limited to) technology review, technique know-how, lab tool, statistical software and known technology modification. Case studies on technologies for gene discovery and function validation as well as genetic transformation. All the articles published in Bioscience Methods are Open Access, and are distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. BioSciPublisher uses CrossCheck service to identify academic plagiarism through the world’s leading plagiarism prevention tool, iParadigms, and to protect the original authors’ copyrights.

Bioscience Methods (online), 2024, Vol. 15 ISSN 1925-1920 https://bioscipublisher.com/index.php/bm © 2024 BioSci Publisher, an online publishing platform of Sophia Publishing Group. All Rights Reserved. Sophia Publishing Group (SPG), founded in British Columbia of Canada, is a multilingual publisher Latest Content Rapid Detection of Rice Fragrance Allele badh2-E7 by Recombinant Polymerase Amplification (RPA) Jihua Zhou, Anpeng Zhang, Wenke Xu, Can Cheng, Fuan Niu, Bin Sun, Liming Cao, Jianming Zhang, Huangwei Chu Bioscience Methods, 2024, Vol. 15, No. 1 The Role and Challenges of Genome-wide Association Studies in Revealing Crop Genetic Diversity Danyan Ding Bioscience Methods, 2024, Vol. 15, No. 2 Unveiling the Mechanism of Proprioception in Primates: The Application of Task-Driven Neural Network Models Natasha Liu Bioscience Methods, 2024, Vol. 15, No. 3 New Methods for Predicting Drug Molecule Activity Using Deep Learning JieZhang Bioscience Methods, 2024, Vol. 15, No. 4 AI Based Drug Screening Process: From Data Mining to Candidate Drug Validation WeiWang Bioscience Methods, 2024, Vol. 15, No. 5

Bioscience Method 2024, Vol.15, No.1, 1-8 http://bioscipublisher.com/index.php/bm 1 Research Article Open Access Rapid Detection of Rice Fragrance Allele badh2-E7 by Recombinant Polymerase Amplification (RPA) Jihua Zhou 1*, Anpeng Zhang 1*, Wenke Xu 2, Can Cheng 1, Fuan Niu 1, Bin Sun 1, Liming Cao 1, Jianming Zhang 1 , Huangwei Chu1 1 Institute of Crop Breeding and Cultivation, Shanghai Academy of Agricultural Sciences, Shanghai, 201403, China 2 Shanghai Chongming Sanxing Town Agricultural Comprehensive Technology Extension Service Center, Shanghai, 202152, China * These authors contributed equally to this work Corresponding authors: zhangjianming@saas.sh.cn; chuhuangwei@saas.sh.cn Bioscience Method, 2024, Vol.15, No.1 doi: 10.5376/bm.2024.15.0001 Received: 23 Jan., 2024 Accepted: 26 Jan., 2024 Published: 31 Jan., 2024 Copyright © 2024 Zhou et al., This article was first published in Molecular Plant Breeding in Chinese, and here was authorized to translate and publish the paper in English under the terms of Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Preferred citation for this article: Zhou J.H., Zhang A.P., Xu W.K., Cheng C., Niu F.A., Sun B., Cao L.M., Zhang J.M., and Chu H.W., 2024, Rapid detection of rice fragrance allele badh2-E7 by recombinant polymerase amplification (RPA), Bioscience Method, 15(1): 1-8 (doi: 10.5376/bm.2024.15.0001) Abstract Fragrant rice is favored deeply by consumers due to the strong fragrance. Fragrant rice is mainly caused by the loss-of-function mutation of the Betaine aldehyde dehydrogenase 2 (Badh2) gene in rice. Badh2-E7, with 8 bp deletion and 3 bp substitution in exon 7 of Badh2, is the main mutation allele used in fragrant rice breeding. In this study, a genotyping method named RPA-badh2-E7 for Badh2-E7 allele was designed. This method has the characteristics of rapid (completed amplification in 5 min), sensitivity (100-fold than conventional PCR), no strict amplification conditions required (25 °C~45 °C), and independent of PCR amplifier (only one thermostatic incubator is enough for amplification). This method greatly improved the efficiency of molecular marker-assisted selection of rice fragrant genes and breeding of fragrant rice varieties. Keywords Rice; Fragrant gene; badh2-E7; RPA Rice (Oryza sativa L.) is the most important grain crop in China. With the development of the times and social progress, people's requirements for the quality of rice are also increasing (Zhang et al., 2020). Aroma is an important characteristic of rice quality. Rice with aroma not only has a fragrant aroma, but also is rich in nutrients such as vitamins and aromatic amino acids, which are deeply loved by consumers (Peng et al., 2018). The molecular regulation mechanism of aroma traits and the breeding of new varieties of fragrant rice are receiving increasing attention from scientists. As early as the 1980s, scientists conducted research on the components, formation mechanisms, and genetic characteristics of rice aroma, and found that mutations in the Badh2 (Betaine aldehyde dehydrogenase2) gene encoding betaine aldehyde dehydrogenase were related to the formation of fragrant rice. The functional deficiency of Badh2 leads to the accumulation of 2-acetyl-1-pyrrolin (2-AP) (Yang et al., 2008; Fukuda et al., 2014), a substance directly related to the aroma of rice, resulting in the rich aroma of rice (Chen et al., 2008; Kovach et al., 2009). The Badh2 gene is located on chromosome 8 and consists of 15 exons and 14 introns. Up to now, multiple different allele variations of Badh2 have been reported, including mutation types occurring in different regions such as exons 1, 2, 4, 5, 7, 10, 12, 13, and 14 of Badh2, the cleavage site of intron 1, and the 5'UTR region (Amarawathi et al., 2008; Shi et al., 2008; Shao et al., 2011; Shao et al., 2013; Ootsuka et al., 2014; Shi et al., 2014; He and Park, 2015; Cheng et al., 2018). The deletion of 8 bp and substitution of 3 bp in exon 7 of Badh2 are the main types of variation (Figure 1), which have been applied in the breeding of many fragrant rice varieties (Sun et al., 2021).

Bioscience Method 2024, Vol.15, No.1, 1-8 http://bioscipublisher.com/index.php/bm 2 Figure 1 Design of RPA amplification primer for rice fragrance gene badh2-E7 In the traditional breeding process of new fragrant rice varieties, researchers often use methods such as hot water method (Hu et al., 2006; Yang et al., 2010), KOH method (Sood and Siddiq, 1978), chewing method (Berner and Hoff, 1986), and instrument measurement method (Li J.H., 2008, China Rice, (2): 8-12.) for aroma identification. These methods are either subjectively influenced, or have low accuracy, poor repeatability, and low efficiency. How to simply, accurately, and quickly identify the aroma in fragrant rice is the mainstream of contemporary research (Yan et al., 2015). With the in-depth study of aroma genes, molecular markers have gradually become the main means of aroma detection. At present, the main method for detecting aroma genes is PCR. Recombinase polymerase amplification (RPA) technology is an emerging and rapidly developing isothermal nucleic acid amplification technology. Compared with traditional PCR technology, RPA technology has the advantages of simple operation, short amplification reaction time, and no need for specific instruments (Piepenburg et al., 2006; Zhang et al., 2022; Banerjee et al., 2023). Recently, Banerjee et al. (2023) designed a detection method for the Badh2-E7 allele using RPA technology. However, the directional primers they designed contained an 8 bp missing part, which could only identify whether rice contained the Badh2-E7 gene, but could not distinguish between homozygous and heterozygous lines of the Badh2-E7 gene. This study designed a co dominant marker RPA-badh2-E7 for the Badh2-E7 allele based on RPA technology. This method has four major advantages: (1) time-saving, amplification can be completed in 5 minutes, while conventional PCR requires 1.5 hours; (2) High sensitivity, 100 times higher than conventional PCR; (3) No strict amplification conditions are required, amplification can be performed at 25 ℃~45 ℃; (4) No PCR instrument required, only one constant temperature incubator is needed. Using RPA-badh2-E7, 36 core restoring lines of hybrid japonica rice and 1 new hybrid japonica rice variety Shenyou R1 from the Rice Center Resource Library of Shanghai Academy of Agricultural Sciences were identified. It was found that this method can effectively distinguish genotypes as homozygous Badh2, homozygous Badh2-E7, and heterozygous lines. RPA-badh2-E7 can limited identify badh2-E7, which greatly improves the efficiency of molecular marker assisted selection of rice aroma genes and the breeding of fragrant rice varieties in the future. 1 Results and Analysis 1.1 Design and quality validation of RPA amplification primers In order to improve the molecular marker assisted selection efficiency of the aroma gene Badh2 and accelerate the cultivation process of new aroma rice varieties, this study compared the functional variation site of the 7th exon of Badh2 (Figure 1) and designed three pairs of primers based on the principle of RPA amplification primer design (Figure 1; Table 1).

Bioscience Method 2024, Vol.15, No.1, 1-8 http://bioscipublisher.com/index.php/bm 3 Table 1 Primers sequences of RPA amplification Name Sequence (5'-3') E7-RPA-80F CATTTACTGGGAGTTATGAAACTGGTAA E7-RPA-80R AGAAATTTGGAAACAAACCTTAACCATAG E7-RPA-122F GATATTCCTCTCAATACATGGTTTATGTTT E7-RPA-122R AGAAATTTGGAAACAAACCTTAACCATAG E7-RPA-214F AATGATATTCCTCTCAATACATGGTTTAT E7-RPA-214R AATTCTAAAAAGTAAAGGAGTTAAAAGAAAAG In order to verify the accuracy of the designed RPA primers for detecting badh2-E7, the japonica rice restoration line Shen CR1 (variety right number: CNA20191001690) selected by our research group containing the badh2-E7 aroma allele gene was used as the positive control, while the japonica rice restoration line Shen Hui 26 without the badh2-E7 aroma allele gene was used as the negative control. The leaf DNA of Shen CR1 and Shen Hui 26 were extracted using the CTAB method, and amplified using three pairs of designed RPA amplification primers. The agarose gel electrophoresis results showed that E7-RPA-80 only had specific DNA fragments in Shen Hui 26 without Badh2-E7, while there were no amplification products in the variety Shen CR1 containing Badh2-E7 (Figure 2). E7-RPA-122 and E7-RPA-214 have specific DNA amplification fragments in both fragrant and non fragrant rice. Compared to E7-RPA-214, E7-RPA-122 has better polymorphism (Figure 2), therefore E7-RPA-122 was selected as the marker for testing the Badh2-E7 allele. Figure 2 Quality verification of RPA amplification primers Note: M: DL2000 Marker; P1: Shen CR1; P2: Shenhui 26 1.2 Sensitivity analysis of RPA amplification technology for detecting badh2-E7 Dilute the initial concentration of leaf DNA extracted from the japonica rice restoration line Shen CR1 containing the Badh2-E7 aroma allele and the japonica rice restoration line Shen Hui 26 without the Badh2-E7 aroma allele to 10 ng/μL. Then dilute the DNA with a 10 fold gradient. Perform amplification using conventional PCR and RPA methods respectively, and compare the sensitivity of the two methods. The results showed that the lowest detectable DNA template concentration for PCR technology was 10-2 ng/μL. The lowest detectable DNA template concentration for RPA amplification technology is 10-4 ng/μL. The sensitivity of RPA amplification technology is about 100 times that of PCR technology (Figure 3).

Bioscience Method 2024, Vol.15, No.1, 1-8 http://bioscipublisher.com/index.php/bm 4 Figure 3 Evaluation of the sensitivity of the RPA-Badh2-E7 Note: M: DL2000 Marker; P1: Shen CR1 (Fragrant Rice); P2: Shen Hui 26 (Non-fragrant Rice); The template concentration of the first P1 and P2 from left to right was 10 ng/μL, and the template concentration was diluted 10 times from the second. 1.3 Optimization of RPA amplification reaction temperature and time In order to further improve the efficiency of Badh2-E7 genotype detection, we optimized the RPA amplification reaction conditions. Firstly, the amplification effect of RPA amplification reaction was compared under different temperature conditions, and it was found that specific DNA bands could be amplified within the temperature range of 25 °C~45 °C (Figure 4A). Therefore, we used the recommended amplification temperature of 39 °C in the product manual of the RPA isothermal amplification kit (TABAS03KIT, TwistDx) as the amplification temperature for subsequent experiments. Then, we compared the amplification effects of different RPA reaction times and found that clear DNA bands could be obtained after 5 minutes of amplification at 39 °C (Figure 4B). When using conventional PCR amplification, it often takes about 2 hours to complete the amplification reaction (Cheng et al., 2018). This result indicates that compared to conventional PCR techniques, RPA amplification can significantly shorten the amplification reaction time and does not require the use of expensive instruments such as PCR machines, greatly improving the efficiency and convenience of badh2-E7 detection. Figure 4 Optimize the temperature and time of RPA amplification reaction (A: Comparison of RPA amplification effect under different temperature conditions; B: Comparison of RPA amplification effect under different reaction times.) Note: M: DL2000 Marker; P1: Shen CR1; P2: Shenhui 26 1.4 Application of RPA-badh2-E7 method for detecting badh2-E7 Using the RPA-badh2-E7 method created, 36 core restoring lines of hybrid japonica rice and 1 hybrid japonica rice Shenyou R1 (Shanghai Shendao 2022002) from the Rice Center Resource Library of Shanghai Academy of Agricultural Sciences were identified. Five restoring lines containing the aroma gene Badh2-E7 were screened, including 'Shenfan 24', 'Shenfan 30', 'Shenfan 33', 'Shenfan 43', and 'ShenCR1' (Figure 5). Banerjee et al. (2023) designed the Badh2-E7 allele detection marker using RPA technology, which cannot distinguish between homozygous and heterozygous Badh2-E7 genotypes. In this study, we tested the hybrid japonica rice 'Shenyou R1' (‘Shen23A’בShen CR1’), it can be found that two bands can be amplified in heterozygous plants, indicating that the RPA-badh2-E7 designed in this study is a co dominant marker (Figure 5).

Bioscience Method 2024, Vol.15, No.1, 1-8 http://bioscipublisher.com/index.php/bm 5 Figure 5 Identification of rice fragrant gene Badh2-E7by RPA amplification Note: M: DL2000 Marker; 1:Shenfan1; 2:Shenfan2; 3:Shenfan3; 4:Shenfan4; 5:Shenfan6; 6:Shenfan7; 7:Shenfan9; 8:Shenfan10; 9:Shenfan11; 10:Shenfan12; 11:Shenfan13; 12:Shenfan14; 13:Shenfan16; 14:Shenfan17; 15:Shenfan18; 16:Shenfan19; 17:Shenfan21; 18:Shenfan22; 19:Shenfan23; 20:Shenfan24; 21:Shenfan25; 22:Shenhui26; 23:Shenfan27; 24:Shenfan28; 25:Shenfan29; 26:Shenfan30; 27:Shenfan31; 28:Shenfan32; 29:Shenfan33; 30: Shenfan34; 31:Shenfan35; 32:Shenfan36; 33: Shenfan37; 34:Shenfan38; 35:Shenfan43; 36:ShenCR1; 37:Shenyou R1 2 Discussion The cultivation and breeding of fragrant rice has a long history. From ancient times to the present, many high-quality rice varieties with unique aromas have been cultivated both domestically and internationally, such as India's "Basmati type fragrant rice", Thailand's "jasmine fragrance type", Japan's "Gongxiang", the United States' "Jasmine85" and "Della", China's "Northeast rice flower fragrance", "Yunnan crab valley", and "Guangxi Jingxi fragrant glutinous rice" (Jain et al., 2004; Pachauri et al., 2010; Zheng et al., 2012). However, these varieties have geographical characteristics, or low yields and weak resistance (Qi et al., 2020). In recent years, with the improvement of people's living standards, the demand for high-quality fragrant rice has also increased sharply. Therefore, the breeding of more and better fragrant rice varieties has attracted increasing attention. Accurately and quickly identifying aroma traits is an important step in the breeding process of fragrant rice. Chewing method (Berner and Hoff, 1986) and KOH soaking method (Sood and Siddiq, 1978) are commonly used methods for traditional breeding to identify aroma, but the aroma characteristics themselves are greatly influenced by external environmental conditions, and subjective differences among different appraisers lead to low accuracy in aroma identification. In recent years, with the development of molecular biology and sequencing technology, molecular marker assisted selection has been widely applied in the genetic breeding of fragrant rice. At present, the main method for detecting aroma genes using molecular markers is conventional PCR. However, conventional PCR technology requires the use of expensive PCR instruments, and the amplification time is relatively long (about 2 hours), which requires high technical requirements from experimental personnel. These to some extent limit the application of this technology in breeding (Yang and Yu, 2019). RPA isothermal amplification technology, as an emerging isothermal nucleic acid amplification technology, has higher sensitivity and specificity compared to traditional methods. It does not require special equipment and can be performed at lower temperatures (35 °C~40 °C) or even room temperature (Zhang et al., 2022; Banerjee et al., 2023). It is a molecular detection method that is expected to replace PCR. Banerjee et al. (2023) designed a detection method for the Badh2-E7 allele using RPA technology, which can quickly and sensitively detect the presence of the Badh2-E7 gene, but cannot distinguish between homozygous and heterozygous strains of the Badh2-E7 gene. Unlike the study by Banerjee et al. (2023), this study designed an RPA-badh2-E7 detection method based on the deletion of the 8 bp mutation allele in the 7th exon of the aroma gene Badh2. The amplification reaction can be completed in as little as 5 minutes, which is much lower than Banerjee et al.'s (2023) RPA amplification time of 30 minutes and eliminates the need for RPA enzyme inactivation at 65 °C for 10 minutes. And this study explored the temperature of RPA amplification, and the results showed that RPA amplification reaction can be carried out under temperature conditions of 25 °C~45 °C. The RPA-badh2-E7 method created in this study can effectively distinguish genotypes as homozygous Badh2, homozygous Badh2-E7, and heterozygous strains. Compared with

Bioscience Method 2024, Vol.15, No.1, 1-8 http://bioscipublisher.com/index.php/bm 6 conventional PCR, the method significantly shortens the amplification time and has a sensitivity 100 times higher than conventional PCR. Additionally, this method does not require an expensive PCR instrument and can be amplified at temperatures ranging from 25 °C to 45 °C. Using RPA-badh2-E7, 36 core restoring lines of hybrid japonica rice and one new hybrid japonica rice variety Shenyou R1 from the Rice Center Resource Library of Shanghai Academy of Agricultural Sciences were identified. It was found that this method can effectively distinguish homozygous Badh2, homozygous Badh2-E7, and heterozygous genotypes, greatly improving the efficiency of molecular marker assisted selection of rice aroma genes and the breeding of fragrant rice varieties. 3 Materials and Methods 3.1 Research materials All rice samples used in this study are the backbone parents or varieties of hybrid japonica rice selected by the Rice Center Heterosis Utilization Research Group of the Crop Breeding and Cultivation Research Institute of Shanghai Academy of Agricultural Sciences, including 36 restoring line resources and 1 hybrid japonica rice variety. The RPA isothermal amplification kit (TABAS03KIT) was purchased from TwistDx Co., Ltd. The RPA amplification primers were synthesized by Biotechnology (Shanghai) Co., Ltd. 3.2 Extraction of rice DNA The DNA extraction from rice leaves was carried out using an improved CTAB method (Murray and Thompson, 1980), and the extracted DNA was measured and quantified using a NanoDropTM 2000 micro spectrophotometer. Take a leaf with a length of about 1.5 cm and place it in a 2 mL centrifuge tube; Join 750 μL 1.5×CTAB solution (1.5% CTAB, 75 mmol/L Tris-HCl, 15 mmol/L EDTA, 1.05 mol/L NaCl, ph 8.0) and a steel ball with a diameter of 6 mm; Set the frequency parameter of the plant tissue rapid grinder to 65 Hz and oscillate for 90 seconds; After grinding, the sample is incubated in a 65 °C water bath for 45 minutes, with 500 μL chloroform added, vigorously shaken, centrifuged at 12 000 r/min for 8 minutes; Transfer 500 μL of supernatant to a new 1.5 mL centrifuge tube, add an equal volume of anhydrous ethanol, mix up and down, place the centrifuge tube in a -20 °C freezer for 1 hour, centrifuge at 12 000 r/min for 8 minutes, discard the supernatant, air dry or dry at 37 °C, then add 500 μL of ddH2O to dissolve DNA. Store the dissolved DNA in a -20 ℃ freezer for later use. 3.3 RPA reaction system The RPA amplification system includes 29.5 μL Primer free rehydrogenation buffer, 2.4 μL each of 10 μM forward and reverse primers, 1 L DNA template, 12.2 μL sterile ddH2O, and a total volume of 50 μL. Gently mix with a pipette and transfer the mixture to a TwistAmp reaction tube containing freeze-dried enzyme powder. Then add 2.5 μL of magnesium acetate with a concentration of 280 mmol/L, mix well, and perform constant temperature amplification. Unless otherwise specified, the amplification conditions are 39 °C and incubate for 20 minutes. Authors’ contributions ZJH was the experimental designer and executor of this study; ZJH and ZAP completed data analysis and wrote the first draft of the paper; XWK, CC, NFA, SB, ZJM participated in experimental design and analysis of experimental results; CLM and CHW are the project conceptualizers and leaders, guiding experimental design, data analysis, paper writing, and revision. All authors read and approved the final manuscript. Acknowledgements This study was jointly funded by the Shanghai Rice Industry Technology System Construction Project (Hu Nong Ke Chan Zi (2023) No. 3), the Shanghai Science and Technology Innovation Action Plan Agricultural Science and Technology Field Project (23N61900100), and the Shanghai Science and Technology Innovation Action Plan Natural Science Foundation Project (23ZR1455600).

Bioscience Method 2024, Vol.15, No.1, 1-8 http://bioscipublisher.com/index.php/bm 7 References Amarawathi Y., Singh R., Singh A.,K., Singh V.P., Mohapatra T., Sharma T.R., and Singh N.K., 2008, Mapping of quantitative trait loci for basmati quality traits in rice (Oryza sativa L.), Molecular Breeding, 21: 49-65. https://doi.org/10.1007/s11032-007-9108-8 Banerjee A., Bharti S., Kumar J., Sar P., Priyamedha., Mandal N.P., Sarkar S., and Roy S., 2023, Recombinase polymerase amplification based rapid detection of aroma gene in rice, Rice Science, 30(2): 96-99. https://doi.org/10.1016/j.rsci.2022.10.001 Berner D.K., and Hoff B.J., 1986, Inheritance of scent in American long grain rice, Crop Science, 26(5): 876-878. https://doi.org/10.2135/cropsci1986.0011183X002600050008x Chen S.H., Yang Y., Shi W.W., Ji Q., He F., Zhang Z.D., Cheng Z.K., Liu X.N., and Xu M.L., 2008, Badh2, encoding betaine aldehyde dehydrogenase, inhibits the biosynthesis of 2-acetyl-1-pyrroline, a major component in rice fragrance, Plant Cell., 20(7): 1850-1861. https://doi.org/10.1105/tpc.108.058917 Cheng C., Yang J., Zhou J.H., Niu F.A., Hu X.J., Tu R.J., Luo Z.Y., Wang X.Q., Cao L.M., and Chu H.W., 2018, Identification of fragrant parents of japonica hybrid rice (Oryza sativa L.) based on functional molecular marker, Fenzi Zhiwu Yuzhong (Molecular Plant Breeding), 16(17): 5653-5659. Fukuda T., Takeda T., and Yoshida S., 2014, Comparison of volatiles in cooked rice with various amylose contents, Food Science and Technology Research, 20(6): 1251-1259. https://doi.org/10.3136/fstr.20.1251 He Q., and Park Y.J., 2015, Discovery of a novel fragrant allele and development of functional markers for fragrance in rice, Molecular Breeding, 35(11): 217-226. https://doi.org/10.1007/s11032-015-0412-4 Hu P.S., Tang S.Q., Gu H.H., and Wang X.Y., 2006, Genetic research and breeding application of fragrance in rice, Zhongguo Daomi (China Rice), (6): 1-5. Jain S., Jain R.K., and McCouch S.R., 2004, Genetic analysis of Indian aromatic and quality rice (Oryza sativa L.) germplasm using panels of fluorescently-labeled microsatellite markers, Theor. Appl. Genet., 109(5): 965-977. https://doi.org/10.1007/s00122-004-1700-2 Kovach M.J., Calingacion M.N., Fitzgerald M.A., and Mccouch S.R., 2009, The origin and evolution of fragrance in rice (Oryza sativa L.), Proc. Natl. Acad. Sci. USA., 106(34): 14444-14449. https://doi.org/10.1073/pnas.0904077106 Murray M.G., and Thompson W.F., 1980, Rapid isolation of high molecular weight plant DNA, Nucleic Acids Res., 8(19): 4321-4325. https://doi.org/10.1093/nar/8.19.4321 Ootsuka K., Takahashi I., Tanaka K., Itani T., Tabuchi H., Yoshihashi T., Tonouchi A., and Ishikawa R., 2014, Genetic polymorphisms in Japanese flagrant landraces and novel fragrant allele domesticated in Northern Japan, Breed Sci., 64(2): 115-124. https://doi.org/10.1270/jsbbs.64.115 Pachauri V., Singh M.K., Singh A.K., Singh S., Shakeel N.A., Singh V.P., and Singh N.K., 2010, Origin and genetic diversity of aromatic rice varieties, molecular breeding and chemical and genetic basis of rice aroma, Journal of Plant Biochemistry and Biotechnology, 19(2): 127-143. https://doi.org/10.1007/BF03263333 Peng B., Kong D.Y., Song X.H., Li H.L., He L.L., Gong A.D., Sun Y.F., Pang R.H., Liu L., Li J.T., Zhou Q.Y., Huang Y.Q., Duan B., Song S.Z., and Yuan H.Y., 2018, A method for detection of main metabolites in aromatic rice seeds, Agricultural Biotechnology, 7(1): 112-116. Piepenburg O., Williams C.H., Stemple D.L., and Armes N.A., 2006, DNA detection using recombination proteins, PLoS Biol, 4(7): e204. https://doi.org/10.1371/journal.pbio.0040204 Qi Y.B., Zhang L.X., Wang L.Y., Song J., and Wang J.J., 2020, CRISPR/Cas9 Targeted Editing for the Fragrant Gene Badh2 inRice,Zhongguo Nongye Kexue (Scientia Agricultura Sinica), 53(8): 1501-1509. Shao G.N., Tang A., Tang S.Q., Luo J., Jiao G.A., Wu J.L., and Hu P.S., 2011, A new deletion mutation of fragrant gene and the development of three molecular markers for fragrance in rice, Plant Breeding, 130(2): 172-176. https://doi.org/10.1111/j.1439-0523.2009.01764.x Shao G.N., Tang S.Q., Chen M.L., Wei X.J., He J.W., Luo J., Jiao G.A., Hu Y.C., Xie L.H., and Hu P.S., 2013, Haplotype variation at Badh2, the gene determining fragrance in rice, Genomics., 101(2): 157-162. https://doi.org/10.1016/j.ygeno.2012.11.010 Shi W.W., Yang Y., Chen S.H., and Xu M.L., 2008, Discovery of a new fragrance allele and the development of functional markers for the breeding of fragrant rice varieties, Molecular Breeding, 22: 185-192. https://doi.org/10.1007/s11032-008-9165-7 Shi Y.Q., Zhao G.C., Xu X.L., and Li J.Y., 2014, Discovery of a new fragrance allele and development of functional markers for identifying diverse fragrant genotypes in rice, Molecular Breeding, 33(3): 701-708. https://doi.org/10.1007/s11032-013-9986-x Sood B.G., and Siddiq E.A., 1978, A rapid technique for scent determination in rice, Indian Journal of Genetics and Plant Breeding, 38(2): 268-271. Sun P.Y., Zhang W.H., Shu F., He Q., Zhang L., Peng Z.R., and Deng H.F., 2021, Analysis of mutation sites of OsBADH2 gene in fragrant rice and development of related functional marker, Shengwu Jishu Tongbao (Biotechnology Bulletin), 37(4): 1-7.

Bioscience Method 2024, Vol.15, No.1, 1-8 http://bioscipublisher.com/index.php/bm 8 Yan Y., Zhu G.M., Zhang L.X., Wan C.Z., Cao L.M., Zhao Z.P., and Wu S.J., 2015, Development of molecular markers for fragrant gene and its application, Xibei Zhiwu Xuebao (Acta Botanica Boreali-Occidentalia Sinica), 35(2): 269-274. Yang D.S., Lee K.S., Jeong O.Y., Kim K.J., and Kays S.J., 2008, Characterization of volatile aroma compounds in cooked black rice, J Agric Chem., 56(1): 235-240. https://doi.org/10.1021/jf072360c Yang J.I., and Yu G.Y., 2019, A loop-mediated isothermal amplification assay for the plant-parasitic nematode Aphelenchoides besseyi in rice seedlings, Journal of Nematology, 51(1): 1-11. https://doi.org/10.21307/jofnem-2019-080 Yang Y., Xie Z.Z., Wang K., and Yan Y.M., 2010, Advance in genetic studies on aromatic rice, Shoudu Shifan Daxue Xuebao (Journal of Capital Normal University(Natural Sciences Edition)), 31(3): 24-29. Zhang A.P., Gao Y., Li Y.Y., Ruan B.P., Yang S.L., Liu C.L., Zhang B., Jiang H.Z., Fang G.N., Ding S.L., Jahan N., Xie L.H., Dong G.J., Xu Z.J., Gao Z.Y., Guo L.B., and Qian Q., 2020, Genetic analysis for cooking and eating quality of super rice and fine mapping of a novel locus qGC10 for gel consistency, Front Plant Sci., 11: 342. https://doi.org/10.3389/fpls.2020.00342 Zhang A.P., Sun B., Zhang J.M., Cheng C., Zhou J.H., Niu F.A., Luo Z.Y., Yu L.Z., Yu C., Dai Y.T., Xie K.Z., Hu Q.Y., Qiu Y., Cao L.M., and Chu H.W., 2022, CRISPR/Cas12a coupled with recombinase polymerase amplification for sensitive and specific detection of Aphelenchoides besseyi, Front Bioeng Biotechnol., 10: 912959-912967. https://doi.org/10.3389/fbioe.2022.912959 Zheng J.T., Yang D.W., Dong L.F., You Q.R., Zheng Y., Tu S.H., and Zhou P. , 2012, Inheritance and breeding actuality of new quasi-aromatic rice (Oryza sativa L.), Fujian Nongye Xuebao (Fujian Journal of Agricultural Sciences), 27(10): 1134-1138.

Bioscience Method 2024, Vol.15, No.1, 8-19 http://bioscipublisher.com/index.php/bm 8 Review and Progress Open Access The Role and Challenges of Genome-wide Association Studies in Revealing Crop Genetic Diversity Danyan Ding Institute of Life Sciences, Zhejiang A&F University, Zhuji, 311800, China Corresponding email: kendrading@hotmail.com Bioscience Method, 2024, Vol.15, No.1 doi: 10.5376/bm.2024.15.0002 Received: 17 Dec., 2023 Accepted: 27 Jan., 2024 Published: 11 Feb., 2024 Copyright © 2024 Ding, This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Preferred citation for this article: Ding D.Y., 2024, The role and challenges of genome-wide association studies in revealing crop genetic diversity, Bioscience Method, 14(1): 8-19 (doi: 10.5376/bm.2024.15.0002) Abstract Genome-wide association studies (GWAS) have shown remarkable achievements in the study of crop genetic diversity, providing a powerful tool for crop improvement by identifying genetic markers and genes related to key agronomic traits. However, GWAS faces challenges such as the complexity of population structure, the difficulty of detecting rare variants and small-effect variants, and the complexity of result interpretation. This study aims to combine new technologies such as CRISPR/Cas9 gene editing and GWAS results. Integrating multi-omics data (such as transcriptomics, proteomics) and GWAS will improve the ability to analyze traits, deeply understand the complex mechanisms of trait formation, and accelerate crop production. Character improvement. This study also emphasizes the importance of protecting and rationally utilizing crop genetic resources, hoping that GWAS will exert greater potential in crop genetic research and improvement in the future, with a view to contributing to the sustainable development of agriculture. Keywords Genome-wide association studies (GWAS); Crop improvement; Genetic diversity; Gene editing; Multi-omics integration Genome -wide association studies (GWAS) are powerful genetic tools that allow scientists to identify genetic markers across the entire genome that are associated with specific traits. This approach is based on a basic assumption: that specific genetic variants, or allele frequencies, are distributed differently in populations with different trait expressions. By comparing genomic data from thousands of individuals, GWAS can reveal which genetic variants are associated with disease, physiological traits, or specific traits in agriculture (such as yield, disease resistance, etc.). The importance of this approach lies in its ability to reveal the genetic basis behind complex traits-those that may be influenced by multiple genes as well as environmental factors. Crop genetic diversity refers to the genetic variation within crop populations, including genetic differences between different species, varieties, varieties and cultivars. This diversity is the result of biological evolution and the basis of agricultural production. Genetic diversity enables crops to adapt to environmental changes, resist pests and diseases, and improve the stability and sustainability of agricultural systems (Abdelraheem et al., 2021). In crop improvement, genetic diversity can be used to develop new varieties with high yield, high quality, stress resistance and other characteristics to meet the growing demand for food and cope with the challenges posed by climate change. The background and motivation for the application of GWAS in crop genetic diversity research stems from the urgent need for crop trait improvement. With the growth of population and limited resources, how to increase crop yields, improve crop quality, and enhance crop resistance to adversity has become a global challenge. Although traditional breeding methods have achieved great success in the past few centuries, with the development of genetics and molecular biology, people have begun to seek more precise and efficient methods to explore the genetic potential of crops. GWAS provides a solution that not only rapidly identifies genes associated with key traits across a broad range of genetic backgrounds, but also reveals the genetic mechanisms underlying these traits. This is of great significance for guiding molecular-assisted breeding and achieving precise improvement of crop traits (Peng et al., 2022).

Bioscience Method 2024, Vol.15, No.1, 8-19 http://bioscipublisher.com/index.php/bm 9 Specifically, the application of GWAS allows researchers to discover novel, advantageous genetic variants in a broad range of crop populations that may have been overlooked in traditional breeding. For example, in major food crops such as rice and wheat, the application of GWAS has successfully identified multiple key genes or gene regions related to yield, disease resistance, and stress tolerance. These findings not only enrich our understanding of crop genetic diversity, but also provide new strategies for molecular-assisted selection and genetic improvement of crops. In addition, GWAS can also help to discover valuable genetic resources in wild species and local varieties, which are crucial for enhancing crop adaptability and sustainable production. Although GWAS has shown great potential in revealing crop genetic diversity, it also faces many challenges during its application, including data complexity, selection of analysis methods, interpretation of results, and effective use of genetic information. Therefore, future research needs to innovate and improve at multiple levels to fully leverage the role of GWAS in crop genetic diversity research and contribute to global agricultural production and crop improvement. 1 Overview of GWAS Technology 1.1 Basic principles and methods of GWAS Genome-wide association studies (GWAS) are a method used to find genes or genomic regions in genetic material that are associated with specific traits. The basic principle is based on a hypothesis: if a genetic variation (usually a single nucleotide polymorphism, SNP) is closely related to a specific trait, then individuals with this variation should show a certain degree of improvement in this trait. Common feature. GWAS identifies genetic variation associated with a trait by comparing the frequency of genetic markers in individuals or populations with different trait expressions. The GWAS method usually includes several key steps , which is to collect a large enough sample population that has obvious phenotypic differences in specific traits; then conduct a genome-wide scan on these samples to record thousands of Information about genetic markers (mainly SNPs); finally, statistical methods are used to analyze the correlation between these genetic markers and traits to identify markers that are significantly associated with traits (Hasan et al., 2021). During the analysis process, the statistical model used in GWAS can control the potential confounding effects of population structure and genetic background, thereby improving the accuracy of the association signal. This step is critical because population structure (i.e., differences in genetic background) can lead to false-positive results. Once SNPs that are significantly associated with a trait are identified, researchers can further explore the genes near those SNPs to identify specific genes or genomic regions that may have an impact on trait expression. A major advantage of GWAS is that it does not rely on prior genetic knowledge and enables unbiased exploration across the entire genome. This means that GWAS can reveal previously unrecognized new genes and genetic mechanisms that influence complex traits. However, the identified genetic markers usually require further experimental and functional studies to verify their actual impact on traits, which includes the use of genetic engineering, gene editing technology, and phenotypic identification (Peng et al., 2022). By integrating genetic and phenotypic data from large samples, GWAS provide a powerful method for understanding the genetic basis of complex traits. Despite challenges such as large data volumes, complex analyses, and difficult interpretation of results, GWAS has achieved remarkable achievements in multiple fields, especially in human disease genetics, agriculture, and plant breeding. 1.2 GWAS data types and acquisition methods To conduct genome-wide association studies (GWAS) mainly include two categories: genetic data and phenotypic data. Genetic data involves an individual's genomic information, usually in the form of single nucleotide polymorphisms (SNPs), while phenotypic data is about an individual's performance on specific traits, such as height, yield, disease resistance, etc. Accurate collection and high-quality processing of these two types of data are critical to the success of GWAS.

Bioscience Method 2024, Vol.15, No.1, 8-19 http://bioscipublisher.com/index.php/bm 10 Acquisition of genetic data is usually accomplished through high-throughput genotype sequencing technology. The process involves extracting DNA from each study subject and then analyzing it using gene chips or next-generation sequencing (NGS) technology. Gene chip is a cost-effective method that can detect millions of known SNP sites simultaneously (Abdelraheem et al., 2021). Next-generation sequencing technology allows researchers to not only detect known SNP sites but also discover new genetic variants, although this method is more expensive. The acquired genetic data are then processed and quality controlled through bioinformatics methods to ensure data accuracy and usability. The collection of phenotypic data involves the precise measurement and recording of individual traits. This process requires the use of standardized methods to evaluate and record the performance of each individual on the studied traits while controlling environmental variables. For agricultural crops, phenotypic data can include traits such as yield, maturity, and disease resistance; while in human genetics research, it may include disease status, biochemical indicators, or other health indicators. The quality of phenotypic data directly affects the accuracy of GWAS analysis, so data reliability must be ensured through precise measurements and sufficient replication. During the data collection process, the representativeness and diversity of the sample also need to be taken into consideration. Selecting samples with sufficient numbers and genetic background diversity can help enhance the discovery power of GWAS, which is especially important when looking for rare variants or genes with small effects. In addition, collecting detailed environmental and lifestyle data may also be critical for some studies, as these factors may interact with genetic factors to influence trait performance. The collection and processing of genetic and phenotypic data required to conduct GWAS is a complex but critical process. High-quality data acquisition methods, including advanced sequencing technology, precise phenotypic measurements, and meticulous data processing and analysis, are the foundation for ensuring the success of GWAS and realizing its application potential in genetic research. 1.3 Statistical methods and computational tools for GWAS analysis In genome-wide association studies (GWAS), a range of statistical methods and computational tools are used to analyze the correlation between genetic and phenotypic data, aiming to identify genetic variants associated with specific traits. These statistical methods mainly include correlation analysis, group structure and kinship correction, and multivariate analysis. Association analysis is one of the core statistical methods in GWAS, which identifies potential genetic factors by calculating the correlation between the frequency of genetic markers (such as SNPs) and specific traits. The most commonly used method is single-locus association analysis, in which each SNP is tested individually for statistical association with trait performance. This is usually done through linear regression or logistic regression models, linear regression is used for continuous traits, and logistic regression is used for categorical traits (such as disease states) (Peng et al., 2022) . Considering that population structure and relatedness may lead to false positive associations, methods to correct for these potential confounding factors are also included in the GWAS analysis. Population structure refers to the genetic background differences present in a sample set, while kinship refers to the blood relationship between samples. These factors, if not controlled, may result in erroneous associations of genetic markers with traits. The effects of population structure can be identified and corrected by using methods like principal component analysis (PCA), while mixed linear models (MLM) can improve the accuracy of GWAS by taking into account both population structure and kinship. Multivariate analysis allows multiple traits or multiple genetic markers to be considered simultaneously to explore interactions and joint effects between them. This approach can help reveal the genetic basis of complex traits, especially when traits are biologically interconnected. To handle the complex data and statistical analysis of GWAS, a variety of computational tools and software packages have been developed. PLINK is one of the most widely used GWAS data analysis tools. It provides a series of functions, including data management, basic statistical analysis, correlation analysis, and control of

Bioscience Method 2024, Vol.15, No.1, 8-19 http://bioscipublisher.com/index.php/bm 11 population structure. GCTA (Genome-wide complex trait analysis) is another popular tool specifically used to estimate the contribution of genetic variation to trait variance and perform population structure correction. In addition, software such as Admixture, Eigenstrat and Structure can be used to analyze population structure, and FastLMM Tools such as Factored spectrally transformed linear mixed models are used to handle mixed linear model analysis (Ceballos et al., 2015). The choice of statistical methods and computational tools for GWAS depends on the specific needs of the study, including the type of trait, the genetic background of the sample, and the goals of the study. Correct application of these methods and tools can effectively identify genetic variants associated with traits, providing strong support for understanding the genetic basis of traits. 2 The Role of GWAS in the Study of Crop Genetic Diversity 2.1 GWAS reveals genetic basis of crop traits Genome-wide association studies (GWAS) play an extremely important role in the study of crop genetic diversity. It provides an efficient and powerful method to reveal the genetic basis behind crop traits. Through GWAS, scientists can identify genetic variations related to important agronomic traits, such as yield, stress resistance (including drought resistance, salt-alkali resistance), disease resistance, and quality characteristics, across the entire crop genome . This process not only deepens our understanding of crop genetic diversity, but also provides powerful molecular tools for crop improvement, greatly promoting the development of precision breeding technology. The application of GWAS allows scientists to discover new and beneficial genetic variations in a wide range of crop populations, including traditional varieties, landraces and wild relatives. These genetic resources are valuable assets for crop improvement, and they can be used to develop new varieties that are adapted to different environmental conditions and have high yield and quality traits. For example, in rice and wheat, multiple key genes or gene regions related to yield and disease resistance have been successfully identified through GWAS. These findings not only enhance crop genetic diversity but also improve crop productivity and sustainability (Ceballos et al., 2015). In addition, GWAS also provides a new perspective on understanding the genetic mechanisms of crop traits. By analyzing the association between traits and genetic variation, researchers can uncover the gene networks and regulatory pathways that control complex traits, thereby gaining a deeper understanding of the genetic and molecular basis of traits. This is particularly important for the study of crop adversity stress responses because it involves the interaction of multiple genes and environmental factors (Abdelraheem et al., 2021). Through GWAS, researchers can identify key genetic factors that affect crop phenotypes under specific environmental conditions, providing guidance for environmental adaptability and stress-resistant breeding of crops. Although GWAS has shown great potential in studying crop genetic diversity, its application also faces challenges, including the need for a large number of samples to enhance the statistical power of the study, handling the complexity brought by population structure and genetic background diversity, and the need to extract data from massive amounts of samples. Genetic variation identifies factors that actually influence traits. However, with the advancement of high-throughput sequencing technology, improvements in data analysis methods, and the development of bioinformatics tools, the application of GWAS in the study of crop genetic diversity will become more extensive and in-depth. In the future, GWAS is expected to further promote the process of crop molecular breeding and achieve precise improvement of crop traits and sustainable development of agricultural production. 2.2 Application of GWAS in the study of crop genetic diversity GWAS have made significant progress in the study of genetic diversity in many crops, especially in revealing genetic loci associated with important agronomic traits. Several successful case studies are introduced below, demonstrating the application results of GWAS in crop genetic diversity research.

Bioscience Method 2024, Vol.15, No.1, 8-19 http://bioscipublisher.com/index.php/bm 12 Inmaize (Zeamays L.), GWAS has achieved a major breakthrough and successfully identified many genetic loci and potential genes related to complex traits. These traits include responses to abiotic and biotic stresses, and their discovery holds promise for enhancing fitness and yield through effective breeding strategies. In addition, research using GWAS also involves how to use multi-omics methods including genomics, transcriptomics, proteomics, metabolomics, epigenomics and phenomics to deepen the understanding of complex traits of maize. understanding, thereby improving environmental stress tolerance and promoting maize production (Bhat et al., 2021). Haplotype-based models are an important method for GWAS that accurately capture allelic diversity by integrating high-density marker data, improving the ability to discover epistatic interactions and minimizing the need for multiple testing. This method has been developed and applied in major crops such as wheat, rice and soybeans. Compared with traditional single-site models, haplotype-based models are more efficient and reliable in identifying haplotypes associated with selected traits. In China , for example , the National Medium-Term Gene Bank at the Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences (OCRI-CAAS) preserves more than 8,000 sesame germplasm. Similarly, the Beijing National Long-term Gene Bank preserves approximately 4 500 parts of sesame material (Figure 1). Based on these large collections, a strategy to build a core collection of sesame began in the early 2000s using morphological descriptors and later molecular tools. Ultimately, OCRI established a sesame core germplasm bank, containing 705 different accessions, including 405 local varieties, 95 varieties from China, and 205 accessions from 28 other countries. The entire panel in Illumina HiSeq 2 000 (http://www.ncgr.ac.cn/SesameHapMap), a total of 5 were detected in the genome 407 981 SNPs, with an average of 2 SNPs every 50 bp (Figure 1) (Muez et al., 2021). It can be seen that in order to explore the genetic basis of economically important agronomic traits and identify possible causative genes, these developed GWAS panels need to be updated by providing more materials reflecting different agroecological contexts around the world. Figure 1 Process of key steps in Sesame GWAS implementation (Muez et al., 2021) For another example, Nouraei et al. (2024) used the 90KSNP array to conduct genome-wide association analysis and revealed the genetic determinants of key traits related to wheat drought tolerance, namely plant height, root length, and root and shoot dry weight. Using a mixed linear model (MLM) approach to analyze 125 well-watered and drought stress-treated wheat accessions, we identified 53 that were significantly related to the stress sensitivity (SSI) and tolerance index (STI) of the target traits. Related SNPs. Notably, chromosomes 2A and 3B have 10 and 9 relevant markers, respectively. On 17 chromosomes, 44 unique candidate genes were identified, mainly located in the distal ends of chromosomes 1A, 1B, 1D, 2A, 3A, 3B, 4A, 6A, 6B, 7A, 7B and 7D. These genes are involved in multiple functions related to plant growth, development, and stress response, providing a rich resource for future research. Clustering patterns emerged, especially 7 genes related to plant height SSI and 4

RkJQdWJsaXNoZXIy MjQ4ODYzMg==