Genomics and Applied Biology 2024, Vol.15, No.5, 235-244 http://bioscipublisher.com/index.php/gab 240 6 Methodologies for Chloroplast Genome Analysis 6.1 Techniques for chloroplast DNA extraction and sequencing Chloroplast DNA (cpDNA) extraction and sequencing are critical steps in the characterization of chloroplast genomes. Various methods have been developed to optimize the yield and purity of cpDNA. Traditional methods often struggle to balance quality and yield, but recent advancements have improved these processes significantly. For example, a modified protocol based on sucrose gradients has been shown to efficiently isolate cpDNA from angiosperms, achieving 40%-50% purity, which is sufficient for subsequent genome assembly using Illumina sequencing technology (Shi et al., 2012). Additionally, the use of single molecule, real-time (SMRT) DNA sequencing technology, such as the circular consensus sequencing (CCS) strategy, allows for high-accuracy de novo assembly and SNP detection of chloroplast genomes without the need for a reference genome (Li et al., 2014; Freudenthal et al., 2019). Multiplex sequencing-by-synthesis (MSBS) using the Illumina Genome Analyzer is another effective method, enabling the simultaneous sequencing of multiple chloroplast genomes with high coverage and accuracy (Cronn et al., 2008). 6.2 Bioinformatics tools for genome assembly and annotation The assembly and annotation of chloroplast genomes require specialized bioinformatics tools. A systematic comparison of various chloroplast genome assembly tools has revealed significant differences in their performance and computational requirements. These tools are capable of successfully assembling chloroplast genomes in more than 60% of known real data sets, leading to the assembly of novel chloroplast genomes (Freudenthal et al., 2019). Docker images for each tested tool have been created to ensure reproducibility and facilitate large-scale screening of genomic data for chloroplast genomes. Additionally, standard practices for DNA extraction, sequencing library preparation, and bioinformatics analyses, including assembly, verification, annotation, and sequence comparisons, are recommended to ensure high-quality chloroplast genome sequencing reports (Heinze, 2021). 6.3 Challenges in chloroplast genome analysis Despite advancements in chloroplast genome analysis, several challenges remain. One major issue is the conservative nature of chloroplast gene and genome evolution, which can limit phylogenetic resolution and statistical power in evolutionary and population genetic studies (Cronn et al., 2008). Moreover, the presence of mononucleotide repeats can interrupt contig assembly, with increasing repeat length posing further difficulties. Another challenge is the need for cost-effective high-throughput cpDNA extraction methods, as conventional methods often fail to achieve the necessary balance between quality and yield (Shi et al., 2012). Furthermore, the analysis of large multi-gene data sets can be complicated by systematic biases and conflicting signals, making it difficult to resolve ancient divergences accurately (Fučíková et al., 2016). These challenges highlight the need for continued development and refinement of methodologies and tools in chloroplast genome analysis. 7 Challenges and Limitations 7.1 Technical challenges The characterization of the chloroplast genome in Eucommia ulmoides faces several technical challenges. One significant issue is the complexity of the chloroplast genome itself, which includes a LSC region, a SSC region, and twoIR regions. This structure can complicate the assembly and annotation processes, as seen in the study where the chloroplast genome length was determined to be 163 586 bp with a typical quadripartite structure (Zhu et al., 2020; Zhong et al., 2022). Additionally, the presence of repetitive sequences and DNA repeat variations can lead to difficulties in achieving high-quality genome assemblies. For instance, the high-quality haploid genome assembly of E. ulmoides required advanced technologies like PacBio and Hi-C to improve the scaffold N50 significantly and reduce the number of gaps (Li et al., 2020). 7.2 Data availability Another major limitation is the scarcity of genomic data available for E. ulmoides. The lack of comprehensive genomic datasets hinders the ability to perform extensive comparative analyses and population genetics studies. For example, the study on the chloroplast genome of E. ulmoides highlighted the limited availability of complete
RkJQdWJsaXNoZXIy MjQ4ODYzMg==