TGMB_2024v14n2

Tree Genetics and Molecular Breeding 2024, Vol.14, No.2, 69-80 http://genbreedpublisher.com/index.php/tgmb 70 2 Overview of Poplar Genome Structure 2.1 Characteristics and complexity of the poplar genome The poplar genome is notable for its modest size and extensive genetic diversity, making it an ideal model for tree molecular biology and biotechnology. The genome of Populus trichocarpa, for instance, has been sequenced to an approximately 6x depth, revealing a genome size of around 520 Mbp (Brunner et al., 2004). This genome size is relatively small compared to other tree species, facilitating easier manipulation and study. The poplar genome also exhibits high levels of structural variation (SV), including insertions, deletions, and copy number variations (CNVs), which play significant roles in genome evolution and adaptation (Pinosio et al., 2016). These SVs are often located in low-gene density regions and are associated with transposable elements, indicating a dynamic genome structure. 2.2 Major milestones in poplar genome sequencing The sequencing of the poplar genome has been a collaborative international effort, spearheaded by the U.S. Department of Energy. The annotated whole genome sequence of Populus trichocarpa was released to the public in early 2004, marking a significant milestone in tree genomics (Tuskan et al., 2004). This project provided the first opportunity to compare the genome of a perennial tree with that of annual plants, offering insights into tree-specific genetic traits such as dormancy and long-term host-pest interactions. Additionally, the assembly of the Populus alba genome, which is highly collinear with P. trichocarpa, further expanded the genomic resources available for poplar research (Ma et al., 2018). The development of comprehensive functional genomics resources, including cDNA libraries and microarray platforms, has also been crucial in advancing our understanding of poplar gene expression and regulation (Ralph et al., 2006; 2008). 2.3 Comparative genomics Comparative genomics has revealed significant insights into the unique features of the poplar genome in relation to other tree species. For instance, the coding content of the poplar genome shows high similarity to that of Arabidopsis, an annual plant, suggesting that differences between these species are primarily due to gene regulation rather than gene content (Sterky et al., 2004). This similarity allows researchers to leverage the extensive functional genomic information available for Arabidopsis to study poplar 8. Furthermore, the expansion of certain gene families in poplar, such as those related to histone and auxin, highlights the evolutionary adaptations of poplar to its perennial lifestyle (Ma et al., 2018). Comparative analysis with other plant species has also identified unique multigene families in poplar, which may be related to its adaptation to environmental stresses (Park et al., 2005). 3 Gene Annotation Techniques in Poplar 3.1 Methods for annotating functional genes in the poplar genome Gene annotation in poplar involves a combination of high-throughput sequencing, computational tools, and manual curation. One prominent method is the use of RNA-Seq data to identify and annotate carbohydrate-active enzymes (CAZymes) in the Populus trichocarpa genome. This approach allows for the identification of genes involved in the biosynthesis of cell wall polymers and other important metabolic processes by analyzing gene expression patterns across different tissues (Kumar et al., 2019). Another method involves the sequencing and annotation of large genomic sequences, such as a 95-kb stretch of Populus deltoides, which revealed disease resistance genes and transposable elements. Tools like ANNOVAR are also employed to annotate genetic variants by examining their functional consequences on genes and identifying variants in conserved regions (Wang et al., 2010). 3.2 Challenges in accurate gene annotation Accurate gene annotation faces several challenges, including the high rate of misannotations due to the reliance on sequence homology and the propagation of errors through databases. Misannotations can be detected using methods that analyze genomic correlations to identify genes with unusually weak correlations in their assigned network positions (Hsiao et al., 2009). Another challenge is the integration of diverse types of functional genomic data, which requires sophisticated computational frameworks to combine data from different sources, such as

RkJQdWJsaXNoZXIy MjQ4ODYzMg==