RGG_2025v16n3

Rice Genomics and Genetics 2025, Vol.16, No.3, 159-179 http://cropscipublisher.com/index.php/rgg 176 9 Challenges and Future Perspectives 9.1 Technical limitations: sequencing depth, assembly errors, SV annotation Although there has been good progress in rice pan-genome studies, several technical hurdles remain. One key issue is the depth and quality of sequencing needed to uncover rare variants. To fully detect structural variants-especially those that occur infrequently-deep long-read sequencing of many individuals is ideal. But this approach is expensive. Many projects try to balance cost by sequencing a small, diverse subset deeply and the rest at lower coverage. This helps reduce costs but may miss low-frequency SVs. In the 3K Rice Genome Project, for example, the lower coverage meant that only common variants were detected reliably, and many rare insertions or deletions were probably missed (Zhang et al., 2022). As studies expand to tens of thousands of genomes, maintaining accuracy will be tough. New algorithms that work well with low-depth data will be essential. Another challenge is the reliability of assemblies. Even with better long-read tools, misassembles in repetitive regions or collapsed duplications are still common. These errors can lead to false SV calls. Advances like Hi-C scaffolding and trio binning are improving things, but manual checking is still needed to reduce errors (Shang et al., 2023). Interpreting SVs is no small task. We can now identify huge numbers of them, but figuring out which ones matter biologically is tricky. SVs in coding regions are easier to assess, but many lie in regulatory or non-coding regions, where their effects are hard to predict. Plus, current formats like VCF struggle to represent complex SVs, making annotation and analysis more difficult. Finally, pan-genome updating and maintenance is technical work: as more genomes are added, computing the “incremental” pan-genome without starting from scratch is non-trivial. Efficient algorithms are needed to merge new assemblies into existing pan-genome graphs or alignment maps. In summary, generating a flawless and exhaustive rice pan-genome is still constrained by sequencing resources, assembly/algorithmic accuracy, and bioinformatic frameworks for SV interpretation (Qin et al., 2021). Overcoming these technical limitations will be crucial to fully realize the benefits of pan-genomics. 9.2 Integration with transcriptome, epigenome, and phenome data The power of a rice pan-genome can be greatly amplified by integrating it with other layers of genomic and phenotypic data. One future direction is the development of pan-transcriptomes-catalogs of all transcripts expressed across different rice varieties and conditions. Different rice lines may express alternative splicing variants or even lineage-specific genes (from the dispensable genome) under certain conditions. By mapping RNA-seq data from diverse varieties to a pan-genome reference, researchers could identify novel transcripts originating from sequences absent in the standard reference (Woldegiorgis et al., 2022). Some initial studies have begun constructing such cross-variety transcriptome comparisons, revealing, for example, that certain stress-induced transcripts in wild rice have no counterpart in cultivated rice. Integrating transcriptomic data will help pinpoint which variable genes are actually functional (transcribed) and under what circumstances, linking the structural presence of a gene to a biochemical function. Similarly, incorporating epigenomic data (like DNA methylation, histone modification profiles) in a pan-genomic context is an emerging frontier. It is known that transposable element activation and silencing can vary between rice strains; these epigenetic differences could influence gene expression and might correlate with structural variations (e.g., TEs near variable genes might be silenced in some strains and active in others) (Li et al., 2025). A pan-epigenome approach would track how epigenetic marks differ on core vs. dispensable genomic regions in different genetic backgrounds. This could provide insight into regulation of newly introgressed DNA or domestication-related chromatin changes. On the phenotype side, bridging the gap between pan-genome genotype and the phenome (the set of phenotypes) is the ultimate goal. This will involve large panels of diverse lines grown in multiple environments with extensive trait measurements (phenomics), and analyzing these in conjunction with pan-genome variants. Approaches like GWAS and machine learning can link complex combinations of structural variants to traits. For instance, if a subset of pan-genome presence/absence variants consistently correlates with drought tolerance (supported by both genomics and transcriptomics under drought stress), it strengthens the causal inference.

RkJQdWJsaXNoZXIy MjQ4ODYzNA==