RGG_2025v16n3

Rice Genomics and Genetics 2025, Vol.16, No.3, 159-179 http://cropscipublisher.com/index.php/rgg 164 3.3 Assembly pipelines and reference graph models There are two primary strategies for constructing a rice pan-genome from multiple genomes: the de novo assembly approach and the reference-guided (iterative) approach. In the de novo approach, one generates an independent whole-genome assembly for each selected accession (using methods outlined above), and then aligns or merges these assemblies to identify the union of genomic sequences. Pipeline tools such as MUMmer4 or minimap2 can align assemblies to the reference, revealing insertions (segments present in a new assembly but not in reference) and other structural differences. The union of all unique sequences across the assemblies forms the pan-genome. For example, using 12 high-quality rice genomes as a starting point, researchers constructed a non-redundant sequence collection that served as a pan-genome reference. Subsequent diverse accessions can then be mapped onto this pan-genome reference to identify additional variants. In contrast, the reference-guided iterative assembly approach starts with a reference genome and incrementally incorporates sequences from short reads of other accessions. Zhao et al. (2018) followed this strategy by taking 66 rice accessions (53 cultivated and 13 wild) and iteratively assembling contigs from unmapped reads, thereby building a composite reference that captured novel sequences absent in the original reference. This “map-to-pan” strategy was effective with short reads, though it can miss complex rearrangements. More recently, graph-based genome modelshave been introduced to represent a pan-genome. In a graph model, nodes represent genomic sequences (from any accession) and edges represent connections, allowing multiple alternative allelic sequences to coexist in one representation. Rice researchers have developed graph-based pan-genomes where all detected structural variants are embedded in a genome graph. This enables joint genotyping of variants across hundreds of accessions using tools like VG or PanGenie. Whether using aligned assemblies or genome graphs, the result is a pan-genome reference that replaces a single linear reference, providing a more comprehensive coordinate system to map reads and genetic data from diverse rice lines. 3.4 Inclusion of different rice subspecies (indica, japonica, aus, aromatic, wild) When building a rice pan-genome, it's important to choose samples that truly represent the genetic variety of rice. The two main types-indica and japonica-must both be included because they each have unique genes. For example, popular varieties like IR64 (indica) and Nipponbare (japonica) can differ by millions of SNPs. Other groups, such as aus (a distinct indica-related type from South Asia) and aromatic types like Basmati, also carry special traits. One well-known example is the fragrance gene in Basmati, which isn’t found in standard indica or japonica. To capture this kind of genetic diversity, researchers often include accessions from all these groups. That’s exactly what was done in the 3K Rice Genomes project, which selected a wide mix: indica, aus, tropical and temperate japonica, and aromatic types. This kind of careful sampling helps ensure the pan-genome reflects the full range of rice diversity. In addition, wild rice species have been incorporated to build an extended pan-genome (sometimes termed a “super-pangenome” when spanning multiple species). O. rufipogon (the Asian wild progenitor) contributes alleles that were lost or rare in cultivated rice, and including dozens of O. rufipogon genomes revealed thousands of wild-specific genes (Guo et al., 2025). For example, a recent study built a pan-genome from 129 wild O. rufipogon accessions and 16 cultivars, uncovering ~13 728 genes present only in wild rice and absent from domesticated rice. Other wild relatives like O. nivara orAfrican O. barthii have further expanded the gene pool. By integrating indica, japonica, aus, aromatic, African rice, and wild Oryza species, researchers ensure that the pan-genome represents the full spectrum of rice genomic variation. This inclusive approach has highlighted, for instance, the much higher abundance of disease resistance genes in wild rice compared to cultivars (Guo et al., 2025)., underlining the value of wild germplasm in enriching the pan-genome for crop improvement. 4 Structural Variations in the Rice Pan-genome 4.1 Types of structural variations: insertions, deletions, inversions, translocations, CNVs Structural variations (SVs) are genomic differences that involve segments of DNA larger than about 50 base pairs. They contrast with single nucleotide polymorphisms (SNPs) and include a variety of mutation types. The major categories of SVs in rice and other organisms are:

RkJQdWJsaXNoZXIy MjQ4ODYzNA==