Rice Genomics and Genetics 2025, Vol.16, No.3, 159-179 http://cropscipublisher.com/index.php/rgg 160 degree of variation observed among different rice accessions. Many DNA sequences that are present in some rice varieties are absent or highly divergent in the reference genome. As a result, reference-biased analyses fail to detect structural variations or novel genes that do not align to the reference (Qin et al., 2021). For example, if a gene is missing in the reference but present in an indica landrace, short-read sequencing of that landrace would yield reads that go unmapped, leading to the gene’s omission from analysis. This bias can skew SNP discovery, gene presence/absence calls, and trait mapping. Furthermore, single-reference approaches complicate the detection of large structural variants (SVs). Short-read sequencing data aligned to a reference often misses insertions, deletions, or rearrangements larger than the typical read length. Such SVs are a substantial source of genetic and phenotypic variation in rice populations (Vahedi et al., 2023). In summary, the single-reference paradigm inherently overlooks “hidden” variation outside the reference gene set. These limitations motivated the development of pan-genome strategies that integrate multiple genomes to more fully capture rice’s genomic diversity. The concept of a pan-genome-the complete set of genes or sequences present in all members of a species-was first introduced in bacterial genomics in 2005. In the ensuing years, pan-genomics has rapidly expanded to plant and animal studies. A plant pan-genome is typically composed of a core genome (genes present in every individual of the species) and a dispensable (or variable) genome (genes present in some individuals but missing in others). Early pan-genome analyses in plants began to reveal that a considerable portion of any given species’ gene repertoire is dispensable. This was a paradigm shift from the assumption that a single reference could adequately represent a species. In rice, pan-genomic research gained momentum in the last decade as sequencing costs dropped and computational methods improved (Liu et al., 2021). Initial efforts involved comparing a few divergent cultivars, demonstrating that each new genome contributed novel genes absent from the reference. As more genomes were added, it became evident that the rice pan-genome is vast and still growing. Concurrently, pan-genome studies in other major crops (e.g. maize, soybean, brassicas) were yielding similar insights, underscoring that pan-genomics had “come of age” as a powerful framework in plant biology. The development of pan-genomics has thus been driven by the need to systematically catalog genetic variation at the species level. It represents a natural evolution of genomics from single-reference assemblies to comprehensive, population-scale genome resources. This study provides a comprehensive overview of rice pan-genome research and to analyze how structural variations are distributed across different subspecies of rice; covers the concept and evolution of plant pan-genomes, emphasizing why they are necessary and how they have been applied in crop species; then focuses on strategies for constructing rice pan-genomes, including sequencing technologies and assembly approaches that enable integration of diverse rice genomes (indica, japonica, aus, aromatic and wild relatives). A major emphasis is placed on structural variations (SVs) uncovered by rice pan-genomes-their types, detection, distribution, and hotspots-and the functional implications of these SVs for agronomic traits and gene content. This study also presents comparative analyses that shed light on evolutionary divergence between rice subspecies. Four case studies illustrate key achievements in rice pan-genome research. Finally, this study discusses practical applications in breeding as well as current challenges and future perspectives for pan-genomic approaches in rice improvement and sustainable agriculture. 2 The Concept and Evolution of Plant Pan-genomes 2.1 Definition and structure of pan-genomes A pan-genome encompasses the complete set of genomic elements (genes, regulatory sequences, structural variants, etc.) present in a species’ population. It is typically defined in terms of two components: the core genome and the variable (or dispensable) genome. The core genome consists of genes found in all individuals of the species, representing functions that are presumably essential or ubiquitous. In contrast, the variable genome comprises genes or sequences that are present in some individuals and absent in others. These variable genes can include those gained or lost during evolution, often imparting specialized traits (for example, adaptation to specific stresses or environments) (Bayer et al., 2020). The pan-genome can be thought of as a union of all genes across all genomes of the species.
RkJQdWJsaXNoZXIy MjQ4ODYzNA==