Cotton Genomics and Genetics 2025, Vol.16, No.5, 210-221 http://cropscipublisher.com/index.php/cgg 210 Research Insight Open Access Pan-Genome Analysis Reveals Genetic Diversity and Subgenome Dominance in Cotton Zhen Li Hainan Institute of Biotechnology, Haikou, 570206, Hainan, China Corresponding email: zhen.li@hibio.org Cotton Genomics and Genetics, 2025, Vol.16, No.5 doi: 10.5376/cgg.2025.16.0021 Received: 01 Jul., 2025 Accepted: 11 Aug., 2025 Published: 01 Sep., 2025 Copyright © 2025 Li, This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Preferred citation for this article: Li Z., 2025, Pan-genome analysis reveals genetic diversity and subgenome dominance in cotton, Cotton Genomics and Genetics, 16(5): 210-221 (doi: 10.5376/cgg.2025.16.0021) Abstract The pan-genome concept has emerged as a powerful framework for understanding genome variability within a species, providing crucial insights into genetic diversity, adaptation, and evolution in plants. In this study, we review the landscape of cotton (Gossypium spp.) genomes through the lens of pan-genomics, with a particular focus on the role of polyploidy and subgenome dynamics. We explore the structural evolution of diploid and polyploid cotton genomes, the composition of core and dispensable genes, and the presence of lineage-specific genes and structural variants across cultivars and wild relatives. Our analysis highlights how pan-genome studies have uncovered key agronomically relevant genes absent in reference genomes and revealed extensive gene presence/absence variation (PAV), SNPs, InDels, and CNVs that contribute to trait diversity. We also examine expression bias and subgenome dominance in allopolyploid cotton, revealing regulatory asymmetries that influence fiber development, stress responses, and reproductive traits. A focused case study on Gossypium hirsutum demonstrates the integration of genomic data from diverse accessions and the discovery of elite trait-associated genes. Finally, we discuss the implications of cotton pan-genomics for molecular breeding, biotechnology, and the development of high-yield, stress-tolerant varieties. This review underscores the transformative potential of pan-genome resources in shaping next-generation cotton improvement strategies. Keywords Cotton pan-genome; Subgenome dominance; Genetic diversity; Polyploidy; Gossypium hirsutum 1 Introduction The traditional reference genome has indeed played a significant role in the past, and many fundamental researches have also started with it. But then again, the content it can cover is, after all, limited. Especially when it comes to genetic differences among different individuals, a single version seems insufficient. Some information is not even in the reference sequence at all, which is one of the reasons why the new concept of "pan-genome" was proposed later. The pan-genome is actually not difficult to understand. To put it simply, it no longer focuses solely on a representative genome but takes into account the genetic information of multiple individuals within an entire species. In addition to the "core genes" that can be found in all materials, it also includes those "unique" or "rare" variant genes that only occur in some individuals. This integration approach has caused a significant stir in plant research, especially in identifying structural variations (such as chromosomal inversions, insertions, and deletions) and presence/deletion variations (PAVs), which has indeed enhanced the efficiency of discovery. Some new genes that had never been noticed before were precisely added through this method. Compared with the previous approach of "one reference genome for the final analysis", the pan-genome undoubtedly offers a more comprehensive perspective. It not only enables us to see genes themselves more comprehensively, but also helps researchers better understand genetic diversity, the evolutionary process of species, and even the genetic basis of some important agronomic traits (Ma et al., 2021; Huang et al., 2024). Among so many research subjects, cotton is almost the "chosen one". Apart from being a major global cash crop in itself, its genetic background is also very representative. Cotton has both diploid and allopolyploid groups, and this structure provides excellent material for the study of genomic evolution. Especially for polyploid species like hirsutum (G. hirsutum) and barbadense (G. barbadense), the two subgenomes in their bodies-usually called At and Dt-are not simply pieced together. During the actual development process, there is a very complex mutual regulatory relationship between these two sets of subgenomes. Moreover, under the long-term breeding selection
RkJQdWJsaXNoZXIy MjQ4ODYzNA==