Cotton Genomics and Genetics 2025, Vol.16, No.5, 210-221 http://cropscipublisher.com/index.php/cgg 212 have high-quality, chromosome-level assembly data (Zhang et al., 2015). After putting them together for comparison, quite a few differences were found-especially structural variations, such as inversions and translocations, which are "gene migrations", frequently occurring in the comparisons. It is precisely these variations that may explain the differences in trait manifestations among different cottons (Wang et al., 2018). Furthermore, pan-genome analysis and mapping methods further indicate that the cotton genome is not monolithic. Some areas are the "core areas" that all materials have, while others are the "variable modules" that vary by variety. This "jigsaw puzzle feeling" has enabled us to have a clearer understanding of the genetic composition of modern cotton. More importantly, these genomic data are not only "databases" for scientific research, but have also been actually used to identify key QTL loci related to fiber traits and stress resistance, directly serving precision breeding and variety improvement. In other words, with these comparative assembly and map resources, breeding no longer relies on guesswork but is based on evidence. 3 Insights from Pan-Genome Analyses in Cotton 3.1 Core, dispensable, and private genes: definitions and biological relevance In pan-genome analysis, classifying genes into three categories is actually to better explain the distribution of genes in different germplasm materials. One type is the "core genes" that almost everything is carried, another type is the "optional genes" that some have and some don't, and there are also those "private genes" that only appear in one or two materials. In upland cotton (G. hirsutum), the proportion of these three types of genes is very interesting: the "soft core gene", which can be found in almost all germplasms, accounts for 97% to 100%. The coverage of "shell genes" is much broader, ranging from 1% to 97%. As for "cloud genes", they only appear in less than 1% of germplasms. The functions of these genes are quite different. Generally speaking, core genes are the "essential modules" for maintaining basic life activities. However, those uncommon genes are closely related to phenotypic differences, stress resistance and even environmental adaptation (Wang et al., 2022). In other words, the genetic differences observed in many cotton populations are actually caused by these "sometimes present and sometimes absent" genes. 3.2 Discovery of lineage-specific genes and structural variants The reference genome of upland cotton actually does not cover all the genetic content. Through pan-genome analysis, researchers have identified over 30 000 new genes that were previously unrecorded. Even when it comes to sea island cotton, this number is nearly 9 000. These genes have a characteristic: they are not "present in everyone", but have lineage specificity-that is to say, they only appear in a certain branch of cotton and are not present in other cotton species at all. Genes like this that "exist only in a part of the cotton family" are completely invisible using the traditional single reference genome method. It is precisely because the pan-genome has incorporated more samples that it can identify these "hidden members". In addition to finding genes, the pan-genome also revealed very large-scale structural variations, such as SVS like large insertions, deletions and inversions, with a number of more than 290 000 (Li et al., 2024). Part of it is related to the process of cotton domestication, population differentiation and even trait improvement. Some structural variations also involve reproductive isolation between different lineages or parallel selection of fibrous traits. These contents sound quite "genomic", but they are very crucial for understanding how the traits of cotton come about. 3.3 Functional categorization of variable genes linked to agronomic traits At first, many people thought that what could affect the yield or the quality of the fibers must be those core genes that "every cotton has". But with more research, this understanding is also changing. Nowadays, many key variations related to traits are actually identified through large-scale screenings such as GWAS (Genome-wide Association Studies) and functional annotations (Jin et al., 2023). For instance, 124 PAVs related to traits (i.e., gene fragments that are absent in some individuals and present in others) have been found to be directly linked to important indicators such as fiber quality and yield. Not only that, but new QTL loci have also been discovered for traits such as flowering time and fiber strength, which were previously not given much attention (Joshi et al., 2023). What's more interesting is that many of these "useful variations" are not in the core part of the mainstream reference genome at all, but are hidden in some so-called "non-essential" genes-that is, those variable genes or
RkJQdWJsaXNoZXIy MjQ4ODYzNA==