CMB_2025v15n6

Computational Molecular Biology 2025, Vol.15, No.6, 273-281 http://bioscipublisher.com/index.php/cmb 275 assembly using only short-read sequences, hybrid assembly usually results in fewer overlapping groups and higher N50 values, indicating a more complete and continuous genome. Monitoring the consistency of GC content with the expected value of this species can further support the reliability of assembly (Zhang et al., 2021b). 3.3 Application of BUSCO, QUAST and other tools for completeness assessment The integrity and quality assessment of genomic assembly mainly rely on tools such as BUSCO and QUAST, which respectively provide indicators of biological and technical significance. BUSCO estimates the integrity and redundancy of the genome by evaluating nearly ubiquitous single-copy direct homologous genes, thereby gaining a deeper understanding of the integrity of genetic content. QUAST reports assembly statistics, including overlapping group counts, N50 values, and incorrect assembly rates, thereby enabling the detection of structural errors. Other tools such as CheckM2 utilize machine learning to predict integrity and contamination, especially for metagenomic assembled genomes. Combining these assessment methods can ensure that the assembled genome is structurally accurate and biologically complete, which is crucial for subsequent comparative genomics and functional analysis of extremist microorganisms (Manni et al., 2021; Chklovski et al., 2022). 4 Genome Annotation and Functional Analysis 4.1 Coding sequence prediction and structural annotation (Prokka, RAST, etc.) To understand the "genetic ledger" of an extremist microorganism, it is usually necessary to start with which coding regions it has. Often, researchers directly use toolkits like Prokka and RAST, which not only saves time but also facilitates standardization in the later stage. The operation interfaces of these programs may not be overly complicated, but in fact, they integrate multiple sets of gene prediction models and databases behind the scenes, capable of generating a complete set of annotation results at once, including rRNA, tRNA, and even hypothetical proteins. For instance, Prokka has been repeatedly used in various bacterial genomes. Stability and speed are its advantages (De Almeida et al., 2023). However, even if the tools are powerful, the "assumed proteins" automatically annotated still need to be guarded against - they often represent unknown functions, which is precisely the most attractive part of extremist microorganisms. 4.2 Functional annotation and database comparisons (COG, KEGG, Pfam) The list of genes derived from automatic annotation does not specify exactly what these genes do. This step still depends on the results of database comparisons, such as the commonly used COG, KEGG and Pfam. They respectively focus on different levels such as functional classification, metabolic networks, and domain recognition. When used, they are like jigsaw puzzles, filling in the blank Spaces of physiological processes with individual genes. Like KEGG pathway mapping, it can help people identify the node genes involved in key reactions, while the structural analysis of Pfam can reveal the conserved modules in proteins (Sohail et al., 2025). Interestingly, many times such comparisons will unearth some unique "module combinations" of extremist microorganisms, which are often functionally related to energy metabolism, nutrient utilization or environmental response. 4.3 Identification of special functional genes (salt tolerance, thermotolerance, heavy metal resistance, etc.) Not every extremist microorganism has "dramatic" stress resistance genes in its genome, but as long as it can survive in high-salt or high-temperature environments, those genes responsible for resistance functions are mostly not far away. Heat shock proteins, compatible solute synthases, metal transport pumps... These names may not look new, but once they appear in a certain strain in the form of family expansion or unique combinations, they are worth taking a second look. Comparative analysis can also reveal in which bacteria they are "standard" and in which they are "introduced from outside", which is crucial for understanding adaptation strategies (Srivastava et al., 2017; Wang et al., 2025). Sometimes, a metal-resistant protein is not just for "survival"; it may also become a "candidate star" in later biotechnology. 5 Comparative Genomic Analysis 5.1 Whole-genome alignment with related extremophilic bacteria Sometimes, identifying the "special features" of an extremist microorganism doesn't require it to speak for itself - a genomic comparison with its "relatives" can reveal where it has retained its traditions and where it has embarked

RkJQdWJsaXNoZXIy MjQ4ODYzNA==