Computational Molecular Biology 2025, Vol.15, No.6, 273-281 http://bioscipublisher.com/index.php/cmb 274 2 Sample Collection and Sequencing Strategies 2.1 Sampling from extreme environments and bacterial isolation These "durable" microorganisms can not be found everywhere. Most of the true extreme bacteria grow in places like hot springs, salt lakes and acidic mines - the conditions are harsh, but they just survive well. However, extracting bacteria from these environments is not as simple as just scooping up a ladle of water. We have to find a way to restore them to their original growth state; otherwise, they will "die" as soon as they enter the laboratory. Often, it is necessary to carefully cultivate under simulated native conditions in order to select the truly tolerant batch. Although it takes a lot of effort, only in this way can pure and stable strains be obtained for subsequent genomic analysis. These operations also lay the foundation for us to understand how they withstand extreme coercion (Verma et al., 2024). 2.2 DNA extraction and selection of sequencing platforms (Illumina, Nanopore, PacBio, etc.) Extracting DNA from these microorganisms is easier said than done. For instance, if the cell wall is particularly hard or the sample contains some strange environmental impurities, conventional methods may not work. After successfully extracting high-quality DNA, the next step is to select a sequencing platform. The combination of short-read high-precision Illumina and long-read ONT or PacBio with strong coverage is currently the most common hybrid strategy. Especially when dealing with samples with unstable GC content or many repetitive sequences, using only one platform often yields mediocre results. Typically, the research will first use long read length (ONT or PacBio) for preliminary assembly, and then use Illumina for fine-tuning. The overall effect is stable and a considerable amount of budget is saved (Goldstein et al., 2018; Neal-McKinney et al., 2021). 2.3 Data quality control and preprocessing approaches After the measurement, the data cannot be used directly. Quality control is the crucial step next. For instance, first, the connectors need to be removed, low-quality reads filtered out, and contaminated fragments mixed in filtered out. All these tasks must be done thoroughly one by one. Otherwise, errors are likely to occur during the subsequent assembly. Especially when multiple sequencing platforms are used in combination, it is necessary to carefully examine the error rate and coverage distribution. Some long-read platforms themselves have insertion or missing issues. Using short-read high-fidelity data for error correction is one of the common operations. Nowadays, most processes can be automated. Basically, from raw data to available assembly data, the entire set of preprocessing can be seamlessly connected. This is particularly important for the research object of extremist microorganisms (De Maio et al., 2019; Olagoke et al., 2025). 3 Genome Assembly and Quality Assessment 3.1 Genome assembly strategies (de novo, hybrid, etc.) The genome assembly of extremist microorganisms usually adopts a hybrid strategy, combining short-read sequencing (such as Illumina) with long-read sequencing platforms (such as Oxford Nanopore Technologies (ONT) or PacBio). Hybrid assembly tools, especially Unicycler, combine the high precision of short reads with the long-distance continuity of long reads, thereby generating more complete and continuous genomes than de novo assembly based solely on a single technology. This method can effectively analyze the common complex genomic regions, repetitive sequences and structural variations in extremist microorganisms, thereby achieving chromosome-level assembly and improving the accuracy and completeness of the assembly. Studies comparing different strategies have shown that hybrid assembly is superior to pure short read or long read assembly in terms of continuity and genetic integrity, while also balancing sequencing costs and DNA initiation quantity requirements (Wick et al., 2017; Chen et al., 2020). 3.2 Evaluation of contigs/scaffolds and statistics on N50 and GC content Assembly quality is usually evaluated using indicators such as the number of overlapping groups or scaffolds, N50 values, and the distribution of GC content. A higher N50 value indicates greater continuity, reflecting longer assembly sequences and better representing genomic structure. The genomes of extremely thermophilic bacteria usually exhibit different GC contents, which poses challenges to assembly algorithms; Therefore, evaluating the GC content is helpful for verifying the accuracy of assembly and detecting potential deviations. Compared with
RkJQdWJsaXNoZXIy MjQ4ODYzNA==