Bt_2025v16n4

Bt Research 2025, Vol.16, No.4, 136-146 http://microbescipublisher.com/index.php/bt 138 Antonets and others used SPAdes to assemble Illumina sequencing data and constructed a genome sketch of a Bt strain of about 6.5 Mb, which contained chromosomes and several virulence plasmids. For PacBio or Nanopore long read data, assembly tools such as Canu, Flye or HGAP can be used, which can produce longer contigs using the overlap graph algorithm, and can often obtain closed circular genomic sequences. In hybrid assembly scenarios, software such as Unicycler can process short read and long read data at the same time, producing high-quality assembly results. Once a preliminary assembly is obtained, its completeness and accuracy need to be evaluated. Common evaluation tools include QUAST (comparing assembly with reference or reads comparison results to measure N50, missing and other indicators) and CheckM (based on single-copy core gene detection genome integrity). 3.2 Functional gene annotation platform After genome assembly is completed, the sequence needs to be functionally annotated to identify the encoding gene and its functions. Bt genome annotation often relies on a comprehensive automatic annotation platform or software pipeline. For example, the Prokaryotic Genome Annotation Pipeline (PGAP) provided by NCBI can automatically annotate the submitted Bt genome, identify coding sequences (CDS), tRNA/rRNA genes, and alignify known protein libraries to predict gene function. Another commonly used platform is the RAST annotation server, which contains an automatic annotation process suitable for bacterial genomes, which can quickly generate results such as gene functional category classification, metabolic pathway reconstruction, etc. for the Bt genome. During the annotation process, the important functional genes of Bt should be compared in combination with professional databases. For example, predicted virulence-related protein sequences can be searched in the Bt Toxin Protein Database (BPPRC) to confirm whether they belong to the known Cry/Cyt/Vip protein family; potential virulence and drug-resistant genes can be queried in VFDB (Bacterial Virulence Factor Database) and CARD (Drug Resistance Gene Database) to identify the virulence factor profile of the strain. For metabolites, antiSMASH and other tools can be used to scan the Bt genome, and secondary metabolic gene clusters such as glycan peptides, non-ribosomal peptides, proteotoxins, etc. can be automatically identified and classified annotated. 3.3 Automatic identification tool for gene families and characteristic genes There are some important gene families and characteristic genes in the Bt genome, and rapid identification of these characteristic genes is very important for evaluating strain virility. Traditional methods rely on manual BLAST comparison, and specialized automated mining tools have been developed. A typical example is the BtToxin_Digger tool developed by Chinese scientific researchers. This tool has a built-in Bt insecticidal toxin gene database, which can mine Bt genome or original sequencing data, identify the toxin gene sequences such as cry, cyt, and vip, and predict their classification subclasses (Liu et al., 2020). Liu et al. used BtToxin_Digger to automatically identify tens of thousands of toxin protein-encoded genes from thousands of Bt original genomic data, including many new candidate Cry toxins. In addition to toxin genes, other functional gene families in the Bt genome can also be identified with the help of bioinformatics tools. For example, homologous gene analysis tools such as OrthoFinder can be used to cluster the genomes of multiple Bt strains to discover shared or unique gene families. Through this whole genome comparison, Shikov et al. found that some classic virulence genes (such as extracellular hemolysin, intrinsic protein Inh, etc.) exist conservatively in almost all Bt strains, while certain regulatory factors or metabolic enzyme genes only exist in specific evolutionary branches (Shikov et al., 2020). For virulence factors, existing databases can also be used for scanning. For example, the VFDB database contains a variety of bacterial virulence-related gene sequences. Submit the Bt genome to VFDB comparison, and you can sift out the list of all known virulence genes in it in one go. 4 Gene Expression and Transcriptome Analysis Tools 4.1 RNA-Seq data analysis process and software In order to study the expression and regulation of Bt genes, RNA sequencing (RNA-Seq) technology has been widely used in Bt transcriptome analysis. Typical Bt RNA-Seq data analysis procedures include: reading quality control, alignment to the reference genome, transcription assembly and quantification, and differential expression

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==