MP_2025v16n4

Molecular Pathogens, 2025, Vol.16, No.4, 147-158 http://microbescipublisher.com/index.php/mp 153 6 High-throughput Sequencing (HTS) and Metagenomics 6.1 The advantages of HTS in the identification of unknown viruses and data processing flow High-throughput sequencing (HTS, also known as next-generation sequencing) is a technology that can determine millions of nucleic acid sequences in parallel. Applying it to nucleic acid analysis of plant samples can detect all viral sequences present without prior knowledge of the virus species. This makes HTS a powerful tool for discovering new and unknown viruses. By meta-transcriptionome sequencing of total RNA from potato samples, abnormal sequences can be found in the data, thereby identifying previously unreported viruses (Kenzhebekova et al., 2024). Unlike traditional PCR or serum methods that can only detect known viruses, HTS is able to capture any viral sequence in a sample "unbiased". In recent years, a variety of new potato viruses and virus-like viruses have been identified at home and abroad using HTS technology. For example, researchers discovered a new type of potato Y virus from papaya samples in Jordan through HTS, and whole genome sequencing confirmed it to be a new virus. In Yunnan, China, some people also use HTS to test new varieties of potatoes for viruses. It was found that they are latently infected with two emerging viruses, namely tomato spotted wilt virus (TSWV) and tomato sporadic mottled virus (TZSV), which have not been reported on potatoes in the past (Pacheco-Dorantes et al., 2025). These findings show that HTS can break through the limitations of traditional detection and expand viral spectrum analysis to unknown areas. The general process for HTS identification of viruses includes: sample total or small RNA extraction, library sequencing, bioinformatics analysis and sequence alignment annotation. Among them, biological information analysis is a key link. By splicing the massive reads data obtained from sequencing into a longer contig sequence, and then comparing it with known virus databases to identify the sequence of virus origin. 6.2 Bioinformatics analysis methods and database construction The amount of data generated by high-throughput sequencing is extremely large and requires in-depth exploration with the help of bioinformatics tools. For potato metagenome sequencing data, commonly used analysis processes include: data quality control, host sequence removal, virus sequence assembly and annotation, etc. Low-quality and linker sequences in sequencing reads are removed by quality control software such as Trimmomatic (Lambert et al., 2018). Then the clean reads are aligned with the potato reference genome, and the reads derived from plants are filtered out, so that the retained non-host reads may contain viral sequences. Next, use de novo assembly software (such as SPAdes, Megahit) to splice these reads into pieces. The resulting contigs sequence is then compared with the virus database. BLASTn or BLASTx can be used to compare it with the NCBI virus genome library to find similar sequences (Zhang et al., 2025). Generally speaking, if contig has high homology (e.g. >90%) to a known viral genome and covers most of its sequence, it can be judged that the virus exists in the sample. If there is a contig without obvious homolog, it may be a new viral sequence. Analysis of new viruses can further predict its open code reading frame and conservative motifs to determine the classification status. In order to improve the sensitivity of virus identification, some special software has been developed in recent years, such as VirusDetect, VirFinder, etc., to use k-mer features and machine learning to identify virus sequences from complex data. 6.3 Case analysis: using hts to discover latent viruses in new potato varieties in Yunnan, China Yunnan is located on the plateau with a diverse ecological environment and is a region with rich potato germplasm resources. When a breeding unit was promoting a new high-yield potato variety, it found that its growth performance in the field was poor but no obvious symptoms were seen. In order to check whether there is any latent virus infection, researchers conducted high-throughput sequencing analysis on asymptomatic plant samples of this variety. Through total RNA sequencing and bioinformatic analysis, two important foreign virus sequences were identified in the samples: one belongs to the genus Tospovirus (Tomato spotted virus (TSWV), and the other is Tomato zonate spot virus (TZSV). These two viruses have mainly infected Solanaceae crops in the past. They are not common in potatoes in Yunnan, but because of their transmission through thrips, potatoes may be transmitted in a mixed environment (Figure 2) (Yang et al., 2023; Dong et al., 2024). Further qPCR tests confirmed that nucleic acids with TSWV and TZSV were found in multiple samples of the new variety, but the plants did not show typical symptoms and were latent infections. Without HTS, these hidden mixed infections are

RkJQdWJsaXNoZXIy MjQ4ODYzNA==