CMB_2025v15n6

Computational Molecular Biology 2025, Vol.15, No.6, 299-306 http://bioscipublisher.com/index.php/cmb 301 preference often reveal whether a gene "resembles a local". These methods suggest the possibility of horizontal transfer by looking for sequence patterns that are not in harmony with the host genome (Ravenhall et al., 2015). For instance, some methods do not even require the prior demarcation of codon intervals; they can identify exogenous genes merely based on nucleotide distribution patterns (Tsirigos & Rigoutsos, 2005). Compositional technologies usually run fast and are suitable for large-scale genomic data. However, for those old transfer events that have been "assimilated" during evolution, they are sometimes not easy to identify. 3.3 Machine learning and network models When dealing with increasingly large amounts of data, researchers often incorporate machine learning into the analysis of HGT. Its idea is not mysterious. It simply mixes genomic, environmental conditions and functional information together, allowing the model to search for which genes might come from where and to whom on its own. In recent years, methods based on knowledge graphs combined with graph neural networks have performed well in predicting HGT related to resistance diffusion (Islam et al., 2025), which can be regarded as a new trend in this type of research. However, there are also more "old-fashioned" but still practical processes, such as HGTector, which mainly determines which genes are not like the host's own by observing the distribution characteristics of BLAST results, and is not very sensitive to interferences such as gene deletion or changes in evolution rate (Zhu et al., 2014). The value of these tools lies in two aspects: on the one hand, they can support the processing of large-scale data; on the other hand, they also make it easier for us to understand the way genes flow within microbial communities. 4 Data Generation and Preprocessing for Soil HGT Studies 4.1 Metagenomic sampling strategies for soil microbiomes When conducting soil metagenomic sampling, one point that researchers usually confirm first is that the soil is actually very heterogeneous within. Microorganisms at different depths and locations may be completely different, so sampling is often carried out in layers and by regions. The rhizosphere, a place where microorganisms interact particularly frequently, is a "key point" that is almost never overlooked during sampling (Brito, 2021). Some teams also manipulate the timing, such as repeatedly sampling during the period after manure application, because manure may bring new plasmids and mobile genetic elements, causing a "small peak" of HGT in a short period of time (Macedo et al., 2022). These designs may seem complex, but in fact, their core purpose is only one: to capture as much as possible the newly occurring gene exchanges in the soil, as well as the traces left over from earlier periods. 4.2 Assembly and binning challenges in complex metagenomic datasets When dealing with soil metagenomes, assembly and box sorting are almost unavoidable challenges. There are too many and too diverse microorganisms in the soil. Coupled with similar sequences among closely related species, subtle differences in strains, and uneven sequencing coverage, it is often not as smooth as it sounds to piece together the genome completely and assign it to the corresponding taxonomic units (Song et al., 2019). Such uncertainties will directly affect the recognition of HGT, especially for those recent transfer events where genetic differences have not accumulated too much, which are easily masked by assembly errors or fragmented sequences. To reduce these problems, tools like MetaCHIP have been developed, which combine the best match and phylogenetic information, enabling the identification of potential HGTS at the community scale without referring to the genome (Figure 1). 4.3 Functional and taxonomic annotation pipelines supporting HGT discovery After the sequence assembly is completed, whether the transferred genes can be truly singled out actually depends more on the subsequent functions and classification annotations. Researchers usually compare these sequences with well-organized databases to see what their functions are and which types of microorganisms they might belong to. This not only helps to identify those fragments carrying resistance genes or mobile elements, but also links gene transfer with its role in the ecosystem. Automated processes such as nf-core/hgtseq unify sequencing data from different sources, making subsequent HGT analysis easier to compare (Carpanzano et al., 2022). The clearer the annotations are, the easier it is to understand how genes move within the community and what impact they have on functions.

RkJQdWJsaXNoZXIy MjQ4ODYzNA==