CMB_2025v15n3

Computational Molecular Biology 2025, Vol.15, No.3, 112-121 http://bioscipublisher.com/index.php/cmb 116 visualizing features, the sequence fragments that the model cares about the most can be identified. A common practice is to calculate the contribution score of the input sequence to the output, such as using gradient methods or DeepLIFT, directly marking the bases that have the greatest impact on the prediction, and incidentally infer the signals that the model values (Xiao et al., 2025). The Transformer with an attention mechanism is more interesting. It can reveal the regions that the model focuses on from the attention weights and even infer the regulatory relationship between remote enhancers and promoters (Figure 2) (Liu et al., 2024). In actual analysis, the key motifs picked out by the model are often concentrated in open chromatin and overlap with the sites expressing quantitative traits, with quite clear biological significance. Through these interpretable tools, the predictive basis of the model becomes visualized, and the results are more convincing. Figure 2 Workflow of TF-EPI (Adopted from Liu et al., 2024) Image caption: (A) Cell type-specific EPI detection network structure. Generally, it includes four steps: tokenization, sequence embedding, feature extraction and classification. (B) The process of de novo motif discovery. (C) Model expansion for cross-cell type EPI detection. The Domain Discriminator is used during the model training process to determine whether the input data comes from the source cell line or the target cell line (Adopted from Liu et al., 2024) 5.3 Result verification and experimental data comparison It has become a consensus that no matter how reliable a prediction is, it must be backed by real data and experiments. Usually, the results of the model are compared with the experimental measurement values that did not participate in the training at all to see if they match. Some people have conducted large-scale reporter gene experiments in human studies to test the model's judgment on the impact of mutations on expression (Avsec et al., 2021). The performance of Enformer was quite accurate, almost consistent with the experiments. There are similar examples in the field of plants: Researchers will perform site-directed mutagenesis on the promoter elements that

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==