Computational Molecular Biology 2025, Vol.15, No.3, 141-150 http://bioscipublisher.com/index.php/cmb 148 weight of the model corresponds to. This lack of interpretability makes it difficult for model predictions to gain full trust from biologists. Furthermore, some of the patterns captured by the model may merely be the correlations of the training data rather than the true causal biological mechanisms, posing a risk of overfitting data bias (Shahid, 2023). Therefore, enhancing the transparency and interpretability of the model's prediction results and verifying their consistency with known biological laws will be a significant challenge in the future. Figure 2 Performance of the model trained on S2648 (pink) dataset and performance of the model trained on MegaTrain (blue) dataset. Same subsets of validation datasets (Adopted from Pak et al., 2023) 6.3 Computational cost and resource constraints Training and deploying large pre-trained models require huge computing resources and specialized hardware. Training Transformer models with hundreds of millions of sequences not only takes weeks to months but also consumes a large amount of electrical energy. For ordinary research teams, there are resource bottlenecks in
RkJQdWJsaXNoZXIy MjQ4ODYzNA==