Computational Molecular Biology 2025, Vol.15, No.2, 102-111 http://bioscipublisher.com/index.php/cmb 10 4 the target sequence (Sun, 2023). Due to its mature algorithm and extensive database, BLAST has certain advantages in the initial screening of potential off-target sites. However, BLAST was not specifically designed for CRISPR, and its results do not adequately consider the weights of PAM sequences and mismatch positions, which can easily lead to false positives or false negatives. Therefore, it is more suitable as an early exploration or rough screening tool. 3.2 Application of sequence alignment tools such as Bowtie in off-target prediction Bowtie is a fast short sequence alignment tool that can achieve efficient matching in large-scale data (Guo and Zhen, 2020). Compared with BLAST, Bowtie has more advantages in terms of speed and memory utilization, and thus is widely used in high-throughput off-target prediction research. Researchers usually input sgRNA sequences as query sequences, search for similar sites in the genome through Bowtie, and screen potential off-target sites in combination with PAM sequences. In addition, tools such as BWA are also used for comparative prediction. Their advantage lies in the flexible setting of mismatch tolerance. 3.3 Advantages and limitations of sequence alignment method The main advantages of sequence alignment methods lie in their simple principle, rapid calculation, and convenient operation, making them suitable for the preliminary screening of potential off-target sites. However, its limitations are also quite obvious: it is unable to accurately distinguish the impact of different mismatch positions on cutting efficiency; Epigenetic factors such as chromatin opening degree and DNA methylation were not considered; Lack of experimental data-driven, limited prediction accuracy (Zhang and Jiang, 2022). Therefore, sequence alignment methods are usually combined with other computational methods to establish a more comprehensive prediction system. 4 Prediction Methods Based on Rules and Machine Learning 4.1 Rule-based off-target scoring algorithms (such as MIT algorithms, etc.) The MIT algorithm is one of the earlier proposed rule-based off-target prediction methods. Its core idea is to assign different weights based on the different influences of mismatch positions on cutting efficiency. It is generally believed that mismatches closer to the PAM end have a greater impact on cutting activity, while mismatches farther from the PAM have a higher tolerance (Chao and Fei, 2023). This method is concise and intuitive, and combines certain experimental rules, thus it was widely used in the early days. However, its weight parameters rely on limited experimental data and lack universality (Fan and Xu, 2021). 4.2 CFD off-target scoring model and optimization The cutting frequency determination (CFD) model has been improved on the basis of the MIT algorithm, integrating more experimental data, especially the influence of different base mismatch types and positions. By statistically analyzing a large number of experimental results, the CFD model can assign more reasonable probability values to each mismatch combination, thereby predicting the off-target risk more accurately. Currently, CFD has been integrated into multiple CRISPR design platforms, such as CRISPOR and CRISPR-DO, and serves researchers widely. 4.3 Application of machine learning models in off-target prediction With the accumulation of data and the development of algorithms, researchers began to introduce machine learning methods for off-target prediction (Anuradha et al., 2024). Common models include support vector machine (SVM), random forest (RF), and logistic regression, etc. These methods can learn patterns from a large amount of known off-target and non-off-target data and establish classifiers to determine whether new sequences are potential off-target sites (Figure 1) (Toufikuzzaman et al., 2024). Compared with traditional rule-based methods, machine learning models can consider multiple features simultaneously, such as the number of mismatches, distribution, GC content, PAM sequence, etc., thereby improving the accuracy of prediction. However, machine learning methods rely on high-quality training data and may have overfitting problems (Choubisa, 2024).
RkJQdWJsaXNoZXIy MjQ4ODYzNA==