CMB_2025v15n2

Computational Molecular Biology 2025, Vol.15, No.2, 102-111 http://bioscipublisher.com/index.php/cmb 10 3 Representative tools such as BLAST and Bowtie can search for loci similar to the target sequence through genome-wide alignment. With the deepening of research, rule-based and machine learning-based methods have gradually emerged, such as MIT scoring and CFD models. These methods not only consider the location and quantity of base mismatches but also introduce statistical patterns and experimental data for modeling. In recent years, with the development of artificial intelligence, deep learning frameworks have been introduced into the field of off-target prediction. Representative models such as DeepCRISPR, R-CRISPR and DL-CRISPR can improve the accuracy and universality of prediction through automatic feature extraction and learning from large-scale training data (Sherkatghanad et al., 2023). In addition to the progress of the predictive model itself, how to evaluate its performance and reliability is equally crucial. Common evaluation indicators include ROC curve, AUC value, accuracy rate and recall rate, etc. Experimental verification methods such as GUIDE-seq and high-throughput sequencing provide solid support for computational prediction. By combining computational and experimental methods, scholars have gradually established a relatively complete off-target assessment system. 2 Introduction to the CRISPR/Cas System and Off-Target Effect Mechanism 2.1 Overview of CRISPR/Cas system The CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) system was initially discovered in bacteria and archaea as an acquired immune mechanism. Its core function is to resist the invasion of viruses and plasmids by identifying and cutting exogenous invading DNA. As research deepened, scientists transformed it into a programmable gene editing tool. Among them, the Cas9 protein, which is the most widely used, can be cleaved at specific positions in the genome under the guidance of single-guide RNA (sgRNA), thereby achieving knockout, insertion or modification of the target gene (Yuan, 2024). CRISPR/Cas9 has demonstrated highly efficient editing capabilities in crops such as rice, wheat, and corn, as well as in human cells and model animals. Therefore, it holds broad application prospects in molecular breeding, disease model construction, and gene therapy. In addition, new CRISPR-derived systems (such as Cas12a/Cpf1, Cas13, etc.) are constantly being discovered and utilized, providing researchers with more diverse means of gene manipulation. 2.2 Definition and generation mechanism of off-target effects Off-target effect refers to the cleavage or regulation of the CRISPR/Cas system at non-target sites, resulting in unexpected alterations in the genome. Its generation mechanism mainly includes the following aspects: Base mismatch tolerance: The Cas9-sgRNA complex is not absolutely strict with the target sequence. Sometimes, a small amount of mismatch is allowed, especially in the non-PAM proximal region, thereby triggering non-specific cleavage. PAM sequence diversity: Although SpCas9 most commonly recognizes NGG PAM, it can also recognize approximate sequences such as NAG and NGA, which increases the number of potential off-target sites. Genomic complexity: In large-genome species, there are numerous fragments partially similar to the target sequence, further intensifying the risk of off-target. Chromatin state and accessibility: Open chromatin regions are more easily recognized by Cas9, and cleavage may occur even in the presence of base mismatches. 2.3 The importance of off-target effect prediction research Off-target effects not only affect the accuracy of experiments but may also bring serious safety hazards in medical and agricultural applications. In human cell gene therapy, non-targeted cleavage may lead to the activation of oncogenes or the inactivation of tumor suppressor genes. In crop breeding, off-target effects may cause unexpected trait changes and affect product safety. Therefore, in order to reduce risks, it is very necessary to establish efficient and accurate calculation and prediction methods. Through computational prediction, researchers can identify potential off-target sites before experiments and optimize the design of sgRNA, thereby reducing non-specific effects at the source. 3 Sequence Alignment Type Prediction Methods 3.1 Principles and applications of BLAST algorithm BLAST (Basic Local Alignment Search Tool) is one of the earliest tools used for off-target prediction. The basic principle is to locally compare the sgRNA sequence with the reference genome to search for fragments similar to

RkJQdWJsaXNoZXIy MjQ4ODYzNA==