CMB_2025v15n4

Computational Molecular Biology 2025, Vol.15, No.4, 183-192 http://bioscipublisher.com/index.php/cmb 186 3.3 Key issues in data preprocessing and quality control Before starting to analyze the spatial transcriptome data, the raw data must be "cleaned up" first; otherwise, no matter how ingenious the model is later, it will be in vain. First, align the captured points with the tissue image. Points that are not within the tissue range or areas with strange signals should be removed. Those points with too few detected genes and high noise levels had better not be retained either. Next, normalization is needed to bring the sequencing depths at different positions back to the same level. The quality control step is more like a physical examination. Usually, it checks how many genes can be detected at each point to judge the overall quality. If a capture point is mixed with multiple cells, single-cell sequencing data still needs to be used to "split" and infer the proportion of each cell. When merging data from different batches or slices, it is also important not to forget to correct the batch effect. Only when these basic tasks are in place can the subsequent analysis be considered solid (Zeira et al., 2022; Hamel et al., 2023). 4 Computational Modeling Framework for Spatial Transcriptome Data 4.1 Model construction idea: from high-dimensional expression matrices to spatial structure reconstruction When conducting spatial transcriptional modeling, the key is not only to analyze gene expression, but also to link these expressions with their "coordinates" in the tissue. In other words, each spatial point not only has a string of high-dimensional genetic data, but also has its own location and neighbors. Researchers usually first perform dimensionality reduction, compressing those redundant signals into more core features, and then consider how these points are spatially adjacent. Some people might regard these points as nodes on a graph, with edges representing proximity relationships, and then use algorithms to find regions that have similar expression patterns and are geographically close (Figure 2) (Dong and Zhang, 2021). In this way, the internal structure of the organization, especially the spatial pattern in the tumor microenvironment, can be gradually reconstructed (Lei et al., 2024). The entire process is like restoring a shuffled biological "map". 4.2 Model algorithms based on graph theory and spatial statistics When analyzing spatial transcriptome data, there are roughly two common approaches: one relies on graph theory, and the other leans towards spatial statistics. The former likes to view data as a network graph - each spatial position is a node, and points that are close to each other are connected to form edges. In this way, various graph algorithms can be used to identify which regions have aggregated or associated patterns. Some studies simply apply graph convolutional networks to extract more complex features on this structure (Hu et al., 2021). The approach of spatial statistics is somewhat different. It places more emphasis on measuring "correlation" - for instance, using the Moran index to observe the spatial autocorrelation of gene expression, or employing Gaussian processes to depict the trend of expression varying with position (Lin et al., 2022). Both methods have their own strengths. If the structural sense of the graph and the quantitative ability of statistics can be combined, it is often possible to depict the spatial picture of the tumor microenvironment more realistically. 4.3 Comprehensive modeling method integrating multi-omics information To truly understand the tumor microenvironment, relying solely on the spatial transcriptome is not enough. The current approach is more inclined towards "jigsaw puzzle" modeling, blending different omics data together. For instance, some people would use the results of single-cell RNA sequencing to label the cell types of spatial data, or use deconvolution methods to embed high-resolution cell information back into the spatial matrix. Some people prefer to view spatial transcriptome and proteome or imaging data together - such as comparing the protein map of multiple immunofluorescence with the distribution of gene expression for analysis, thereby verifying some key signals (Yang et al., 2025). Although this approach is complex, it can make up for the blind spots of a single data point. The fusion of multi-omics can reveal the structure and function of tumors from different levels, and also make the model more stable and more explanatory (Zhang et al., 2025). 5 Pattern Recognition and Functional Analysis of Spatial transcriptome Data 5.1 Cell subpopulation identification and spatial clustering algorithm When conducting pattern recognition, researchers usually start by identifying cell subpopulations and then look at how they are spatially distributed. The most common approach is unsupervised clustering, which divides spatial

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==