CMB_2025v15n3

Computational Molecular Biology 2025, Vol.15, No.3, 122-130 http://bioscipublisher.com/index.php/cmb 124 research, including DNA sequence variation information obtained from whole genome sequencing (WGS), whole exome sequencing (WES), etc. For single-gene hereditary diseases, genomic variations are often the root cause of the disease. This database collects confirmed pathogenic variants of patients, candidate variant lists, and raw sequencing data (such as FASTQ/BAM), etc. If the patient is from an existing database such as ClinVar, we also record their variant pathogenicity interpretation (Krawitz and Haack, 2023). Genomic data serves as the foundation for diagnosis. For instance, patients with Duchenne muscular dystrophy (DMD) often have large deletions or frameshift mutations in the DMDgene. Collecting such information is helpful for diagnosis and the design of gene therapy. We named variations using the HGVS standard and linked them to genomic coordinates (such as GRCh38) to facilitate cross-study comparisons (Denton et al., 2021). Figure 1 The diagram emphasises the potential of studies which, following careful phenotyping at study conception, utilise integrated multi-omic analysis to consider multiple components in the journey from DNA to expression (Adopted from Kerr et al., 2020) Transcriptomics data mainly come from the results of RNA sequencing (RNA-SEq). The transcriptome reflects the activity level of genes in cells and is of great value for understanding the molecular mechanisms of diseases and discovering diagnostic markers. For instance, in the muscle tissues of DMD patients, a large number of genes related to inflammation and fibrosis are abnormally expressed (Lembo et al., 2024). This database will include the transcriptional expression profiles of patient tissues/cells, which may be stored in standardized forms such as FPKM and TPM. For key genes, we also store quantitative PCR verification data. If a patient has multiple samplings (such as before and after treatment), we classify their transcriptome by time or condition to support dynamic analysis. Proteomics data refers to information on protein expression levels, post-translational modifications, and protein-protein interactions determined by methods such as mass spectrometry. Proteins are functional performers, and their states are often more directly related to phenotypes. Many diagnostic markers for rare diseases are proteins. For instance, amyotrophic lateral sclerosis has serum neurofilament light chain protein indicators, etc. Our database will integrate quantitative protein data generated from literature and experiments, such as a list of differential protein expression between patients with a certain disease and controls. In addition, it is planned to incorporate protein-protein interaction networks to demonstrate the position of pathogenic proteins in cellular pathways. For instance, the role of DMD protein (dystrophin) in the muscle cell membrane complex and the series of downstream protein changes triggered by its absence can be revealed through integrated proteome data. Proteomic data can provide a bridge for multi-omics associations: sometimes gene mutations do not change the mRNA level but affect protein stability, which can be reflected in the proteome (Lochmüller et al., 2018). 3.2 Ethical considerations for rare disease data and data sharing framework Rare disease data usually involves patients' genetic and health information. Ethical, legal and social factors must be fully considered in data sharing and use. We have formulated a series of policies and measures in the database

RkJQdWJsaXNoZXIy MjQ4ODYzNA==