Computational Molecular Biology 2016, Vol.6, No.3, 1-6
3
role in different types of cancer, for instance. Breast cancer is often caused by an error in the production of RNA.
This regulatory role makes microRNA very interesting source as a drug target. Millar (2006) explored some of the
similarities and differences between the miRNA’s systems of plants and animals and examine whether they are
fundamentally different or simply variations of a theme. This gives insight to study miRNA and its biological
importance. Similar work was also done by Pant et al. (2009) support vector machine for the classification of plant
and animal miRNA’s. Looking into the importance of miRNA, RNAi (RNA interference) come into existence.
Aagaard and Rossi (2007) studied about RNAi importance with respect to its therapeutics and shows it would be
next biological source for treating diseases. Söllner and Mayer (2006) studied machine learning approaches for
prediction of linear B-cell epitopes on proteins. The approach combines several parameters previously associated
with antigenicity, and includes novel parameters based on frequencies of amino acids and amino acid
neighborhood propensities. Machine learning classifiers clearly outperform the reference classification systems on
the HIV epitope validation set.
Hallett and his co-workers (2006) studied the prediction of subcellular localization of viral proteins within a
mammalian host cell. PSLT predictor which considers the combinatorial presence of domains and targeting signals
in human proteins to predict localization. This localization of proteins greatly helps to identify signature proteins
for HIV drug target sites. Song and Shi (2010) jointly using K-Nearest Neighbor Classifier, and test on a known
dataset which includes 317 apoptosis proteins, the total prediction accuracy of the method are 88.3%. These
results indicate that the composition of dipeptide categories combined with K-Nearest Neighbor Classifier is very
useful for predicting subcellular location of apoptosis proteins. Harrison and Langdale (2006) studied both amino
acid and nucleotide data to generate a phylogeny by distance based methods and likelihood methods and the
results were further analyzed by Bayesian algorithm. Thus, using the DNA data to generate the alignments is very
likely to lead to alignments that sometimes do not reflect the actual mutational history. The protein sequence is
under selective constraint for protein function and protein structure, and these are conserved over much longer
periods than the individual codon choices, hence amino acid sequences are important to study phylogeny.
Prosperi (2009) studied different machine learning and feature selection methods for the classification of HIV
treatment, the success based on viral genotype, therapy, and derived input features. HIV positive persons have low
CD4 count and somehow retinal damage and visual field defects which were proposed by Kozak and Sample
(2007) by Support vector machine and relevance vector machine (RVM), which were sufficiently sensitive to
distinguish these eyes from normal eyes. Nanni and his team (2009) proposed Protein classification combining
surface analysis and primary structure of proteins. Emily et al. (2007) proposed a hybrid prediction method for
Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural
homology approach. The SVM model comprises a number of binary classifiers, in which biological features
derived from Gram-negative bacteria translocation pathways are incorporated and structural homology shows the
common amino acids of these bacteria.
G-protein coupled receptors (GPCRs) the seven-transmembrane domain comprise the largest family of proteins
targeted by drug discovery. Together with structures of the prototypical GPCR rhodopsin, solved structures of
other liganded GPCRs promise to provide insights into the structural basis of the super family’s biochemical
functions and assist in the development of new therapeutic modalities and drugs. Neberg (2007) proposed
evolutionary analysis of GPCR by DNA extraction methods. Evolutionary data from both sequenced genomes and
targeted retrieved orthologs are increasingly used as a source of structural information. Recent success in
sequencing and functionally expressing GPCRs from fossils opens the possibility of studying signaling pathways
even in extinct species.
Steffen et al. (2008) predicted the outcome of a therapy attempt for a patient who carries an HIV with a set of
observed genetic properties; such predictions need to be made for hundreds of possible combinations of drugs,
which use similar biochemical mechanisms. In this paired t-test, distribution matching is significantly better than
reference methods. As significance of machine learning techniques like support vector machine increases