Computational Molecular Biology 2016, Vol.6, No.3, 1-6
4
researchers use kernel methods along with SVM. Sebastien et al. (2008) investigated how the SVM can predict
HIV-1 coreceptor usage when it is equipped with an appropriate string kernel. The data mining learning models
and algorithms are also helpful in disease diagnosis, treatment and targeting potential drugs. Andreeva (2008)
worked in this area. This is really a breakthrough idea for biomedical scientists for getting the machine learning
models and algorithms that can be used for medical applications.
Prosperi et al. (2009) studied the associations of the whole HIV-1 envelope genetic features and clinical markers
with viral tropism. Bootstrapped hierarchical clustering was used to assess mutational co variation. Different
machine learning method i.e. logistic regression, SVM, decision trees, rule based reasoning and feature selection
method along with loss functions (accuracy, ROC curves, and f-measure) were applied and compared for the
classification of X4 variants. The logistic regression model was developed with 92.7% accuracy. Rao (2009)
studied machine learning approaches including SVM, K-Nearest neighbour (K-NN), artificial neural network
(ANN) and logistic regression (LR), are applied for classification of HIV-1 protease inhibitors from molecular
structure. SVM proves better generalization ability and can be used as an alternative fast filters in the virtual
screening of large chemical databases.
Ozyilmaz (2009) studied the features of HIV 1 genome by the statistical data of R5X4, R5 and X4 viruses which
was analyzed by using signal processing methods and ANNs. The results indicate that R5X4 viruses successfully
classified with high sensitivity and specificity values training and testing ROC analysis for RBF, which gives the
best performance among ANN structures. Blair et al. (2009) demonstrated a synergistic combination of NMR
spectroscopy, denovo structure prediction, and X-ray crystallography in an effective overall strategy for rapidly
determining the structure of the coat protein C-terminal domain from the Sulfolobus islandicus rod-shaped virus
(SIRV). This approach takes advantage of the most accessible aspects of each structural technique and may be
widely applicable for structure determination.
Singh and Mars (2010) proposed support vector machine classification model to predict the degree of CD4 count
change in HIV-1 positive patients with parameters genotype, viral load and time. The model produced the
accuracy of 83%. Again in 2011 they showed mathematically, forecast a change in CD4 count using machine
learning without genome data. That neural network predicts virological response in HIV positive patients with
73% accuracy. These analyses clearly show that SVM is relatively good for analyzing biological data due to its
high dimensionality.
Maurizio et al. (2012) proposed machine learning approaches to establish data driven engines able to indicate the
most effective treatments for any patient and virus combination. As the biological data is huge and difficult to
manage there is a need to mine the data for further analysis. To overcome this problem data mining/machine
learning algorithms have been developed.
Now in present era scientists working on HIV-AIDS are trying to develop vaccine for eradication of HIV. And
machine learning techniques seem to fulfill such promise. Machine Learning used to create an HIV vaccine by
cocktail use of epitopes. The dataset was so enormous that a novel approach was adopted. The task is at hand to
look at the genotype of a controller and compare it with epitope: a short chain of proteins, in the virus they carry.
The machine learning is able to manage whittle down all the data to a list of the first six epitopes that have the
desired dormant mutation property. The vaccine consists of cocktail of such epitopes. However it requires tricky
epitopes for successful formulation of vaccines. If the vaccine passes clinical trials, it could be reached mass till
2017 said Mike Szczys.
Machine learning methods are also used to identify and model associations between antibody features (IgG
subclass and antigen specificity) and effectors function activity. These antibody features qualitatively and
quantitatively useful in classification and regression, provides a new objective approach to discovering and
assessing immune correlations (Choi et al., 2015).