Genomics and Applied Biology 2017, Vol.8, No.5, 30-48
40
citation. The title is “The Sequence Alignment/Map format and SAMtools”. It’s also the literature of high
centrality. SAM format is a text format for storing and reading gene sequence data. It can support short sequence
reading and long sequence reading (up to 128Mbp) generated by different sequencing platforms. In addition, the
most recently published high-centrality literature in 2011 is “Molecular Evolutionary Genetics Analysis Using
Maximum Likelihood
,
Evolutionary Distance
,
and Maximum Parsimony Methods”. The comparative analysis of
molecular sequence data plays a vital role in reproducing the evolutionary history of species and inferring the
impacts of natural selection on the creation of genes and species evolution. This document has released the latest
version of MEGA (Molecular Evolutionary Genetics Analysis) software MEGA5, which introduces the maximum
likelihood rate algorithm to deduce the evolutionary tree, select the best alternative model (nucleotide or amino
acid), deduce the ancestral sequence and state (and probability), and estimate the rate of evolution. In computer
simulation and analysis, the maximum likelihood rate algorithm adopted by MEGA5 is better than other softwares
in inferring phylogenetic trees and replacing parameters. This version supports Windows, Mac OS X and Linux
systems, available at
free of charge.
Table 13 The top 10 most cited references (1985-2016)
Rank
Frequency Author
Year
Source
Volume
Page
Cluster
1
851
Lander ES
2001
Nature
V409
P860
0a
2
797
Altschul SF
1997
Nucleic Acids Research
V25
P3389
0a
3
659
Li H
2009
Bioinformatics
V25
P2078
1b
、
3c
4
629
Venter JC
2001
Science
V291
P1304
0a
5
523
Li H
2009
Bio-informatics
V25
P1754
1b
、
3c
6
455
Tamura K
2011
Molecular Biology and Evolution
V28
P2731
2b
、
5c
7
435
Ashburner M
2000
Nature Genetics
V25
P25
0a
8
432
Wang Z
2009
Nature Reviews Genetics
V10
P57
0b
、
2c
9
412
Kaul S
2000
Nature
V408
P796
0a
10
380
Berman HM
2000
Nucleic Acids Research
V28
P235
0a
Note: a refers to the cluster in the reference co-citation network (1985-2009); b refers to the cluster in the reference co-citation
network (2010-2014); c refers to the cluster in the reference co-citation network (2015-2016)
The references with strong citation bursts of 1985-2009 and 2010-2014 are shown below (Figure 14; Figure 15).
There is no citation burst in 2015 or 2016. The reference with strong citation burst refers to the sudden increase in
the cited frequency of the reference at a time point or time period, so it contains two dimensions: the burst
strength and the bursting time. The reference with the highest burst strength is “Altschul SF-1997” with the value
of 74.7129. The title is “Gapped BLAST and PSI-BLAST
:
a new generation of protein database search programs”.
It’s also the reference of high citation. It indicates that this paper has received great attention in genome research,
especially from 2003 to 2005, and has played an important role in the research of this field. It is the research hot
spot during this period. In addition, the literature of most recent bursting time is “Finn RD-2010”. The bursting
time was 2011-2012. It is the research frontier in recent years. The title is “The Pfam protein families database”.
Pfam is a protein motif database, which is widely used in proteomics research. It is based on hidden Markov
model and provides multiple sequence alignment services. It is a large set of protein family. This paper introduces
its latest version, Pfam24.0, which applies the latest version of Hidden Markov Model package, HMMER3.
HMMER3 runs 100 times faster than HMMER2. The sensitivity is greatly improved by the application of forward
algorithm. Pfam 24 contains 11912 protein families. Pfam application website:
/ (UK),
/ (USA),
(Sweden).
2 Discussion
Through the bibliometric analysis and visualization analysis in the field of genomics from 1985 to 2016, it is
found that there are more papers in genomics in China, but there is less cooperation among researchers and among
research institutions. Therefore, while strengthening the research of genomics and encouraging domestic research
institutions to strengthen the research investment in this discipline, at the same time, we should also encourage the