Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2011_09 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2011_09 of 21-Sep-2011 of UniProtKB/TrEMBL contains 16886838 sequence entries,
                    comprising 5477504111 amino acids .
                    
                    425347 sequences have been added since release 2011_08, the sequence data of
                    686 existing entries has been updated and the annotations of
                    6682763 entries have been revised. This represents an increase of 3%.
                    
                    Number of fragments: 2696730
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           12724     0.08%
                    2: Evidence at transcript level       534559     3.17%
                    3: Inferred from homology            3847758    22.79%
                    4: Predicted                        12491797    73.97%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 381813
                    
                    The first twenty species represent 1362849 sequences:   8.1 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:18355
                    2x:66305
                    3x:33656
                    4x:20198
                    5x:12212
                    6x: 8698
                    7x: 6336
                    8x: 4936
                    9x: 3893
                    10x: 7816
                    11- 20x:19266
                    21- 50x: 6892
                    51-100x: 2450
                    >100x: 5599
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     397301  Human immunodeficiency virus 1
                    2      95253  Oryza sativa subsp. japonica (Rice)
                    3      94154  Homo sapiens (Human)
                    4      64897  Hepatitis C virus
                    5      58913  Mus musculus (Mouse)
                    6      58173  uncultured bacterium
                    7      54029  Vitis vinifera (Grape)
                    8      52556  Danio rerio (Zebrafish) (Brachydanio rerio)
                    9      50479  Trichomonas vaginalis
                    10      45406  Hepatitis B virus (HBV)
                    11      44947  Arabidopsis thaliana (Mouse-ear cress)
                    12      44063  Populus trichocarpa (Western balsam poplar) 
                    13      42027  Zea mays (Maize)
                    14      42026  Callithrix jacchus (White-tufted-ear marmoset)
                    15      39841  Paramecium tetraurelia
                    16      39381  Oryza sativa subsp. indica (Rice)
                    17      37695  Macaca mulatta (Rhesus macaque)
                    18      34798  Physcomitrella patens subsp. patens (Moss)
                    19      33645  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    20      33265  Selaginella moellendorffii (Spikemoss)
                    21      33023  Drosophila melanogaster (Fruit fly)
                    22      32604  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
                    23      32578  Rattus norvegicus (Rat)
                    24      31830  Caenorhabditis remanei (Caenorhabditis vulgaris)
                    25      31558  Monodelphis domestica (Gray short-tailed gray opossum)
                    26      31300  Ricinus communis (Castor bean)
                    27      30821  Trypanosoma cruzi
                    28      30525  Daphnia pulex (Water flea)
                    29      29163  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    30      29024  Oikopleura dioica (Tunicate)
                    31      28794  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    32      28090  Tetraodon nigroviridis (Green puffer)
                    33      27602  Bos taurus (Bovine)
                    34      27038  Canis familiaris (Dog) (Canis lupus familiaris)
                    35      26911  Ornithorhynchus anatinus (Duckbill platypus)
                    36      24810  Nematostella vectensis (Starlet sea anemone)
                    37      24699  Sus scrofa (Pig)
                    38      24672  Gallus gallus (Chicken)
                    39      23632  Equus caballus (Horse)
                    40      23612  Ralstonia solanacearum (Pseudomonas solanacearum)
                    41      23549  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    42      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
                    43      22810  Escherichia coli
                    44      21602  Caenorhabditis elegans
                    45      21533  Hordeum vulgare var. distichum (Two-rowed barley)
                    46      21238  Caenorhabditis briggsae
                    47      21090  Ixodes scapularis (Black-legged tick) (Deer tick)
                    48      20985  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
                    49      20434  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
                    50      19175  Toxoplasma gondii
                    51      18897  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    52      18771  mine drainage metagenome
                    53      18588  Drosophila simulans (Fruit fly)
                    54      17843  Ailuropoda melanoleuca (Giant panda)
                    55      17840  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    56      17603  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
                    57      17031  Drosophila yakuba (Fruit fly)
                    58      16988  Tribolium castaneum (Red flour beetle)
                    59      16742  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    60      16706  Drosophila persimilis (Fruit fly)
                    61      16425  Ectocarpus siliculosus (Brown alga)
                    62      16295  Loa loa (Eye worm)
                    63      16252  Trichinella spiralis (Trichina worm)
                    64      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
                    65      16233  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    66      16179  Drosophila sechellia (Fruit fly)
                    67      15979  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    68      15767  Phaeosphaeria nodorum (strain SN15 / FGSC 10173) (Glume blotch fungus) 
                    69      15715  Naegleria gruberi (Amoeba)
                    70      15632  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    71      15592  Anopheles gambiae (African malaria mosquito)
                    72      15418  Drosophila willistoni (Fruit fly)
                    73      15247  Tetrahymena thermophila (strain SB210)
                    74      15138  Drosophila ananassae (Fruit fly)
                    75      15029  Harpegnathos saltator (Jerdon's jumping ant)
                    76      14921  Drosophila erecta (Fruit fly)
                    77      14892  Hepatitis C virus subtype 1a
                    78      14821  Chlamydomonas reinhardtii (Chlamydomonas smithii)
                    79      14791  Camponotus floridanus (Florida carpenter ant)
                    80      14782  Drosophila mojavensis (Fruit fly)
                    81      14695  Drosophila virilis (Fruit fly)
                    82      14671  Plasmodium chabaudi
                    83      14651  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    84      14417  Volvox carteri (Green alga)
                    85      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
                    86      14322  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
                    87      14241  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    88      14121  Hepatitis C virus subtype 1b
                    89      13964  Acromyrmex echinatior (Panamanian leafcutter ant) 
                    90      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
                    91      13508  Schistosoma mansoni (Blood fluke)
                    92      13505  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    93      13484  Plasmodium falciparum
                    94      13329  Aspergillus flavus 
                    95      13278  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    96      13168  Magnaporthe oryzae (strain 70-15 / FGSC 8958) (Rice blast fungus) 
                    97      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
                    98      12983  Albugo laibachii Nc14
                    99      12950  Stigmatella aurantiaca (strain DW4/3-1)
                    100      12937  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    101      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
                    102      12683  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    103      12651  Glycine max (Soybean) (Glycine hispida)
                    104      12553  Xenopus laevis (African clawed frog)
                    105      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
                    106      12444  Polysphondylium pallidum (Cellular slime mold)
                    107      12352  Dictyostelium purpureum (Slime mold)
                    108      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
                    109      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    110      11994  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
                    111      11717  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
                    112      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
                    113      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
                    114      11647  Anopheles darlingi (Mosquito)
                    115      11645  Plasmodium berghei (strain Anka)
                    116      11588  Aspergillus oryzae (strain ATCC 42149 / RIB 40)
                    117      11563  Trichoplax adhaerens (Trichoplax reptans)
                    118      11510  Aureococcus anophagefferens
                    119      11498  Brugia malayi (Filarial nematode worm)
                    120      11394  Helicobacter pylori (Campylobacter pylori)
                    121      11283  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
                    122      11211  Ktedonobacter racemifer DSM 44963
                    123      10966  Streptomyces clavuligerus ATCC 27064
                    124      10928  Schistosoma japonicum (Blood fluke)
                    125      10842  Pediculus humanus subsp. corporis (Body louse)
                    126      10821  Chaetomium globosum  
                    127      10571  Metarhizium robertsii (strain ARSEF 23) (Metarhizium anisopliae)
                    128      10550  Podospora anserina (strain S / DSM 980 / FGSC 10383) (Pleurage anserina)
                    129      10423  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    130      10382  Pseudomonas syringae pv. glycinea str. race 4
                    131      10378  Neurospora tetrasperma (strain FGSC 2508 / P0657)
                    132      10378  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    133      10357  Phaeodactylum tricornutum (strain CCAP 1055/1)
                    134      10356  Aspergillus nidulans FGSC A4
                    135      10276  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
                    136      10205  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    137      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
                    138      10176  Rabies virus
                    139      10143  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
                    140      10113  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
                    141      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
                    142      10088  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    143      10053  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    144      10015  Streptomyces bingchenggensis (strain BCW-1)
                    145       9835  Chlorella variabilis
                    146       9822  Metarhizium acridum (strain CQMa 102)
                    147       9706  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    148       9663  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
                    149       9534  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    150       9511  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    151       9486  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    152       9484  Streptomyces violaceusniger Tu 4113
                    153       9443  Ajellomyces capsulata (strain H88) (Darling's disease fungus) 
                    154       9433  Salmo salar (Atlantic salmon)
                    155       9239  Monosiga brevicollis (Choanoflagellate)
                    156       9214  Candida albicans (Yeast)
                    157       9204  Sordaria macrospora
                    158       9202  Amycolatopsis mediterranei (strain U-32)
                    159       9177  Streptomyces himastatinicus ATCC 53653
                    160       9164  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    161       9157  Emericella nidulans (Aspergillus nidulans)
                    162       9146  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    163       9136  Pseudomonas syringae pv. pisi str. 1704B
                    164       9113  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    165       9065  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    166       9019  Neurospora crassa 
                    167       9011  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
                    168       8993  Dictyostelium discoideum (Slime mold)
                    169       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    170       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    171       8940  Burkholderia sp. TJI49
                    172       8900  Catenulispora acidiphila 
                    173       8860  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
                    174       8813  Trypanosoma brucei
                    175       8797  Aspergillus clavatus 
                    176       8777  Pseudomonas syringae pv. japonica str. M301072PT
                    177       8757  Rhodococcus sp. (strain RHA1)
                    178       8705  Trichophyton rubrum (strain ATCC MYA-4607 / CBS 118892) (Athlete's foot fungus)
                    179       8699  Paracoccidioides brasiliensis (strain Pb18)
                    180       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    181       8676  Trichophyton equinum (strain ATCC MYA-4606 / CBS 127.97) (Horse ringworm fungus)
                    182       8662  Arthroderma otae (strain CBS 113480) (Microsporum canis)
                    183       8610  Batrachochytrium dendrobatidis (strain JAM81 / FGSC 10211) (Frog chytrid fungus)
                    184       8599  Entamoeba dispar (strain ATCC PRA-260 / SAW760)
                    185       8520  Trichophyton tonsurans (strain CBS 112818) (Scalp ringworm fungus)
                    186       8437  Plesiocystis pacifica SIR-1
                    187       8394  Streptomyces sp. AA4
                    188       8374  Capsaspora owczarzaki (strain ATCC 30864)
                    189       8311  Bradyrhizobium japonicum
                    190       8308  Grosmannia clavigera (strain kw1407 / UAMH 11150) (Blue stain fungus) 
                    191       8306  uncultured archaeon
                    192       8302  Entamoeba histolytica
                    193       8274  Leishmania major
                    194       8249  Microscilla marina ATCC 23134
                    195       8202  Leishmania infantum
                    196       8202  Streptomyces sviceus ATCC 29083
                    197       8201  Microcoleus chthonoplastes PCC 7420
                    198       8185  Leishmania braziliensis
                    199       8176  Pseudomonas aeruginosa
                    200       8163  Frankia sp. EUN1f
                    201       8154  Burkholderia xenovorans (strain LB400)
                    202       8044  Leishmania mexicana MHOM/GT/2001/U1103
                    203       7961  Leishmania donovani BPK282A1
                    204       7957  Trichophyton verrucosum (strain HKI 0517)
                    205       7955  Ostreococcus tauri
                    206       7943  Rhodococcus opacus (strain B4)
                    207       7917  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    208       7906  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    209       7866  Streptomyces ghanaensis ATCC 14672
                    210       7854  Acaryochloris marina (strain MBIC 11017)
                    211       7824  Paracoccidioides brasiliensis (strain Pb03)
                    212       7823  Burkholderia sp. Ch1-1
                    213       7812  Pseudomonas putida (Arthrobacter siderocapsulatus)
                    214       7808  Plasmodium yoelii yoelii
                    215       7781  Paenibacillus mucilaginosus KNP414
                    216       7708  Uncinocarpus reesii (strain UAMH 1704)
                    217       7706  Streptomyces viridochromogenes DSM 40736
                    218       7571  Clostridium hathewayi DSM 13479
                    219       7563  Burkholderia pseudomallei MSHR346
                    220       7528  Streptomyces sp. C
                    221       7523  Streptomyces lividans TK24
                    222       7519  Solibacter usitatus (strain Ellin6076)
                    223       7501  Bacillus thuringiensis
                    224       7490  Pseudomonas syringae pv. mori str. 301020
                    225       7486  Tuber melanosporum (strain Mel28) (Perigord black truffle)
                    226       7475  Burkholderia pseudomallei 1710a
                    227       7471  Streptomyces coelicolor
                    228       7465  Burkholderia pseudomallei Pakistan 9
                    229       7459  Burkholderia sp. H160
                    230       7448  Streptomyces venezuelae 
                    231       7443  Kitasatospora setae  
                    232       7385  Ostreococcus lucimarinus (strain CCE9901)
                    233       7383  Lyngbya majuscula 3L
                    234       7367  Burkholderia pseudomallei 576
                    235       7351  Burkholderia gladioli BSR3
                    236       7349  Burkholderia pseudomallei 305
                    237       7274  Clostridium bolteae ATCC BAA-613
                    238       7267  Myxococcus fulvus
                    239       7241  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    240       7231  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    241       7227  Streptomyces avermitilis
                    242       7186  Bacillus cereus
                    243       7177  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    244       7150  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    245       7145  Giardia intestinalis (strain ATCC 50803 / WB clone C6) (Giardia lamblia)
                    246       7140  Burkholderia pseudomallei 1106b
                    247       7130  Burkholderia phymatum (strain DSM 17167 / STM815)
                    248       7124  Burkholderia ambifaria MEX-5
                    249       7120  Medicago truncatula (Barrel medic) (Medicago tribuloides)
                    250       7120  Pseudomonas syringae Cit 7
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          292668 (  2%)
                    Bacteria       10800881 ( 64%)
                    Eukaryota       4565443 ( 27%)
                    Viruses         1187976 (  7%)
                    Other             39869 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  94190 (  2%)           (  1%)
                    Other Mammalia        447668 ( 10%)           (  3%)
                    Other Vertebrata      417688 (  9%)           (  2%)
                    Viridiplantae         907805 ( 20%)           (  5%)
                    Fungi                 947232 ( 21%)           (  6%)
                    Insecta               685780 ( 15%)           (  4%)
                    Nematoda              136989 (  3%)           (  1%)
                    Other                 928091 ( 20%)           (  5%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  372040             1001-1100   100557
                    51- 100 1351216             1101-1200    70737
                    101- 150 1552443             1201-1300    49211
                    151- 200 1502371             1301-1400    32332
                    201- 250 1512755             1401-1500    25873
                    251- 300 1464705             1501-1600    18465
                    301- 350 1333989             1601-1700    13939
                    351- 400 1025306             1701-1800    10825
                    401- 450  873104             1801-1900     8864
                    451- 500  727886             1901-2000     7527
                    501- 550  490565             2001-2100     6040
                    551- 600  379212             2101-2200     6049
                    601- 650  276217             2201-2300     4730
                    651- 700  215595             2301-2400     3851
                    701- 750  185227             2401-2500     3246
                    751- 800  166002             >2500        28204
                    801- 850  124023
                    851- 900  112311
                    901- 950   76993
                    951-1000   57698
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   324 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    20445113                1.21                                                    
                    Submitted to EMBL/GenBank/DDBJ  11704442  10401063      0.69                                                    
                    Journal                          8309585   7703192      0.49                                                    
                    Submitted to other databases      417152    410427      0.02                                                    
                    Thesis                              8169      8111     <0.01                                                    
                    Book citation                       5706      5657     <0.01                                                    
                    Unpublished observations              58        58     <0.01                                                    
                    Patent                                 1         1     <0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 409545
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                      16926535                1.00                                                    
                    CATALYTIC ACTIVITY               1705556   1577553      0.10     4                                              
                    CAUTION                          4638541   4638533      0.27     2                                              
                    COFACTOR                          555969    529368      0.03     8                                              
                    DOMAIN                             43759     41336     <0.01     9                                              
                    FUNCTION                         1879800   1739431      0.11     3                                              
                    INTERACTION                          923       923     <0.01    11                                              
                    MISCELLANEOUS                      39849     39787     <0.01    10                                              
                    PATHWAY                           877456    811855      0.05     6                                              
                    SIMILARITY                       5001568   4325867      0.30     1                                              
                    SUBCELLULAR LOCATION             1420817   1412102      0.08     5                                              
                    SUBUNIT                           762297    757711      0.05     7                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       5377854                0.32                                                    
                    CHAIN                             544018    430223      0.03     2                                              
                    NON_TER                          4456479   2695168      0.26     1                                              
                    SIGNAL                            376760    375619      0.02     3                                              
                    TRANSIT                              597       597     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             190767252               11.30                                                    
                    AGD                                 2529      2529     <0.01    77   Organism-specific databases                
                    ANU-2DPAGE                            56        56     <0.01    94   2D gel databases                           
                    Allergome                           2101      1511     <0.01    81   Protein family/group databases             
                    ArachnoServer                         66        66     <0.01    93   Organism-specific databases                
                    ArrayExpress                       90685     90674      0.01    49   Gene expression databases                  
                    BRENDA                              2761      2729     <0.01    75   Enzyme and pathway databases               
                    Bgee                              109747    109550      0.01    46   Gene expression databases                  
                    BioCyc                            671030    656692      0.04    27   Enzyme and pathway databases               
                    CAZy                               74431     69923     <0.01    53   Protein family/group databases             
                    CGD                                 6742      6742     <0.01    72   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    98   2D gel databases                           
                    CTD                               243965    243010      0.01    39   Organism-specific databases                
                    CYGD                                   2         2     <0.01   100   Organism-specific databases                
                    DIP                                 2735      2730     <0.01    76   Protein-protein interaction databases      
                    EMBL                            19007122  16705765      1.13     3   Sequence databases                         
                    Ensembl                           442034    427760      0.03    31   Genome annotation databases                
                    EnsemblBacteria                   567538    532656      0.03    29   Genome annotation databases                
                    EnsemblFungi                      108159    108064      0.01    47   Genome annotation databases                
                    EnsemblMetazoa                    317756    295061      0.02    34   Genome annotation databases                
                    EnsemblPlants                     259422    232235      0.02    37   Genome annotation databases                
                    EnsemblProtists                    72633     71474     <0.01    54   Genome annotation databases                
                    EuPathDB                          182663    182662      0.01    44   Organism-specific databases                
                    FlyBase                           195641    194093      0.01    41   Organism-specific databases                
                    GO                              31532186  10379718      1.87     2   Ontologies                                 
                    Gene3D                           7435830   5958392      0.44     6   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   102   Organism-specific databases                
                    GeneID                           6187252   6068827      0.37     9   Genome annotation databases                
                    GeneTree                         1151996   1151651      0.07    22   Phylogenomic databases                     
                    Genevestigator                     96323     96318      0.01    48   Gene expression databases                  
                    GenoList                           14745     14473     <0.01    68   Organism-specific databases                
                    GenomeReviews                    4257612   4159275      0.25    12   Genome annotation databases                
                    Gramene                            68625     68625     <0.01    56   Organism-specific databases                
                    H-InvDB                              594       483     <0.01    85   Organism-specific databases                
                    HAMAP                            1341044   1325718      0.08    21   Family and domain databases                
                    HGNC                               79248     77409     <0.01    50   Organism-specific databases                
                    HOGENOM                          2193826   2193784      0.13    20   Phylogenomic databases                     
                    HOVERGEN                          314914    314903      0.02    35   Phylogenomic databases                     
                    HSSP                              252620    252383      0.01    38   3D structure databases                     
                    IPI                               319459    319429      0.02    33   Sequence databases                         
                    InParanoid                        192753    192686      0.01    43   Phylogenomic databases                     
                    IntAct                             15990     15990     <0.01    65   Protein-protein interaction databases      
                    InterPro                        36276727  12961457      2.15     1   Family and domain databases                
                    KEGG                             5171278   5069311      0.31    11   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    73   Organism-specific databases                
                    Leproma                              936       935     <0.01    84   Organism-specific databases                
                    MEROPS                             70252     68771     <0.01    55   Protein family/group databases             
                    MGI                                35431     35182     <0.01    60   Organism-specific databases                
                    MINT                                8941      8941     <0.01    70   Protein-protein interaction databases      
                    NMPDR                             920072    920068      0.05    25   Genome annotation databases                
                    NextBio                            45590     45587     <0.01    58   Other                                      
                    OMA                              3315067   3315067      0.20    15   Phylogenomic databases                     
                    OrthoDB                           579721    579557      0.03    28   Phylogenomic databases                     
                    PANTHER                          2281321   2196287      0.14    19   Family and domain databases                
                    PDB                                15285      8940     <0.01    66   3D structure databases                     
                    PDBsum                             14861      8732     <0.01    67   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    90   2D gel databases                           
                    PIR                               174596    141756      0.01    45   Sequence databases                         
                    PIRSF                            1083815   1083522      0.06    24   Family and domain databases                
                    PMAP-CutDB                           238       238     <0.01    87   Other                                      
                    PMMA-2DPAGE                            2         2     <0.01    99   2D gel databases                           
                    PRIDE                             218493    218491      0.01    40   Proteomic databases                        
                    PRINTS                           2791353   2478033      0.17    16   Family and domain databases                
                    PROSITE                          8516857   5674673      0.50     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    97   Enzyme and pathway databases               
                    PeptideAtlas                         147       147     <0.01    88   Proteomic databases                        
                    PeroxiBase                          2529      2521     <0.01    78   Protein family/group databases             
                    Pfam                            16339863  12149174      0.97     4   Family and domain databases                
                    PharmGKB                              83        83     <0.01    92   Organism-specific databases                
                    PhosphoSite                         1611      1611     <0.01    82   PTM databases                              
                    PhylomeDB                         371114    371082      0.02    32   Phylogenomic databases                     
                    ProDom                            307289    289387      0.02    36   Family and domain databases                
                    ProMEX                               316       316     <0.01    86   Proteomic databases                        
                    ProtClustDB                      2730310   2730299      0.16    17   Phylogenomic databases                     
                    ProteinModelPortal               5687742   5686508      0.34    10   3D structure databases                     
                    PseudoCAP                           4342      4339     <0.01    74   Organism-specific databases                
                    REBASE                             20554     19898     <0.01    63   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   90        89     <0.01    91   2D gel databases                           
                    RGD                                23839     23579     <0.01    61   Organism-specific databases                
                    Reactome                             141       120     <0.01    89   Enzyme and pathway databases               
                    RefSeq                           6210650   6074904      0.37     8   Sequence databases                         
                    SGD                                   11        11     <0.01    96   Organism-specific databases                
                    SMART                            3579052   2748446      0.21    13   Family and domain databases                
                    SMR                               870001    870001      0.05    26   3D structure databases                     
                    STRING                           2608057   2607874      0.15    18   Protein-protein interaction databases      
                    SUPFAM                           7064264   5838310      0.42     7   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    95   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01   101   2D gel databases                           
                    TAIR                               16798     16718     <0.01    64   Organism-specific databases                
                    TCDB                                2430      2421     <0.01    79   Protein family/group databases             
                    TIGR                              195005    187945      0.01    42   Genome annotation databases                
                    TIGRFAMs                         3469453   3163063      0.21    14   Family and domain databases                
                    TubercuList                         2113      2108     <0.01    80   Organism-specific databases                
                    UCSC                               56357     56356     <0.01    57   Genome annotation databases                
                    UniGene                           481648    451122      0.03    30   Sequence databases                         
                    VectorBase                         75602     75093     <0.01    51   Genome annotation databases                
                    World-2DPAGE                         943       938     <0.01    83   2D gel databases                           
                    WormBase                           41466     41307     <0.01    59   Organism-specific databases                
                    Xenbase                            13200     13170     <0.01    69   Organism-specific databases                
                    ZFIN                               21536     21531     <0.01    62   Organism-specific databases                
                    dictyBase                           7573      7573     <0.01    71   Organism-specific databases                
                    eggNOG                           1144892   1144892      0.07    23   Phylogenomic databases                     
                    euHCVdb                            75268     75265     <0.01    52   Organism-specific databases                
                    
                    Number of explicitly cross-referenced databases: 129
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.62   Gln (Q) 3.88   Leu (L) 9.86   Ser (S) 6.71
                    Arg (R) 5.46   Glu (E) 6.14   Lys (K) 5.25   Thr (T) 5.61
                    Asn (N) 4.12   Gly (G) 7.12   Met (M) 2.48   Trp (W) 1.31
                    Asp (D) 5.30   His (H) 2.20   Phe (F) 4.02   Tyr (Y) 3.04
                    Cys (C) 1.27   Ile (I) 5.99   Pro (P) 4.74   Val (V) 6.75
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.04
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 551877
                    Total number of entries encoded on a Plasmid: 241097
                    Total number of entries encoded on a Plastid: 13758
                    Total number of entries encoded on a Plastid; Apicoplast: 367
                    Total number of entries encoded on a Plastid; Chloroplast: 144374
                    Total number of entries encoded on a Plastid; Cyanelle: 8
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 456