Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2011_06 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2011_06 of 31-May-2011 of UniProtKB/TrEMBL contains 15400876 sequence entries,
                    comprising 4982458690 amino acids .
                    
                    362727 sequences have been added since release 2011_05, the sequence data of
                    88 existing entries has been updated and the annotations of
                    5388503 entries have been revised. This represents an increase of 3%.
                    
                    Number of fragments: 2485652
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           20317     0.13%
                    2: Evidence at transcript level       521886     3.39%
                    3: Inferred from homology            3183929    20.67%
                    4: Predicted                        11674744    75.81%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 361441
                    
                    The first twenty species represent 1304556 sequences:   8.5 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:17222
                    2x:63448
                    3x:32086
                    4x:19124
                    5x:11747
                    6x: 8226
                    7x: 6050
                    8x: 4667
                    9x: 3761
                    10x: 7423
                    11- 20x:18591
                    21- 50x: 6555
                    51-100x: 2330
                    >100x: 5210
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     384546  Human immunodeficiency virus 1
                    2      95306  Oryza sativa subsp. japonica (Rice)
                    3      86521  Homo sapiens (Human)
                    4      59510  Hepatitis C virus
                    5      56142  Mus musculus (Mouse)
                    6      55184  uncultured bacterium
                    7      51503  Danio rerio (Zebrafish) (Brachydanio rerio)
                    8      50947  Vitis vinifera (Grape)
                    9      50471  Trichomonas vaginalis
                    10      44072  Populus trichocarpa (Western balsam poplar) 
                    11      43879  Hepatitis B virus (HBV)
                    12      42018  Zea mays (Maize)
                    13      39841  Paramecium tetraurelia
                    14      39366  Oryza sativa subsp. indica (Rice)
                    15      38531  Arabidopsis thaliana (Mouse-ear cress)
                    16      34795  Physcomitrella patens subsp. patens (Moss)
                    17      33645  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    18      33218  Selaginella moellendorffii (Spikemoss)
                    19      32625  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
                    20      32436  Rattus norvegicus (Rat)
                    21      32185  Drosophila melanogaster (Fruit fly)
                    22      31830  Caenorhabditis remanei (Caenorhabditis vulgaris)
                    23      31298  Ricinus communis (Castor bean)
                    24      30817  Trypanosoma cruzi
                    25      30506  Daphnia pulex (Water flea)
                    26      29162  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    27      29024  Oikopleura dioica (Tunicate)
                    28      28089  Tetraodon nigroviridis (Green puffer)
                    29      27610  Bos taurus (Bovine)
                    30      27019  Canis familiaris (Dog) (Canis lupus familiaris)
                    31      25274  Ralstonia solanacearum (Pseudomonas solanacearum)
                    32      24811  Nematostella vectensis (Starlet sea anemone)
                    33      24637  Gallus gallus (Chicken)
                    34      24564  Sus scrofa (Pig)
                    35      23115  Perkinsus marinus ATCC 50983
                    36      22601  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    37      22589  Escherichia coli
                    38      21492  Caenorhabditis elegans
                    39      21475  Hordeum vulgare var. distichum (Two-rowed barley)
                    40      21087  Ixodes scapularis (Black-legged tick) (Deer tick)
                    41      20435  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
                    42      18889  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    43      18771  mine drainage metagenome
                    44      18067  Drosophila simulans (Fruit fly)
                    45      17933  Caenorhabditis briggsae
                    46      17843  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    47      17843  Ailuropoda melanoleuca (Giant panda)
                    48      17604  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
                    49      16975  Tribolium castaneum (Red flour beetle)
                    50      16928  Drosophila yakuba (Fruit fly)
                    51      16735  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    52      16706  Drosophila persimilis (Fruit fly)
                    53      16425  Ectocarpus siliculosus (Brown alga)
                    54      16295  Loa loa (Eye worm)
                    55      16243  Trichinella spiralis (Trichina worm)
                    56      16238  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    57      16179  Drosophila sechellia (Fruit fly)
                    58      15980  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    59      15845  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    60      15715  Naegleria gruberi (Amoeba)
                    61      15642  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    62      15417  Drosophila willistoni (Fruit fly)
                    63      15247  Tetrahymena thermophila SB210
                    64      15136  Drosophila ananassae (Fruit fly)
                    65      15029  Harpegnathos saltator
                    66      14955  Anopheles gambiae (African malaria mosquito)
                    67      14921  Drosophila erecta (Fruit fly)
                    68      14817  Chlamydomonas reinhardtii (Chlamydomonas smithii)
                    69      14791  Camponotus floridanus
                    70      14774  Drosophila mojavensis (Fruit fly)
                    71      14695  Drosophila virilis (Fruit fly)
                    72      14671  Plasmodium chabaudi
                    73      14651  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    74      14635  Toxoplasma gondii
                    75      14634  Volvox carteri f. nagariensis
                    76      14322  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
                    77      14251  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    78      13656  Hepatitis C virus subtype 1b
                    79      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
                    80      13515  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    81      13510  Schistosoma mansoni (Blood fluke)
                    82      13469  Plasmodium falciparum
                    83      13339  Aspergillus flavus 
                    84      13281  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    85      13178  Magnaporthe oryzae (strain 70-15 / FGSC 8958) (Rice blast fungus) 
                    86      13124  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
                    87      12983  Albugo laibachii Nc14
                    88      12950  Stigmatella aurantiaca (strain DW4/3-1)
                    89      12947  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    90      12693  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    91      12627  Glycine max (Soybean) (Glycine hispida)
                    92      12528  Xenopus laevis (African clawed frog)
                    93      12523  Leptosphaeria maculans (Blackleg fungus) (Phoma lingam)
                    94      12444  Polysphondylium pallidum (Cellular slime mold)
                    95      12352  Dictyostelium purpureum (Slime mold)
                    96      12005  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    97      12003  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
                    98      11828  Hepatitis C virus subtype 1a
                    99      11716  Thalassiosira pseudonana (Marine diatom)
                    100      11703  Salpingoeca sp. ATCC 50818
                    101      11695  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
                    102      11647  Anopheles darlingi (Mosquito)
                    103      11645  Plasmodium berghei (strain Anka)
                    104      11600  Aspergillus oryzae (strain ATCC 42149 / RIB 40)
                    105      11563  Trichoplax adhaerens (Trichoplax reptans)
                    106      11510  Aureococcus anophagefferens
                    107      11497  Brugia malayi (Filarial nematode worm)
                    108      11356  Helicobacter pylori (Campylobacter pylori)
                    109      11287  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
                    110      11211  Ktedonobacter racemifer DSM 44963
                    111      10966  Streptomyces clavuligerus ATCC 27064
                    112      10919  Schistosoma japonicum (Blood fluke)
                    113      10851  Chaetomium globosum (Soil fungus)
                    114      10842  Pediculus humanus subsp. corporis (Body louse)
                    115      10781  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
                    116      10661  Podospora anserina
                    117      10580  Metarhizium robertsii (strain ARSEF 23) (Metarhizium anisopliae)
                    118      10400  Neurospora crassa
                    119      10388  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    120      10366  Aspergillus nidulans FGSC A4
                    121      10357  Phaeodactylum tricornutum (strain CCAP 1055/1)
                    122      10276  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
                    123      10214  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    124      10206  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
                    125      10168  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    126      10139  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
                    127      10113  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
                    128      10096  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    129      10089  Ajellomyces dermatitidis ATCC 18188
                    130      10064  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    131      10042  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    132      10015  Streptomyces bingchenggensis (strain BCW-1)
                    133       9908  Rabies virus
                    134       9835  Chlorella variabilis
                    135       9830  Metarhizium acridum (strain CQMa 102)
                    136       9718  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    137       9669  Cryptococcus neoformans (Filobasidiella neoformans)
                    138       9663  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
                    139       9545  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    140       9520  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    141       9497  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    142       9484  Streptomyces violaceusniger Tu 4113
                    143       9482  Trypanosoma brucei
                    144       9446  Ajellomyces capsulatus H88
                    145       9383  Salmo salar (Atlantic salmon)
                    146       9238  Monosiga brevicollis (Choanoflagellate)
                    147       9216  Candida albicans (Yeast)
                    148       9202  Amycolatopsis mediterranei (strain U-32)
                    149       9177  Streptomyces himastatinicus ATCC 53653
                    150       9173  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    151       9163  Emericella nidulans (Aspergillus nidulans)
                    152       9155  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    153       9114  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    154       9075  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    155       9023  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
                    156       8982  Dictyostelium discoideum (Slime mold)
                    157       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    158       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    159       8926  Burkholderia sp. TJI49
                    160       8900  Catenulispora acidiphila 
                    161       8870  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
                    162       8809  Aspergillus clavatus 
                    163       8757  Rhodococcus sp. (strain RHA1)
                    164       8710  Paracoccidioides brasiliensis (strain Pb18)
                    165       8705  Trichophyton rubrum CBS 118892
                    166       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    167       8676  Trichophyton equinum CBS 127.97
                    168       8672  Arthroderma otae (strain CBS 113480) (Microsporum canis)
                    169       8599  Entamoeba dispar SAW760
                    170       8520  Trichophyton tonsurans CBS 112818
                    171       8437  Plesiocystis pacifica SIR-1
                    172       8394  Streptomyces sp. AA4
                    173       8374  Capsaspora owczarzaki ATCC 30864
                    174       8311  Grosmannia clavigera kw1407
                    175       8302  Entamoeba histolytica
                    176       8290  Bradyrhizobium japonicum
                    177       8274  Leishmania major
                    178       8249  Microscilla marina ATCC 23134
                    179       8202  Streptomyces sviceus ATCC 29083
                    180       8201  Microcoleus chthonoplastes PCC 7420
                    181       8190  Leishmania infantum
                    182       8163  Frankia sp. EUN1f
                    183       8154  Burkholderia xenovorans (strain LB400)
                    184       8137  Pseudomonas aeruginosa
                    185       8044  Leishmania mexicana MHOM/GT/2001/U1103
                    186       7997  Leishmania braziliensis
                    187       7978  Toxoplasma gondii ME49
                    188       7967  Trichophyton verrucosum (strain HKI 0517)
                    189       7961  Leishmania donovani BPK282A1
                    190       7955  Ostreococcus tauri
                    191       7943  Rhodococcus opacus (strain B4)
                    192       7917  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    193       7917  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    194       7866  Streptomyces ghanaensis ATCC 14672
                    195       7856  Acaryochloris marina (strain MBIC 11017)
                    196       7835  Paracoccidioides brasiliensis (strain Pb03)
                    197       7823  Burkholderia sp. Ch1-1
                    198       7808  Plasmodium yoelii yoelii
                    199       7776  uncultured archaeon
                    200       7719  Uncinocarpus reesii (strain UAMH 1704)
                    201       7706  Streptomyces viridochromogenes DSM 40736
                    202       7571  Clostridium hathewayi DSM 13479
                    203       7563  Burkholderia pseudomallei MSHR346
                    204       7528  Streptomyces sp. C
                    205       7523  Streptomyces lividans TK24
                    206       7519  Solibacter usitatus (strain Ellin6076)
                    207       7492  Tuber melanosporum (strain Mel28) (Perigord black truffle)
                    208       7475  Burkholderia pseudomallei 1710a
                    209       7474  Streptomyces coelicolor
                    210       7465  Burkholderia pseudomallei Pakistan 9
                    211       7459  Burkholderia sp. H160
                    212       7451  Streptomyces venezuelae ATCC 10712
                    213       7443  Kitasatospora setae  
                    214       7385  Ostreococcus lucimarinus (strain CCE9901)
                    215       7367  Burkholderia pseudomallei 576
                    216       7351  Burkholderia gladioli BSR3
                    217       7349  Burkholderia pseudomallei 305
                    218       7274  Clostridium bolteae ATCC BAA-613
                    219       7241  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    220       7231  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    221       7227  Streptomyces avermitilis
                    222       7177  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    223       7162  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    224       7145  Giardia intestinalis (strain ATCC 50803 / WB clone C6) (Giardia lamblia)
                    225       7140  Burkholderia pseudomallei 1106b
                    226       7130  Burkholderia phymatum (strain DSM 17167 / STM815)
                    227       7124  Burkholderia ambifaria MEX-5
                    228       7111  Neospora caninum Liverpool
                    229       7098  Medicago truncatula (Barrel medic) (Medicago tribuloides)
                    230       7079  Frankia sp. (strain EuI1c)
                    231       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    232       7016  Myxococcus xanthus (strain DK 1622)
                    233       7005  Mucilaginibacter paludis DSM 18603
                    234       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    235       6974  Rhodopirellula baltica
                    236       6959  Frankia sp. (strain EAN1pec)
                    237       6932  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    238       6931  Streptomyces sp. Mg1
                    239       6923  Burkholderia ambifaria IOP40-10
                    240       6903  Streptococcus pneumoniae
                    241       6903  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    242       6902  Saccharopolyspora erythraea (strain NRRL 23338)
                    243       6892  Streptomyces roseosporus NRRL 15998
                    244       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    245       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    246       6866  Streptomyces pristinaespiralis ATCC 25486
                    247       6865  Burkholderia sp. (strain CCGE1002)
                    248       6859  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    249       6833  Rhizobium loti (Mesorhizobium loti)
                    250       6817  Clostridium asparagiforme DSM 15981
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          264769 (  2%)
                    Bacteria        9765542 ( 63%)
                    Eukaryota       4195105 ( 27%)
                    Viruses         1137127 (  7%)
                    Other             38332 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  86557 (  2%)           (  1%)
                    Other Mammalia        286666 (  7%)           (  2%)
                    Other Vertebrata      386730 (  9%)           (  3%)
                    Viridiplantae         885020 ( 21%)           (  6%)
                    Fungi                 879122 ( 21%)           (  6%)
                    Insecta               645438 ( 15%)           (  4%)
                    Nematoda              136670 (  3%)           (  1%)
                    Other                 888902 ( 21%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  334528             1001-1100    91056
                    51- 100 1229359             1101-1200    64348
                    101- 150 1416184             1201-1300    44698
                    151- 200 1368552             1301-1400    29434
                    201- 250 1377402             1401-1500    23608
                    251- 300 1332305             1501-1600    16965
                    301- 350 1212015             1601-1700    12735
                    351- 400  934225             1701-1800     9848
                    401- 450  793523             1801-1900     8015
                    451- 500  663538             1901-2000     6756
                    501- 550  447644             2001-2100     5471
                    551- 600  345851             2101-2200     5549
                    601- 650  251308             2201-2300     4361
                    651- 700  195804             2301-2400     3511
                    701- 750  168911             2401-2500     2987
                    751- 800  151651             >2500        26023
                    801- 850  112478
                    851- 900  102211
                    901- 950   69811
                    951-1000   52559
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   323 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    18718056                1.22                                                    
                    Submitted to EMBL/GenBank/DDBJ  11108928   9828131      0.72                                                    
                    Journal                          7303408   6667023      0.47                                                    
                    Submitted to other databases      166487    165719      0.01                                                    
                    Thesis                              7819      7761     <0.01                                                    
                    Book citation                       5666      5615     <0.01                                                    
                    Other                             125748    123687      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 319698
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                      14361940                0.93                                                    
                    CATALYTIC ACTIVITY               1442369   1333617      0.09     4                                              
                    CAUTION                          3882536   3882536      0.25     2                                              
                    COFACTOR                          468556    447782      0.03     8                                              
                    DOMAIN                             30332     28362     <0.01    10                                              
                    FUNCTION                         1708722   1563128      0.11     3                                              
                    INTERACTION                         5154      5154     <0.01    11                                              
                    MISCELLANEOUS                      31572     31568     <0.01     9                                              
                    PATHWAY                           721545    666765      0.05     6                                              
                    SIMILARITY                       4254604   3678877      0.28     1                                              
                    SUBCELLULAR LOCATION             1179978   1172988      0.08     5                                              
                    SUBUNIT                           636572    634376      0.04     7                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       5003566                0.32                                                    
                    CHAIN                             513666    404391      0.03     2                                              
                    NON_TER                          4150101   2484105      0.27     1                                              
                    SIGNAL                            339200    339135      0.02     3                                              
                    TRANSIT                              599       599     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             173563373               11.27                                                    
                    AGD                                 2540      2540     <0.01    77   Organism-specific databases                
                    ANU-2DPAGE                            56        56     <0.01    94   2D gel databases                           
                    Allergome                           2037      1468     <0.01    81   Protein family/group databases             
                    ArachnoServer                         66        66     <0.01    93   Organism-specific databases                
                    ArrayExpress                       91699     91688      0.01    49   Gene expression databases                  
                    BRENDA                              2772      2740     <0.01    75   Enzyme and pathway databases               
                    Bgee                              141333    141157      0.01    46   Gene expression databases                  
                    BioCyc                           1623183   1588627      0.11    21   Enzyme and pathway databases               
                    CAZy                               74408     69908     <0.01    52   Protein family/group databases             
                    CGD                                 6748      6748     <0.01    72   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    97   2D gel databases                           
                    CTD                               237141    236191      0.02    39   Organism-specific databases                
                    DIP                                 2744      2739     <0.01    76   Protein-protein interaction databases      
                    EMBL                            17213120  15353132      1.12     3   Sequence databases                         
                    Ensembl                           283673    257490      0.02    34   Genome annotation databases                
                    EnsemblBacteria                   565589    535434      0.04    29   Genome annotation databases                
                    EnsemblFungi                      106159    106070      0.01    47   Genome annotation databases                
                    EnsemblMetazoa                    320350    297364      0.02    32   Genome annotation databases                
                    EnsemblPlants                     260086    232828      0.02    37   Genome annotation databases                
                    EnsemblProtists                    72640     71481     <0.01    54   Genome annotation databases                
                    EuPathDB                          179061    179060      0.01    43   Organism-specific databases                
                    FlyBase                           195669    194119      0.01    40   Organism-specific databases                
                    GO                              30101572   9585875      1.95     2   Ontologies                                 
                    Gene3D                           6441583   5161110      0.42     6   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   100   Organism-specific databases                
                    GeneID                           5777097   5660635      0.38     9   Genome annotation databases                
                    GeneTree                         1157546   1157205      0.08    23   Phylogenomic databases                     
                    Genevestigator                     97989     97984      0.01    48   Gene expression databases                  
                    GenoList                           14750     14477     <0.01    66   Organism-specific databases                
                    GenomeReviews                    3982486   3895312      0.26    12   Genome annotation databases                
                    Gramene                            68721     68721     <0.01    56   Organism-specific databases                
                    H-InvDB                              596       485     <0.01    85   Organism-specific databases                
                    HAMAP                            1145568   1132356      0.07    24   Family and domain databases                
                    HGNC                               73030     71276     <0.01    53   Organism-specific databases                
                    HOGENOM                          2193323   2193281      0.14    18   Phylogenomic databases                     
                    HOVERGEN                          316676    316675      0.02    33   Phylogenomic databases                     
                    HSSP                              252984    252714      0.02    38   3D structure databases                     
                    IPI                               264956    264955      0.02    36   Sequence databases                         
                    InParanoid                        194104    194037      0.01    42   Phylogenomic databases                     
                    IntAct                             16058     16058     <0.01    65   Protein-protein interaction databases      
                    InterPro                        31808312  11421944      2.07     1   Family and domain databases                
                    KEGG                             4932975   4836211      0.32    10   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    73   Organism-specific databases                
                    Leproma                              936       935     <0.01    84   Organism-specific databases                
                    MEROPS                             70539     69056     <0.01    55   Protein family/group databases             
                    MGI                                49292     49241     <0.01    58   Organism-specific databases                
                    MINT                                8950      8950     <0.01    70   Protein-protein interaction databases      
                    NMPDR                             919121    919117      0.06    26   Genome annotation databases                
                    NextBio                            46565     46562     <0.01    59   Other                                      
                    OMA                              2421375   2421373      0.16    16   Phylogenomic databases                     
                    OrthoDB                           578592    578424      0.04    28   Phylogenomic databases                     
                    PANTHER                          1961392   1891040      0.13    20   Family and domain databases                
                    PDB                                14734      8684     <0.01    67   3D structure databases                     
                    PDBsum                             14375      8489     <0.01    68   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    89   2D gel databases                           
                    PIR                               174799    141963      0.01    44   Sequence databases                         
                    PIRSF                             864055    864055      0.06    27   Family and domain databases                
                    PMAP-CutDB                           252       252     <0.01    87   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    98   2D gel databases                           
                    PRIDE                             143214    143206      0.01    45   Proteomic databases                        
                    PRINTS                           2410198   2144517      0.16    17   Family and domain databases                
                    PROSITE                          7356741   4922711      0.48     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    96   Enzyme and pathway databases               
                    PeptideAtlas                         147       147     <0.01    88   Proteomic databases                        
                    PeroxiBase                          2495      2486     <0.01    78   Protein family/group databases             
                    Pfam                            14504686  10799454      0.94     4   Family and domain databases                
                    PharmGKB                              83        83     <0.01    92   Organism-specific databases                
                    PhosphoSite                         1565      1565     <0.01    82   PTM databases                              
                    PhylomeDB                         371361    371329      0.02    31   Phylogenomic databases                     
                    ProDom                            272463    255633      0.02    35   Family and domain databases                
                    ProMEX                               324       324     <0.01    86   Proteomic databases                        
                    ProtClustDB                      2726576   2726565      0.18    15   Phylogenomic databases                     
                    ProteinModelPortal               4860266   4854766      0.32    11   3D structure databases                     
                    PseudoCAP                           4342      4339     <0.01    74   Organism-specific databases                
                    REBASE                             18197     17601     <0.01    63   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   92        91     <0.01    91   2D gel databases                           
                    RGD                                22012     21852     <0.01    61   Organism-specific databases                
                    Reactome                              93        90     <0.01    90   Enzyme and pathway databases               
                    RefSeq                           5812939   5682249      0.38     8   Sequence databases                         
                    SMART                            3056747   2364485      0.20    13   Family and domain databases                
                    SMR                              2151073   2151073      0.14    19   3D structure databases                     
                    STRING                           1202303   1202167      0.08    22   Protein-protein interaction databases      
                    SUPFAM                           6181670   5107671      0.40     7   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    95   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01    99   2D gel databases                           
                    TAIR                               17049     16968     <0.01    64   Organism-specific databases                
                    TCDB                                2439      2429     <0.01    79   Protein family/group databases             
                    TIGR                              194967    187919      0.01    41   Genome annotation databases                
                    TIGRFAMs                         2990526   2733493      0.19    14   Family and domain databases                
                    TubercuList                         2119      2114     <0.01    80   Organism-specific databases                
                    UCSC                               49922     49922     <0.01    57   Genome annotation databases                
                    UniGene                           471605    442580      0.03    30   Sequence databases                         
                    VectorBase                         78957     78445      0.01    50   Genome annotation databases                
                    World-2DPAGE                         944       939     <0.01    83   2D gel databases                           
                    WormBase                           41321     41191     <0.01    60   Organism-specific databases                
                    Xenbase                            13206     13174     <0.01    69   Organism-specific databases                
                    ZFIN                               21559     21554     <0.01    62   Organism-specific databases                
                    dictyBase                           7744      7744     <0.01    71   Organism-specific databases                
                    eggNOG                           1144488   1144488      0.07    25   Phylogenomic databases                     
                    euHCVdb                            75268     75265     <0.01    51   Organism-specific databases                
                    
                    Number of explicitly cross-referenced databases: 129
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.63   Gln (Q) 3.87   Leu (L) 9.84   Ser (S) 6.71
                    Arg (R) 5.46   Glu (E) 6.13   Lys (K) 5.25   Thr (T) 5.61
                    Asn (N) 4.13   Gly (G) 7.12   Met (M) 2.48   Trp (W) 1.31
                    Asp (D) 5.30   His (H) 2.19   Phe (F) 4.03   Tyr (Y) 3.04
                    Cys (C) 1.27   Ile (I) 6.00   Pro (P) 4.73   Val (V) 6.75
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 518650
                    Total number of entries encoded on a Plasmid: 204370
                    Total number of entries encoded on a Plastid: 12595
                    Total number of entries encoded on a Plastid; Apicoplast: 368
                    Total number of entries encoded on a Plastid; Chloroplast: 137037
                    Total number of entries encoded on a Plastid; Cyanelle: 8
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 448