Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2011_07 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2011_07 of 28-Jun-2011 of UniProtKB/TrEMBL contains 16014672 sequence entries,
                    comprising 5170073575 amino acids .
                    
                    625203 sequences have been added since release 2011_06, the sequence data of
                    1449 existing entries has been updated and the annotations of
                    3624041 entries have been revised. This represents an increase of 4%.
                    
                    Number of fragments: 2573872
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           20440     0.13%
                    2: Evidence at transcript level       526588     3.29%
                    3: Inferred from homology            3186638    19.90%
                    4: Predicted                        12281006    76.69%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 373195
                    
                    The first twenty species represent 1321948 sequences:   8.3 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:17882
                    2x:65492
                    3x:33157
                    4x:19735
                    5x:11988
                    6x: 8372
                    7x: 6172
                    8x: 4762
                    9x: 3829
                    10x: 7520
                    11- 20x:18904
                    21- 50x: 6685
                    51-100x: 2368
                    >100x: 5389
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     389280  Human immunodeficiency virus 1
                    2      95278  Oryza sativa subsp. japonica (Rice)
                    3      89587  Homo sapiens (Human)
                    4      60204  Hepatitis C virus
                    5      56435  uncultured bacterium
                    6      56060  Mus musculus (Mouse)
                    7      51510  Danio rerio (Zebrafish) (Brachydanio rerio)
                    8      50950  Vitis vinifera (Grape)
                    9      50471  Trichomonas vaginalis
                    10      45282  Arabidopsis thaliana (Mouse-ear cress)
                    11      44833  Hepatitis B virus (HBV)
                    12      44072  Populus trichocarpa (Western balsam poplar) 
                    13      42023  Zea mays (Maize)
                    14      39841  Paramecium tetraurelia
                    15      39364  Oryza sativa subsp. indica (Rice)
                    16      34795  Physcomitrella patens subsp. patens (Moss)
                    17      33645  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    18      33270  Selaginella moellendorffii (Spikemoss)
                    19      32625  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
                    20      32423  Rattus norvegicus (Rat)
                    21      32349  Drosophila melanogaster (Fruit fly)
                    22      31830  Caenorhabditis remanei (Caenorhabditis vulgaris)
                    23      31298  Ricinus communis (Castor bean)
                    24      30817  Trypanosoma cruzi
                    25      30523  Daphnia pulex (Water flea)
                    26      29162  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    27      29024  Oikopleura dioica (Tunicate)
                    28      28089  Tetraodon nigroviridis (Green puffer)
                    29      27602  Bos taurus (Bovine)
                    30      27022  Canis familiaris (Dog) (Canis lupus familiaris)
                    31      24811  Nematostella vectensis (Starlet sea anemone)
                    32      24669  Gallus gallus (Chicken)
                    33      24622  Sus scrofa (Pig)
                    34      23595  Ralstonia solanacearum (Pseudomonas solanacearum)
                    35      23242  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    36      23115  Perkinsus marinus ATCC 50983
                    37      22627  Escherichia coli
                    38      21639  Caenorhabditis elegans
                    39      21509  Hordeum vulgare var. distichum (Two-rowed barley)
                    40      21087  Ixodes scapularis (Black-legged tick) (Deer tick)
                    41      20435  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
                    42      19147  Toxoplasma gondii
                    43      18889  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    44      18771  mine drainage metagenome
                    45      18069  Drosophila simulans (Fruit fly)
                    46      17933  Caenorhabditis briggsae
                    47      17843  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    48      17843  Ailuropoda melanoleuca (Giant panda)
                    49      17604  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
                    50      16976  Tribolium castaneum (Red flour beetle)
                    51      16928  Drosophila yakuba (Fruit fly)
                    52      16735  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    53      16706  Drosophila persimilis (Fruit fly)
                    54      16425  Ectocarpus siliculosus (Brown alga)
                    55      16295  Loa loa (Eye worm)
                    56      16243  Trichinella spiralis (Trichina worm)
                    57      16238  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    58      16237  Melampsora larici-populina 98AG31
                    59      16179  Drosophila sechellia (Fruit fly)
                    60      15979  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    61      15774  Phaeosphaeria nodorum (strain SN15 / FGSC 10173) (Glume blotch fungus) 
                    62      15715  Naegleria gruberi (Amoeba)
                    63      15641  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    64      15418  Drosophila willistoni (Fruit fly)
                    65      15247  Tetrahymena thermophila SB210
                    66      15138  Drosophila ananassae (Fruit fly)
                    67      15029  Harpegnathos saltator
                    68      14956  Anopheles gambiae (African malaria mosquito)
                    69      14921  Drosophila erecta (Fruit fly)
                    70      14828  Hepatitis C virus subtype 1a
                    71      14819  Chlamydomonas reinhardtii (Chlamydomonas smithii)
                    72      14791  Camponotus floridanus
                    73      14774  Drosophila mojavensis (Fruit fly)
                    74      14696  Drosophila virilis (Fruit fly)
                    75      14671  Plasmodium chabaudi
                    76      14651  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    77      14634  Volvox carteri f. nagariensis
                    78      14322  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
                    79      14250  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    80      14064  Hepatitis C virus subtype 1b
                    81      13963  Acromyrmex echinatior (Panamanian leafcutter ant)
                    82      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
                    83      13514  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    84      13510  Schistosoma mansoni (Blood fluke)
                    85      13470  Plasmodium falciparum
                    86      13338  Aspergillus flavus 
                    87      13281  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    88      13177  Magnaporthe oryzae (strain 70-15 / FGSC 8958) (Rice blast fungus) 
                    89      13123  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
                    90      12983  Albugo laibachii Nc14
                    91      12950  Stigmatella aurantiaca (strain DW4/3-1)
                    92      12946  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    93      12692  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    94      12633  Glycine max (Soybean) (Glycine hispida)
                    95      12526  Xenopus laevis (African clawed frog)
                    96      12522  Leptosphaeria maculans (Blackleg fungus) (Phoma lingam)
                    97      12444  Polysphondylium pallidum (Cellular slime mold)
                    98      12352  Dictyostelium purpureum (Slime mold)
                    99      12206  Dictyostelium fasciculatum
                    100      12004  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    101      12002  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
                    102      11717  Thalassiosira pseudonana (Marine diatom)
                    103      11703  Salpingoeca sp. ATCC 50818
                    104      11694  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
                    105      11647  Anopheles darlingi (Mosquito)
                    106      11645  Plasmodium berghei (strain Anka)
                    107      11599  Aspergillus oryzae (strain ATCC 42149 / RIB 40)
                    108      11563  Trichoplax adhaerens (Trichoplax reptans)
                    109      11510  Aureococcus anophagefferens
                    110      11497  Brugia malayi (Filarial nematode worm)
                    111      11357  Helicobacter pylori (Campylobacter pylori)
                    112      11287  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
                    113      11211  Ktedonobacter racemifer DSM 44963
                    114      10966  Streptomyces clavuligerus ATCC 27064
                    115      10919  Schistosoma japonicum (Blood fluke)
                    116      10842  Pediculus humanus subsp. corporis (Body louse)
                    117      10828  Chaetomium globosum  
                    118      10780  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
                    119      10579  Metarhizium robertsii (strain ARSEF 23) (Metarhizium anisopliae)
                    120      10558  Podospora anserina (strain S / DSM 980 / FGSC 10383) (Pleurage anserina)
                    121      10387  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    122      10382  Pseudomonas syringae pv. glycinea str. race 4
                    123      10365  Aspergillus nidulans FGSC A4
                    124      10357  Phaeodactylum tricornutum (strain CCAP 1055/1)
                    125      10276  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
                    126      10213  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    127      10206  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
                    128      10169  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    129      10139  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
                    130      10113  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
                    131      10096  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    132      10089  Ajellomyces dermatitidis ATCC 18188
                    133      10063  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    134      10034  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    135      10015  Streptomyces bingchenggensis (strain BCW-1)
                    136       9913  Rabies virus
                    137       9835  Chlorella variabilis
                    138       9830  Metarhizium acridum (strain CQMa 102)
                    139       9717  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    140       9663  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
                    141       9544  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    142       9519  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    143       9496  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    144       9484  Streptomyces violaceusniger Tu 4113
                    145       9446  Ajellomyces capsulata (strain H88) (Darling's disease fungus) 
                    146       9421  Salmo salar (Atlantic salmon)
                    147       9238  Monosiga brevicollis (Choanoflagellate)
                    148       9212  Candida albicans (Yeast)
                    149       9202  Amycolatopsis mediterranei (strain U-32)
                    150       9177  Streptomyces himastatinicus ATCC 53653
                    151       9173  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    152       9162  Emericella nidulans (Aspergillus nidulans)
                    153       9154  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    154       9136  Pseudomonas syringae pv. pisi str. 1704B
                    155       9114  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    156       9075  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    157       9026  Neurospora crassa 
                    158       9022  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
                    159       8976  Dictyostelium discoideum (Slime mold)
                    160       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    161       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    162       8940  Burkholderia sp. TJI49
                    163       8900  Catenulispora acidiphila 
                    164       8870  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
                    165       8812  Trypanosoma brucei
                    166       8808  Aspergillus clavatus 
                    167       8777  Pseudomonas syringae pv. japonica str. M301072PT
                    168       8757  Rhodococcus sp. (strain RHA1)
                    169       8709  Paracoccidioides brasiliensis (strain Pb18)
                    170       8705  Trichophyton rubrum CBS 118892
                    171       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    172       8676  Trichophyton equinum CBS 127.97
                    173       8671  Arthroderma otae (strain CBS 113480) (Microsporum canis)
                    174       8610  Batrachochytrium dendrobatidis JAM81
                    175       8599  Entamoeba dispar SAW760
                    176       8520  Trichophyton tonsurans CBS 112818
                    177       8437  Plesiocystis pacifica SIR-1
                    178       8394  Streptomyces sp. AA4
                    179       8374  Capsaspora owczarzaki ATCC 30864
                    180       8311  Grosmannia clavigera (strain kw1407 / UAMH 11150) (Blue stain fungus) 
                    181       8302  Entamoeba histolytica
                    182       8296  Bradyrhizobium japonicum
                    183       8274  Leishmania major
                    184       8249  Microscilla marina ATCC 23134
                    185       8202  Streptomyces sviceus ATCC 29083
                    186       8201  Microcoleus chthonoplastes PCC 7420
                    187       8190  Leishmania infantum
                    188       8163  Frankia sp. EUN1f
                    189       8154  Pseudomonas aeruginosa
                    190       8154  Burkholderia xenovorans (strain LB400)
                    191       8093  uncultured archaeon
                    192       8044  Leishmania mexicana MHOM/GT/2001/U1103
                    193       7997  Leishmania braziliensis
                    194       7966  Trichophyton verrucosum (strain HKI 0517)
                    195       7961  Leishmania donovani BPK282A1
                    196       7955  Ostreococcus tauri
                    197       7943  Rhodococcus opacus (strain B4)
                    198       7917  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    199       7916  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    200       7866  Streptomyces ghanaensis ATCC 14672
                    201       7856  Acaryochloris marina (strain MBIC 11017)
                    202       7834  Paracoccidioides brasiliensis (strain Pb03)
                    203       7823  Burkholderia sp. Ch1-1
                    204       7808  Plasmodium yoelii yoelii
                    205       7718  Uncinocarpus reesii (strain UAMH 1704)
                    206       7706  Streptomyces viridochromogenes DSM 40736
                    207       7571  Clostridium hathewayi DSM 13479
                    208       7563  Burkholderia pseudomallei MSHR346
                    209       7528  Streptomyces sp. C
                    210       7523  Streptomyces lividans TK24
                    211       7519  Solibacter usitatus (strain Ellin6076)
                    212       7492  Tuber melanosporum (strain Mel28) (Perigord black truffle)
                    213       7490  Pseudomonas syringae pv. mori str. 301020
                    214       7475  Burkholderia pseudomallei 1710a
                    215       7474  Streptomyces coelicolor
                    216       7465  Burkholderia pseudomallei Pakistan 9
                    217       7459  Burkholderia sp. H160
                    218       7451  Streptomyces venezuelae ATCC 10712
                    219       7443  Kitasatospora setae  
                    220       7385  Ostreococcus lucimarinus (strain CCE9901)
                    221       7383  Lyngbya majuscula 3L
                    222       7367  Burkholderia pseudomallei 576
                    223       7351  Burkholderia gladioli BSR3
                    224       7349  Burkholderia pseudomallei 305
                    225       7274  Clostridium bolteae ATCC BAA-613
                    226       7241  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    227       7231  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    228       7227  Streptomyces avermitilis
                    229       7177  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    230       7162  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    231       7145  Giardia intestinalis (strain ATCC 50803 / WB clone C6) (Giardia lamblia)
                    232       7140  Burkholderia pseudomallei 1106b
                    233       7130  Burkholderia phymatum (strain DSM 17167 / STM815)
                    234       7124  Burkholderia ambifaria MEX-5
                    235       7120  Pseudomonas syringae Cit 7
                    236       7111  Neospora caninum Liverpool
                    237       7102  Medicago truncatula (Barrel medic) (Medicago tribuloides)
                    238       7079  Frankia sp. (strain EuI1c)
                    239       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    240       7017  Myxococcus xanthus (strain DK 1622)
                    241       7005  Mucilaginibacter paludis DSM 18603
                    242       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    243       6974  Rhodopirellula baltica
                    244       6959  Frankia sp. (strain EAN1pec)
                    245       6940  Pseudomonas syringae pv. oryzae str. 1_6
                    246       6936  Streptococcus pneumoniae
                    247       6932  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    248       6931  Streptomyces sp. Mg1
                    249       6923  Burkholderia ambifaria IOP40-10
                    250       6903  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          276489 (  2%)
                    Bacteria       10249158 ( 64%)
                    Eukaryota       4295771 ( 27%)
                    Viruses         1154907 (  7%)
                    Other             38346 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  89623 (  2%)           (  1%)
                    Other Mammalia        287956 (  7%)           (  2%)
                    Other Vertebrata      391139 (  9%)           (  2%)
                    Viridiplantae         898404 ( 21%)           (  6%)
                    Fungi                 909522 ( 21%)           (  6%)
                    Insecta               678308 ( 16%)           (  4%)
                    Nematoda              136890 (  3%)           (  1%)
                    Other                 903929 ( 21%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  351954             1001-1100    94575
                    51- 100 1281350             1101-1200    66639
                    101- 150 1474620             1201-1300    46265
                    151- 200 1425606             1301-1400    30418
                    201- 250 1434663             1401-1500    24337
                    251- 300 1387723             1501-1600    17449
                    301- 350 1262106             1601-1700    13138
                    351- 400  971394             1701-1800    10149
                    401- 450  825889             1801-1900     8266
                    451- 500  689230             1901-2000     6967
                    501- 550  464073             2001-2100     5629
                    551- 600  358548             2101-2200     5699
                    601- 650  260922             2201-2300     4472
                    651- 700  203475             2301-2400     3596
                    701- 750  175049             2401-2500     3041
                    751- 800  156908             >2500        26594
                    801- 850  117022
                    851- 900  106195
                    901- 950   72478
                    951-1000   54361
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   322 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    19468522                1.22                                                    
                    Submitted to EMBL/GenBank/DDBJ  11643163  10309897      0.73                                                    
                    Journal                          7516078   6867907      0.47                                                    
                    Submitted to other databases      169904    168690      0.01                                                    
                    Thesis                              7918      7860     <0.01                                                    
                    Book citation                       5678      5627     <0.01                                                    
                    Other                             125781    123720      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 321753
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                      14906982                0.93                                                    
                    CATALYTIC ACTIVITY               1466694   1352222      0.09     4                                              
                    CAUTION                          4178438   4178432      0.26     2                                              
                    COFACTOR                          482140    458110      0.03     8                                              
                    DOMAIN                             37364     35267     <0.01     9                                              
                    FUNCTION                         1753135   1605497      0.11     3                                              
                    INTERACTION                         5147      5147     <0.01    11                                              
                    MISCELLANEOUS                      36169     36111     <0.01    10                                              
                    PATHWAY                           735212    678945      0.05     6                                              
                    SIMILARITY                       4331049   3740514      0.27     1                                              
                    SUBCELLULAR LOCATION             1220775   1212231      0.08     5                                              
                    SUBUNIT                           660859    658421      0.04     7                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       5159417                0.32                                                    
                    CHAIN                             522853    413143      0.03     2                                              
                    NON_TER                          4288013   2572325      0.27     1                                              
                    SIGNAL                            347954    347886      0.02     3                                              
                    TRANSIT                              597       597     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             176463887               11.02                                                    
                    AGD                                 2533      2533     <0.01    77   Organism-specific databases                
                    ANU-2DPAGE                            56        56     <0.01    94   2D gel databases                           
                    Allergome                           2089      1503     <0.01    81   Protein family/group databases             
                    ArachnoServer                         66        66     <0.01    93   Organism-specific databases                
                    ArrayExpress                       91561     91551      0.01    49   Gene expression databases                  
                    BRENDA                              2778      2746     <0.01    75   Enzyme and pathway databases               
                    Bgee                              140608    140325      0.01    46   Gene expression databases                  
                    BioCyc                           1623133   1588570      0.10    21   Enzyme and pathway databases               
                    CAZy                               74466     69958     <0.01    53   Protein family/group databases             
                    CGD                                 6745      6745     <0.01    72   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    98   2D gel databases                           
                    CTD                               236892    235942      0.01    39   Organism-specific databases                
                    CYGD                                   2         2     <0.01   100   Organism-specific databases                
                    DIP                                 2741      2736     <0.01    76   Protein-protein interaction databases      
                    EMBL                            17909347  15966935      1.12     3   Sequence databases                         
                    Ensembl                           286566    260223      0.02    35   Genome annotation databases                
                    EnsemblBacteria                   565513    531670      0.04    29   Genome annotation databases                
                    EnsemblFungi                      108257    108162      0.01    47   Genome annotation databases                
                    EnsemblMetazoa                    320248    297266      0.02    32   Genome annotation databases                
                    EnsemblPlants                     259858    232618      0.02    37   Genome annotation databases                
                    EnsemblProtists                    72635     71476     <0.01    54   Genome annotation databases                
                    EuPathDB                          182662    182661      0.01    44   Organism-specific databases                
                    FlyBase                           195663    194113      0.01    41   Organism-specific databases                
                    GO                              31157065  10060315      1.95     2   Ontologies                                 
                    Gene3D                           6440312   5160121      0.40     6   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   102   Organism-specific databases                
                    GeneID                           5944530   5823628      0.37     9   Genome annotation databases                
                    GeneTree                         1154500   1154154      0.07    23   Phylogenomic databases                     
                    Genevestigator                     97737     97732      0.01    48   Gene expression databases                  
                    GenoList                           14747     14475     <0.01    67   Organism-specific databases                
                    GenomeReviews                    3984921   3894010      0.25    12   Genome annotation databases                
                    Gramene                            68676     68676     <0.01    56   Organism-specific databases                
                    H-InvDB                              596       485     <0.01    85   Organism-specific databases                
                    HAMAP                            1144583   1131375      0.07    25   Family and domain databases                
                    HGNC                               75285     73542     <0.01    51   Organism-specific databases                
                    HOGENOM                          2194490   2194448      0.14    18   Phylogenomic databases                     
                    HOVERGEN                          316248    316237      0.02    33   Phylogenomic databases                     
                    HSSP                              253028    252760      0.02    38   3D structure databases                     
                    IPI                               310308    309960      0.02    34   Sequence databases                         
                    InParanoid                        193882    193815      0.01    43   Phylogenomic databases                     
                    IntAct                             16067     16067     <0.01    65   Protein-protein interaction databases      
                    InterPro                        31800274  11419830      1.99     1   Family and domain databases                
                    KEGG                             4943225   4843448      0.31    11   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    73   Organism-specific databases                
                    Leproma                              936       935     <0.01    84   Organism-specific databases                
                    MEROPS                             70577     69093     <0.01    55   Protein family/group databases             
                    MGI                                32197     32133     <0.01    60   Organism-specific databases                
                    MINT                                8941      8941     <0.01    70   Protein-protein interaction databases      
                    NMPDR                             920687    920683      0.06    26   Genome annotation databases                
                    NextBio                            46404     46401     <0.01    58   Other                                      
                    OMA                              2422851   2422849      0.15    16   Phylogenomic databases                     
                    OrthoDB                           582645    582477      0.04    28   Phylogenomic databases                     
                    PANTHER                          1961125   1890779      0.12    20   Family and domain databases                
                    PDB                                14859      8735     <0.01    66   3D structure databases                     
                    PDBsum                             14583      8586     <0.01    68   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    89   2D gel databases                           
                    PIR                               174865    142031      0.01    45   Sequence databases                         
                    PIRSF                             863772    863772      0.05    27   Family and domain databases                
                    PMAP-CutDB                           251       251     <0.01    87   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    99   2D gel databases                           
                    PRIDE                             214206    213977      0.01    40   Proteomic databases                        
                    PRINTS                           2409502   2143892      0.15    17   Family and domain databases                
                    PROSITE                          7354481   4921430      0.46     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    97   Enzyme and pathway databases               
                    PeptideAtlas                         147       147     <0.01    88   Proteomic databases                        
                    PeroxiBase                          2506      2497     <0.01    78   Protein family/group databases             
                    Pfam                            14501527  10797337      0.91     4   Family and domain databases                
                    PharmGKB                              83        83     <0.01    92   Organism-specific databases                
                    PhosphoSite                         1561      1561     <0.01    82   PTM databases                              
                    PhylomeDB                         373981    373949      0.02    31   Phylogenomic databases                     
                    ProDom                            272283    255453      0.02    36   Family and domain databases                
                    ProMEX                               323       323     <0.01    86   Proteomic databases                        
                    ProtClustDB                      2729541   2729530      0.17    15   Phylogenomic databases                     
                    ProteinModelPortal               5565778   5560726      0.35    10   3D structure databases                     
                    PseudoCAP                           4342      4339     <0.01    74   Organism-specific databases                
                    REBASE                             19136     18513     <0.01    63   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   92        91     <0.01    91   2D gel databases                           
                    RGD                                22588     22428     <0.01    61   Organism-specific databases                
                    Reactome                              93        90     <0.01    90   Enzyme and pathway databases               
                    RefSeq                           5968778   5833676      0.37     8   Sequence databases                         
                    SGD                                   12        12     <0.01    96   Organism-specific databases                
                    SMART                            3055907   2363997      0.19    13   Family and domain databases                
                    SMR                              2151693   2151693      0.13    19   3D structure databases                     
                    STRING                           1203361   1203193      0.08    22   Protein-protein interaction databases      
                    SUPFAM                           6180404   5106744      0.39     7   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    95   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01   101   2D gel databases                           
                    TAIR                               16958     16877     <0.01    64   Organism-specific databases                
                    TCDB                                2429      2420     <0.01    79   Protein family/group databases             
                    TIGR                              194996    187937      0.01    42   Genome annotation databases                
                    TIGRFAMs                         2989033   2732133      0.19    14   Family and domain databases                
                    TubercuList                         2119      2114     <0.01    80   Organism-specific databases                
                    UCSC                               49795     49795     <0.01    57   Genome annotation databases                
                    UniGene                           474176    444961      0.03    30   Sequence databases                         
                    VectorBase                         78956     78444     <0.01    50   Genome annotation databases                
                    World-2DPAGE                         943       938     <0.01    83   2D gel databases                           
                    WormBase                           41505     41373     <0.01    59   Organism-specific databases                
                    Xenbase                            13197     13165     <0.01    69   Organism-specific databases                
                    ZFIN                               21555     21550     <0.01    62   Organism-specific databases                
                    dictyBase                           7671      7671     <0.01    71   Organism-specific databases                
                    eggNOG                           1148080   1148080      0.07    24   Phylogenomic databases                     
                    euHCVdb                            75268     75265     <0.01    52   Organism-specific databases                
                    
                    Number of explicitly cross-referenced databases: 129
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.64   Gln (Q) 3.87   Leu (L) 9.85   Ser (S) 6.70
                    Arg (R) 5.46   Glu (E) 6.12   Lys (K) 5.24   Thr (T) 5.62
                    Asn (N) 4.13   Gly (G) 7.12   Met (M) 2.48   Trp (W) 1.31
                    Asp (D) 5.30   His (H) 2.19   Phe (F) 4.03   Tyr (Y) 3.04
                    Cys (C) 1.26   Ile (I) 6.01   Pro (P) 4.72   Val (V) 6.75
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 538972
                    Total number of entries encoded on a Plasmid: 209211
                    Total number of entries encoded on a Plastid: 13181
                    Total number of entries encoded on a Plastid; Apicoplast: 364
                    Total number of entries encoded on a Plastid; Chloroplast: 140002
                    Total number of entries encoded on a Plastid; Cyanelle: 8
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 448