Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2011_05 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2011_05 of 03-May-2011 of UniProtKB/TrEMBL contains 15062837 sequence entries,
                    comprising 4869006741 amino acids .
                    
                    550246 sequences have been added since release 2011_04, the sequence data of
                    173 existing entries has been updated and the annotations of
                    1268231 entries have been revised. This represents an increase of 4%.
                    
                    Number of fragments: 2459055
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           17836     0.12%
                    2: Evidence at transcript level       503763     3.34%
                    3: Inferred from homology            3014351    20.01%
                    4: Predicted                        11526887    76.53%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 358626
                    
                    The first twenty species represent 1299495 sequences:   8.6 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:17156
                    2x:62958
                    3x:31792
                    4x:18927
                    5x:11539
                    6x: 8120
                    7x: 5960
                    8x: 4591
                    9x: 3731
                    10x: 7337
                    11- 20x:18277
                    21- 50x: 6434
                    51-100x: 2310
                    >100x: 5087
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     381463  Human immunodeficiency virus 1
                    2      95343  Oryza sativa subsp. japonica (Rice)
                    3      85567  Homo sapiens (Human)
                    4      59288  Hepatitis C virus
                    5      56027  Mus musculus (Mouse)
                    6      54673  uncultured bacterium
                    7      51441  Danio rerio (Zebrafish) (Brachydanio rerio)
                    8      50945  Vitis vinifera (Grape)
                    9      50471  Trichomonas vaginalis
                    10      44065  Populus trichocarpa (Western balsam poplar) 
                    11      43552  Hepatitis B virus (HBV)
                    12      42015  Zea mays (Maize)
                    13      39841  Paramecium tetraurelia
                    14      39369  Oryza sativa subsp. indica (Rice)
                    15      38766  Arabidopsis thaliana (Mouse-ear cress)
                    16      34795  Physcomitrella patens subsp. patens (Moss)
                    17      33649  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    18      33195  Selaginella moellendorffii (Spikemoss)
                    19      32625  Arabidopsis lyrata subsp. lyrata
                    20      32405  Rattus norvegicus (Rat)
                    21      32138  Drosophila melanogaster (Fruit fly)
                    22      31830  Caenorhabditis remanei (Caenorhabditis vulgaris)
                    23      31296  Ricinus communis (Castor bean)
                    24      30818  Trypanosoma cruzi
                    25      30506  Daphnia pulex (Water flea)
                    26      29116  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    27      29024  Oikopleura dioica (Tunicate)
                    28      28089  Tetraodon nigroviridis (Green puffer)
                    29      27503  Bos taurus (Bovine)
                    30      26977  Canis familiaris (Dog) (Canis lupus familiaris)
                    31      25274  Ralstonia solanacearum (Pseudomonas solanacearum)
                    32      24811  Nematostella vectensis (Starlet sea anemone)
                    33      24620  Gallus gallus (Chicken)
                    34      24194  Sus scrofa (Pig)
                    35      23115  Perkinsus marinus ATCC 50983
                    36      22601  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    37      22487  Escherichia coli
                    38      21491  Caenorhabditis elegans
                    39      21086  Ixodes scapularis (Black-legged tick) (Deer tick)
                    40      20435  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
                    41      18889  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    42      18771  mine drainage metagenome
                    43      18064  Drosophila simulans (Fruit fly)
                    44      17933  Caenorhabditis briggsae
                    45      17844  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    46      17800  Ailuropoda melanoleuca (Giant panda)
                    47      17605  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
                    48      16975  Tribolium castaneum (Red flour beetle)
                    49      16928  Drosophila yakuba (Fruit fly)
                    50      16735  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    51      16706  Drosophila persimilis (Fruit fly)
                    52      16425  Ectocarpus siliculosus (Brown alga)
                    53      16295  Loa loa (Eye worm)
                    54      16243  Trichinella spiralis (Trichina worm)
                    55      16240  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    56      16179  Drosophila sechellia (Fruit fly)
                    57      15981  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    58      15849  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    59      15715  Naegleria gruberi (Amoeba)
                    60      15646  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    61      15417  Drosophila willistoni (Fruit fly)
                    62      15247  Tetrahymena thermophila SB210
                    63      15136  Drosophila ananassae (Fruit fly)
                    64      15029  Harpegnathos saltator
                    65      14920  Drosophila erecta (Fruit fly)
                    66      14818  Chlamydomonas reinhardtii (Chlamydomonas smithii)
                    67      14791  Camponotus floridanus
                    68      14774  Drosophila mojavensis (Fruit fly)
                    69      14757  Anopheles gambiae (African malaria mosquito)
                    70      14695  Drosophila virilis (Fruit fly)
                    71      14671  Plasmodium chabaudi
                    72      14651  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    73      14634  Volvox carteri f. nagariensis
                    74      14628  Toxoplasma gondii
                    75      14322  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
                    76      14254  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    77      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
                    78      13630  Hepatitis C virus subtype 1b
                    79      13519  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    80      13510  Schistosoma mansoni (Blood fluke)
                    81      13446  Plasmodium falciparum
                    82      13343  Aspergillus flavus 
                    83      13282  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    84      13182  Magnaporthe oryzae (strain 70-15 / FGSC 8958) (Rice blast fungus) 
                    85      13125  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
                    86      12983  Albugo laibachii Nc14
                    87      12950  Stigmatella aurantiaca (strain DW4/3-1)
                    88      12950  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    89      12697  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    90      12624  Glycine max (Soybean) (Glycine hispida)
                    91      12539  Xenopus laevis (African clawed frog)
                    92      12527  Leptosphaeria maculans (Blackleg fungus) (Phoma lingam)
                    93      12444  Polysphondylium pallidum (Cellular slime mold)
                    94      12352  Dictyostelium purpureum (Slime mold)
                    95      12010  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    96      12008  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
                    97      11827  Hepatitis C virus subtype 1a
                    98      11715  Thalassiosira pseudonana (Marine diatom)
                    99      11698  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
                    100      11647  Anopheles darlingi (Mosquito)
                    101      11645  Plasmodium berghei (strain Anka)
                    102      11605  Aspergillus oryzae (strain ATCC 42149 / RIB 40)
                    103      11563  Trichoplax adhaerens (Trichoplax reptans)
                    104      11510  Aureococcus anophagefferens
                    105      11497  Brugia malayi (Filarial nematode worm)
                    106      11352  Helicobacter pylori (Campylobacter pylori)
                    107      11280  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
                    108      11211  Ktedonobacter racemifer DSM 44963
                    109      10966  Streptomyces clavuligerus ATCC 27064
                    110      10919  Schistosoma japonicum (Blood fluke)
                    111      10851  Chaetomium globosum (Soil fungus)
                    112      10842  Pediculus humanus subsp. corporis (Body louse)
                    113      10785  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
                    114      10665  Podospora anserina
                    115      10581  Metarhizium anisopliae ARSEF 23
                    116      10403  Neurospora crassa
                    117      10392  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    118      10370  Aspergillus nidulans FGSC A4
                    119      10357  Phaeodactylum tricornutum (strain CCAP 1055/1)
                    120      10302  Rabies virus
                    121      10276  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
                    122      10216  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    123      10212  Coccidioides posadasii str. Silveira
                    124      10139  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
                    125      10139  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    126      10113  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
                    127      10099  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    128      10068  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    129      10047  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    130      10015  Streptomyces bingchenggensis (strain BCW-1)
                    131       9830  Metarhizium acridum CQMa 102
                    132       9755  Chlorella variabilis
                    133       9722  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    134       9669  Cryptococcus neoformans (Filobasidiella neoformans)
                    135       9663  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
                    136       9548  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    137       9523  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    138       9501  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    139       9484  Streptomyces violaceusniger Tu 4113
                    140       9482  Trypanosoma brucei
                    141       9446  Ajellomyces capsulatus H88
                    142       9386  Salmo salar (Atlantic salmon)
                    143       9239  Monosiga brevicollis (Choanoflagellate)
                    144       9223  Candida albicans (Yeast)
                    145       9202  Amycolatopsis mediterranei (strain U-32)
                    146       9177  Streptomyces himastatinicus ATCC 53653
                    147       9175  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    148       9166  Emericella nidulans (Aspergillus nidulans)
                    149       9158  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    150       9114  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    151       9079  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    152       9027  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
                    153       8982  Dictyostelium discoideum (Slime mold)
                    154       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    155       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    156       8926  Burkholderia sp. TJI49
                    157       8900  Catenulispora acidiphila 
                    158       8875  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
                    159       8813  Aspergillus clavatus 
                    160       8757  Rhodococcus sp. (strain RHA1)
                    161       8714  Paracoccidioides brasiliensis (strain Pb18)
                    162       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    163       8676  Arthroderma otae (strain CBS 113480) (Microsporum canis)
                    164       8599  Entamoeba dispar SAW760
                    165       8437  Plesiocystis pacifica SIR-1
                    166       8394  Streptomyces sp. AA4
                    167       8374  Capsaspora owczarzaki ATCC 30864
                    168       8311  Grosmannia clavigera kw1407
                    169       8302  Entamoeba histolytica
                    170       8274  Leishmania major
                    171       8249  Microscilla marina ATCC 23134
                    172       8202  Streptomyces sviceus ATCC 29083
                    173       8201  Microcoleus chthonoplastes PCC 7420
                    174       8190  Leishmania infantum
                    175       8163  Frankia sp. EUN1f
                    176       8154  Burkholderia xenovorans (strain LB400)
                    177       8134  Pseudomonas aeruginosa
                    178       8044  Leishmania mexicana MHOM/GT/2001/U1103
                    179       7997  Leishmania braziliensis
                    180       7978  Toxoplasma gondii ME49
                    181       7969  Trichophyton verrucosum (strain HKI 0517)
                    182       7961  Leishmania donovani BPK282A1
                    183       7955  Ostreococcus tauri
                    184       7943  Rhodococcus opacus (strain B4)
                    185       7918  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    186       7917  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    187       7866  Streptomyces ghanaensis ATCC 14672
                    188       7856  Acaryochloris marina (strain MBIC 11017)
                    189       7839  Paracoccidioides brasiliensis (strain Pb03)
                    190       7823  Burkholderia sp. Ch1-1
                    191       7808  Plasmodium yoelii yoelii
                    192       7721  Uncinocarpus reesii (strain UAMH 1704)
                    193       7706  Streptomyces viridochromogenes DSM 40736
                    194       7688  uncultured archaeon
                    195       7607  Bradyrhizobium japonicum USDA 110
                    196       7571  Clostridium hathewayi DSM 13479
                    197       7563  Burkholderia pseudomallei MSHR346
                    198       7528  Streptomyces sp. C
                    199       7523  Streptomyces lividans TK24
                    200       7519  Solibacter usitatus (strain Ellin6076)
                    201       7503  Tuber melanosporum (Perigord truffle)
                    202       7475  Burkholderia pseudomallei 1710a
                    203       7474  Streptomyces coelicolor
                    204       7465  Burkholderia pseudomallei Pakistan 9
                    205       7459  Burkholderia sp. H160
                    206       7443  Kitasatospora setae  
                    207       7385  Ostreococcus lucimarinus (strain CCE9901)
                    208       7379  Streptomyces cf. griseus XylebKG-1
                    209       7367  Burkholderia pseudomallei 576
                    210       7349  Burkholderia pseudomallei 305
                    211       7274  Clostridium bolteae ATCC BAA-613
                    212       7241  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    213       7231  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    214       7227  Streptomyces avermitilis
                    215       7177  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    216       7165  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    217       7146  Giardia intestinalis (strain ATCC 50803 / WB clone C6) (Giardia lamblia)
                    218       7140  Burkholderia pseudomallei 1106b
                    219       7130  Burkholderia phymatum (strain DSM 17167 / STM815)
                    220       7124  Burkholderia ambifaria MEX-5
                    221       7111  Neospora caninum Liverpool
                    222       7098  Medicago truncatula (Barrel medic) (Medicago tribuloides)
                    223       7079  Frankia sp. (strain EuI1c)
                    224       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    225       7016  Myxococcus xanthus (strain DK 1622)
                    226       7005  Mucilaginibacter paludis DSM 18603
                    227       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    228       6975  Rhodopirellula baltica
                    229       6959  Frankia sp. (strain EAN1pec)
                    230       6932  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    231       6931  Streptomyces sp. Mg1
                    232       6923  Burkholderia ambifaria IOP40-10
                    233       6903  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    234       6902  Saccharopolyspora erythraea (strain NRRL 23338)
                    235       6892  Streptomyces roseosporus NRRL 15998
                    236       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    237       6877  Streptococcus pneumoniae
                    238       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    239       6866  Streptomyces pristinaespiralis ATCC 25486
                    240       6865  Burkholderia sp. (strain CCGE1002)
                    241       6859  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    242       6832  Rhizobium loti (Mesorhizobium loti)
                    243       6817  Clostridium asparagiforme DSM 15981
                    244       6798  Achromobacter xylosoxidans (strain A8)
                    245       6771  Burkholderia pseudomallei (strain 1106a)
                    246       6769  Sinorhizobium meliloti AK83
                    247       6740  Burkholderia pseudomallei (strain 668)
                    248       6736  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    249       6725  Burkholderia graminis C4D1M
                    250       6713  Rhizobium leguminosarum bv. viciae (strain 3841)
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          259720 (  2%)
                    Bacteria        9535785 ( 63%)
                    Eukaryota       4103423 ( 27%)
                    Viruses         1126105 (  7%)
                    Other             37803 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  85603 (  2%)           (  1%)
                    Other Mammalia        283590 (  7%)           (  2%)
                    Other Vertebrata      381870 (  9%)           (  3%)
                    Viridiplantae         861166 ( 21%)           (  6%)
                    Fungi                 837322 ( 20%)           (  6%)
                    Insecta               642237 ( 16%)           (  4%)
                    Nematoda              136538 (  3%)           (  1%)
                    Other                 875097 ( 21%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  325155             1001-1100    88748
                    51- 100 1200468             1101-1200    62776
                    101- 150 1382918             1201-1300    43560
                    151- 200 1336825             1301-1400    28622
                    201- 250 1344701             1401-1500    23003
                    251- 300 1300876             1501-1600    16509
                    301- 350 1182936             1601-1700    12425
                    351- 400  911904             1701-1800     9628
                    401- 450  773886             1801-1900     7841
                    451- 500  647106             1901-2000     6629
                    501- 550  436356             2001-2100     5341
                    551- 600  337273             2101-2200     5409
                    601- 650  245065             2201-2300     4274
                    651- 700  190862             2301-2400     3430
                    701- 750  164707             2401-2500     2935
                    751- 800  147861             >2500        25453
                    801- 850  109594
                    851- 900   99469
                    901- 950   68027
                    951-1000   51210
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   323 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    18311576                1.22                                                    
                    Submitted to EMBL/GenBank/DDBJ  10825296   9552827      0.72                                                    
                    Journal                          7158576   6528598      0.48                                                    
                    Submitted to other databases      165116    164375      0.01                                                    
                    Thesis                              7680      7622     <0.01                                                    
                    Book citation                       5662      5611     <0.01                                                    
                    Other                             149246    147181      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 317690
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                      13789049                0.92                                                    
                    CATALYTIC ACTIVITY               1381381   1276723      0.09     4                                              
                    CAUTION                          3766570   3766570      0.25     2                                              
                    COFACTOR                          442198    423378      0.03     8                                              
                    DOMAIN                             29762     27788     <0.01    10                                              
                    FUNCTION                         1636986   1499351      0.11     3                                              
                    INTERACTION                         2436      2436     <0.01    11                                              
                    MISCELLANEOUS                      30852     30848     <0.01     9                                              
                    PATHWAY                           685301    633667      0.05     6                                              
                    SIMILARITY                       4083433   3527889      0.27     1                                              
                    SUBCELLULAR LOCATION             1128019   1122605      0.07     5                                              
                    SUBUNIT                           602111    599850      0.04     7                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       4950617                0.33                                                    
                    CHAIN                             508270    399762      0.03     2                                              
                    NON_TER                          4107088   2457509      0.27     1                                              
                    SIGNAL                            334660    334601      0.02     3                                              
                    TRANSIT                              599       599     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             164959154               10.95                                                    
                    AGD                                 2552      2552     <0.01    77   Organism-specific databases                
                    ANU-2DPAGE                            56        56     <0.01    94   2D gel databases                           
                    Allergome                           2013      1451     <0.01    81   Protein family/group databases             
                    ArachnoServer                         66        66     <0.01    93   Organism-specific databases                
                    ArrayExpress                       91908     91897      0.01    49   Gene expression databases                  
                    BRENDA                              2776      2743     <0.01    75   Enzyme and pathway databases               
                    Bgee                              141641    141465      0.01    45   Gene expression databases                  
                    BioCyc                           1623218   1588640      0.11    21   Enzyme and pathway databases               
                    CAZy                               74603     70097     <0.01    52   Protein family/group databases             
                    CGD                                 6755      6755     <0.01    72   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    97   2D gel databases                           
                    CTD                               235487    234537      0.02    39   Organism-specific databases                
                    DIP                                 2745      2740     <0.01    76   Protein-protein interaction databases      
                    EMBL                            16854976  15015154      1.12     3   Sequence databases                         
                    Ensembl                           300329    269786      0.02    34   Genome annotation databases                
                    EnsemblBacteria                   565608    535452      0.04    29   Genome annotation databases                
                    EnsemblFungi                      106198    106109      0.01    47   Genome annotation databases                
                    EnsemblMetazoa                    320370    297380      0.02    32   Genome annotation databases                
                    EnsemblPlants                     260394    233093      0.02    35   Genome annotation databases                
                    EnsemblProtists                    72641     71482     <0.01    54   Genome annotation databases                
                    EuPathDB                          148522    148522      0.01    44   Organism-specific databases                
                    FlyBase                           194993    193457      0.01    40   Organism-specific databases                
                    GO                              27791096   8764281      1.85     2   Ontologies                                 
                    Gene3D                           5745278   4559031      0.38     7   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   100   Organism-specific databases                
                    GeneID                           5728435   5612232      0.38     8   Genome annotation databases                
                    GeneTree                         1157922   1157581      0.08    23   Phylogenomic databases                     
                    Genevestigator                     98298     98293      0.01    48   Gene expression databases                  
                    GenoList                           14751     14478     <0.01    66   Organism-specific databases                
                    GenomeReviews                    3986144   3898974      0.26    12   Genome annotation databases                
                    Gramene                            68747     68747     <0.01    56   Organism-specific databases                
                    H-InvDB                              597       486     <0.01    85   Organism-specific databases                
                    HAMAP                            1028280   1016375      0.07    25   Family and domain databases                
                    HGNC                               72878     71122     <0.01    53   Organism-specific databases                
                    HOGENOM                          2196172   2196130      0.15    18   Phylogenomic databases                     
                    HOVERGEN                          316982    316981      0.02    33   Phylogenomic databases                     
                    HSSP                              253047    252787      0.02    37   3D structure databases                     
                    IPI                               248768    248703      0.02    38   Sequence databases                         
                    InParanoid                        194306    194239      0.01    42   Phylogenomic databases                     
                    IntAct                             15458     15458     <0.01    65   Protein-protein interaction databases      
                    InterPro                        29745263  10863791      1.97     1   Family and domain databases                
                    KEGG                             4815387   4719190      0.32    10   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    73   Organism-specific databases                
                    Leproma                              936       935     <0.01    84   Organism-specific databases                
                    MEROPS                             70579     69096     <0.01    55   Protein family/group databases             
                    MGI                                49198     49150     <0.01    58   Organism-specific databases                
                    MINT                                8961      8961     <0.01    70   Protein-protein interaction databases      
                    NMPDR                             919255    919251      0.06    26   Genome annotation databases                
                    NextBio                            46641     46638     <0.01    59   Other                                      
                    OMA                              2424406   2424404      0.16    16   Phylogenomic databases                     
                    OrthoDB                           578769    578601      0.04    28   Phylogenomic databases                     
                    PANTHER                          1819200   1755387      0.12    20   Family and domain databases                
                    PDB                                14484      8542     <0.01    67   3D structure databases                     
                    PDBsum                             14226      8403     <0.01    68   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    89   2D gel databases                           
                    PIR                               174611    141779      0.01    43   Sequence databases                         
                    PIRSF                             810091    810091      0.05    27   Family and domain databases                
                    PMAP-CutDB                           251       251     <0.01    87   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    98   2D gel databases                           
                    PRIDE                             139585    139578      0.01    46   Proteomic databases                        
                    PRINTS                           2297034   2041220      0.15    17   Family and domain databases                
                    PROSITE                          6709619   4429598      0.45     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    96   Enzyme and pathway databases               
                    PeptideAtlas                         147       147     <0.01    88   Proteomic databases                        
                    PeroxiBase                          2495      2486     <0.01    78   Protein family/group databases             
                    Pfam                            13802122  10280182      0.92     4   Family and domain databases                
                    PharmGKB                              83        83     <0.01    92   Organism-specific databases                
                    PhosphoSite                         1570      1570     <0.01    82   PTM databases                              
                    PhylomeDB                         371401    371369      0.02    31   Phylogenomic databases                     
                    ProDom                            257019    240502      0.02    36   Family and domain databases                
                    ProMEX                               324       324     <0.01    86   Proteomic databases                        
                    ProtClustDB                      2728954   2728943      0.18    15   Phylogenomic databases                     
                    ProteinModelPortal               4634603   4634522      0.31    11   3D structure databases                     
                    PseudoCAP                           4342      4339     <0.01    74   Organism-specific databases                
                    REBASE                             17401     16820     <0.01    63   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   93        92     <0.01    91   2D gel databases                           
                    RGD                                21601     21498     <0.01    61   Organism-specific databases                
                    Reactome                              94        91     <0.01    90   Enzyme and pathway databases               
                    RefSeq                           5746430   5616729      0.38     6   Sequence databases                         
                    SMART                            2904198   2247689      0.19    13   Family and domain databases                
                    SMR                              2153664   2153664      0.14    19   3D structure databases                     
                    STRING                           1202514   1202377      0.08    22   Protein-protein interaction databases      
                    SUPFAM                           5608115   4598410      0.37     9   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    95   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01    99   2D gel databases                           
                    TAIR                               17179     17098     <0.01    64   Organism-specific databases                
                    TCDB                                2423      2416     <0.01    79   Protein family/group databases             
                    TIGR                              194976    187928      0.01    41   Genome annotation databases                
                    TIGRFAMs                         2818636   2570589      0.19    14   Family and domain databases                
                    TubercuList                         2119      2114     <0.01    80   Organism-specific databases                
                    UCSC                               49438     49436     <0.01    57   Genome annotation databases                
                    UniGene                           464682    436381      0.03    30   Sequence databases                         
                    VectorBase                         78957     78445      0.01    50   Genome annotation databases                
                    World-2DPAGE                         944       939     <0.01    83   2D gel databases                           
                    WormBase                           41322     41192     <0.01    60   Organism-specific databases                
                    Xenbase                            13230     13194     <0.01    69   Organism-specific databases                
                    ZFIN                               21572     21567     <0.01    62   Organism-specific databases                
                    dictyBase                           7829      7829     <0.01    71   Organism-specific databases                
                    eggNOG                           1144608   1144608      0.08    24   Phylogenomic databases                     
                    euHCVdb                            75268     75265     <0.01    51   Organism-specific databases                
                    
                    Number of explicitly cross-referenced databases: 129
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.62   Gln (Q) 3.86   Leu (L) 9.84   Ser (S) 6.70
                    Arg (R) 5.46   Glu (E) 6.14   Lys (K) 5.26   Thr (T) 5.61
                    Asn (N) 4.14   Gly (G) 7.11   Met (M) 2.48   Trp (W) 1.31
                    Asp (D) 5.29   His (H) 2.19   Phe (F) 4.03   Tyr (Y) 3.05
                    Cys (C) 1.27   Ile (I) 6.01   Pro (P) 4.73   Val (V) 6.74
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 511861
                    Total number of entries encoded on a Plasmid: 201362
                    Total number of entries encoded on a Plastid: 12264
                    Total number of entries encoded on a Plastid; Apicoplast: 368
                    Total number of entries encoded on a Plastid; Chloroplast: 134774
                    Total number of entries encoded on a Plastid; Cyanelle: 8
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 448