Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2010_11 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2010_11 of 02-Nov-2010 of UniProtKB/TrEMBL contains 12347303 sequence entries,
                    comprising 3974018240 amino acids .
                    
                    266952 sequences have been added since release 2010_10, the sequence data of
                    247 existing entries has been updated and the annotations of
                    3861581 entries have been revised. This represents an increase of 2%.
                    
                    Number of fragments: 2075005
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           16937     0.14%
                    2: Evidence at transcript level       477372     3.87%
                    3: Inferred from homology            2509910    20.33%
                    4: Predicted                         9343084    75.67%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 289557
                    
                    The first twenty species represent 1226190 sequences:   9.9 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:12954
                    2x:51775
                    3x:27558
                    4x:16463
                    5x:10068
                    6x: 7269
                    7x: 5058
                    8x: 4018
                    9x: 3244
                    10x: 6213
                    11- 20x:16199
                    21- 50x: 5749
                    51-100x: 2107
                    >100x: 4287
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     352110  Human immunodeficiency virus 1
                    2      95401  Oryza sativa subsp. japonica (Rice)
                    3      75886  Homo sapiens (Human)
                    4      58088  Hepatitis C virus
                    5      50817  Vitis vinifera (Grape)
                    6      50404  Trichomonas vaginalis
                    7      49348  Mus musculus (Mouse)
                    8      48863  uncultured bacterium
                    9      44035  Populus trichocarpa (Western balsam poplar) 
                    10      41943  Zea mays (Maize)
                    11      41772  Hepatitis B virus (HBV)
                    12      41693  Arabidopsis thaliana (Mouse-ear cress)
                    13      39843  Paramecium tetraurelia
                    14      39337  Oryza sativa subsp. indica (Rice)
                    15      34796  Physcomitrella patens subsp. patens (Moss)
                    16      33639  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    17      33195  Selaginella moellendorffii (Spikemoss)
                    18      32625  Arabidopsis lyrata subsp. lyrata
                    19      31268  Ricinus communis (Castor bean)
                    20      31127  Drosophila melanogaster (Fruit fly)
                    21      29117  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    22      28089  Tetraodon nigroviridis (Green puffer)
                    23      26787  Danio rerio (Zebrafish) (Brachydanio rerio)
                    24      25201  Ralstonia solanacearum (Pseudomonas solanacearum)
                    25      24812  Nematostella vectensis (Starlet sea anemone)
                    26      23495  Rattus norvegicus (Rat)
                    27      23115  Perkinsus marinus ATCC 50983
                    28      21304  Caenorhabditis elegans
                    29      21084  Ixodes scapularis (Black-legged tick) (Deer tick)
                    30      20673  Trypanosoma cruzi
                    31      18870  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    32      18079  Caenorhabditis briggsae
                    33      18071  Escherichia coli
                    34      17931  Drosophila simulans (Fruit fly)
                    35      17855  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    36      17801  Ailuropoda melanoleuca (Giant panda)
                    37      17607  Phytophthora infestans T30-4
                    38      17200  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    39      16972  Tribolium castaneum (Red flour beetle)
                    40      16889  Drosophila yakuba (Fruit fly)
                    41      16743  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    42      16709  Drosophila persimilis (Fruit fly)
                    43      16366  Ectocarpus siliculosus (Brown alga)
                    44      16251  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    45      16202  Bos taurus (Bovine)
                    46      16182  Drosophila sechellia (Fruit fly)
                    47      15945  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    48      15872  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    49      15718  Naegleria gruberi (Amoeba)
                    50      15669  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    51      15421  Drosophila willistoni (Fruit fly)
                    52      15250  Tetrahymena thermophila SB210
                    53      15142  Drosophila ananassae (Fruit fly)
                    54      14925  Drosophila erecta (Fruit fly)
                    55      14814  Chlamydomonas reinhardtii
                    56      14778  Drosophila mojavensis (Fruit fly)
                    57      14763  Anopheles gambiae (African malaria mosquito)
                    58      14696  Drosophila virilis (Fruit fly)
                    59      14673  Plasmodium chabaudi
                    60      14655  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    61      14634  Volvox carteri f. nagariensis
                    62      14267  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    63      13621  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    64      13496  Schistosoma mansoni (Blood fluke)
                    65      13366  Aspergillus flavus 
                    66      13293  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    67      13128  Schizophyllum commune H4-8
                    68      12971  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    69      12960  Gallus gallus (Chicken)
                    70      12832  Giardia lamblia (Giardia intestinalis)
                    71      12721  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    72      12710  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
                    73      12502  Glycine max (Soybean) (Glycine hispida)
                    74      12482  Xenopus laevis (African clawed frog)
                    75      12447  Polysphondylium pallidum (Cellular slime mold)
                    76      12026  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    77      12008  Hepatitis C virus subtype 1b
                    78      11855  Aspergillus oryzae
                    79      11800  Plasmodium berghei
                    80      11721  Plasmodium falciparum
                    81      11569  Trichoplax adhaerens
                    82      11498  Brugia malayi (Filarial nematode worm)
                    83      11211  Ktedonobacter racemifer DSM 44963
                    84      11183  Picea sitchensis (Sitka spruce)
                    85      10936  Helicobacter pylori (Campylobacter pylori)
                    86      10936  Sordaria macrospora
                    87      10916  Schistosoma japonicum (Blood fluke)
                    88      10864  Chaetomium globosum (Soil fungus)
                    89      10832  Pediculus humanus subsp. corporis (Body louse)
                    90      10674  Podospora anserina
                    91      10414  Neurospora crassa
                    92      10401  Aspergillus nidulans FGSC A4
                    93      10392  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    94      10331  Phaeodactylum tricornutum CCAP 1055/1
                    95      10279  Micromonas pusilla CCMP1545
                    96      10228  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    97      10120  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    98      10115  Micromonas sp. RCC299
                    99      10095  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    100      10061  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    101      10015  Streptomyces bingchenggensis (strain BCW-1)
                    102       9882  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    103       9749  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    104       9664  Trypanosoma brucei gambiense DAL972
                    105       9651  Cryptococcus neoformans (Filobasidiella neoformans)
                    106       9568  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    107       9559  Aspergillus fumigatus (Sartorya fumigata)
                    108       9535  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    109       9531  Rabies virus
                    110       9520  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    111       9484  Trypanosoma brucei
                    112       9484  Streptomyces violaceusniger Tu 4113
                    113       9371  Salmo salar (Atlantic salmon)
                    114       9272  Plasmodium vivax
                    115       9243  Monosiga brevicollis (Choanoflagellate)
                    116       9242  Candida albicans (Yeast)
                    117       9202  Amycolatopsis mediterranei (strain U-32)
                    118       9193  Emericella nidulans (Aspergillus nidulans)
                    119       9190  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    120       9177  Streptomyces hygroscopicus ATCC 53653
                    121       9169  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    122       9114  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    123       9092  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    124       9001  Dictyostelium discoideum (Slime mold)
                    125       8977  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    126       8964  Thalassiosira pseudonana (Marine diatom)
                    127       8945  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    128       8901  Catenulispora acidiphila 
                    129       8867  Aspergillus clavatus
                    130       8764  Rhodococcus sp. (strain RHA1)
                    131       8743  Toxoplasma gondii
                    132       8715  Paracoccidioides brasiliensis (strain Pb18)
                    133       8699  Nannizzia otae (strain CBS 113480) (Microsporum canis) (Arthroderma otae)
                    134       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    135       8603  Entamoeba dispar SAW760
                    136       8523  Stigmatella aurantiaca DW4/3-1
                    137       8437  Plesiocystis pacifica SIR-1
                    138       8394  Streptomyces sp. AA4
                    139       8299  Entamoeba histolytica
                    140       8249  Microscilla marina ATCC 23134
                    141       8233  Leishmania major
                    142       8220  Bradyrhizobium japonicum
                    143       8202  Streptomyces sviceus ATCC 29083
                    144       8201  Microcoleus chthonoplastes PCC 7420
                    145       8164  Frankia sp. EUN1f
                    146       8154  Burkholderia xenovorans (strain LB400)
                    147       8056  Pseudomonas aeruginosa
                    148       8024  Leishmania infantum
                    149       7989  Trichophyton verrucosum (strain HKI 0517)
                    150       7978  Toxoplasma gondii ME49
                    151       7956  Ostreococcus tauri
                    152       7943  Rhodococcus opacus (strain B4)
                    153       7940  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    154       7916  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    155       7888  Leishmania braziliensis
                    156       7867  Streptomyces ghanaensis ATCC 14672
                    157       7856  Acaryochloris marina (strain MBIC 11017)
                    158       7855  Paracoccidioides brasiliensis (strain Pb03)
                    159       7836  Toxoplasma gondii VEG
                    160       7823  Burkholderia sp. Ch1-1
                    161       7811  Plasmodium yoelii yoelii
                    162       7743  Uncinocarpus reesii (strain UAMH 1704)
                    163       7708  Streptomyces viridochromogenes DSM 40736
                    164       7571  Clostridium hathewayi DSM 13479
                    165       7563  Burkholderia pseudomallei MSHR346
                    166       7528  Streptomyces sp. C
                    167       7523  Streptomyces lividans TK24
                    168       7519  Solibacter usitatus (strain Ellin6076)
                    169       7501  Tuber melanosporum (Perigord truffle)
                    170       7481  Streptomyces coelicolor
                    171       7475  Burkholderia pseudomallei 1710a
                    172       7465  Burkholderia pseudomallei Pakistan 9
                    173       7459  Burkholderia sp. H160
                    174       7392  Ostreococcus lucimarinus (strain CCE9901)
                    175       7379  Streptomyces sp. ACT-1
                    176       7367  Burkholderia pseudomallei 576
                    177       7349  Burkholderia pseudomallei 305
                    178       7337  Streptomyces clavuligerus ATCC 27064
                    179       7310  Frankia sp. EuI1c
                    180       7274  Clostridium bolteae ATCC BAA-613
                    181       7243  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    182       7232  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    183       7228  Streptomyces avermitilis
                    184       7205  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    185       7198  Medicago truncatula (Barrel medic)
                    186       7179  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    187       7140  Burkholderia pseudomallei 1106b
                    188       7131  Burkholderia phymatum (strain DSM 17167 / STM815)
                    189       7124  Burkholderia ambifaria MEX-5
                    190       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    191       7016  Myxococcus xanthus (strain DK 1622)
                    192       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    193       6976  Rhodopirellula baltica
                    194       6959  Frankia sp. (strain EAN1pec)
                    195       6943  Streptomyces sp. Mg1
                    196       6932  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    197       6923  Burkholderia ambifaria IOP40-10
                    198       6904  Saccharopolyspora erythraea (strain NRRL 23338)
                    199       6903  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    200       6892  Streptomyces roseosporus NRRL 15998
                    201       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    202       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    203       6866  Burkholderia sp. (strain CCGE1002)
                    204       6866  Streptomyces pristinaespiralis ATCC 25486
                    205       6859  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    206       6818  Rhizobium loti (Mesorhizobium loti)
                    207       6817  Clostridium asparagiforme DSM 15981
                    208       6772  Burkholderia pseudomallei (strain 1106a)
                    209       6769  Sinorhizobium meliloti AK83
                    210       6740  Burkholderia pseudomallei (strain 668)
                    211       6736  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    212       6731  Sus scrofa (Pig)
                    213       6725  Burkholderia graminis C4D1M
                    214       6714  Rhizobium leguminosarum bv. viciae (strain 3841)
                    215       6712  Rhodococcus erythropolis SK121
                    216       6712  Hepatitis C virus subtype 1a
                    217       6705  Chthoniobacter flavus Ellin428
                    218       6702  Streptomyces flavogriseus ATCC 33331
                    219       6692  Bacillus thuringiensis IBL 200
                    220       6690  delta proteobacterium NaphS2
                    221       6683  Haliangium ochraceum (strain DSM 14365 / JCM 11303 / SMP-2)
                    222       6682  Sinorhizobium meliloti BL225C
                    223       6679  Mesorhizobium opportunistum WSM2075
                    224       6662  Burkholderia pseudomallei S13
                    225       6661  Streptococcus pneumoniae
                    226       6657  Burkholderia cepacia (strain J2315 / LMG 16656) (Burkholderia cenocepacia 
                    227       6655  Bacillus thuringiensis IBL 4222
                    228       6644  Beggiatoa sp. PS
                    229       6627  Burkholderia cenocepacia (strain MC0-3)
                    230       6627  uncultured archaeon
                    231       6614  Burkholderia multivorans CGD2
                    232       6613  Burkholderia pseudomallei Pasteur 52237
                    233       6606  Burkholderia multivorans CGD2M
                    234       6583  Bacillus thuringiensis serovar sotto str. T04001
                    235       6559  Cyanothece sp. PCC 7822
                    236       6527  Burkholderia multivorans CGD1
                    237       6521  Streptomyces sp. ACTE
                    238       6504  Frankia alni (strain ACN14a)
                    239       6498  bacterium Ellin514
                    240       6497  Burkholderia cenocepacia (strain HI2424)
                    241       6488  Bacillus thuringiensis serovar monterrey BGSC 4AJ1
                    242       6463  Planctomyces maris DSM 8797
                    243       6453  Mycobacterium parascrofulaceum ATCC BAA-614
                    244       6427  Agrobacterium radiobacter (strain K84 / ATCC BAA-868)
                    245       6417  Methylobacterium sp. (strain 4-46)
                    246       6413  Cyanothece sp. CCY0110
                    247       6388  Ustilago maydis (Smut fungus)
                    248       6388  Bradyrhizobium sp. (strain ORS278)
                    249       6377  Clostridium carboxidivorans P7
                    250       6372  Stackebrandtia nassauensis 
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          230741 (  2%)
                    Bacteria        7713395 ( 62%)
                    Eukaryota       3371461 ( 27%)
                    Viruses         1014578 (  8%)
                    Other             17127 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  75921 (  2%)           (  1%)
                    Other Mammalia        207737 (  6%)           (  2%)
                    Other Vertebrata      317673 (  9%)           (  3%)
                    Viridiplantae         830969 ( 25%)           (  7%)
                    Fungi                 666090 ( 20%)           (  5%)
                    Insecta               505589 ( 15%)           (  4%)
                    Nematoda               62070 (  2%)           (  1%)
                    Other                 705412 ( 21%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  265085             1001-1100    72373
                    51- 100  979635             1101-1200    51202
                    101- 150 1125192             1201-1300    35164
                    151- 200 1088756             1301-1400    23139
                    201- 250 1093394             1401-1500    18579
                    251- 300 1059667             1501-1600    13312
                    301- 350  962763             1601-1700     9973
                    351- 400  747378             1701-1800     7826
                    401- 450  629021             1801-1900     6322
                    451- 500  526741             1901-2000     5314
                    501- 550  358918             2001-2100     4306
                    551- 600  276170             2101-2200     4435
                    601- 650  199895             2201-2300     3521
                    651- 700  155715             2301-2400     2774
                    701- 750  134304             2401-2500     2405
                    751- 800  121144             >2500        20917
                    801- 850   88831
                    851- 900   80905
                    901- 950   55152
                    951-1000   42070
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   321 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    15124900                1.22                                                    
                    Submitted to EMBL/GenBank/DDBJ   8956322   7881721      0.73                                                    
                    Journal                          6013857   5443142      0.49                                                    
                    Submitted to other databases       46223     46200     <0.01                                                    
                    Thesis                              7511      7453     <0.01                                                    
                    Book citation                       5233      5182     <0.01                                                    
                    Other                              95754     95431      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 301208
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                      10376527                0.84                                                    
                    CATALYTIC ACTIVITY               1044062    971052      0.08     4                                              
                    CAUTION                          2485340   2485340      0.20     2                                              
                    COFACTOR                          332195    319421      0.03     8                                              
                    DOMAIN                             22711     21072     <0.01    10                                              
                    FUNCTION                         1251627   1154625      0.10     3                                              
                    INTERACTION                         2070      2070     <0.01    11                                              
                    MISCELLANEOUS                      25235     25235     <0.01     9                                              
                    PATHWAY                           493588    456039      0.04     6                                              
                    SIMILARITY                       3404148   2963497      0.28     1                                              
                    SUBCELLULAR LOCATION              862517    859137      0.07     5                                              
                    SUBUNIT                           453034    450486      0.04     7                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       4232805                0.34                                                    
                    CHAIN                             445216    348936      0.04     2                                              
                    NON_TER                          3492673   2073516      0.28     1                                              
                    SIGNAL                            294327    293804      0.02     3                                              
                    TRANSIT                              589       589     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             145080325               11.75                                                    
                    AGD                                 2575      2575     <0.01    76   Organism-specific databases                
                    ANU-2DPAGE                            56        56     <0.01    92   2D gel databases                           
                    ArachnoServer                        278       278     <0.01    85   Organism-specific databases                
                    ArrayExpress                       93994     93981      0.01    48   Gene expression databases                  
                    BRENDA                              2870      2806     <0.01    74   Enzyme and pathway databases               
                    Bgee                              129530    129428      0.01    44   Gene expression databases                  
                    BioCyc                           1624285   1589691      0.13    21   Enzyme and pathway databases               
                    CAZy                               74766     70255      0.01    49   Protein family/group databases             
                    CGD                                 6785      6785     <0.01    71   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    96   2D gel databases                           
                    CTD                               169388    168468      0.01    42   Organism-specific databases                
                    CYGD                                   5         5     <0.01    97   Organism-specific databases                
                    DIP                                 2738      2733     <0.01    75   Protein-protein interaction databases      
                    EMBL                            13807880  12331178      1.12     3   Sequence databases                         
                    Ensembl                           335029    200001      0.03    31   Genome annotation databases                
                    EnsemblBacteria                   501748    471857      0.04    27   Genome annotation databases                
                    EnsemblFungi                       98120     98014      0.01    47   Genome annotation databases                
                    EnsemblMetazoa                    296680    251972      0.02    33   Genome annotation databases                
                    EnsemblPlants                     208159    193138      0.02    37   Genome annotation databases                
                    EnsemblProtists                    24295     24116     <0.01    59   Genome annotation databases                
                    EuPathDB                          151351    151351      0.01    43   Organism-specific databases                
                    FlyBase                           195309    193778      0.02    39   Organism-specific databases                
                    GO                              24502005   7757980      1.98     2   Ontologies                                 
                    Gene3D                           4715982   3850175      0.38     9   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   100   Organism-specific databases                
                    GeneID                           5234016   5120774      0.42     7   Genome annotation databases                
                    Genevestigator                    101499    101489      0.01    46   Gene expression databases                  
                    GenoList                           14756     14483     <0.01    64   Organism-specific databases                
                    GenomeReviews                    3458963   3373885      0.28    12   Genome annotation databases                
                    Gramene                            68887     68887      0.01    51   Organism-specific databases                
                    H-InvDB                              601       490     <0.01    83   Organism-specific databases                
                    HAMAP                             761916    753186      0.06    25   Family and domain databases                
                    HGNC                               61416     59730     <0.01    53   Organism-specific databases                
                    HOGENOM                          2202815   2202686      0.18    17   Phylogenomic databases                     
                    HOVERGEN                          320337    318626      0.03    32   Phylogenomic databases                     
                    HSSP                              254306    254032      0.02    34   3D structure databases                     
                    IPI                               227393    227393      0.02    36   Sequence databases                         
                    InParanoid                        196440    196348      0.02    38   Phylogenomic databases                     
                    IntAct                             15669     15669     <0.01    63   Protein-protein interaction databases      
                    InterPro                        25814948   9504806      2.09     1   Family and domain databases                
                    KEGG                             4439445   4349500      0.36    10   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    72   Organism-specific databases                
                    Leproma                              937       936     <0.01    82   Organism-specific databases                
                    MEROPS                             66656     65212      0.01    52   Protein family/group databases             
                    MGI                                42913     42893     <0.01    57   Organism-specific databases                
                    MINT                                9153      9153     <0.01    68   Protein-protein interaction databases      
                    NMPDR                             921779    921769      0.07    24   Genome annotation databases                
                    NextBio                            47816     47813     <0.01    55   Other                                      
                    OMA                              2431274   2431272      0.20    15   Phylogenomic databases                     
                    OrthoDB                           429756    429755      0.03    28   Phylogenomic databases                     
                    PANTHER                          1993703   1879802      0.16    20   Family and domain databases                
                    PDB                                13102      7849     <0.01    65   3D structure databases                     
                    PDBsum                             12845      7676     <0.01    67   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    89   2D gel databases                           
                    PIR                               175760    142909      0.01    41   Sequence databases                         
                    PIRSF                             664876    664876      0.05    26   Family and domain databases                
                    PMAP-CutDB                           254       254     <0.01    86   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    98   2D gel databases                           
                    PRIDE                             104163    104162      0.01    45   Proteomic databases                        
                    PRINTS                           2006807   1774574      0.16    19   Family and domain databases                
                    PROSITE                          6026412   4029804      0.49     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    95   Enzyme and pathway databases               
                    PeptideAtlas                         147       147     <0.01    88   Proteomic databases                        
                    PeroxiBase                          2464      2456     <0.01    77   Protein family/group databases             
                    Pfam                            12084906   8996963      0.98     4   Family and domain databases                
                    PharmGKB                              85        85     <0.01    91   Organism-specific databases                
                    PhosphoSite                         1800      1800     <0.01    80   PTM databases                              
                    PhylomeDB                         373418    373386      0.03    30   Phylogenomic databases                     
                    ProDom                            239337    224511      0.02    35   Family and domain databases                
                    ProMEX                               436       436     <0.01    84   Proteomic databases                        
                    ProtClustDB                      2623851   2623835      0.21    13   Phylogenomic databases                     
                    ProteinModelPortal               4072884   4072501      0.33    11   3D structure databases                     
                    PseudoCAP                           4346      4343     <0.01    73   Organism-specific databases                
                    REBASE                              8285      7923     <0.01    70   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   95        94     <0.01    90   2D gel databases                           
                    RGD                                17491     17397     <0.01    62   Organism-specific databases                
                    Reactome                              56        53     <0.01    93   Enzyme and pathway databases               
                    RefSeq                           5247478   5122747      0.42     6   Sequence databases                         
                    SGD                                  247       247     <0.01    87   Organism-specific databases                
                    SMART                            2553274   1977037      0.21    14   Family and domain databases                
                    SMR                              2107325   2107065      0.17    18   3D structure databases                     
                    STRING                           1205428   1205286      0.10    22   Protein-protein interaction databases      
                    SUPFAM                           5012594   4153778      0.41     8   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    94   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01    99   2D gel databases                           
                    TAIR                               18788     18708     <0.01    61   Organism-specific databases                
                    TCDB                                2342      2333     <0.01    78   Protein family/group databases             
                    TIGR                              195123    188076      0.02    40   Genome annotation databases                
                    TIGRFAMs                         2403993   2192728      0.19    16   Family and domain databases                
                    TubercuList                         2256      2251     <0.01    79   Organism-specific databases                
                    UCSC                               50627     50627     <0.01    54   Genome annotation databases                
                    UniGene                           427728    395869      0.03    29   Sequence databases                         
                    VectorBase                         47569     47101     <0.01    56   Genome annotation databases                
                    World-2DPAGE                         947       942     <0.01    81   2D gel databases                           
                    WormBase                           41221     41094     <0.01    58   Organism-specific databases                
                    Xenbase                            13093     13071     <0.01    66   Organism-specific databases                
                    ZFIN                               21644     21639     <0.01    60   Organism-specific databases                
                    dictyBase                           8473      8473     <0.01    69                                              
                    eggNOG                           1147498   1147498      0.09    23                                              
                    euHCVdb                            72340     72337      0.01    50                                              
                    
                    Number of explicitly cross-referenced databases: 126
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.61   Gln (Q) 3.85   Leu (L) 9.83   Ser (S) 6.69
                    Arg (R) 5.47   Glu (E) 6.13   Lys (K) 5.27   Thr (T) 5.61
                    Asn (N) 4.15   Gly (G) 7.12   Met (M) 2.47   Trp (W) 1.31
                    Asp (D) 5.29   His (H) 2.19   Phe (F) 4.03   Tyr (Y) 3.06
                    Cys (C) 1.27   Ile (I) 6.02   Pro (P) 4.73   Val (V) 6.74
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 400651
                    Total number of entries encoded on a Plasmid: 179257
                    Total number of entries encoded on a Plastid: 10275
                    Total number of entries encoded on a Plastid; Apicoplast: 335
                    Total number of entries encoded on a Plastid; Chloroplast: 124447
                    Total number of entries encoded on a Plastid; Cyanelle: 7
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 441