Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2010_12 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2010_12 of 30-Nov-2010 of UniProtKB/TrEMBL contains 12769092 sequence entries,
                    comprising 4109015043 amino acids .
                    
                    434825 sequences have been added since release 2010_11, the sequence data of
                    1167 existing entries has been updated and the annotations of
                    6829295 entries have been revised. This represents an increase of 4%.
                    
                    Number of fragments: 2145202
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           18540     0.15%
                    2: Evidence at transcript level       478326     3.75%
                    3: Inferred from homology            2619194    20.51%
                    4: Predicted                         9653032    75.60%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 301002
                    
                    The first twenty species represent 1233154 sequences:   9.7 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:13631
                    2x:53302
                    3x:28097
                    4x:17011
                    5x:10494
                    6x: 7459
                    7x: 5230
                    8x: 4166
                    9x: 3428
                    10x: 6452
                    11- 20x:16634
                    21- 50x: 5852
                    51-100x: 2152
                    >100x: 4410
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     357213  Human immunodeficiency virus 1
                    2      95380  Oryza sativa subsp. japonica (Rice)
                    3      76767  Homo sapiens (Human)
                    4      58226  Hepatitis C virus
                    5      50834  Vitis vinifera (Grape)
                    6      50404  Trichomonas vaginalis
                    7      49422  uncultured bacterium
                    8      49339  Mus musculus (Mouse)
                    9      44033  Populus trichocarpa (Western balsam poplar) 
                    10      41935  Zea mays (Maize)
                    11      41928  Hepatitis B virus (HBV)
                    12      41227  Arabidopsis thaliana (Mouse-ear cress)
                    13      39841  Paramecium tetraurelia
                    14      39344  Oryza sativa subsp. indica (Rice)
                    15      34796  Physcomitrella patens subsp. patens (Moss)
                    16      33638  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    17      33195  Selaginella moellendorffii (Spikemoss)
                    18      32625  Arabidopsis lyrata subsp. lyrata
                    19      31741  Drosophila melanogaster (Fruit fly)
                    20      31266  Ricinus communis (Castor bean)
                    21      29116  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    22      28089  Tetraodon nigroviridis (Green puffer)
                    23      26790  Danio rerio (Zebrafish) (Brachydanio rerio)
                    24      25201  Ralstonia solanacearum (Pseudomonas solanacearum)
                    25      24811  Nematostella vectensis (Starlet sea anemone)
                    26      23490  Rattus norvegicus (Rat)
                    27      23115  Perkinsus marinus ATCC 50983
                    28      21838  Escherichia coli
                    29      21317  Caenorhabditis elegans
                    30      21084  Ixodes scapularis (Black-legged tick) (Deer tick)
                    31      20667  Trypanosoma cruzi
                    32      19078  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    33      18881  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    34      18075  Caenorhabditis briggsae
                    35      17930  Drosophila simulans (Fruit fly)
                    36      17853  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    37      17801  Ailuropoda melanoleuca (Giant panda)
                    38      17606  Phytophthora infestans T30-4
                    39      16973  Tribolium castaneum (Red flour beetle)
                    40      16902  Drosophila yakuba (Fruit fly)
                    41      16739  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    42      16708  Drosophila persimilis (Fruit fly)
                    43      16366  Ectocarpus siliculosus (Brown alga)
                    44      16277  Loa loa (Eye worm)
                    45      16250  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    46      16205  Bos taurus (Bovine)
                    47      16181  Drosophila sechellia (Fruit fly)
                    48      15944  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    49      15871  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    50      15717  Naegleria gruberi (Amoeba)
                    51      15664  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    52      15420  Drosophila willistoni (Fruit fly)
                    53      15249  Tetrahymena thermophila SB210
                    54      15143  Canis familiaris (Dog) (Canis lupus familiaris)
                    55      15141  Drosophila ananassae (Fruit fly)
                    56      15029  Harpegnathos saltator
                    57      14924  Drosophila erecta (Fruit fly)
                    58      14816  Chlamydomonas reinhardtii
                    59      14791  Camponotus floridanus
                    60      14777  Drosophila mojavensis (Fruit fly)
                    61      14758  Anopheles gambiae (African malaria mosquito)
                    62      14698  Drosophila virilis (Fruit fly)
                    63      14672  Plasmodium chabaudi
                    64      14654  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    65      14634  Volvox carteri f. nagariensis
                    66      14264  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    67      13788  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    68      13560  Moniliophthora perniciosa FA553
                    69      13505  Schistosoma mansoni (Blood fluke)
                    70      13363  Aspergillus flavus 
                    71      13292  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    72      13128  Schizophyllum commune H4-8
                    73      12969  Gallus gallus (Chicken)
                    74      12967  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    75      12718  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    76      12708  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
                    77      12529  Glycine max (Soybean) (Glycine hispida)
                    78      12476  Xenopus laevis (African clawed frog)
                    79      12446  Polysphondylium pallidum (Cellular slime mold)
                    80      12299  Hepatitis C virus subtype 1b
                    81      12024  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    82      11852  Aspergillus oryzae
                    83      11809  Plasmodium falciparum
                    84      11800  Plasmodium berghei
                    85      11567  Trichoplax adhaerens
                    86      11497  Brugia malayi (Filarial nematode worm)
                    87      11211  Ktedonobacter racemifer DSM 44963
                    88      11181  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
                    89      11082  Helicobacter pylori (Campylobacter pylori)
                    90      10966  Streptomyces clavuligerus ATCC 27064
                    91      10916  Schistosoma japonicum (Blood fluke)
                    92      10862  Chaetomium globosum (Soil fungus)
                    93      10832  Pediculus humanus subsp. corporis (Body louse)
                    94      10809  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
                    95      10677  Podospora anserina
                    96      10409  Neurospora crassa
                    97      10394  Aspergillus nidulans FGSC A4
                    98      10390  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    99      10331  Phaeodactylum tricornutum CCAP 1055/1
                    100      10278  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
                    101      10224  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    102      10117  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    103      10115  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
                    104      10092  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    105      10072  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    106      10015  Streptomyces bingchenggensis (strain BCW-1)
                    107       9953  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    108       9755  Chlorella variabilis
                    109       9746  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    110       9676  Cryptococcus neoformans (Filobasidiella neoformans)
                    111       9663  Trypanosoma brucei gambiense DAL972
                    112       9567  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    113       9556  Aspergillus fumigatus (Sartorya fumigata)
                    114       9554  Rabies virus
                    115       9535  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    116       9519  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    117       9484  Streptomyces violaceusniger Tu 4113
                    118       9482  Trypanosoma brucei
                    119       9380  Salmo salar (Atlantic salmon)
                    120       9269  Plasmodium vivax
                    121       9242  Monosiga brevicollis (Choanoflagellate)
                    122       9235  Candida albicans (Yeast)
                    123       9202  Amycolatopsis mediterranei (strain U-32)
                    124       9189  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    125       9187  Emericella nidulans (Aspergillus nidulans)
                    126       9177  Streptomyces hygroscopicus ATCC 53653
                    127       9168  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    128       9114  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    129       9092  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    130       8999  Dictyostelium discoideum (Slime mold)
                    131       8974  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    132       8961  Thalassiosira pseudonana (Marine diatom)
                    133       8945  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    134       8901  Catenulispora acidiphila 
                    135       8865  Aspergillus clavatus
                    136       8764  Rhodococcus sp. (strain RHA1)
                    137       8744  Toxoplasma gondii
                    138       8714  Paracoccidioides brasiliensis (strain Pb18)
                    139       8698  Nannizzia otae (strain CBS 113480) (Microsporum canis) (Arthroderma otae)
                    140       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    141       8602  Entamoeba dispar SAW760
                    142       8523  Stigmatella aurantiaca DW4/3-1
                    143       8437  Plesiocystis pacifica SIR-1
                    144       8394  Streptomyces sp. AA4
                    145       8300  Entamoeba histolytica
                    146       8249  Microscilla marina ATCC 23134
                    147       8230  Leishmania major
                    148       8216  Bradyrhizobium japonicum
                    149       8202  Streptomyces sviceus ATCC 29083
                    150       8201  Microcoleus chthonoplastes PCC 7420
                    151       8164  Frankia sp. EUN1f
                    152       8154  Burkholderia xenovorans (strain LB400)
                    153       8065  Pseudomonas aeruginosa
                    154       8021  Leishmania infantum
                    155       7989  Trichophyton verrucosum (strain HKI 0517)
                    156       7978  Toxoplasma gondii ME49
                    157       7954  Ostreococcus tauri
                    158       7943  Rhodococcus opacus (strain B4)
                    159       7940  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    160       7916  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    161       7885  Leishmania braziliensis
                    162       7867  Streptomyces ghanaensis ATCC 14672
                    163       7856  Acaryochloris marina (strain MBIC 11017)
                    164       7854  Paracoccidioides brasiliensis (strain Pb03)
                    165       7836  Toxoplasma gondii VEG
                    166       7823  Burkholderia sp. Ch1-1
                    167       7811  Plasmodium yoelii yoelii
                    168       7742  Uncinocarpus reesii (strain UAMH 1704)
                    169       7708  Streptomyces viridochromogenes DSM 40736
                    170       7571  Clostridium hathewayi DSM 13479
                    171       7563  Burkholderia pseudomallei MSHR346
                    172       7528  Streptomyces sp. C
                    173       7523  Streptomyces lividans TK24
                    174       7519  Solibacter usitatus (strain Ellin6076)
                    175       7501  Tuber melanosporum (Perigord truffle)
                    176       7481  Streptomyces coelicolor
                    177       7475  Burkholderia pseudomallei 1710a
                    178       7465  Burkholderia pseudomallei Pakistan 9
                    179       7459  Burkholderia sp. H160
                    180       7389  Ostreococcus lucimarinus (strain CCE9901)
                    181       7379  Streptomyces sp. ACT-1
                    182       7367  Burkholderia pseudomallei 576
                    183       7349  Burkholderia pseudomallei 305
                    184       7310  Frankia sp. EuI1c
                    185       7274  Clostridium bolteae ATCC BAA-613
                    186       7243  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    187       7232  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    188       7228  Streptomyces avermitilis
                    189       7204  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    190       7179  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    191       7148  Giardia intestinalis (strain ATCC 50803 / WB clone C6) (Giardia lamblia)
                    192       7140  Burkholderia pseudomallei 1106b
                    193       7131  Burkholderia phymatum (strain DSM 17167 / STM815)
                    194       7124  Burkholderia ambifaria MEX-5
                    195       7090  Medicago truncatula (Barrel medic)
                    196       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    197       7016  Myxococcus xanthus (strain DK 1622)
                    198       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    199       6976  Rhodopirellula baltica
                    200       6959  Frankia sp. (strain EAN1pec)
                    201       6943  Streptomyces sp. Mg1
                    202       6932  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    203       6923  Burkholderia ambifaria IOP40-10
                    204       6904  Saccharopolyspora erythraea (strain NRRL 23338)
                    205       6903  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    206       6895  Hepatitis C virus subtype 1a
                    207       6892  Streptomyces roseosporus NRRL 15998
                    208       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    209       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    210       6866  Burkholderia sp. (strain CCGE1002)
                    211       6866  Streptomyces pristinaespiralis ATCC 25486
                    212       6859  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    213       6818  Rhizobium loti (Mesorhizobium loti)
                    214       6817  Clostridium asparagiforme DSM 15981
                    215       6771  Burkholderia pseudomallei (strain 1106a)
                    216       6769  Sinorhizobium meliloti AK83
                    217       6762  Sus scrofa (Pig)
                    218       6740  Burkholderia pseudomallei (strain 668)
                    219       6736  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    220       6732  uncultured archaeon
                    221       6725  Burkholderia graminis C4D1M
                    222       6714  Rhizobium leguminosarum bv. viciae (strain 3841)
                    223       6712  Rhodococcus erythropolis SK121
                    224       6705  Chthoniobacter flavus Ellin428
                    225       6702  Streptomyces flavogriseus ATCC 33331
                    226       6692  Bacillus thuringiensis IBL 200
                    227       6690  delta proteobacterium NaphS2
                    228       6687  Streptococcus pneumoniae
                    229       6683  Haliangium ochraceum (strain DSM 14365 / JCM 11303 / SMP-2)
                    230       6682  Sinorhizobium meliloti BL225C
                    231       6679  Mesorhizobium opportunistum WSM2075
                    232       6662  Burkholderia pseudomallei S13
                    233       6657  Burkholderia cepacia (strain J2315 / LMG 16656) (Burkholderia cenocepacia 
                    234       6655  Bacillus thuringiensis IBL 4222
                    235       6644  Beggiatoa sp. PS
                    236       6627  Burkholderia cenocepacia (strain MC0-3)
                    237       6614  Burkholderia multivorans CGD2
                    238       6613  Burkholderia pseudomallei Pasteur 52237
                    239       6606  Burkholderia multivorans CGD2M
                    240       6583  Bacillus thuringiensis serovar sotto str. T04001
                    241       6559  Cyanothece sp. (strain PCC 7822)
                    242       6527  Burkholderia multivorans CGD1
                    243       6521  Streptomyces sp. ACTE
                    244       6504  Frankia alni (strain ACN14a)
                    245       6498  bacterium Ellin514
                    246       6496  Burkholderia cenocepacia (strain HI2424)
                    247       6488  Bacillus thuringiensis serovar monterrey BGSC 4AJ1
                    248       6463  Planctomyces maris DSM 8797
                    249       6453  Mycobacterium parascrofulaceum ATCC BAA-614
                    250       6427  Agrobacterium radiobacter (strain K84 / ATCC BAA-868)
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          238039 (  2%)
                    Bacteria        7986846 ( 63%)
                    Eukaryota       3490152 ( 27%)
                    Viruses         1036738 (  8%)
                    Other             17316 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  76802 (  2%)           (  1%)
                    Other Mammalia        222097 (  6%)           (  2%)
                    Other Vertebrata      322391 (  9%)           (  3%)
                    Viridiplantae         844764 ( 24%)           (  7%)
                    Fungi                 680560 ( 19%)           (  5%)
                    Insecta               550557 ( 16%)           (  4%)
                    Nematoda               78569 (  2%)           (  1%)
                    Other                 714412 ( 20%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  274450             1001-1100    75038
                    51- 100 1012733             1101-1200    52926
                    101- 150 1164831             1201-1300    36483
                    151- 200 1126457             1301-1400    24015
                    201- 250 1130998             1401-1500    19260
                    251- 300 1095269             1501-1600    13760
                    301- 350  995363             1601-1700    10359
                    351- 400  771849             1701-1800     8084
                    401- 450  650838             1801-1900     6563
                    451- 500  544638             1901-2000     5523
                    501- 550  370579             2001-2100     4465
                    551- 600  285349             2101-2200     4592
                    601- 650  206774             2201-2300     3642
                    651- 700  161258             2301-2400     2874
                    701- 750  139066             2401-2500     2474
                    751- 800  125550             >2500        21733
                    801- 850   91818
                    851- 900   83703
                    901- 950   57113
                    951-1000   43463
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   321 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    15850934                1.24                                                    
                    Submitted to EMBL/GenBank/DDBJ   9354554   8185821      0.73                                                    
                    Journal                          6325146   5619724      0.50                                                    
                    Submitted to other databases       59101     59078     <0.01                                                    
                    Thesis                              7573      7515     <0.01                                                    
                    Book citation                       5234      5183     <0.01                                                    
                    Other                              99326     96277      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 303580
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                      11159298                0.87                                                    
                    CATALYTIC ACTIVITY               1166887   1080309      0.09     4                                              
                    CAUTION                          2586538   2586538      0.20     2                                              
                    COFACTOR                          382240    367621      0.03     8                                              
                    DOMAIN                             25206     23497     <0.01    10                                              
                    FUNCTION                         1367677   1265499      0.11     3                                              
                    INTERACTION                         3967      3967     <0.01    11                                              
                    MISCELLANEOUS                      27199     27195     <0.01     9                                              
                    PATHWAY                           559899    515164      0.04     6                                              
                    SIMILARITY                       3579682   3105619      0.28     1                                              
                    SUBCELLULAR LOCATION              946586    942423      0.07     5                                              
                    SUBUNIT                           513417    511696      0.04     7                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       4364807                0.34                                                    
                    CHAIN                             452885    354562      0.04     2                                              
                    NON_TER                          3609836   2143715      0.28     1                                              
                    SIGNAL                            301496    300691      0.02     3                                              
                    TRANSIT                              590       590     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             148603820               11.64                                                    
                    AGD                                 2572      2572     <0.01    76   Organism-specific databases                
                    ANU-2DPAGE                            56        56     <0.01    93   2D gel databases                           
                    ArachnoServer                         66        66     <0.01    91   Organism-specific databases                
                    ArrayExpress                       93466     93453      0.01    48   Gene expression databases                  
                    BRENDA                              2853      2790     <0.01    74   Enzyme and pathway databases               
                    Bgee                              128899    128799      0.01    44   Gene expression databases                  
                    BioCyc                           1624262   1589668      0.13    21   Enzyme and pathway databases               
                    CAZy                               74749     70238      0.01    49   Protein family/group databases             
                    CGD                                 6775      6775     <0.01    71   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    96   2D gel databases                           
                    CTD                               169133    168214      0.01    42   Organism-specific databases                
                    CYGD                                   5         5     <0.01    97   Organism-specific databases                
                    DIP                                 2767      2761     <0.01    75   Protein-protein interaction databases      
                    EMBL                            14263150  12740241      1.12     3   Sequence databases                         
                    Ensembl                           347911    212748      0.03    31   Genome annotation databases                
                    EnsemblBacteria                   501737    471848      0.04    27   Genome annotation databases                
                    EnsemblFungi                       98090     97984      0.01    47   Genome annotation databases                
                    EnsemblMetazoa                    295291    251380      0.02    33   Genome annotation databases                
                    EnsemblPlants                     207682    192701      0.02    37   Genome annotation databases                
                    EnsemblProtists                    24288     24109     <0.01    59   Genome annotation databases                
                    EuPathDB                          151330    151330      0.01    43   Organism-specific databases                
                    FlyBase                           194733    193206      0.02    40   Organism-specific databases                
                    GO                              25325668   7868366      1.98     2   Ontologies                                 
                    Gene3D                           5011053   4066577      0.39     9   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   100   Organism-specific databases                
                    GeneID                           5297308   5183968      0.41     7   Genome annotation databases                
                    Genevestigator                    101017    101007      0.01    46   Gene expression databases                  
                    GenoList                           14756     14483     <0.01    64   Organism-specific databases                
                    GenomeReviews                    3581858   3496096      0.28    12   Genome annotation databases                
                    Gramene                            68860     68860      0.01    51   Organism-specific databases                
                    H-InvDB                              601       490     <0.01    83   Organism-specific databases                
                    HAMAP                             814307    805077      0.06    25   Family and domain databases                
                    HGNC                               61490     59803     <0.01    53   Organism-specific databases                
                    HOGENOM                          2202363   2202235      0.17    17   Phylogenomic databases                     
                    HOVERGEN                          320195    318478      0.03    32   Phylogenomic databases                     
                    HSSP                              254075    253801      0.02    34   3D structure databases                     
                    IPI                               228169    228168      0.02    36   Sequence databases                         
                    InParanoid                        195676    195585      0.02    38   Phylogenomic databases                     
                    IntAct                             15641     15640     <0.01    63   Protein-protein interaction databases      
                    InterPro                        26535780   9715222      2.08     1   Family and domain databases                
                    KEGG                             4477015   4387040      0.35    10   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    72   Organism-specific databases                
                    Leproma                              937       936     <0.01    82   Organism-specific databases                
                    MEROPS                             66629     65185      0.01    52   Protein family/group databases             
                    MGI                                42990     42945     <0.01    57   Organism-specific databases                
                    MINT                                9033      9033     <0.01    69   Protein-protein interaction databases      
                    NMPDR                             921313    921303      0.07    24   Genome annotation databases                
                    NextBio                            47220     47217     <0.01    56   Other                                      
                    OMA                              2430728   2430726      0.19    16   Phylogenomic databases                     
                    OrthoDB                           429520    429519      0.03    28   Phylogenomic databases                     
                    PANTHER                          2036646   1919992      0.16    20   Family and domain databases                
                    PDB                                13078      7838     <0.01    67   3D structure databases                     
                    PDBsum                             12814      7660     <0.01    68   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    88   2D gel databases                           
                    PIR                               175580    142733      0.01    41   Sequence databases                         
                    PIRSF                             703433    703433      0.06    26   Family and domain databases                
                    PMAP-CutDB                           254       254     <0.01    85   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    98   2D gel databases                           
                    PRIDE                             103672    103671      0.01    45   Proteomic databases                        
                    PRINTS                           2053265   1814835      0.16    19   Family and domain databases                
                    PROSITE                          6168304   4120604      0.48     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    95   Enzyme and pathway databases               
                    PeptideAtlas                         147       147     <0.01    87   Proteomic databases                        
                    PeroxiBase                          2466      2458     <0.01    77   Protein family/group databases             
                    Pfam                            12358866   9196857      0.97     4   Family and domain databases                
                    PharmGKB                              85        85     <0.01    90   Organism-specific databases                
                    PhosphoSite                         1797      1797     <0.01    80   PTM databases                              
                    PhylomeDB                         372947    372915      0.03    30   Phylogenomic databases                     
                    ProDom                            242969    227623      0.02    35   Family and domain databases                
                    ProMEX                               428       428     <0.01    84   Proteomic databases                        
                    ProtClustDB                      2623831   2623815      0.21    13   Phylogenomic databases                     
                    ProteinModelPortal               4153522   4153169      0.33    11   3D structure databases                     
                    PseudoCAP                           4344      4341     <0.01    73   Organism-specific databases                
                    REBASE                             14105     13593     <0.01    65   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   95        94     <0.01    89   2D gel databases                           
                    RGD                                17486     17388     <0.01    62   Organism-specific databases                
                    Reactome                              58        55     <0.01    92   Enzyme and pathway databases               
                    RefSeq                           5314155   5186897      0.42     6   Sequence databases                         
                    SGD                                  246       246     <0.01    86   Organism-specific databases                
                    SMART                            2611086   2020730      0.20    14   Family and domain databases                
                    SMR                              2104631   2104568      0.16    18   3D structure databases                     
                    STRING                           1204994   1204852      0.09    22   Protein-protein interaction databases      
                    SUPFAM                           5138928   4252031      0.40     8   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    94   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01    99   2D gel databases                           
                    TAIR                               18526     18446     <0.01    61   Organism-specific databases                
                    TCDB                                2343      2334     <0.01    78   Protein family/group databases             
                    TIGR                              195120    188073      0.02    39   Genome annotation databases                
                    TIGRFAMs                         2470355   2253220      0.19    15   Family and domain databases                
                    TubercuList                         2250      2245     <0.01    79   Organism-specific databases                
                    UCSC                               50056     50056     <0.01    54   Genome annotation databases                
                    UniGene                           426772    394957      0.03    29   Sequence databases                         
                    VectorBase                         47559     47091     <0.01    55   Genome annotation databases                
                    World-2DPAGE                         947       942     <0.01    81   2D gel databases                           
                    WormBase                           41327     41199     <0.01    58   Organism-specific databases                
                    Xenbase                            13172     13148     <0.01    66   Organism-specific databases                
                    ZFIN                               21619     21614     <0.01    60   Organism-specific databases                
                    dictyBase                           8471      8471     <0.01    70                                              
                    eggNOG                           1147027   1147027      0.09    23                                              
                    euHCVdb                            74732     74729      0.01    50                                              
                    
                    Number of explicitly cross-referenced databases: 126
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.60   Gln (Q) 3.85   Leu (L) 9.83   Ser (S) 6.69
                    Arg (R) 5.46   Glu (E) 6.14   Lys (K) 5.28   Thr (T) 5.61
                    Asn (N) 4.16   Gly (G) 7.12   Met (M) 2.47   Trp (W) 1.31
                    Asp (D) 5.29   His (H) 2.19   Phe (F) 4.04   Tyr (Y) 3.06
                    Cys (C) 1.27   Ile (I) 6.03   Pro (P) 4.72   Val (V) 6.73
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 418191
                    Total number of entries encoded on a Plasmid: 180871
                    Total number of entries encoded on a Plastid: 10658
                    Total number of entries encoded on a Plastid; Apicoplast: 365
                    Total number of entries encoded on a Plastid; Chloroplast: 126709
                    Total number of entries encoded on a Plastid; Cyanelle: 8
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 444