Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2011_02 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2011_02 of 08-Feb-2011 of UniProtKB/TrEMBL contains 13499622 sequence entries,
                    comprising 4340005282 amino acids .
                    
                    443495 sequences have been added since release 2011_01, the sequence data of
                    675 existing entries has been updated and the annotations of
                    5357924 entries have been revised. This represents an increase of 3%.
                    
                    Number of fragments: 2240487
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           17463     0.13%
                    2: Evidence at transcript level       485355     3.60%
                    3: Inferred from homology            1284259     9.51%
                    4: Predicted                        11712545    86.76%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 329829
                    
                    The first twenty species represent 1240905 sequences:   9.2 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:15522
                    2x:57759
                    3x:29828
                    4x:17910
                    5x:11198
                    6x: 7798
                    7x: 5462
                    8x: 4345
                    9x: 3549
                    10x: 6668
                    11- 20x:17200
                    21- 50x: 6041
                    51-100x: 2200
                    >100x: 4649
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     361465  Human immunodeficiency virus 1
                    2      95333  Oryza sativa subsp. japonica (Rice)
                    3      76784  Homo sapiens (Human)
                    4      58476  Hepatitis C virus
                    5      51145  uncultured bacterium
                    6      50935  Vitis vinifera (Grape)
                    7      50470  Trichomonas vaginalis
                    8      49317  Mus musculus (Mouse)
                    9      44037  Populus trichocarpa (Western balsam poplar) 
                    10      43056  Hepatitis B virus (HBV)
                    11      41945  Zea mays (Maize)
                    12      40768  Arabidopsis thaliana (Mouse-ear cress)
                    13      39837  Paramecium tetraurelia
                    14      39332  Oryza sativa subsp. indica (Rice)
                    15      34791  Physcomitrella patens subsp. patens (Moss)
                    16      33637  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    17      33195  Selaginella moellendorffii (Spikemoss)
                    18      32625  Arabidopsis lyrata subsp. lyrata
                    19      31927  Drosophila melanogaster (Fruit fly)
                    20      31830  Caenorhabditis remanei (Caenorhabditis vulgaris)
                    21      31263  Ricinus communis (Castor bean)
                    22      29115  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    23      29022  Oikopleura dioica (Tunicate)
                    24      28088  Tetraodon nigroviridis (Green puffer)
                    25      26793  Danio rerio (Zebrafish) (Brachydanio rerio)
                    26      25260  Ralstonia solanacearum (Pseudomonas solanacearum)
                    27      24812  Nematostella vectensis (Starlet sea anemone)
                    28      23465  Rattus norvegicus (Rat)
                    29      23115  Perkinsus marinus ATCC 50983
                    30      22021  Escherichia coli
                    31      21398  Caenorhabditis elegans
                    32      21382  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    33      21086  Ixodes scapularis (Black-legged tick) (Deer tick)
                    34      20734  Trypanosoma cruzi
                    35      20437  Puccinia graminis f. sp. tritici CRL 75-36-700-3
                    36      18883  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    37      18043  Drosophila simulans (Fruit fly)
                    38      17935  Caenorhabditis briggsae
                    39      17849  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    40      17796  Ailuropoda melanoleuca (Giant panda)
                    41      17604  Phytophthora infestans T30-4
                    42      16972  Tribolium castaneum (Red flour beetle)
                    43      16929  Drosophila yakuba (Fruit fly)
                    44      16736  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    45      16707  Drosophila persimilis (Fruit fly)
                    46      16386  Ectocarpus siliculosus (Brown alga)
                    47      16277  Loa loa (Eye worm)
                    48      16254  Bos taurus (Bovine)
                    49      16249  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    50      16180  Drosophila sechellia (Fruit fly)
                    51      15983  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    52      15866  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    53      15715  Naegleria gruberi (Amoeba)
                    54      15659  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    55      15418  Drosophila willistoni (Fruit fly)
                    56      15248  Tetrahymena thermophila SB210
                    57      15171  Canis familiaris (Dog) (Canis lupus familiaris)
                    58      15137  Drosophila ananassae (Fruit fly)
                    59      15029  Harpegnathos saltator
                    60      14922  Drosophila erecta (Fruit fly)
                    61      14818  Chlamydomonas reinhardtii (Chlamydomonas smithii)
                    62      14791  Camponotus floridanus
                    63      14775  Drosophila mojavensis (Fruit fly)
                    64      14760  Anopheles gambiae (African malaria mosquito)
                    65      14696  Drosophila virilis (Fruit fly)
                    66      14671  Plasmodium chabaudi
                    67      14652  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    68      14634  Volvox carteri f. nagariensis
                    69      14626  Toxoplasma gondii
                    70      14262  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    71      13782  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    72      13560  Moniliophthora perniciosa FA553
                    73      13504  Schistosoma mansoni (Blood fluke)
                    74      13359  Aspergillus flavus 
                    75      13289  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    76      13192  Magnaporthe oryzae (strain 70-15 / FGSC 8958) (Rice blast fungus) 
                    77      13128  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
                    78      13042  Gallus gallus (Chicken)
                    79      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    80      12950  Stigmatella aurantiaca (strain DW4/3-1)
                    81      12712  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    82      12557  Glycine max (Soybean) (Glycine hispida)
                    83      12532  Leptosphaeria maculans (Blackleg fungus) (Phoma lingam)
                    84      12478  Xenopus laevis (African clawed frog)
                    85      12444  Polysphondylium pallidum (Cellular slime mold)
                    86      12300  Hepatitis C virus subtype 1b
                    87      12068  Plasmodium falciparum
                    88      12021  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    89      12019  Glomerella graminicola M1.001
                    90      11850  Aspergillus oryzae
                    91      11705  Pyrenophora teres f. teres 0-1
                    92      11646  Plasmodium berghei (strain Anka)
                    93      11645  Anopheles darlingi (Mosquito)
                    94      11564  Trichoplax adhaerens (Trichoplax reptans)
                    95      11498  Brugia malayi (Filarial nematode worm)
                    96      11272  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
                    97      11221  Helicobacter pylori (Campylobacter pylori)
                    98      11211  Ktedonobacter racemifer DSM 44963
                    99      10966  Streptomyces clavuligerus ATCC 27064
                    100      10916  Schistosoma japonicum (Blood fluke)
                    101      10859  Chaetomium globosum (Soil fungus)
                    102      10832  Pediculus humanus subsp. corporis (Body louse)
                    103      10803  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
                    104      10674  Podospora anserina
                    105      10406  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    106      10404  Neurospora crassa
                    107      10388  Aspergillus nidulans FGSC A4
                    108      10358  Phaeodactylum tricornutum (strain CCAP 1055/1)
                    109      10277  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
                    110      10223  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    111      10169  Rabies virus
                    112      10114  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
                    113      10112  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    114      10087  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    115      10056  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    116      10015  Streptomyces bingchenggensis (strain BCW-1)
                    117      10007  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    118       9755  Chlorella variabilis
                    119       9741  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    120       9674  Cryptococcus neoformans (Filobasidiella neoformans)
                    121       9662  Trypanosoma brucei gambiense DAL972
                    122       9562  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    123       9551  Aspergillus fumigatus (Sartorya fumigata)
                    124       9530  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    125       9514  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    126       9484  Streptomyces violaceusniger Tu 4113
                    127       9481  Trypanosoma brucei
                    128       9380  Salmo salar (Atlantic salmon)
                    129       9240  Monosiga brevicollis (Choanoflagellate)
                    130       9232  Candida albicans (Yeast)
                    131       9202  Amycolatopsis mediterranei (strain U-32)
                    132       9185  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    133       9182  Emericella nidulans (Aspergillus nidulans)
                    134       9177  Streptomyces hygroscopicus ATCC 53653
                    135       9165  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    136       9114  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    137       9089  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    138       8991  Dictyostelium discoideum (Slime mold)
                    139       8972  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    140       8958  Thalassiosira pseudonana (Marine diatom)
                    141       8945  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    142       8907  Arthroderma gypseum CBS 118893
                    143       8901  Catenulispora acidiphila 
                    144       8860  Aspergillus clavatus
                    145       8758  Rhodococcus sp. (strain RHA1)
                    146       8726  Paracoccidioides brasiliensis (strain Pb18)
                    147       8694  Nannizzia otae (strain CBS 113480) (Microsporum canis) (Arthroderma otae)
                    148       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    149       8601  Entamoeba dispar SAW760
                    150       8437  Plesiocystis pacifica SIR-1
                    151       8394  Streptomyces sp. AA4
                    152       8299  Entamoeba histolytica
                    153       8249  Microscilla marina ATCC 23134
                    154       8228  Leishmania major
                    155       8221  Bradyrhizobium japonicum
                    156       8202  Streptomyces sviceus ATCC 29083
                    157       8201  Microcoleus chthonoplastes PCC 7420
                    158       8164  Frankia sp. EUN1f
                    159       8154  Burkholderia xenovorans (strain LB400)
                    160       8089  Pseudomonas aeruginosa
                    161       8019  Leishmania infantum
                    162       7987  Trichophyton verrucosum (strain HKI 0517)
                    163       7978  Toxoplasma gondii ME49
                    164       7955  Ostreococcus tauri
                    165       7943  Rhodococcus opacus (strain B4)
                    166       7938  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    167       7917  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    168       7883  Leishmania braziliensis
                    169       7867  Streptomyces ghanaensis ATCC 14672
                    170       7856  Acaryochloris marina (strain MBIC 11017)
                    171       7850  Paracoccidioides brasiliensis (strain Pb03)
                    172       7823  Burkholderia sp. Ch1-1
                    173       7809  Plasmodium yoelii yoelii
                    174       7736  Uncinocarpus reesii (strain UAMH 1704)
                    175       7708  Streptomyces viridochromogenes DSM 40736
                    176       7571  Clostridium hathewayi DSM 13479
                    177       7563  Burkholderia pseudomallei MSHR346
                    178       7528  Streptomyces sp. C
                    179       7523  Streptomyces lividans TK24
                    180       7519  Solibacter usitatus (strain Ellin6076)
                    181       7514  uncultured archaeon
                    182       7500  Tuber melanosporum (Perigord truffle)
                    183       7477  Streptomyces coelicolor
                    184       7475  Burkholderia pseudomallei 1710a
                    185       7465  Burkholderia pseudomallei Pakistan 9
                    186       7459  Burkholderia sp. H160
                    187       7443  Kitasatospora setae KM-6054
                    188       7386  Ostreococcus lucimarinus (strain CCE9901)
                    189       7379  Streptomyces sp. ACT-1
                    190       7367  Burkholderia pseudomallei 576
                    191       7349  Burkholderia pseudomallei 305
                    192       7274  Clostridium bolteae ATCC BAA-613
                    193       7243  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    194       7231  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    195       7228  Streptomyces avermitilis
                    196       7200  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    197       7179  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    198       7146  Giardia intestinalis (strain ATCC 50803 / WB clone C6) (Giardia lamblia)
                    199       7140  Burkholderia pseudomallei 1106b
                    200       7131  Burkholderia phymatum (strain DSM 17167 / STM815)
                    201       7124  Burkholderia ambifaria MEX-5
                    202       7094  Medicago truncatula (Barrel medic) (Medicago tribuloides)
                    203       7079  Frankia sp. (strain EuI1c)
                    204       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    205       7016  Myxococcus xanthus (strain DK 1622)
                    206       7005  Mucilaginibacter paludis DSM 18603
                    207       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    208       6976  Rhodopirellula baltica
                    209       6959  Frankia sp. (strain EAN1pec)
                    210       6943  Streptomyces sp. Mg1
                    211       6932  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    212       6923  Burkholderia ambifaria IOP40-10
                    213       6903  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    214       6902  Saccharopolyspora erythraea (strain NRRL 23338)
                    215       6897  Hepatitis C virus subtype 1a
                    216       6892  Streptomyces roseosporus NRRL 15998
                    217       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    218       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    219       6866  Burkholderia sp. (strain CCGE1002)
                    220       6866  Streptomyces pristinaespiralis ATCC 25486
                    221       6859  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    222       6817  Clostridium asparagiforme DSM 15981
                    223       6816  Rhizobium loti (Mesorhizobium loti)
                    224       6807  Sus scrofa (Pig)
                    225       6798  Achromobacter xylosoxidans (strain A8)
                    226       6771  Burkholderia pseudomallei (strain 1106a)
                    227       6769  Sinorhizobium meliloti AK83
                    228       6740  Burkholderia pseudomallei (strain 668)
                    229       6736  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    230       6725  Burkholderia graminis C4D1M
                    231       6713  Rhizobium leguminosarum bv. viciae (strain 3841)
                    232       6712  Rhodococcus erythropolis SK121
                    233       6705  Chthoniobacter flavus Ellin428
                    234       6702  Streptomyces flavogriseus ATCC 33331
                    235       6692  Bacillus thuringiensis IBL 200
                    236       6690  delta proteobacterium NaphS2
                    237       6682  Sinorhizobium meliloti BL225C
                    238       6680  Haliangium ochraceum (strain DSM 14365 / JCM 11303 / SMP-2)
                    239       6679  Mesorhizobium opportunistum WSM2075
                    240       6674  Streptococcus pneumoniae
                    241       6662  Burkholderia pseudomallei S13
                    242       6657  Burkholderia cepacia (strain J2315 / LMG 16656) (Burkholderia cenocepacia 
                    243       6655  Bacillus thuringiensis IBL 4222
                    244       6644  Beggiatoa sp. PS
                    245       6627  Burkholderia cenocepacia (strain MC0-3)
                    246       6614  Burkholderia multivorans CGD2
                    247       6613  Burkholderia pseudomallei Pasteur 52237
                    248       6606  Burkholderia multivorans CGD2M
                    249       6583  Bacillus thuringiensis serovar sotto str. T04001
                    250       6559  Cyanothece sp. (strain PCC 7822)
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          240560 (  2%)
                    Bacteria        8487034 ( 63%)
                    Eukaryota       3688204 ( 27%)
                    Viruses         1065952 (  8%)
                    Other             17871 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  76819 (  2%)           (  1%)
                    Other Mammalia        224462 (  6%)           (  2%)
                    Other Vertebrata      330698 (  9%)           (  2%)
                    Viridiplantae         854021 ( 23%)           (  6%)
                    Fungi                 747122 ( 20%)           (  6%)
                    Insecta               594896 ( 16%)           (  4%)
                    Nematoda              110164 (  3%)           (  1%)
                    Other                 750022 ( 20%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  292375             1001-1100    79243
                    51- 100 1075468             1101-1200    55943
                    101- 150 1235137             1201-1300    38514
                    151- 200 1193939             1301-1400    25345
                    201- 250 1198848             1401-1500    20338
                    251- 300 1162072             1501-1600    14570
                    301- 350 1055383             1601-1700    10900
                    351- 400  817315             1701-1800     8490
                    401- 450  690025             1801-1900     6865
                    451- 500  576932             1901-2000     5772
                    501- 550  392196             2001-2100     4656
                    551- 600  302216             2101-2200     4824
                    601- 650  218858             2201-2300     3807
                    651- 700  170294             2301-2400     3018
                    701- 750  146940             2401-2500     2566
                    751- 800  132374             >2500        22530
                    801- 850   97168
                    851- 900   88286
                    901- 950   60246
                    951-1000   45682
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   321 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    16412412                1.22                                                    
                    Submitted to EMBL/GenBank/DDBJ   9690518   8494508      0.72                                                    
                    Journal                          6549736   5888054      0.49                                                    
                    Submitted to other databases       64721     64129     <0.01                                                    
                    Thesis                              7609      7551     <0.01                                                    
                    Book citation                       5239      5188     <0.01                                                    
                    Other                              94589     92902      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 312437
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                       7722366                0.57                                                    
                    CATALYTIC ACTIVITY                804284    733539      0.06     4                                              
                    CAUTION                          2889702   2889702      0.21     1                                              
                    COFACTOR                          227090    216774      0.02     8                                              
                    DOMAIN                             23309     21521     <0.01     9                                              
                    FUNCTION                          898865    807018      0.07     3                                              
                    INTERACTION                         2466      2466     <0.01    11                                              
                    MISCELLANEOUS                      22085     22081     <0.01    10                                              
                    PATHWAY                           314937    287618      0.02     7                                              
                    SIMILARITY                       1631699   1376918      0.12     2                                              
                    SUBCELLULAR LOCATION              449584    449583      0.03     6                                              
                    SUBUNIT                           458345    458343      0.03     5                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       4556438                0.34                                                    
                    CHAIN                             469506    370416      0.03     2                                              
                    NON_TER                          3774902   2238994      0.28     1                                              
                    SIGNAL                            311437    311376      0.02     3                                              
                    TRANSIT                              593       593     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             155141089               11.49                                                    
                    AGD                                 2561      2561     <0.01    77   Organism-specific databases                
                    ANU-2DPAGE                            56        56     <0.01    95   2D gel databases                           
                    Allergome                           1907      1369     <0.01    81   Protein family/group databases             
                    ArachnoServer                         66        66     <0.01    93   Organism-specific databases                
                    ArrayExpress                       92772     92761      0.01    49   Gene expression databases                  
                    BRENDA                              2833      2770     <0.01    75   Enzyme and pathway databases               
                    Bgee                              128415    128314      0.01    45   Gene expression databases                  
                    BioCyc                           1623943   1589359      0.12    21   Enzyme and pathway databases               
                    CAZy                               74686     70177      0.01    52   Protein family/group databases             
                    CGD                                 6771      6771     <0.01    72   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    98   2D gel databases                           
                    CTD                               168892    168165      0.01    43   Organism-specific databases                
                    CYGD                                   5         5     <0.01    99   Organism-specific databases                
                    DIP                                 2757      2752     <0.01    76   Protein-protein interaction databases      
                    EMBL                            15106672  13470794      1.12     3   Sequence databases                         
                    Ensembl                           386266    233775      0.03    31   Genome annotation databases                
                    EnsemblBacteria                   566979    536744      0.04    28   Genome annotation databases                
                    EnsemblFungi                      106184    106107      0.01    46   Genome annotation databases                
                    EnsemblMetazoa                    292212    272911      0.02    34   Genome annotation databases                
                    EnsemblPlants                     262161    234685      0.02    35   Genome annotation databases                
                    EnsemblProtists                    33713     33075     <0.01    60   Genome annotation databases                
                    EuPathDB                          151357    151357      0.01    44   Organism-specific databases                
                    FlyBase                           195069    193534      0.01    41   Organism-specific databases                
                    GO                              26502863   8300766      1.96     2   Ontologies                                 
                    Gene3D                           5231610   4243747      0.39     9   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   102   Organism-specific databases                
                    GeneID                           5487569   5373398      0.41     7   Genome annotation databases                
                    GeneTree                         1161123   1160782      0.09    23   Phylogenomic databases                     
                    Genevestigator                    100458    100448      0.01    48   Gene expression databases                  
                    GenoList                           14753     14480     <0.01    66   Organism-specific databases                
                    GenomeReviews                    3653607   3567575      0.27    12   Genome annotation databases                
                    Gramene                            68814     68814      0.01    53   Organism-specific databases                
                    H-InvDB                              601       490     <0.01    85   Organism-specific databases                
                    HAMAP                             851570    841935      0.06    26   Family and domain databases                
                    HGNC                               65322     63555     <0.01    55   Organism-specific databases                
                    HOGENOM                          2201247   2201203      0.16    17   Phylogenomic databases                     
                    HOVERGEN                          317735    317735      0.02    33   Phylogenomic databases                     
                    HSSP                              253727    253454      0.02    36   3D structure databases                     
                    IPI                               240238    240231      0.02    38   Sequence databases                         
                    InParanoid                        195230    195140      0.01    39   Phylogenomic databases                     
                    IntAct                             15580     15580     <0.01    64   Protein-protein interaction databases      
                    InterPro                        27394048  10025779      2.03     1   Family and domain databases                
                    KEGG                             4749170   4653210      0.35    10   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    73   Organism-specific databases                
                    Leproma                              936       935     <0.01    84   Organism-specific databases                
                    MEROPS                             66590     65146     <0.01    54   Protein family/group databases             
                    MGI                                42919     42884     <0.01    58   Organism-specific databases                
                    MINT                                9011      9011     <0.01    70   Protein-protein interaction databases      
                    NMPDR                             920995    920985      0.07    25   Genome annotation databases                
                    NextBio                            46975     46972     <0.01    57   Other                                      
                    OMA                              2429359   2429357      0.18    16   Phylogenomic databases                     
                    OrthoDB                           429419    429418      0.03    30   Phylogenomic databases                     
                    PANTHER                          2111653   1990855      0.16    20   Family and domain databases                
                    PDB                                13522      8038     <0.01    67   3D structure databases                     
                    PDBsum                             13192      7845     <0.01    69   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    90   2D gel databases                           
                    PIR                               175272    142437      0.01    42   Sequence databases                         
                    PIRSF                             728525    728525      0.05    27   Family and domain databases                
                    PMAP-CutDB                           253       253     <0.01    87   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01   100   2D gel databases                           
                    PRIDE                             103346    103344      0.01    47   Proteomic databases                        
                    PRINTS                           2115410   1877031      0.16    19   Family and domain databases                
                    PROSITE                          6368150   4257291      0.47     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    97   Enzyme and pathway databases               
                    PeptideAtlas                         147       147     <0.01    89   Proteomic databases                        
                    PeroxiBase                          2497      2488     <0.01    78   Protein family/group databases             
                    Pfam                            12729161   9479748      0.94     4   Family and domain databases                
                    PharmGKB                              85        85     <0.01    92   Organism-specific databases                
                    PhosphoSite                         1758      1758     <0.01    82   PTM databases                              
                    PhylomeDB                         372617    372585      0.03    32   Phylogenomic databases                     
                    ProDom                            245606    229926      0.02    37   Family and domain databases                
                    ProMEX                               421       421     <0.01    86   Proteomic databases                        
                    ProtClustDB                      2735015   2735015      0.20    13   Phylogenomic databases                     
                    ProteinModelPortal               4143594   4142639      0.31    11   3D structure databases                     
                    PseudoCAP                           4344      4341     <0.01    74   Organism-specific databases                
                    REBASE                             14922     14383     <0.01    65   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   95        94     <0.01    91   2D gel databases                           
                    RGD                                17465     17365     <0.01    63   Organism-specific databases                
                    Reactome                              58        55     <0.01    94   Enzyme and pathway databases               
                    RefSeq                           5502746   5375724      0.41     6   Sequence databases                         
                    SGD                                  246       246     <0.01    88   Organism-specific databases                
                    SMART                            2696241   2086303      0.20    14   Family and domain databases                
                    SMR                              2140303   2140303      0.16    18   3D structure databases                     
                    STRING                           1204301   1204163      0.09    22   Protein-protein interaction databases      
                    SUPFAM                           5358712   4436906      0.40     8   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    96   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01   101   2D gel databases                           
                    TAIR                               18594     18510     <0.01    62   Organism-specific databases                
                    TCDB                                2380      2371     <0.01    79   Protein family/group databases             
                    TIGR                              195082    188034      0.01    40   Genome annotation databases                
                    TIGRFAMs                         2562179   2336667      0.19    15   Family and domain databases                
                    TubercuList                         2230      2225     <0.01    80   Organism-specific databases                
                    UCSC                               49802     49802     <0.01    56   Genome annotation databases                
                    UniGene                           462775    432969      0.03    29   Sequence databases                         
                    VectorBase                         78969     78457      0.01    50   Genome annotation databases                
                    World-2DPAGE                         946       941     <0.01    83   2D gel databases                           
                    WormBase                           41311     41182     <0.01    59   Organism-specific databases                
                    Xenbase                            13209     13173     <0.01    68   Organism-specific databases                
                    ZFIN                               21601     21596     <0.01    61   Organism-specific databases                
                    dictyBase                           8184      8184     <0.01    71   Organism-specific databases                
                    eggNOG                           1146406   1146406      0.08    24   Phylogenomic databases                     
                    euHCVdb                            74723     74720      0.01    51   Organism-specific databases                
                    
                    Number of explicitly cross-referenced databases: 129
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.61   Gln (Q) 3.84   Leu (L) 9.83   Ser (S) 6.69
                    Arg (R) 5.46   Glu (E) 6.13   Lys (K) 5.27   Thr (T) 5.62
                    Asn (N) 4.15   Gly (G) 7.12   Met (M) 2.48   Trp (W) 1.31
                    Asp (D) 5.30   His (H) 2.19   Phe (F) 4.04   Tyr (Y) 3.05
                    Cys (C) 1.26   Ile (I) 6.03   Pro (P) 4.73   Val (V) 6.74
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 463306
                    Total number of entries encoded on a Plasmid: 188097
                    Total number of entries encoded on a Plastid: 11782
                    Total number of entries encoded on a Plastid; Apicoplast: 365
                    Total number of entries encoded on a Plastid; Chloroplast: 129938
                    Total number of entries encoded on a Plastid; Cyanelle: 8
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 444