Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2011_08 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2011_08 of 27-Jul-2011 of UniProtKB/TrEMBL contains 16504022 sequence entries,
                    comprising 5357406695 amino acids .
                    
                    538274 sequences have been added since release 2011_07, the sequence data of
                    2976 existing entries has been updated and the annotations of
                    5723714 entries have been revised. This represents an increase of 4%.
                    
                    Number of fragments: 2655446
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           14122     0.09%
                    2: Evidence at transcript level       530839     3.22%
                    3: Inferred from homology            3729175    22.60%
                    4: Predicted                        12229886    74.10%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 376640
                    
                    The first twenty species represent 1348489 sequences:   8.2 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:18081
                    2x:65795
                    3x:33309
                    4x:19998
                    5x:12070
                    6x: 8506
                    7x: 6194
                    8x: 4844
                    9x: 3849
                    10x: 7560
                    11- 20x:19037
                    21- 50x: 6779
                    51-100x: 2402
                    >100x: 5483
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     393851  Human immunodeficiency virus 1
                    2      95258  Oryza sativa subsp. japonica (Rice)
                    3      89812  Homo sapiens (Human)
                    4      61801  Hepatitis C virus
                    5      58162  Mus musculus (Mouse)
                    6      56851  uncultured bacterium
                    7      53920  Vitis vinifera (Grape)
                    8      51548  Danio rerio (Zebrafish) (Brachydanio rerio)
                    9      50471  Trichomonas vaginalis
                    10      45125  Arabidopsis thaliana (Mouse-ear cress)
                    11      45004  Hepatitis B virus (HBV)
                    12      44057  Populus trichocarpa (Western balsam poplar) 
                    13      42025  Zea mays (Maize)
                    14      42012  Callithrix jacchus (White-tufted-ear marmoset)
                    15      39841  Paramecium tetraurelia
                    16      39354  Oryza sativa subsp. indica (Rice)
                    17      37694  Macaca mulatta (Rhesus macaque)
                    18      34795  Physcomitrella patens subsp. patens (Moss)
                    19      33644  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    20      33264  Selaginella moellendorffii (Spikemoss)
                    21      33003  Drosophila melanogaster (Fruit fly)
                    22      32604  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
                    23      32317  Rattus norvegicus (Rat)
                    24      31830  Caenorhabditis remanei (Caenorhabditis vulgaris)
                    25      31558  Monodelphis domestica (Gray short-tailed gray opossum)
                    26      31298  Ricinus communis (Castor bean)
                    27      30824  Trypanosoma cruzi
                    28      30523  Daphnia pulex (Water flea)
                    29      29162  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    30      29024  Oikopleura dioica (Tunicate)
                    31      28793  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    32      28090  Tetraodon nigroviridis (Green puffer)
                    33      27589  Bos taurus (Bovine)
                    34      27032  Canis familiaris (Dog) (Canis lupus familiaris)
                    35      26911  Ornithorhynchus anatinus (Duckbill platypus)
                    36      24811  Nematostella vectensis (Starlet sea anemone)
                    37      24676  Gallus gallus (Chicken)
                    38      24637  Sus scrofa (Pig)
                    39      23631  Equus caballus (Horse)
                    40      23597  Ralstonia solanacearum (Pseudomonas solanacearum)
                    41      23244  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    42      23115  Perkinsus marinus ATCC 50983
                    43      22686  Escherichia coli
                    44      21628  Caenorhabditis elegans
                    45      21523  Hordeum vulgare var. distichum (Two-rowed barley)
                    46      21245  Caenorhabditis briggsae
                    47      21089  Ixodes scapularis (Black-legged tick) (Deer tick)
                    48      20982  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
                    49      20434  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
                    50      19158  Toxoplasma gondii
                    51      18893  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    52      18771  mine drainage metagenome
                    53      18588  Drosophila simulans (Fruit fly)
                    54      17843  Ailuropoda melanoleuca (Giant panda)
                    55      17841  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    56      17603  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
                    57      17031  Drosophila yakuba (Fruit fly)
                    58      16977  Tribolium castaneum (Red flour beetle)
                    59      16736  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    60      16706  Drosophila persimilis (Fruit fly)
                    61      16425  Ectocarpus siliculosus (Brown alga)
                    62      16295  Loa loa (Eye worm)
                    63      16243  Trichinella spiralis (Trichina worm)
                    64      16237  Melampsora larici-populina 98AG31
                    65      16233  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    66      16179  Drosophila sechellia (Fruit fly)
                    67      15979  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    68      15767  Phaeosphaeria nodorum (strain SN15 / FGSC 10173) (Glume blotch fungus) 
                    69      15715  Naegleria gruberi (Amoeba)
                    70      15634  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    71      15418  Drosophila willistoni (Fruit fly)
                    72      15247  Tetrahymena thermophila SB210
                    73      15138  Drosophila ananassae (Fruit fly)
                    74      15029  Harpegnathos saltator
                    75      14921  Drosophila erecta (Fruit fly)
                    76      14828  Hepatitis C virus subtype 1a
                    77      14820  Chlamydomonas reinhardtii (Chlamydomonas smithii)
                    78      14791  Camponotus floridanus
                    79      14782  Drosophila mojavensis (Fruit fly)
                    80      14694  Drosophila virilis (Fruit fly)
                    81      14671  Plasmodium chabaudi
                    82      14651  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    83      14634  Volvox carteri f. nagariensis
                    84      14452  Anopheles gambiae (African malaria mosquito)
                    85      14322  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
                    86      14243  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    87      14063  Hepatitis C virus subtype 1b
                    88      13964  Acromyrmex echinatior (Panamanian leafcutter ant)
                    89      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
                    90      13510  Schistosoma mansoni (Blood fluke)
                    91      13506  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    92      13482  Plasmodium falciparum
                    93      13330  Aspergillus flavus 
                    94      13278  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    95      13170  Magnaporthe oryzae (strain 70-15 / FGSC 8958) (Rice blast fungus) 
                    96      13122  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
                    97      12983  Albugo laibachii Nc14
                    98      12950  Stigmatella aurantiaca (strain DW4/3-1)
                    99      12938  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    100      12685  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    101      12643  Glycine max (Soybean) (Glycine hispida)
                    102      12547  Xenopus laevis (African clawed frog)
                    103      12448  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
                    104      12444  Polysphondylium pallidum (Cellular slime mold)
                    105      12352  Dictyostelium purpureum (Slime mold)
                    106      12206  Dictyostelium fasciculatum
                    107      11996  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    108      11996  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
                    109      11717  Thalassiosira pseudonana (Marine diatom)
                    110      11703  Salpingoeca sp. ATCC 50818
                    111      11687  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
                    112      11647  Anopheles darlingi (Mosquito)
                    113      11645  Plasmodium berghei (strain Anka)
                    114      11591  Aspergillus oryzae (strain ATCC 42149 / RIB 40)
                    115      11563  Trichoplax adhaerens (Trichoplax reptans)
                    116      11510  Aureococcus anophagefferens
                    117      11497  Brugia malayi (Filarial nematode worm)
                    118      11361  Helicobacter pylori (Campylobacter pylori)
                    119      11283  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
                    120      11211  Ktedonobacter racemifer DSM 44963
                    121      10966  Streptomyces clavuligerus ATCC 27064
                    122      10929  Schistosoma japonicum (Blood fluke)
                    123      10842  Pediculus humanus subsp. corporis (Body louse)
                    124      10823  Chaetomium globosum  
                    125      10774  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
                    126      10573  Metarhizium robertsii (strain ARSEF 23) (Metarhizium anisopliae)
                    127      10552  Podospora anserina (strain S / DSM 980 / FGSC 10383) (Pleurage anserina)
                    128      10382  Pseudomonas syringae pv. glycinea str. race 4
                    129      10379  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    130      10358  Aspergillus nidulans FGSC A4
                    131      10357  Phaeodactylum tricornutum (strain CCAP 1055/1)
                    132      10276  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
                    133      10206  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    134      10196  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
                    135      10169  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    136      10141  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
                    137      10113  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
                    138      10089  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    139      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
                    140      10055  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    141      10015  Streptomyces bingchenggensis (strain BCW-1)
                    142       9986  Rabies virus
                    143       9835  Chlorella variabilis
                    144       9824  Metarhizium acridum (strain CQMa 102)
                    145       9709  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    146       9663  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
                    147       9536  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    148       9513  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    149       9488  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    150       9484  Streptomyces violaceusniger Tu 4113
                    151       9445  Ajellomyces capsulata (strain H88) (Darling's disease fungus) 
                    152       9422  Salmo salar (Atlantic salmon)
                    153       9239  Monosiga brevicollis (Choanoflagellate)
                    154       9210  Candida albicans (Yeast)
                    155       9202  Amycolatopsis mediterranei (strain U-32)
                    156       9177  Streptomyces himastatinicus ATCC 53653
                    157       9166  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    158       9154  Emericella nidulans (Aspergillus nidulans)
                    159       9148  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    160       9136  Pseudomonas syringae pv. pisi str. 1704B
                    161       9113  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    162       9067  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    163       9021  Neurospora crassa 
                    164       9014  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
                    165       8994  Dictyostelium discoideum (Slime mold)
                    166       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    167       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    168       8940  Burkholderia sp. TJI49
                    169       8900  Catenulispora acidiphila 
                    170       8862  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
                    171       8812  Trypanosoma brucei
                    172       8799  Aspergillus clavatus 
                    173       8777  Pseudomonas syringae pv. japonica str. M301072PT
                    174       8757  Rhodococcus sp. (strain RHA1)
                    175       8705  Trichophyton rubrum (strain ATCC MYA-4607 / CBS 118892) (Athlete's foot fungus)
                    176       8701  Paracoccidioides brasiliensis (strain Pb18)
                    177       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    178       8676  Trichophyton equinum (strain ATCC MYA-4606 / CBS 127.97) (Horse ringworm fungus)
                    179       8663  Arthroderma otae (strain CBS 113480) (Microsporum canis)
                    180       8610  Batrachochytrium dendrobatidis JAM81
                    181       8599  Entamoeba dispar SAW760
                    182       8520  Trichophyton tonsurans (strain CBS 112818) (Scalp ringworm fungus)
                    183       8437  Plesiocystis pacifica SIR-1
                    184       8394  Streptomyces sp. AA4
                    185       8374  Capsaspora owczarzaki ATCC 30864
                    186       8310  Grosmannia clavigera (strain kw1407 / UAMH 11150) (Blue stain fungus) 
                    187       8302  Entamoeba histolytica
                    188       8296  Bradyrhizobium japonicum
                    189       8274  Leishmania major
                    190       8249  Microscilla marina ATCC 23134
                    191       8207  uncultured archaeon
                    192       8202  Leishmania infantum
                    193       8202  Streptomyces sviceus ATCC 29083
                    194       8201  Microcoleus chthonoplastes PCC 7420
                    195       8185  Leishmania braziliensis
                    196       8164  Pseudomonas aeruginosa
                    197       8163  Frankia sp. EUN1f
                    198       8154  Burkholderia xenovorans (strain LB400)
                    199       8044  Leishmania mexicana MHOM/GT/2001/U1103
                    200       7961  Leishmania donovani BPK282A1
                    201       7958  Trichophyton verrucosum (strain HKI 0517)
                    202       7955  Ostreococcus tauri
                    203       7943  Rhodococcus opacus (strain B4)
                    204       7917  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    205       7907  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    206       7866  Streptomyces ghanaensis ATCC 14672
                    207       7856  Acaryochloris marina (strain MBIC 11017)
                    208       7826  Paracoccidioides brasiliensis (strain Pb03)
                    209       7823  Burkholderia sp. Ch1-1
                    210       7808  Plasmodium yoelii yoelii
                    211       7710  Uncinocarpus reesii (strain UAMH 1704)
                    212       7706  Streptomyces viridochromogenes DSM 40736
                    213       7571  Clostridium hathewayi DSM 13479
                    214       7563  Burkholderia pseudomallei MSHR346
                    215       7528  Streptomyces sp. C
                    216       7523  Streptomyces lividans TK24
                    217       7519  Solibacter usitatus (strain Ellin6076)
                    218       7490  Pseudomonas syringae pv. mori str. 301020
                    219       7487  Tuber melanosporum (strain Mel28) (Perigord black truffle)
                    220       7475  Burkholderia pseudomallei 1710a
                    221       7472  Streptomyces coelicolor
                    222       7465  Burkholderia pseudomallei Pakistan 9
                    223       7459  Burkholderia sp. H160
                    224       7448  Streptomyces venezuelae 
                    225       7443  Kitasatospora setae  
                    226       7385  Ostreococcus lucimarinus (strain CCE9901)
                    227       7383  Lyngbya majuscula 3L
                    228       7367  Burkholderia pseudomallei 576
                    229       7351  Burkholderia gladioli BSR3
                    230       7349  Burkholderia pseudomallei 305
                    231       7274  Clostridium bolteae ATCC BAA-613
                    232       7241  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    233       7231  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    234       7227  Streptomyces avermitilis
                    235       7177  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    236       7152  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    237       7145  Giardia intestinalis (strain ATCC 50803 / WB clone C6) (Giardia lamblia)
                    238       7140  Burkholderia pseudomallei 1106b
                    239       7130  Burkholderia phymatum (strain DSM 17167 / STM815)
                    240       7124  Burkholderia ambifaria MEX-5
                    241       7120  Pseudomonas syringae Cit 7
                    242       7111  Medicago truncatula (Barrel medic) (Medicago tribuloides)
                    243       7111  Neospora caninum Liverpool
                    244       7079  Frankia sp. (strain EuI1c)
                    245       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    246       7017  Myxococcus xanthus (strain DK 1622)
                    247       7005  Mucilaginibacter paludis DSM 18603
                    248       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    249       6974  Rhodopirellula baltica
                    250       6959  Frankia sp. (strain EAN1pec)
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          281106 (  2%)
                    Bacteria       10511730 ( 64%)
                    Eukaryota       4505604 ( 27%)
                    Viruses         1166675 (  7%)
                    Other             38906 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  89848 (  2%)           (  1%)
                    Other Mammalia        445384 ( 10%)           (  3%)
                    Other Vertebrata      412166 (  9%)           (  2%)
                    Viridiplantae         904259 ( 20%)           (  5%)
                    Fungi                 909788 ( 20%)           (  6%)
                    Insecta               682412 ( 15%)           (  4%)
                    Nematoda              136892 (  3%)           (  1%)
                    Other                 924855 ( 21%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  362649             1001-1100    98348
                    51- 100 1315972             1101-1200    69283
                    101- 150 1514969             1201-1300    48322
                    151- 200 1466821             1301-1400    31729
                    201- 250 1477012             1401-1500    25413
                    251- 300 1428355             1501-1600    18160
                    301- 350 1301328             1601-1700    13689
                    351- 400 1000335             1701-1800    10665
                    401- 450  851596             1801-1900     8718
                    451- 500  710554             1901-2000     7415
                    501- 550  478921             2001-2100     5941
                    551- 600  370128             2101-2200     5936
                    601- 650  269717             2201-2300     4671
                    651- 700  210704             2301-2400     3787
                    701- 750  181178             2401-2500     3211
                    751- 800  162325             >2500        27736
                    801- 850  121299
                    851- 900  109787
                    901- 950   75356
                    951-1000   56546
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   324 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    20232134                1.23                                                    
                    Submitted to EMBL/GenBank/DDBJ  11795158  10441094      0.71                                                    
                    Journal                          7867958   7205860      0.48                                                    
                    Submitted to other databases      410308    403887      0.02                                                    
                    Thesis                              8171      8113     <0.01                                                    
                    Book citation                       5677      5628     <0.01                                                    
                    Other                             144862    142799      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 324075
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                      16430145                1.00                                                    
                    CATALYTIC ACTIVITY               1644616   1519570      0.10     4                                              
                    CAUTION                          4524661   4524655      0.27     2                                              
                    COFACTOR                          530462    505394      0.03     8                                              
                    DOMAIN                             40424     38177     <0.01     9                                              
                    FUNCTION                         1941397   1784013      0.12     3                                              
                    INTERACTION                         2430      2430     <0.01    11                                              
                    MISCELLANEOUS                      38064     38006     <0.01    10                                              
                    PATHWAY                           846108    783299      0.05     6                                              
                    SIMILARITY                       4773277   4142834      0.29     1                                              
                    SUBCELLULAR LOCATION             1366200   1358145      0.08     5                                              
                    SUBUNIT                           722506    719923      0.04     7                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       5285303                0.32                                                    
                    CHAIN                             531214    420121      0.03     2                                              
                    NON_TER                          4387360   2653900      0.27     1                                              
                    SIGNAL                            366132    365039      0.02     3                                              
                    TRANSIT                              597       597     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             186079252               11.27                                                    
                    AGD                                 2530      2530     <0.01    77   Organism-specific databases                
                    ANU-2DPAGE                            56        56     <0.01    94   2D gel databases                           
                    Allergome                           2097      1507     <0.01    81   Protein family/group databases             
                    ArachnoServer                         66        66     <0.01    93   Organism-specific databases                
                    ArrayExpress                       90780     90769      0.01    49   Gene expression databases                  
                    BRENDA                              2770      2738     <0.01    75   Enzyme and pathway databases               
                    Bgee                              109843    109646      0.01    46   Gene expression databases                  
                    BioCyc                            671072    656733      0.04    27   Enzyme and pathway databases               
                    CAZy                               74460     69950     <0.01    53   Protein family/group databases             
                    CGD                                 6743      6743     <0.01    72   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    98   2D gel databases                           
                    CTD                               235625    234678      0.01    39   Organism-specific databases                
                    CYGD                                   2         2     <0.01   100   Organism-specific databases                
                    DIP                                 2739      2734     <0.01    76   Protein-protein interaction databases      
                    EMBL                            18537337  16305305      1.12     3   Sequence databases                         
                    Ensembl                           453029    432226      0.03    31   Genome annotation databases                
                    EnsemblBacteria                   567580    532693      0.03    29   Genome annotation databases                
                    EnsemblFungi                      108180    108085      0.01    47   Genome annotation databases                
                    EnsemblMetazoa                    317807    295100      0.02    33   Genome annotation databases                
                    EnsemblPlants                     259658    232431      0.02    37   Genome annotation databases                
                    EnsemblProtists                    72634     71475     <0.01    54   Genome annotation databases                
                    EuPathDB                          182663    182662      0.01    44   Organism-specific databases                
                    FlyBase                           195662    194112      0.01    41   Organism-specific databases                
                    GO                              30738068  10168218      1.86     2   Ontologies                                 
                    Gene3D                           7159745   5744325      0.43     6   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   102   Organism-specific databases                
                    GeneID                           6039089   5922563      0.37     9   Genome annotation databases                
                    GeneTree                         1152291   1151946      0.07    23   Phylogenomic databases                     
                    Genevestigator                     96533     96528      0.01    48   Gene expression databases                  
                    GenoList                           14747     14475     <0.01    67   Organism-specific databases                
                    GenomeReviews                    4257898   4159573      0.26    12   Genome annotation databases                
                    Gramene                            68653     68653     <0.01    56   Organism-specific databases                
                    H-InvDB                              595       484     <0.01    85   Organism-specific databases                
                    HAMAP                            1293250   1278382      0.08    22   Family and domain databases                
                    HGNC                               75378     73569     <0.01    51   Organism-specific databases                
                    HOGENOM                          2194017   2193975      0.13    20   Phylogenomic databases                     
                    HOVERGEN                          315086    315075      0.02    34   Phylogenomic databases                     
                    HSSP                              252749    252510      0.02    38   3D structure databases                     
                    IPI                               308690    308374      0.02    35   Sequence databases                         
                    InParanoid                        192876    192809      0.01    43   Phylogenomic databases                     
                    IntAct                             16074     16074     <0.01    65   Protein-protein interaction databases      
                    InterPro                        34954739  12544349      2.12     1   Family and domain databases                
                    KEGG                             5171753   5069790      0.31    11   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    73   Organism-specific databases                
                    Leproma                              936       935     <0.01    84   Organism-specific databases                
                    MEROPS                             70330     68849     <0.01    55   Protein family/group databases             
                    MGI                                34389     34169     <0.01    60   Organism-specific databases                
                    MINT                                8944      8944     <0.01    70   Protein-protein interaction databases      
                    NMPDR                             920222    920218      0.06    26   Genome annotation databases                
                    NextBio                            45632     45629     <0.01    58   Other                                      
                    OMA                              2420211   2420209      0.15    18   Phylogenomic databases                     
                    OrthoDB                           579803    579639      0.04    28   Phylogenomic databases                     
                    PANTHER                          2195518   2115922      0.13    19   Family and domain databases                
                    PDB                                15035      8822     <0.01    66   3D structure databases                     
                    PDBsum                             14575      8581     <0.01    68   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    89   2D gel databases                           
                    PIR                               174837    141998      0.01    45   Sequence databases                         
                    PIRSF                            1027578   1027314      0.06    25   Family and domain databases                
                    PMAP-CutDB                           238       238     <0.01    87   Other                                      
                    PMMA-2DPAGE                            2         2     <0.01    99   2D gel databases                           
                    PRIDE                             213071    212845      0.01    40   Proteomic databases                        
                    PRINTS                           2681140   2388173      0.16    16   Family and domain databases                
                    PROSITE                          8146933   5456901      0.49     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    97   Enzyme and pathway databases               
                    PeptideAtlas                         147       147     <0.01    88   Proteomic databases                        
                    PeroxiBase                          2501      2494     <0.01    78   Protein family/group databases             
                    Pfam                            15792633  11759029      0.96     4   Family and domain databases                
                    PharmGKB                              83        83     <0.01    92   Organism-specific databases                
                    PhosphoSite                         1550      1550     <0.01    82   PTM databases                              
                    PhylomeDB                         371240    371208      0.02    32   Phylogenomic databases                     
                    ProDom                            299575    281850      0.02    36   Family and domain databases                
                    ProMEX                               321       321     <0.01    86   Proteomic databases                        
                    ProtClustDB                      2730566   2730555      0.17    15   Phylogenomic databases                     
                    ProteinModelPortal               5552651   5547561      0.34    10   3D structure databases                     
                    PseudoCAP                           4342      4339     <0.01    74   Organism-specific databases                
                    REBASE                             19896     19258     <0.01    63   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   90        89     <0.01    91   2D gel databases                           
                    RGD                                20564     20378     <0.01    62   Organism-specific databases                
                    Reactome                              93        90     <0.01    90   Enzyme and pathway databases               
                    RefSeq                           6060255   5928982      0.37     8   Sequence databases                         
                    SGD                                   12        12     <0.01    96   Organism-specific databases                
                    SMART                            3386598   2614631      0.21    13   Family and domain databases                
                    SMR                              2146863   2146863      0.13    21   3D structure databases                     
                    STRING                           2608592   2608405      0.16    17   Protein-protein interaction databases      
                    SUPFAM                           6832597   5647928      0.41     7   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    95   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01   101   2D gel databases                           
                    TAIR                               16884     16803     <0.01    64   Organism-specific databases                
                    TCDB                                2431      2422     <0.01    79   Protein family/group databases             
                    TIGR                              195029    187970      0.01    42   Genome annotation databases                
                    TIGRFAMs                         3369352   3071592      0.20    14   Family and domain databases                
                    TubercuList                         2116      2111     <0.01    80   Organism-specific databases                
                    UCSC                               49193     49193     <0.01    57   Genome annotation databases                
                    UniGene                           482023    451439      0.03    30   Sequence databases                         
                    VectorBase                         75602     75093     <0.01    50   Genome annotation databases                
                    World-2DPAGE                         943       938     <0.01    83   2D gel databases                           
                    WormBase                           41494     41335     <0.01    59   Organism-specific databases                
                    Xenbase                            13203     13173     <0.01    69   Organism-specific databases                
                    ZFIN                               21552     21547     <0.01    61   Organism-specific databases                
                    dictyBase                           7654      7654     <0.01    71   Organism-specific databases                
                    eggNOG                           1145089   1145089      0.07    24   Phylogenomic databases                     
                    euHCVdb                            75268     75265     <0.01    52   Organism-specific databases                
                    
                    Number of explicitly cross-referenced databases: 129
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.61   Gln (Q) 3.89   Leu (L) 9.86   Ser (S) 6.72
                    Arg (R) 5.46   Glu (E) 6.14   Lys (K) 5.25   Thr (T) 5.61
                    Asn (N) 4.12   Gly (G) 7.11   Met (M) 2.48   Trp (W) 1.31
                    Asp (D) 5.30   His (H) 2.20   Phe (F) 4.02   Tyr (Y) 3.04
                    Cys (C) 1.27   Ile (I) 5.99   Pro (P) 4.74   Val (V) 6.75
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.04
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 544616
                    Total number of entries encoded on a Plasmid: 216558
                    Total number of entries encoded on a Plastid: 13188
                    Total number of entries encoded on a Plastid; Apicoplast: 367
                    Total number of entries encoded on a Plastid; Chloroplast: 142296
                    Total number of entries encoded on a Plastid; Cyanelle: 8
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 448