Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2010_07 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2010_07 of 15-Jun-2010 of UniProtKB/TrEMBL contains 11109684 sequence entries,
                    comprising 3575035695 amino acids .
                    
                    247333 sequences have been added since release 2010_06, the sequence data of
                    1025 existing entries has been updated and the annotations of
                    2821578 entries have been revised. This represents an increase of 2%.
                    
                    Number of fragments: 1863974
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           30745     0.28%
                    2: Evidence at transcript level       468101     4.21%
                    3: Inferred from homology            2112855    19.02%
                    4: Predicted                         8497983    76.49%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 243295
                    
                    The first twenty species represent 1161729 sequences:  10.5 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:10395
                    2x:43582
                    3x:23757
                    4x:14293
                    5x: 8972
                    6x: 6567
                    7x: 4520
                    8x: 3662
                    9x: 2960
                    10x: 5003
                    11- 20x:14778
                    21- 50x: 5315
                    51-100x: 1987
                    >100x: 3942
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     335139  Human immunodeficiency virus 1
                    2      95623  Oryza sativa subsp. japonica (Rice)
                    3      71708  Homo sapiens (Human)
                    4      57098  Hepatitis C virus
                    5      50404  Trichomonas vaginalis
                    6      47622  Mus musculus (Mouse)
                    7      44516  uncultured bacterium
                    8      44040  Populus trichocarpa (Western balsam poplar) 
                    9      42098  Arabidopsis thaliana (Mouse-ear cress)
                    10      41892  Zea mays (Maize)
                    11      39843  Paramecium tetraurelia
                    12      39309  Oryza sativa subsp. indica (Rice)
                    13      38475  Hepatitis B virus (HBV)
                    14      34760  Physcomitrella patens subsp. patens
                    15      33629  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    16      31215  Ricinus communis (Castor bean)
                    17      30436  Drosophila melanogaster (Fruit fly)
                    18      29073  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    19      28088  Tetraodon nigroviridis (Green puffer)
                    20      26761  Danio rerio (Zebrafish) (Brachydanio rerio)
                    21      25088  Vitis vinifera (Grape)
                    22      24830  Nematostella vectensis (Starlet sea anemone)
                    23      23511  Rattus norvegicus (Rat)
                    24      23115  Perkinsus marinus ATCC 50983
                    25      21134  Caenorhabditis elegans
                    26      21081  Ixodes scapularis (Black-legged tick) (Deer tick)
                    27      20674  Trypanosoma cruzi
                    28      18874  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    29      18111  Caenorhabditis briggsae
                    30      17861  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    31      17785  Ailuropoda melanoleuca (Giant panda)
                    32      17610  Phytophthora infestans T30-4
                    33      17446  Drosophila simulans (Fruit fly)
                    34      17438  Escherichia coli
                    35      16902  Drosophila yakuba (Fruit fly)
                    36      16753  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    37      16717  Drosophila persimilis (Fruit fly)
                    38      16715  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    39      16256  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    40      16195  Drosophila sechellia (Fruit fly)
                    41      15957  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    42      15874  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    43      15715  Naegleria gruberi (Amoeba)
                    44      15674  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    45      15428  Drosophila willistoni (Fruit fly)
                    46      15251  Tetrahymena thermophila SB210
                    47      15150  Drosophila ananassae (Fruit fly)
                    48      14937  Drosophila erecta (Fruit fly)
                    49      14814  Chlamydomonas reinhardtii
                    50      14784  Drosophila mojavensis (Fruit fly)
                    51      14770  Anopheles gambiae (African malaria mosquito)
                    52      14701  Drosophila virilis (Fruit fly)
                    53      14673  Plasmodium chabaudi
                    54      14662  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    55      14274  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    56      13819  Candida albicans (Yeast)
                    57      13642  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    58      13472  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    59      13438  Schistosoma mansoni (Blood fluke)
                    60      13390  Aspergillus flavus 
                    61      12979  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    62      12732  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    63      12713  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
                    64      12519  Xenopus laevis (African clawed frog)
                    65      12448  Glycine max (Soybean) (Glycine hispida)
                    66      12340  Polysphondylium pallidum PN500
                    67      12033  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    68      11875  Aspergillus oryzae
                    69      11801  Plasmodium berghei
                    70      11605  Hepatitis C virus subtype 1b
                    71      11571  Trichoplax adhaerens
                    72      11500  Brugia malayi (Filarial nematode worm)
                    73      10939  Sordaria macrospora
                    74      10900  Schistosoma japonicum (Blood fluke)
                    75      10868  Chaetomium globosum (Soil fungus)
                    76      10852  Plasmodium falciparum
                    77      10725  Podospora anserina
                    78      10662  Ralstonia solanacearum (Pseudomonas solanacearum)
                    79      10441  Picea sitchensis (Sitka spruce)
                    80      10427  Aspergillus nidulans FGSC A4
                    81      10422  Neurospora crassa
                    82      10403  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    83      10335  Phaeodactylum tricornutum CCAP 1055/1
                    84      10279  Micromonas pusilla CCMP1545
                    85      10232  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    86      10141  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    87      10130  Helicobacter pylori (Campylobacter pylori)
                    88      10128  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    89      10115  Micromonas sp. RCC299
                    90      10093  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    91       9860  Bos taurus (Bovine)
                    92       9771  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    93       9680  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    94       9666  Trypanosoma brucei gambiense DAL972
                    95       9636  Cryptococcus neoformans (Filobasidiella neoformans)
                    96       9582  Aspergillus fumigatus (Sartorya fumigata)
                    97       9574  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    98       9540  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    99       9526  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    100       9469  Trypanosoma brucei
                    101       9357  Salmo salar (Atlantic salmon)
                    102       9243  Monosiga brevicollis (Choanoflagellate)
                    103       9215  Emericella nidulans (Aspergillus nidulans)
                    104       9195  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    105       9173  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    106       9122  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    107       9096  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    108       9033  Plasmodium vivax
                    109       9022  Dictyostelium discoideum (Slime mold)
                    110       8978  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    111       8964  Thalassiosira pseudonana (Marine diatom)
                    112       8955  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    113       8912  Catenulispora acidiphila 
                    114       8885  Aspergillus clavatus
                    115       8774  Rhodococcus sp. (strain RHA1)
                    116       8720  Paracoccidioides brasiliensis (strain Pb18)
                    117       8708  Nannizzia otae (strain CBS 113480) (Microsporum canis) (Arthroderma otae)
                    118       8700  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    119       8667  Rabies virus
                    120       8603  Entamoeba dispar SAW760
                    121       8523  Stigmatella aurantiaca DW4/3-1
                    122       8437  Plesiocystis pacifica SIR-1
                    123       8299  Entamoeba histolytica
                    124       8253  Streptomyces sviceus ATCC 29083
                    125       8249  Microscilla marina ATCC 23134
                    126       8201  Microcoleus chthonoplastes PCC 7420
                    127       8196  Bradyrhizobium japonicum
                    128       8163  Frankia sp. EUN1f
                    129       8154  Burkholderia xenovorans (strain LB400)
                    130       8098  Toxoplasma gondii GT1
                    131       8028  Pseudomonas aeruginosa
                    132       8028  Trichophyton verrucosum HKI 0517
                    133       8025  Leishmania infantum
                    134       7980  Arthroderma benhamiae CBS 112371
                    135       7980  Toxoplasma gondii ME49
                    136       7958  Ostreococcus tauri
                    137       7952  Rhodococcus opacus (strain B4)
                    138       7916  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    139       7891  Leishmania braziliensis
                    140       7860  Paracoccidioides brasiliensis (strain Pb03)
                    141       7857  Acaryochloris marina (strain MBIC 11017)
                    142       7838  Toxoplasma gondii VEG
                    143       7813  Plasmodium yoelii yoelii
                    144       7747  Uncinocarpus reesii (strain UAMH 1704)
                    145       7571  Clostridium hathewayi DSM 13479
                    146       7563  Burkholderia pseudomallei MSHR346
                    147       7520  Solibacter usitatus (strain Ellin6076)
                    148       7501  Tuber melanosporum (Perigord truffle)
                    149       7489  Streptomyces coelicolor
                    150       7475  Burkholderia pseudomallei 1710a
                    151       7465  Burkholderia pseudomallei Pakistan 9
                    152       7459  Burkholderia sp. H160
                    153       7397  Ostreococcus lucimarinus (strain CCE9901)
                    154       7379  Streptomyces sp. ACT-1
                    155       7367  Burkholderia pseudomallei 576
                    156       7349  Burkholderia pseudomallei 305
                    157       7310  Frankia sp. EuI1c
                    158       7274  Clostridium bolteae ATCC BAA-613
                    159       7243  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    160       7237  Streptomyces avermitilis
                    161       7232  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    162       7225  Burkholderia sp. CCGE1002
                    163       7211  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    164       7193  Medicago truncatula (Barrel medic)
                    165       7179  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    166       7149  Giardia lamblia ATCC 50803
                    167       7140  Burkholderia pseudomallei 1106b
                    168       7132  Burkholderia phymatum (strain DSM 17167 / STM815)
                    169       7124  Rhizobium loti (Mesorhizobium loti)
                    170       7124  Burkholderia ambifaria MEX-5
                    171       7119  Leishmania major
                    172       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    173       7017  Myxococcus xanthus (strain DK 1622)
                    174       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    175       6979  Rhodopirellula baltica
                    176       6967  Frankia sp. (strain EAN1pec)
                    177       6943  Streptomyces sp. Mg1
                    178       6940  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    179       6923  Burkholderia ambifaria IOP40-10
                    180       6913  Saccharopolyspora erythraea (strain NRRL 23338)
                    181       6911  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    182       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    183       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    184       6862  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    185       6817  Clostridium asparagiforme DSM 15981
                    186       6772  Burkholderia pseudomallei (strain 1106a)
                    187       6744  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    188       6740  Burkholderia pseudomallei (strain 668)
                    189       6725  Burkholderia graminis C4D1M
                    190       6714  Rhizobium leguminosarum bv. viciae (strain 3841)
                    191       6712  Rhodococcus erythropolis SK121
                    192       6705  Chthoniobacter flavus Ellin428
                    193       6702  Streptomyces flavogriseus ATCC 33331
                    194       6692  Bacillus thuringiensis IBL 200
                    195       6684  Haliangium ochraceum (strain DSM 14365 / JCM 11303 / SMP-2)
                    196       6679  Mesorhizobium opportunistum WSM2075
                    197       6662  Burkholderia pseudomallei S13
                    198       6657  Burkholderia cepacia (strain J2315 / LMG 16656) (Burkholderia cenocepacia 
                    199       6655  Bacillus thuringiensis IBL 4222
                    200       6644  Beggiatoa sp. PS
                    201       6627  Burkholderia cenocepacia (strain MC0-3)
                    202       6614  Burkholderia multivorans CGD2
                    203       6614  Sus scrofa (Pig)
                    204       6613  Burkholderia pseudomallei Pasteur 52237
                    205       6606  Burkholderia multivorans CGD2M
                    206       6605  Hepatitis C virus subtype 1a
                    207       6583  Bacillus thuringiensis serovar sotto str. T04001
                    208       6581  Streptococcus pneumoniae
                    209       6527  Burkholderia multivorans CGD1
                    210       6521  Streptomyces sp. ACTE
                    211       6514  Frankia alni (strain ACN14a)
                    212       6498  bacterium Ellin514
                    213       6497  Burkholderia cenocepacia (strain HI2424)
                    214       6488  Bacillus thuringiensis serovar monterrey BGSC 4AJ1
                    215       6463  Planctomyces maris DSM 8797
                    216       6462  Streptomyces clavuligerus ATCC 27064
                    217       6427  Agrobacterium radiobacter (strain K84 / ATCC BAA-868)
                    218       6417  Methylobacterium sp. (strain 4-46)
                    219       6413  Cyanothece sp. CCY0110
                    220       6390  Ustilago maydis (Smut fungus)
                    221       6388  Bradyrhizobium sp. (strain ORS278)
                    222       6379  Stackebrandtia nassauensis 
                    223       6372  Micromonospora aurantiaca ATCC 27029
                    224       6360  Rhizobium meliloti (Sinorhizobium meliloti)
                    225       6356  'Nostoc azollae' 0708
                    226       6347  Micromonospora sp. L5
                    227       6336  Burkholderia ambifaria (strain MC40-6)
                    228       6322  Bacillus thuringiensis serovar thuringiensis str. T01001
                    229       6310  Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
                    230       6309  Hahella chejuensis (strain KCTC 2396)
                    231       6298  Bacillus thuringiensis Bt407
                    232       6294  Burkholderia pseudomallei 406e
                    233       6290  Nostoc punctiforme (strain ATCC 29133 / PCC 73102)
                    234       6288  Burkholderia pseudomallei 1655
                    235       6273  uncultured archaeon
                    236       6272  Labrenzia aggregata IAM 12614
                    237       6252  Clostridiales bacterium 1_7_47FAA
                    238       6242  Bacillus thuringiensis serovar berliner ATCC 10792
                    239       6237  Geobacillus sp. (strain Y412MC10)
                    240       6233  Rhodococcus erythropolis (strain PR4 / NBRC 100887)
                    241       6212  Candida tropicalis (strain ATCC MYA-3404 / T1) (Yeast)
                    242       6210  Burkholderia ambifaria (strain ATCC BAA-244 / AMMD) (Burkholderia cepacia 
                    243       6206  Paenibacillus sp. (strain JDR-2)
                    244       6184  Methylobacterium extorquens (strain ATCC 14718 / DSM 1338 / AM1)
                    245       6182  Oryza sativa (Rice)
                    246       6172  Methylobacterium radiotolerans (strain ATCC 27329 / DSM 1819 / JCM 2831)
                    247       6154  Ralstonia eutropha  (Cupriavidus necator 
                    248       6148  Gallus gallus (Chicken)
                    249       6143  Burkholderia sp. CCGE1003
                    250       6129  Bacillus thuringiensis serovar israelensis ATCC 35646
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          215948 (  2%)
                    Bacteria        6862341 ( 62%)
                    Eukaryota       3069757 ( 28%)
                    Viruses          950301 (  9%)
                    Other             11336 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  71743 (  2%)           (  1%)
                    Other Mammalia        193457 (  6%)           (  2%)
                    Other Vertebrata      293908 ( 10%)           (  3%)
                    Viridiplantae         710181 ( 23%)           (  6%)
                    Fungi                 647912 ( 21%)           (  6%)
                    Insecta               422276 ( 14%)           (  4%)
                    Nematoda               61713 (  2%)           (  1%)
                    Other                 668567 ( 22%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  243201             1001-1100    66090
                    51- 100  881197             1101-1200    46653
                    101- 150 1014080             1201-1300    31876
                    151- 200  980207             1301-1400    21128
                    201- 250  981677             1401-1500    17007
                    251- 300  949921             1501-1600    12253
                    301- 350  863743             1601-1700     9017
                    351- 400  671739             1701-1800     7156
                    401- 450  564133             1801-1900     5767
                    451- 500  472262             1901-2000     4845
                    501- 550  323478             2001-2100     3950
                    551- 600  248893             2101-2200     4056
                    601- 650  181253             2201-2300     3209
                    651- 700  141174             2301-2400     2523
                    701- 750  121388             2401-2500     2180
                    751- 800  108513             >2500        19115
                    801- 850   80629
                    851- 900   72944
                    901- 950   50096
                    951-1000   38357
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   321 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    13474133                1.21                                                    
                    Submitted to EMBL/GenBank/DDBJ   7915773   7005057      0.71                                                    
                    Journal                          5421677   4912217      0.49                                                    
                    Submitted to other databases       29229     29213     <0.01                                                    
                    Thesis                              7396      7339     <0.01                                                    
                    Book citation                       5159      5108     <0.01                                                    
                    Other                              94899     94576      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 294050
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                       8636729                0.78                                                    
                    CATALYTIC ACTIVITY                796771    728720      0.07     4                                              
                    CAUTION                          2792651   2792651      0.25     1                                              
                    COFACTOR                          242643    235814      0.02     8                                              
                    DOMAIN                              6033      6033     <0.01    10                                              
                    FUNCTION                          909887    840432      0.08     3                                              
                    INTERACTION                         2473      2473     <0.01    11                                              
                    MISCELLANEOUS                      23128     23126     <0.01     9                                              
                    PATHWAY                           321697    294777      0.03     7                                              
                    SIMILARITY                       2598971   2231886      0.23     2                                              
                    SUBCELLULAR LOCATION              602103    602055      0.05     5                                              
                    SUBUNIT                           340372    339882      0.03     6                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       3831536                0.34                                                    
                    CHAIN                             413579    325888      0.04     2                                              
                    NON_TER                          3147645   1862453      0.28     1                                              
                    SIGNAL                            269724    269483      0.02     3                                              
                    TRANSIT                              588       588     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             124719497               11.23                                                    
                    AGD                                 3869      3869     <0.01    74   Organism-specific databases                
                    ANU-2DPAGE                            58        58     <0.01    92   2D gel databases                           
                    ArachnoServer                        368       368     <0.01    85   Organism-specific databases                
                    ArrayExpress                       94681     94668      0.01    45   Gene expression databases                  
                    BRENDA                              2911      2842     <0.01    75   Enzyme and pathway databases               
                    Bgee                              130459    130360      0.01    42   Gene expression databases                  
                    BioCyc                            795015    769808      0.07    23   Enzyme and pathway databases               
                    CAZy                               36159     33811     <0.01    56   Protein family/group databases             
                    CGD                                 6802      6802     <0.01    69   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    96   2D gel databases                           
                    CTD                               152120    151419      0.01    40   Organism-specific databases                
                    CYGD                                   5         5     <0.01    97   Organism-specific databases                
                    DIP                                 2583      2578     <0.01    76   Protein-protein interaction databases      
                    EMBL                            12361532  11093447      1.11     3   Sequence databases                         
                    Ensembl                           308580    185263      0.03    31   Genome annotation databases                
                    EnsemblBacteria                   478047    448578      0.04    25   Genome annotation databases                
                    EnsemblFungi                       78081     78002      0.01    46   Genome annotation databases                
                    EnsemblMetazoa                    291110    251394      0.03    32   Genome annotation databases                
                    EnsemblPlants                      55107     48781     <0.01    50   Genome annotation databases                
                    EnsemblProtists                    15268     15158     <0.01    62   Genome annotation databases                
                    EuPathDB                          151376    151376      0.01    41   Organism-specific databases                
                    FlyBase                           195465    193934      0.02    36   Organism-specific databases                
                    GO                              22492618   6865647      2.02     1   Ontologies                                 
                    Gene3D                           3287573   2779687      0.30    10   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   100   Organism-specific databases                
                    GeneID                           4690497   4517112      0.42     7   Genome annotation databases                
                    Genevestigator                    102757    102746      0.01    44   Gene expression databases                  
                    GenoList                           14765     14492     <0.01    63   Organism-specific databases                
                    GenomeReviews                    3221557   3138390      0.29    11   Genome annotation databases                
                    Gramene                            69043     69043      0.01    48   Organism-specific databases                
                    H-InvDB                              540       441     <0.01    83   Organism-specific databases                
                    HAMAP                             420415    418571      0.04    27   Family and domain databases                
                    HGNC                               51482     49623     <0.01    51   Organism-specific databases                
                    HOGENOM                          2204370   2204297      0.20    15   Phylogenomic databases                     
                    HOVERGEN                          320118    319443      0.03    30   Phylogenomic databases                     
                    HSSP                              255091    254814      0.02    33   3D structure databases                     
                    IPI                               202436    202436      0.02    34   Sequence databases                         
                    InParanoid                        197379    197284      0.02    35   Phylogenomic databases                     
                    IntAct                             13480     13480     <0.01    64   Protein-protein interaction databases      
                    InterPro                        21391014   8103081      1.93     2   Family and domain databases                
                    KEGG                             4116222   4021536      0.37     9   Genome annotation databases                
                    LegioList                           5143      5115     <0.01    71   Organism-specific databases                
                    Leproma                              942       941     <0.01    82   Organism-specific databases                
                    MEROPS                             67331     66047      0.01    49   Protein family/group databases             
                    MGI                                37768     37517     <0.01    55   Organism-specific databases                
                    MINT                                4462      4462     <0.01    72   Protein-protein interaction databases      
                    NMPDR                             926522    926511      0.08    22   Genome annotation databases                
                    NextBio                            48259     48256     <0.01    53   Other                                      
                    OMA                              2437800   2437798      0.22    14   Phylogenomic databases                     
                    OrthoDB                           430604    430603      0.04    26   Phylogenomic databases                     
                    PANTHER                          1687712   1590421      0.15    19   Family and domain databases                
                    PDB                                12132      7273     <0.01    66   3D structure databases                     
                    PDBsum                              5296      3104     <0.01    70   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    89   2D gel databases                           
                    PIR                               176353    143491      0.02    39   Sequence databases                         
                    PIRSF                             561068    561068      0.05    24   Family and domain databases                
                    PMAP-CutDB                           262       262     <0.01    86   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    98   2D gel databases                           
                    PRIDE                             104833    104832      0.01    43   Proteomic databases                        
                    PRINTS                           1716526   1506530      0.15    18   Family and domain databases                
                    PROSITE                          5168257   3416392      0.47     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    95   Enzyme and pathway databases               
                    PeptideAtlas                         148       148     <0.01    88   Proteomic databases                        
                    PeroxiBase                          2283      2277     <0.01    79   Protein family/group databases             
                    Pfam                            10243032   7640476      0.92     4   Family and domain databases                
                    PharmGKB                              86        86     <0.01    91   Organism-specific databases                
                    PhosphoSite                         1804      1804     <0.01    80   PTM databases                              
                    PhylomeDB                         373652    373620      0.03    29   Phylogenomic databases                     
                    ProDom                            195251    184366      0.02    38   Family and domain databases                
                    ProMEX                               451       451     <0.01    84   Proteomic databases                        
                    ProtClustDB                      2624972   2624956      0.24    13   Phylogenomic databases                     
                    PseudoCAP                           4349      4346     <0.01    73   Organism-specific databases                
                    REBASE                              7667      7399     <0.01    68   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   96        95     <0.01    90   2D gel databases                           
                    RGD                                18826     18736     <0.01    61   Organism-specific databases                
                    Reactome                              55        53     <0.01    93   Enzyme and pathway databases               
                    RefSeq                           4832969   4646639      0.44     6   Sequence databases                         
                    SGD                                  249       249     <0.01    87   Organism-specific databases                
                    SMART                            2110013   1645149      0.19    16   Family and domain databases                
                    SMR                              3054796   3054787      0.27    12   3D structure databases                     
                    STRING                           1204211   1204067      0.11    20   Protein-protein interaction databases      
                    SUPFAM                           4398366   3612749      0.40     8   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    94   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01    99   2D gel databases                           
                    TAIR                               19284     19201     <0.01    60   Organism-specific databases                
                    TCDB                                2284      2265     <0.01    78   Protein family/group databases             
                    TIGR                              195268    188215      0.02    37   Genome annotation databases                
                    TIGRFAMs                         2026815   1854332      0.18    17   Family and domain databases                
                    TubercuList                         2307      2301     <0.01    77   Organism-specific databases                
                    UCSC                               50973     50973     <0.01    52   Genome annotation databases                
                    UniGene                           396689    363699      0.04    28   Sequence databases                         
                    VectorBase                         47589     47121     <0.01    54   Genome annotation databases                
                    World-2DPAGE                         947       942     <0.01    81   2D gel databases                           
                    WormBase                           19514     19418     <0.01    59   Organism-specific databases                
                    WormPep                            19522     19418     <0.01    58   Organism-specific databases                
                    Xenbase                            12889     12520     <0.01    65   Organism-specific databases                
                    ZFIN                               20279     20274     <0.01    57   Organism-specific databases                
                    dictyBase                           8170      8169     <0.01    67                                              
                    eggNOG                           1150045   1150045      0.10    21                                              
                    euHCVdb                            71270     71267      0.01    47                                              
                    
                    Number of explicitly cross-referenced databases: 126
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.56   Gln (Q) 3.88   Leu (L) 9.81   Ser (S) 6.71
                    Arg (R) 5.45   Glu (E) 6.15   Lys (K) 5.32   Thr (T) 5.61
                    Asn (N) 4.18   Gly (G) 7.08   Met (M) 2.45   Trp (W) 1.31
                    Asp (D) 5.29   His (H) 2.19   Phe (F) 4.04   Tyr (Y) 3.07
                    Cys (C) 1.28   Ile (I) 6.03   Pro (P) 4.72   Val (V) 6.72
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.04
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Lys, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 323631
                    Total number of entries encoded on a Plasmid: 165939
                    Total number of entries encoded on a Plastid: 9837
                    Total number of entries encoded on a Plastid; Apicoplast: 334
                    Total number of entries encoded on a Plastid; Chloroplast: 115631
                    Total number of entries encoded on a Plastid; Cyanelle: 7
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 441