Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2010_10 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2010_10 of 05-Oct-2010 of UniProtKB/TrEMBL contains 12098541 sequence entries,
                    comprising 3892640047 amino acids .
                    
                    524827 sequences have been added since release 2010_09, the sequence data of
                    1496 existing entries has been updated and the annotations of
                    4079550 entries have been revised. This represents an increase of 4%.
                    
                    Number of fragments: 2048964
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           41366     0.34%
                    2: Evidence at transcript level       471823     3.90%
                    3: Inferred from homology            2632443    21.76%
                    4: Predicted                         8952909    74.00%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 284773
                    
                    The first twenty species represent 1219160 sequences:  10.1 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:12715
                    2x:51250
                    3x:27274
                    4x:16131
                    5x: 9962
                    6x: 7197
                    7x: 5023
                    8x: 3946
                    9x: 3147
                    10x: 5844
                    11- 20x:15988
                    21- 50x: 5592
                    51-100x: 2063
                    >100x: 4202
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     347203  Human immunodeficiency virus 1
                    2      95434  Oryza sativa subsp. japonica (Rice)
                    3      75476  Homo sapiens (Human)
                    4      57976  Hepatitis C virus
                    5      50816  Vitis vinifera (Grape)
                    6      50404  Trichomonas vaginalis
                    7      49355  Mus musculus (Mouse)
                    8      47727  uncultured bacterium
                    9      44035  Populus trichocarpa (Western balsam poplar) 
                    10      41939  Zea mays (Maize)
                    11      41551  Arabidopsis thaliana (Mouse-ear cress)
                    12      41188  Hepatitis B virus (HBV)
                    13      39843  Paramecium tetraurelia
                    14      39316  Oryza sativa subsp. indica (Rice)
                    15      35129  Physcomitrella patens (Moss)
                    16      33638  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    17      33195  Selaginella moellendorffii (Spikemoss)
                    18      32624  Arabidopsis lyrata subsp. lyrata
                    19      31268  Ricinus communis (Castor bean)
                    20      31043  Drosophila melanogaster (Fruit fly)
                    21      29117  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    22      28089  Tetraodon nigroviridis (Green puffer)
                    23      26772  Danio rerio (Zebrafish) (Brachydanio rerio)
                    24      25201  Ralstonia solanacearum (Pseudomonas solanacearum)
                    25      24812  Nematostella vectensis (Starlet sea anemone)
                    26      23490  Rattus norvegicus (Rat)
                    27      23115  Perkinsus marinus ATCC 50983
                    28      21239  Caenorhabditis elegans
                    29      21083  Ixodes scapularis (Black-legged tick) (Deer tick)
                    30      20673  Trypanosoma cruzi
                    31      18871  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    32      18079  Caenorhabditis briggsae
                    33      17964  Escherichia coli
                    34      17927  Drosophila simulans (Fruit fly)
                    35      17855  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    36      17797  Ailuropoda melanoleuca (Giant panda)
                    37      17607  Phytophthora infestans T30-4
                    38      17163  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    39      16968  Tribolium castaneum (Red flour beetle)
                    40      16890  Drosophila yakuba (Fruit fly)
                    41      16745  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    42      16710  Drosophila persimilis (Fruit fly)
                    43      16366  Ectocarpus siliculosus (Brown alga)
                    44      16251  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    45      16183  Drosophila sechellia (Fruit fly)
                    46      15946  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    47      15871  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    48      15718  Naegleria gruberi (Amoeba)
                    49      15669  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    50      15422  Drosophila willistoni (Fruit fly)
                    51      15250  Tetrahymena thermophila SB210
                    52      15143  Drosophila ananassae (Fruit fly)
                    53      14926  Drosophila erecta (Fruit fly)
                    54      14813  Chlamydomonas reinhardtii
                    55      14779  Drosophila mojavensis (Fruit fly)
                    56      14764  Anopheles gambiae (African malaria mosquito)
                    57      14697  Drosophila virilis (Fruit fly)
                    58      14673  Plasmodium chabaudi
                    59      14656  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    60      14634  Volvox carteri f. nagariensis
                    61      14267  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    62      13621  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    63      13496  Schistosoma mansoni (Blood fluke)
                    64      13368  Aspergillus flavus 
                    65      13293  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    66      13128  Schizophyllum commune H4-8
                    67      12971  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    68      12802  Giardia lamblia (Giardia intestinalis)
                    69      12721  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    70      12710  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
                    71      12496  Glycine max (Soybean) (Glycine hispida)
                    72      12490  Xenopus laevis (African clawed frog)
                    73      12447  Polysphondylium pallidum (Cellular slime mold)
                    74      12027  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    75      12008  Hepatitis C virus subtype 1b
                    76      11855  Aspergillus oryzae
                    77      11800  Plasmodium berghei
                    78      11664  Plasmodium falciparum
                    79      11569  Trichoplax adhaerens
                    80      11498  Brugia malayi (Filarial nematode worm)
                    81      11211  Ktedonobacter racemifer DSM 44963
                    82      10936  Sordaria macrospora
                    83      10916  Schistosoma japonicum (Blood fluke)
                    84      10864  Chaetomium globosum (Soil fungus)
                    85      10674  Podospora anserina
                    86      10440  Picea sitchensis (Sitka spruce)
                    87      10414  Neurospora crassa
                    88      10401  Aspergillus nidulans FGSC A4
                    89      10392  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    90      10331  Phaeodactylum tricornutum CCAP 1055/1
                    91      10279  Micromonas pusilla CCMP1545
                    92      10228  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    93      10188  Helicobacter pylori (Campylobacter pylori)
                    94      10120  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    95      10115  Micromonas sp. RCC299
                    96      10106  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    97      10061  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    98      10019  Streptomyces bingchenggensis (strain BCW-1)
                    99       9887  Bos taurus (Bovine)
                    100       9860  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    101       9749  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    102       9664  Trypanosoma brucei gambiense DAL972
                    103       9627  Cryptococcus neoformans (Filobasidiella neoformans)
                    104       9568  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    105       9557  Aspergillus fumigatus (Sartorya fumigata)
                    106       9535  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    107       9520  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    108       9484  Trypanosoma brucei
                    109       9359  Salmo salar (Atlantic salmon)
                    110       9243  Monosiga brevicollis (Choanoflagellate)
                    111       9243  Plasmodium vivax
                    112       9241  Candida albicans (Yeast)
                    113       9204  Amycolatopsis mediterranei (strain U-32)
                    114       9190  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    115       9189  Emericella nidulans (Aspergillus nidulans)
                    116       9177  Streptomyces hygroscopicus ATCC 53653
                    117       9169  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    118       9156  Rabies virus
                    119       9118  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    120       9092  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    121       9001  Dictyostelium discoideum (Slime mold)
                    122       8977  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    123       8964  Thalassiosira pseudonana (Marine diatom)
                    124       8949  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    125       8908  Catenulispora acidiphila 
                    126       8866  Aspergillus clavatus
                    127       8768  Rhodococcus sp. (strain RHA1)
                    128       8743  Toxoplasma gondii
                    129       8715  Paracoccidioides brasiliensis (strain Pb18)
                    130       8699  Nannizzia otae (strain CBS 113480) (Microsporum canis) (Arthroderma otae)
                    131       8695  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    132       8603  Entamoeba dispar SAW760
                    133       8523  Stigmatella aurantiaca DW4/3-1
                    134       8437  Plesiocystis pacifica SIR-1
                    135       8394  Streptomyces sp. AA4
                    136       8299  Entamoeba histolytica
                    137       8249  Microscilla marina ATCC 23134
                    138       8237  Leishmania major
                    139       8220  Bradyrhizobium japonicum
                    140       8202  Streptomyces sviceus ATCC 29083
                    141       8201  Microcoleus chthonoplastes PCC 7420
                    142       8164  Frankia sp. EUN1f
                    143       8154  Burkholderia xenovorans (strain LB400)
                    144       8053  Pseudomonas aeruginosa
                    145       8026  Leishmania infantum
                    146       7989  Trichophyton verrucosum (strain HKI 0517)
                    147       7978  Toxoplasma gondii ME49
                    148       7956  Ostreococcus tauri
                    149       7947  Rhodococcus opacus (strain B4)
                    150       7940  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    151       7916  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    152       7888  Leishmania braziliensis
                    153       7867  Streptomyces ghanaensis ATCC 14672
                    154       7857  Acaryochloris marina (strain MBIC 11017)
                    155       7855  Paracoccidioides brasiliensis (strain Pb03)
                    156       7836  Toxoplasma gondii VEG
                    157       7823  Burkholderia sp. Ch1-1
                    158       7811  Plasmodium yoelii yoelii
                    159       7743  Uncinocarpus reesii (strain UAMH 1704)
                    160       7708  Streptomyces viridochromogenes DSM 40736
                    161       7571  Clostridium hathewayi DSM 13479
                    162       7563  Burkholderia pseudomallei MSHR346
                    163       7528  Streptomyces sp. C
                    164       7523  Streptomyces lividans TK24
                    165       7519  Solibacter usitatus (strain Ellin6076)
                    166       7500  Tuber melanosporum (Perigord truffle)
                    167       7485  Streptomyces coelicolor
                    168       7475  Burkholderia pseudomallei 1710a
                    169       7465  Burkholderia pseudomallei Pakistan 9
                    170       7459  Burkholderia sp. H160
                    171       7392  Ostreococcus lucimarinus (strain CCE9901)
                    172       7379  Streptomyces sp. ACT-1
                    173       7367  Burkholderia pseudomallei 576
                    174       7349  Burkholderia pseudomallei 305
                    175       7337  Streptomyces clavuligerus ATCC 27064
                    176       7310  Frankia sp. EuI1c
                    177       7274  Clostridium bolteae ATCC BAA-613
                    178       7243  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    179       7232  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    180       7230  Streptomyces avermitilis
                    181       7205  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    182       7196  Medicago truncatula (Barrel medic)
                    183       7179  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    184       7140  Burkholderia pseudomallei 1106b
                    185       7132  Burkholderia phymatum (strain DSM 17167 / STM815)
                    186       7124  Burkholderia ambifaria MEX-5
                    187       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    188       7017  Myxococcus xanthus (strain DK 1622)
                    189       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    190       6978  Rhodopirellula baltica
                    191       6963  Frankia sp. (strain EAN1pec)
                    192       6943  Streptomyces sp. Mg1
                    193       6936  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    194       6923  Burkholderia ambifaria IOP40-10
                    195       6909  Saccharopolyspora erythraea (strain NRRL 23338)
                    196       6907  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    197       6892  Streptomyces roseosporus NRRL 15998
                    198       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    199       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    200       6866  Burkholderia sp. (strain CCGE1002)
                    201       6866  Streptomyces pristinaespiralis ATCC 25486
                    202       6859  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    203       6818  Rhizobium loti (Mesorhizobium loti)
                    204       6817  Clostridium asparagiforme DSM 15981
                    205       6772  Burkholderia pseudomallei (strain 1106a)
                    206       6740  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    207       6740  Burkholderia pseudomallei (strain 668)
                    208       6725  Burkholderia graminis C4D1M
                    209       6716  Sus scrofa (Pig)
                    210       6714  Rhizobium leguminosarum bv. viciae (strain 3841)
                    211       6712  Rhodococcus erythropolis SK121
                    212       6712  Hepatitis C virus subtype 1a
                    213       6705  Chthoniobacter flavus Ellin428
                    214       6702  Streptomyces flavogriseus ATCC 33331
                    215       6692  Bacillus thuringiensis IBL 200
                    216       6690  delta proteobacterium NaphS2
                    217       6684  Haliangium ochraceum (strain DSM 14365 / JCM 11303 / SMP-2)
                    218       6679  Mesorhizobium opportunistum WSM2075
                    219       6662  Burkholderia pseudomallei S13
                    220       6657  Burkholderia cepacia (strain J2315 / LMG 16656) (Burkholderia cenocepacia 
                    221       6655  Bacillus thuringiensis IBL 4222
                    222       6647  Streptococcus pneumoniae
                    223       6644  Beggiatoa sp. PS
                    224       6627  Burkholderia cenocepacia (strain MC0-3)
                    225       6614  Burkholderia multivorans CGD2
                    226       6613  Burkholderia pseudomallei Pasteur 52237
                    227       6606  Burkholderia multivorans CGD2M
                    228       6583  Bacillus thuringiensis serovar sotto str. T04001
                    229       6532  uncultured archaeon
                    230       6527  Burkholderia multivorans CGD1
                    231       6521  Streptomyces sp. ACTE
                    232       6509  Frankia alni (strain ACN14a)
                    233       6498  bacterium Ellin514
                    234       6497  Burkholderia cenocepacia (strain HI2424)
                    235       6488  Bacillus thuringiensis serovar monterrey BGSC 4AJ1
                    236       6463  Planctomyces maris DSM 8797
                    237       6453  Mycobacterium parascrofulaceum ATCC BAA-614
                    238       6427  Agrobacterium radiobacter (strain K84 / ATCC BAA-868)
                    239       6417  Methylobacterium sp. (strain 4-46)
                    240       6413  Cyanothece sp. CCY0110
                    241       6388  Ustilago maydis (Smut fungus)
                    242       6388  Bradyrhizobium sp. (strain ORS278)
                    243       6377  Clostridium carboxidivorans P7
                    244       6376  Stackebrandtia nassauensis 
                    245       6360  Rhizobium meliloti (Sinorhizobium meliloti)
                    246       6347  Micromonospora sp. L5
                    247       6337  Streptomyces griseoflavus Tu4000
                    248       6336  Burkholderia ambifaria (strain MC40-6)
                    249       6335  Streptomyces sp. SPB78
                    250       6322  Bacillus thuringiensis serovar thuringiensis str. T01001
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          228669 (  2%)
                    Bacteria        7517963 ( 62%)
                    Eukaryota       3337363 ( 28%)
                    Viruses          998108 (  8%)
                    Other             16437 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  75511 (  2%)           (  1%)
                    Other Mammalia        200355 (  6%)           (  2%)
                    Other Vertebrata      308697 (  9%)           (  3%)
                    Viridiplantae         826934 ( 25%)           (  7%)
                    Fungi                 669234 ( 20%)           (  6%)
                    Insecta               491090 ( 15%)           (  4%)
                    Nematoda               61963 (  2%)           (  1%)
                    Other                 703579 ( 21%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  259709             1001-1100    71194
                    51- 100  957161             1101-1200    50240
                    101- 150 1100782             1201-1300    34372
                    151- 200 1065407             1301-1400    22698
                    201- 250 1069112             1401-1500    18236
                    251- 300 1035682             1501-1600    13083
                    301- 350  941138             1601-1700     9795
                    351- 400  731546             1701-1800     7705
                    401- 450  615422             1801-1900     6178
                    451- 500  515155             1901-2000     5202
                    501- 550  352124             2001-2100     4202
                    551- 600  270531             2101-2200     4359
                    601- 650  196048             2201-2300     3454
                    651- 700  152713             2301-2400     2715
                    701- 750  131418             2401-2500     2360
                    751- 800  118254             >2500        20395
                    801- 850   87020
                    851- 900   78770
                    901- 950   54057
                    951-1000   41340
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   321 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    14788683                1.22                                                    
                    Submitted to EMBL/GenBank/DDBJ   8782757   7735142      0.73                                                    
                    Journal                          5864260   5294931      0.48                                                    
                    Submitted to other databases       33169     33138     <0.01                                                    
                    Thesis                              7505      7447     <0.01                                                    
                    Book citation                       5233      5182     <0.01                                                    
                    Other                              95759     95436      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 299422
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                      10358602                0.86                                                    
                    CATALYTIC ACTIVITY                989416    920195      0.08     4                                              
                    CAUTION                          2853600   2853600      0.24     2                                              
                    COFACTOR                          309195    298288      0.03     8                                              
                    DOMAIN                             13828     12217     <0.01    10                                              
                    FUNCTION                         1189791   1100982      0.10     3                                              
                    INTERACTION                         5034      5034     <0.01    11                                              
                    MISCELLANEOUS                      32360     32353     <0.01     9                                              
                    PATHWAY                           472134    437286      0.04     6                                              
                    SIMILARITY                       3262610   2850831      0.27     1                                              
                    SUBCELLULAR LOCATION              809930    808936      0.07     5                                              
                    SUBUNIT                           420704    418707      0.03     7                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       4171129                0.34                                                    
                    CHAIN                             435724    341534      0.04     2                                              
                    NON_TER                          3448296   2047482      0.29     1                                              
                    SIGNAL                            286519    286054      0.02     3                                              
                    TRANSIT                              590       590     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             140559151               11.62                                                    
                    AGD                                 2579      2579     <0.01    76   Organism-specific databases                
                    ANU-2DPAGE                            56        56     <0.01    92   2D gel databases                           
                    ArachnoServer                        278       278     <0.01    85   Organism-specific databases                
                    ArrayExpress                       94116     94103      0.01    48   Gene expression databases                  
                    BRENDA                              2879      2813     <0.01    74   Enzyme and pathway databases               
                    Bgee                              129616    129514      0.01    44   Gene expression databases                  
                    BioCyc                           1624484   1589889      0.13    21   Enzyme and pathway databases               
                    CAZy                               74811     70297      0.01    49   Protein family/group databases             
                    CGD                                 6785      6785     <0.01    71   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    96   2D gel databases                           
                    CTD                               169132    168214      0.01    42   Organism-specific databases                
                    CYGD                                   5         5     <0.01    97   Organism-specific databases                
                    DIP                                 2713      2708     <0.01    75   Protein-protein interaction databases      
                    EMBL                            13511311  12082372      1.12     3   Sequence databases                         
                    Ensembl                           321980    187001      0.03    31   Genome annotation databases                
                    EnsemblBacteria                   501820    471927      0.04    27   Genome annotation databases                
                    EnsemblFungi                       98137     98028      0.01    47   Genome annotation databases                
                    EnsemblMetazoa                    296741    252026      0.02    33   Genome annotation databases                
                    EnsemblPlants                     208472    193412      0.02    37   Genome annotation databases                
                    EnsemblProtists                    24295     24116     <0.01    59   Genome annotation databases                
                    EuPathDB                          151353    151353      0.01    43   Organism-specific databases                
                    FlyBase                           195335    193805      0.02    39   Organism-specific databases                
                    GO                              23199284   7416221      1.92     2   Ontologies                                 
                    Gene3D                           3806516   3225529      0.31    11   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   100   Organism-specific databases                
                    GeneID                           5213822   5100885      0.43     7   Genome annotation databases                
                    Genevestigator                    101781    101771      0.01    46   Gene expression databases                  
                    GenoList                           14760     14487     <0.01    64   Organism-specific databases                
                    GenomeReviews                    3459499   3374416      0.29    12   Genome annotation databases                
                    Gramene                            68922     68922      0.01    51   Organism-specific databases                
                    H-InvDB                              534       437     <0.01    83   Organism-specific databases                
                    HAMAP                             579439    576893      0.05    26   Family and domain databases                
                    HGNC                               61491     59797      0.01    53   Organism-specific databases                
                    HOGENOM                          2203238   2203106      0.18    17   Phylogenomic databases                     
                    HOVERGEN                          320452    318735      0.03    32   Phylogenomic databases                     
                    HSSP                              254430    254156      0.02    34   3D structure databases                     
                    IPI                               227753    227753      0.02    36   Sequence databases                         
                    InParanoid                        196626    196531      0.02    38   Phylogenomic databases                     
                    IntAct                             15686     15686     <0.01    63   Protein-protein interaction databases      
                    InterPro                        24668310   9332106      2.04     1   Family and domain databases                
                    KEGG                             4410639   4319089      0.36     9   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    72   Organism-specific databases                
                    Leproma                              939       938     <0.01    82   Organism-specific databases                
                    MEROPS                             66006     64780      0.01    52   Protein family/group databases             
                    MGI                                42859     42845     <0.01    57   Organism-specific databases                
                    MINT                                9157      9157     <0.01    68   Protein-protein interaction databases      
                    NMPDR                             924586    924575      0.08    24   Genome annotation databases                
                    NextBio                            47877     47874     <0.01    55   Other                                      
                    OMA                              2431705   2431703      0.20    14   Phylogenomic databases                     
                    OrthoDB                           429819    429818      0.04    28   Phylogenomic databases                     
                    PANTHER                          1975661   1862822      0.16    20   Family and domain databases                
                    PDB                                12849      7697     <0.01    66   3D structure databases                     
                    PDBsum                             12603      7535     <0.01    67   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    89   2D gel databases                           
                    PIR                               175899    143047      0.01    41   Sequence databases                         
                    PIRSF                             657347    657347      0.05    25   Family and domain databases                
                    PMAP-CutDB                           256       256     <0.01    86   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    98   2D gel databases                           
                    PRIDE                             104290    104288      0.01    45   Proteomic databases                        
                    PRINTS                           1985928   1755945      0.16    19   Family and domain databases                
                    PROSITE                          5976206   3991218      0.49     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    95   Enzyme and pathway databases               
                    PeptideAtlas                         148       148     <0.01    88   Proteomic databases                        
                    PeroxiBase                          2464      2456     <0.01    77   Protein family/group databases             
                    Pfam                            11961805   8906319      0.99     4   Family and domain databases                
                    PharmGKB                              85        85     <0.01    91   Organism-specific databases                
                    PhosphoSite                         1803      1803     <0.01    80   PTM databases                              
                    PhylomeDB                         373619    373587      0.03    30   Phylogenomic databases                     
                    ProDom                            237982    223400      0.02    35   Family and domain databases                
                    ProMEX                               443       443     <0.01    84   Proteomic databases                        
                    ProtClustDB                      2624165   2624149      0.22    13   Phylogenomic databases                     
                    ProteinModelPortal               4077545   4077060      0.34    10   3D structure databases                     
                    PseudoCAP                           4346      4343     <0.01    73   Organism-specific databases                
                    REBASE                              8260      7895     <0.01    69   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   96        95     <0.01    90   2D gel databases                           
                    RGD                                17538     17369     <0.01    62   Organism-specific databases                
                    Reactome                              56        53     <0.01    93   Enzyme and pathway databases               
                    RefSeq                           5230225   5105743      0.43     6   Sequence databases                         
                    SGD                                  247       247     <0.01    87   Organism-specific databases                
                    SMART                            2389410   1859391      0.20    15   Family and domain databases                
                    SMR                              2111997   2111738      0.17    18   3D structure databases                     
                    STRING                           1205697   1205548      0.10    22   Protein-protein interaction databases      
                    SUPFAM                           4807351   3984807      0.40     8   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    94   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01    99   2D gel databases                           
                    TAIR                               18933     18853     <0.01    61   Organism-specific databases                
                    TCDB                                2343      2334     <0.01    78   Protein family/group databases             
                    TIGR                              195146    188098      0.02    40   Genome annotation databases                
                    TIGRFAMs                         2374504   2166021      0.20    16   Family and domain databases                
                    TubercuList                         2257      2252     <0.01    79   Organism-specific databases                
                    UCSC                               50698     50698     <0.01    54   Genome annotation databases                
                    UniGene                           428090    396206      0.04    29   Sequence databases                         
                    VectorBase                         47574     47106     <0.01    56   Genome annotation databases                
                    World-2DPAGE                         947       942     <0.01    81   2D gel databases                           
                    WormBase                           41259     41132     <0.01    58   Organism-specific databases                
                    Xenbase                            12902     12878     <0.01    65   Organism-specific databases                
                    ZFIN                               21561     21556     <0.01    60   Organism-specific databases                
                    dictyBase                           8159      8158     <0.01    70                                              
                    eggNOG                           1147794   1147794      0.09    23                                              
                    euHCVdb                            72340     72337      0.01    50                                              
                    
                    Number of explicitly cross-referenced databases: 126
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.62   Gln (Q) 3.85   Leu (L) 9.83   Ser (S) 6.70
                    Arg (R) 5.47   Glu (E) 6.13   Lys (K) 5.26   Thr (T) 5.61
                    Asn (N) 4.15   Gly (G) 7.12   Met (M) 2.46   Trp (W) 1.31
                    Asp (D) 5.29   His (H) 2.19   Phe (F) 4.03   Tyr (Y) 3.06
                    Cys (C) 1.27   Ile (I) 6.01   Pro (P) 4.74   Val (V) 6.74
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 394382
                    Total number of entries encoded on a Plasmid: 176974
                    Total number of entries encoded on a Plastid: 10189
                    Total number of entries encoded on a Plastid; Apicoplast: 335
                    Total number of entries encoded on a Plastid; Chloroplast: 122430
                    Total number of entries encoded on a Plastid; Cyanelle: 7
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 441