Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2010_09 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2010_09 of 10-Aug-2010 of UniProtKB/TrEMBL contains 11636205 sequence entries,
                    comprising 3746823921 amino acids .
                    
                    300428 sequences have been added since release 2010_08, the sequence data of
                    9476 existing entries has been updated and the annotations of
                    6032912 entries have been revised. This represents an increase of 3%.
                    
                    Number of fragments: 1938739
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           35014     0.30%
                    2: Evidence at transcript level       469360     4.03%
                    3: Inferred from homology            2371373    20.38%
                    4: Predicted                         8760458    75.29%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 248802
                    
                    The first twenty species represent 1204378 sequences:  10.4 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:10568
                    2x:44361
                    3x:24307
                    4x:14839
                    5x: 9202
                    6x: 6739
                    7x: 4718
                    8x: 3771
                    9x: 3013
                    10x: 5284
                    11- 20x:15286
                    21- 50x: 5466
                    51-100x: 2034
                    >100x: 4101
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     343787  Human immunodeficiency virus 1
                    2      95560  Oryza sativa subsp. japonica (Rice)
                    3      75272  Homo sapiens (Human)
                    4      57781  Hepatitis C virus
                    5      50404  Trichomonas vaginalis
                    6      48570  Mus musculus (Mouse)
                    7      48278  Vitis vinifera (Grape)
                    8      46625  uncultured bacterium
                    9      44040  Populus trichocarpa (Western balsam poplar) 
                    10      41904  Zea mays (Maize)
                    11      41775  Arabidopsis thaliana (Mouse-ear cress)
                    12      39844  Paramecium tetraurelia
                    13      39331  Oryza sativa subsp. indica (Rice)
                    14      39085  Hepatitis B virus (HBV)
                    15      34760  Physcomitrella patens subsp. patens
                    16      33633  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    17      32381  Arabidopsis lyrata subsp. lyrata
                    18      31273  Ricinus communis (Castor bean)
                    19      30960  Drosophila melanogaster (Fruit fly)
                    20      29115  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    21      28089  Tetraodon nigroviridis (Green puffer)
                    22      26805  Danio rerio (Zebrafish) (Brachydanio rerio)
                    23      24832  Nematostella vectensis (Starlet sea anemone)
                    24      23487  Rattus norvegicus (Rat)
                    25      23115  Perkinsus marinus ATCC 50983
                    26      21188  Caenorhabditis elegans
                    27      21081  Ixodes scapularis (Black-legged tick) (Deer tick)
                    28      20676  Trypanosoma cruzi
                    29      18872  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    30      18092  Caenorhabditis briggsae
                    31      17929  Drosophila simulans (Fruit fly)
                    32      17861  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    33      17790  Ailuropoda melanoleuca (Giant panda)
                    34      17618  Escherichia coli
                    35      17610  Phytophthora infestans T30-4
                    36      16964  Tribolium castaneum (Red flour beetle)
                    37      16898  Drosophila yakuba (Fruit fly)
                    38      16845  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    39      16752  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    40      16713  Drosophila persimilis (Fruit fly)
                    41      16255  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    42      16188  Drosophila sechellia (Fruit fly)
                    43      15953  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    44      15874  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    45      15715  Naegleria gruberi (Amoeba)
                    46      15674  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    47      15425  Drosophila willistoni (Fruit fly)
                    48      15250  Tetrahymena thermophila SB210
                    49      15147  Drosophila ananassae (Fruit fly)
                    50      14931  Drosophila erecta (Fruit fly)
                    51      14814  Chlamydomonas reinhardtii
                    52      14782  Drosophila mojavensis (Fruit fly)
                    53      14767  Anopheles gambiae (African malaria mosquito)
                    54      14700  Drosophila virilis (Fruit fly)
                    55      14673  Plasmodium chabaudi
                    56      14659  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    57      14272  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    58      13833  Candida albicans (Yeast)
                    59      13627  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    60      13493  Schistosoma mansoni (Blood fluke)
                    61      13376  Aspergillus flavus 
                    62      13298  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    63      12977  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    64      12727  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    65      12713  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
                    66      12507  Xenopus laevis (African clawed frog)
                    67      12472  Glycine max (Soybean) (Glycine hispida)
                    68      12340  Polysphondylium pallidum PN500
                    69      12032  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    70      12008  Hepatitis C virus subtype 1b
                    71      11863  Aspergillus oryzae
                    72      11801  Plasmodium berghei
                    73      11571  Trichoplax adhaerens
                    74      11500  Brugia malayi (Filarial nematode worm)
                    75      11211  Ktedonobacter racemifer DSM 44963
                    76      10968  Plasmodium falciparum
                    77      10939  Sordaria macrospora
                    78      10900  Schistosoma japonicum (Blood fluke)
                    79      10868  Chaetomium globosum (Soil fungus)
                    80      10679  Podospora anserina
                    81      10663  Ralstonia solanacearum (Pseudomonas solanacearum)
                    82      10441  Picea sitchensis (Sitka spruce)
                    83      10419  Neurospora crassa
                    84      10410  Aspergillus nidulans FGSC A4
                    85      10399  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    86      10334  Phaeodactylum tricornutum CCAP 1055/1
                    87      10279  Micromonas pusilla CCMP1545
                    88      10232  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    89      10174  Helicobacter pylori (Campylobacter pylori)
                    90      10126  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    91      10115  Micromonas sp. RCC299
                    92      10112  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    93      10084  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    94      10019  Streptomyces bingchenggensis BCW-1
                    95       9885  Ectocarpus siliculosus (Brown alga)
                    96       9872  Bos taurus (Bovine)
                    97       9823  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    98       9755  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    99       9666  Trypanosoma brucei gambiense DAL972
                    100       9634  Cryptococcus neoformans (Filobasidiella neoformans)
                    101       9574  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    102       9567  Aspergillus fumigatus (Sartorya fumigata)
                    103       9540  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    104       9526  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    105       9487  Trypanosoma brucei
                    106       9362  Salmo salar (Atlantic salmon)
                    107       9244  Monosiga brevicollis (Choanoflagellate)
                    108       9204  Plasmodium vivax
                    109       9197  Emericella nidulans (Aspergillus nidulans)
                    110       9195  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    111       9173  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    112       9118  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    113       9096  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    114       9015  Dictyostelium discoideum (Slime mold)
                    115       8978  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    116       8965  Thalassiosira pseudonana (Marine diatom)
                    117       8951  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    118       8908  Catenulispora acidiphila 
                    119       8873  Aspergillus clavatus
                    120       8770  Rhodococcus sp. (strain RHA1)
                    121       8743  Rabies virus
                    122       8720  Paracoccidioides brasiliensis (strain Pb18)
                    123       8710  Toxoplasma gondii
                    124       8706  Nannizzia otae (strain CBS 113480) (Microsporum canis) (Arthroderma otae)
                    125       8696  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    126       8603  Entamoeba dispar SAW760
                    127       8523  Stigmatella aurantiaca DW4/3-1
                    128       8437  Plesiocystis pacifica SIR-1
                    129       8299  Entamoeba histolytica
                    130       8249  Microscilla marina ATCC 23134
                    131       8209  Bradyrhizobium japonicum
                    132       8202  Streptomyces sviceus ATCC 29083
                    133       8201  Microcoleus chthonoplastes PCC 7420
                    134       8163  Frankia sp. EUN1f
                    135       8154  Burkholderia xenovorans (strain LB400)
                    136       8046  Pseudomonas aeruginosa
                    137       8027  Trichophyton verrucosum (strain HKI 0517)
                    138       8025  Leishmania infantum
                    139       7980  Toxoplasma gondii ME49
                    140       7978  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    141       7957  Ostreococcus tauri
                    142       7948  Rhodococcus opacus (strain B4)
                    143       7916  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    144       7891  Leishmania braziliensis
                    145       7867  Streptomyces ghanaensis ATCC 14672
                    146       7860  Paracoccidioides brasiliensis (strain Pb03)
                    147       7857  Acaryochloris marina (strain MBIC 11017)
                    148       7838  Toxoplasma gondii VEG
                    149       7823  Burkholderia sp. Ch1-1
                    150       7813  Plasmodium yoelii yoelii
                    151       7747  Uncinocarpus reesii (strain UAMH 1704)
                    152       7571  Clostridium hathewayi DSM 13479
                    153       7563  Burkholderia pseudomallei MSHR346
                    154       7523  Streptomyces lividans TK24
                    155       7519  Solibacter usitatus (strain Ellin6076)
                    156       7501  Tuber melanosporum (Perigord truffle)
                    157       7487  Streptomyces coelicolor
                    158       7475  Burkholderia pseudomallei 1710a
                    159       7465  Burkholderia pseudomallei Pakistan 9
                    160       7459  Burkholderia sp. H160
                    161       7396  Ostreococcus lucimarinus (strain CCE9901)
                    162       7379  Streptomyces sp. ACT-1
                    163       7367  Burkholderia pseudomallei 576
                    164       7349  Burkholderia pseudomallei 305
                    165       7337  Streptomyces clavuligerus ATCC 27064
                    166       7310  Frankia sp. EuI1c
                    167       7274  Clostridium bolteae ATCC BAA-613
                    168       7243  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    169       7232  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    170       7232  Streptomyces avermitilis
                    171       7211  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    172       7191  Medicago truncatula (Barrel medic)
                    173       7179  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    174       7149  Giardia lamblia ATCC 50803
                    175       7140  Burkholderia pseudomallei 1106b
                    176       7132  Burkholderia phymatum (strain DSM 17167 / STM815)
                    177       7124  Burkholderia ambifaria MEX-5
                    178       7119  Leishmania major
                    179       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    180       7017  Myxococcus xanthus (strain DK 1622)
                    181       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    182       6979  Rhodopirellula baltica
                    183       6963  Frankia sp. (strain EAN1pec)
                    184       6943  Streptomyces sp. Mg1
                    185       6936  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    186       6923  Burkholderia ambifaria IOP40-10
                    187       6909  Saccharopolyspora erythraea (strain NRRL 23338)
                    188       6907  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    189       6892  Streptomyces roseosporus NRRL 15998
                    190       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    191       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    192       6866  Burkholderia sp. (strain CCGE1002)
                    193       6866  Streptomyces pristinaespiralis ATCC 25486
                    194       6859  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    195       6817  Clostridium asparagiforme DSM 15981
                    196       6817  Rhizobium loti (Mesorhizobium loti)
                    197       6772  Burkholderia pseudomallei (strain 1106a)
                    198       6740  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    199       6740  Burkholderia pseudomallei (strain 668)
                    200       6725  Burkholderia graminis C4D1M
                    201       6714  Rhizobium leguminosarum bv. viciae (strain 3841)
                    202       6712  Rhodococcus erythropolis SK121
                    203       6712  Hepatitis C virus subtype 1a
                    204       6705  Chthoniobacter flavus Ellin428
                    205       6702  Streptomyces flavogriseus ATCC 33331
                    206       6692  Bacillus thuringiensis IBL 200
                    207       6685  Sus scrofa (Pig)
                    208       6684  Haliangium ochraceum (strain DSM 14365 / JCM 11303 / SMP-2)
                    209       6679  Mesorhizobium opportunistum WSM2075
                    210       6662  Burkholderia pseudomallei S13
                    211       6657  Burkholderia cepacia (strain J2315 / LMG 16656) (Burkholderia cenocepacia 
                    212       6655  Bacillus thuringiensis IBL 4222
                    213       6644  Beggiatoa sp. PS
                    214       6643  Streptococcus pneumoniae
                    215       6627  Burkholderia cenocepacia (strain MC0-3)
                    216       6614  Burkholderia multivorans CGD2
                    217       6613  Burkholderia pseudomallei Pasteur 52237
                    218       6606  Burkholderia multivorans CGD2M
                    219       6583  Bacillus thuringiensis serovar sotto str. T04001
                    220       6527  Burkholderia multivorans CGD1
                    221       6521  Streptomyces sp. ACTE
                    222       6509  Frankia alni (strain ACN14a)
                    223       6498  bacterium Ellin514
                    224       6497  Burkholderia cenocepacia (strain HI2424)
                    225       6488  Bacillus thuringiensis serovar monterrey BGSC 4AJ1
                    226       6463  Planctomyces maris DSM 8797
                    227       6458  uncultured archaeon
                    228       6453  Mycobacterium parascrofulaceum ATCC BAA-614
                    229       6427  Agrobacterium radiobacter (strain K84 / ATCC BAA-868)
                    230       6417  Methylobacterium sp. (strain 4-46)
                    231       6413  Cyanothece sp. CCY0110
                    232       6391  Ustilago maydis (Smut fungus)
                    233       6388  Bradyrhizobium sp. (strain ORS278)
                    234       6377  Clostridium carboxidivorans P7
                    235       6376  Stackebrandtia nassauensis 
                    236       6372  Micromonospora aurantiaca ATCC 27029
                    237       6360  Rhizobium meliloti (Sinorhizobium meliloti)
                    238       6347  Micromonospora sp. L5
                    239       6336  Burkholderia ambifaria (strain MC40-6)
                    240       6322  Bacillus thuringiensis serovar thuringiensis str. T01001
                    241       6309  Hahella chejuensis (strain KCTC 2396)
                    242       6298  Bacillus thuringiensis Bt407
                    243       6294  Burkholderia pseudomallei 406e
                    244       6290  Nostoc punctiforme (strain ATCC 29133 / PCC 73102)
                    245       6288  Burkholderia pseudomallei 1655
                    246       6272  Labrenzia aggregata IAM 12614
                    247       6263  Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
                    248       6252  Clostridiales bacterium 1_7_47FAA
                    249       6242  Bacillus thuringiensis serovar berliner ATCC 10792
                    250       6237  Geobacillus sp. (strain Y412MC10)
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          221254 (  2%)
                    Bacteria        7241640 ( 62%)
                    Eukaryota       3182637 ( 27%)
                    Viruses          978380 (  8%)
                    Other             12293 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  75307 (  2%)           (  1%)
                    Other Mammalia        196537 (  6%)           (  2%)
                    Other Vertebrata      300096 (  9%)           (  3%)
                    Viridiplantae         771819 ( 24%)           (  7%)
                    Fungi                 654174 ( 21%)           (  6%)
                    Insecta               440947 ( 14%)           (  4%)
                    Nematoda               61889 (  2%)           (  1%)
                    Other                 681868 ( 21%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  250228             1001-1100    69015
                    51- 100  924138             1101-1200    48676
                    101- 150 1064115             1201-1300    33186
                    151- 200 1028833             1301-1400    21951
                    201- 250 1030667             1401-1500    17645
                    251- 300  998232             1501-1600    12695
                    301- 350  907370             1601-1700     9294
                    351- 400  705807             1701-1800     7431
                    401- 450  593429             1801-1900     5968
                    451- 500  496037             1901-2000     5025
                    501- 550  339913             2001-2100     4052
                    551- 600  260966             2101-2200     4207
                    601- 650  189541             2201-2300     3321
                    651- 700  147627             2301-2400     2615
                    701- 750  126824             2401-2500     2275
                    751- 800  113570             >2500        19696
                    801- 850   84302
                    851- 900   76461
                    901- 950   52321
                    951-1000   40033
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   321 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    14026518                1.21                                                    
                    Submitted to EMBL/GenBank/DDBJ   8268191   7337470      0.71                                                    
                    Journal                          5621079   5113408      0.48                                                    
                    Submitted to other databases       32328     32303     <0.01                                                    
                    Thesis                              7358      7301     <0.01                                                    
                    Book citation                       5148      5097     <0.01                                                    
                    Other                              92414     92169      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 294488
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                       9383246                0.81                                                    
                    CATALYTIC ACTIVITY                900535    818741      0.08     4                                              
                    CAUTION                          2806201   2806201      0.24     2                                              
                    COFACTOR                          276685    268564      0.02     8                                              
                    DOMAIN                              7065      7065     <0.01    10                                              
                    FUNCTION                         1028220    948436      0.09     3                                              
                    INTERACTION                         4991      4991     <0.01    11                                              
                    MISCELLANEOUS                      29528     29524     <0.01     9                                              
                    PATHWAY                           369455    338969      0.03     7                                              
                    SIMILARITY                       2899486   2501673      0.25     1                                              
                    SUBCELLULAR LOCATION              671167    671093      0.06     5                                              
                    SUBUNIT                           389913    389341      0.03     6                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       3969173                0.34                                                    
                    CHAIN                             431108    338998      0.04     2                                              
                    NON_TER                          3255007   1937249      0.28     1                                              
                    SIGNAL                            282473    282224      0.02     3                                              
                    TRANSIT                              585       585     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             135660703               11.66                                                    
                    AGD                                 3867      3867     <0.01    74   Organism-specific databases                
                    ANU-2DPAGE                            57        57     <0.01    92   2D gel databases                           
                    ArachnoServer                        368       368     <0.01    85   Organism-specific databases                
                    ArrayExpress                       94335     94322      0.01    48   Gene expression databases                  
                    BRENDA                              2896      2829     <0.01    75   Enzyme and pathway databases               
                    Bgee                              129875    129773      0.01    44   Gene expression databases                  
                    BioCyc                           1624603   1590003      0.14    21   Enzyme and pathway databases               
                    CAZy                               74835     70319      0.01    49   Protein family/group databases             
                    CGD                                 6800      6800     <0.01    71   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    96   2D gel databases                           
                    CTD                               168152    167240      0.01    42   Organism-specific databases                
                    CYGD                                   5         5     <0.01    97   Organism-specific databases                
                    DIP                                 2595      2590     <0.01    76   Protein-protein interaction databases      
                    EMBL                            12937646  11620022      1.11     3   Sequence databases                         
                    Ensembl                           307503    183015      0.03    32   Genome annotation databases                
                    EnsemblBacteria                   501890    471980      0.04    26   Genome annotation databases                
                    EnsemblFungi                       98203     98094      0.01    47   Genome annotation databases                
                    EnsemblMetazoa                    296929    252179      0.03    33   Genome annotation databases                
                    EnsemblPlants                     208984    193826      0.02    36   Genome annotation databases                
                    EnsemblProtists                    24305     24126     <0.01    59   Genome annotation databases                
                    EuPathDB                          151376    151376      0.01    43   Organism-specific databases                
                    FlyBase                           195388    193858      0.02    38   Organism-specific databases                
                    GO                              22430100   7063276      1.93     2   Ontologies                                 
                    Gene3D                           3558942   3008316      0.31    11   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01   100   Organism-specific databases                
                    GeneID                           4812957   4638284      0.41     7   Genome annotation databases                
                    Genevestigator                    102199    102189      0.01    46   Gene expression databases                  
                    GenoList                           14761     14488     <0.01    63   Organism-specific databases                
                    GenomeReviews                    3326689   3242535      0.29    12   Genome annotation databases                
                    Gramene                            69004     69004      0.01    51   Organism-specific databases                
                    H-InvDB                              536       439     <0.01    83   Organism-specific databases                
                    HAMAP                             474416    472382      0.04    27   Family and domain databases                
                    HGNC                               58978     57110      0.01    53   Organism-specific databases                
                    HOGENOM                          2203549   2203470      0.19    18   Phylogenomic databases                     
                    HOVERGEN                          319771    318999      0.03    31   Phylogenomic databases                     
                    HSSP                              254790    254515      0.02    34   3D structure databases                     
                    IPI                               223001    222997      0.02    35   Sequence databases                         
                    InParanoid                        196949    196854      0.02    37   Phylogenomic databases                     
                    IntAct                             13536     13536     <0.01    64   Protein-protein interaction databases      
                    InterPro                        23392854   8820003      2.01     1   Family and domain databases                
                    KEGG                             4310056   4214451      0.37     9   Genome annotation databases                
                    LegioList                           5142      5114     <0.01    72   Organism-specific databases                
                    Leproma                              940       939     <0.01    82   Organism-specific databases                
                    MEROPS                             66109     64877      0.01    52   Protein family/group databases             
                    MGI                                42026     42014     <0.01    57   Organism-specific databases                
                    MINT                                9174      9174     <0.01    68   Protein-protein interaction databases      
                    NMPDR                             926089    926078      0.08    24   Genome annotation databases                
                    NextBio                            48024     48021     <0.01    55   Other                                      
                    OMA                              2434116   2434114      0.21    15   Phylogenomic databases                     
                    OrthoDB                           430347    430346      0.04    28   Phylogenomic databases                     
                    PANTHER                          1816321   1710429      0.16    20   Family and domain databases                
                    PDB                                12239      7337     <0.01    66   3D structure databases                     
                    PDBsum                             12118      7270     <0.01    67   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    89   2D gel databases                           
                    PIR                               176080    143222      0.02    41   Sequence databases                         
                    PIRSF                             624407    624407      0.05    25   Family and domain databases                
                    PMAP-CutDB                           259       259     <0.01    86   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    98   2D gel databases                           
                    PRIDE                             104527    104526      0.01    45   Proteomic databases                        
                    PRINTS                           1818001   1602589      0.16    19   Family and domain databases                
                    PROSITE                          5638923   3747331      0.48     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    95   Enzyme and pathway databases               
                    PeptideAtlas                         148       148     <0.01    88   Proteomic databases                        
                    PeroxiBase                          2278      2272     <0.01    78   Protein family/group databases             
                    Pfam                            11278475   8393551      0.97     4   Family and domain databases                
                    PharmGKB                              85        85     <0.01    91   Organism-specific databases                
                    PhosphoSite                         1794      1794     <0.01    80   PTM databases                              
                    PhylomeDB                         373234    373202      0.03    30   Phylogenomic databases                     
                    ProDom                            194477    183587      0.02    40   Family and domain databases                
                    ProMEX                               449       449     <0.01    84   Proteomic databases                        
                    ProtClustDB                      2624394   2624378      0.23    14   Phylogenomic databases                     
                    ProteinModelPortal               4094296   4094174      0.35    10                                              
                    PseudoCAP                           4347      4344     <0.01    73   Organism-specific databases                
                    REBASE                              7826      7556     <0.01    70   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   96        95     <0.01    90   2D gel databases                           
                    RGD                                17454     17369     <0.01    62   Organism-specific databases                
                    Reactome                              56        53     <0.01    93   Enzyme and pathway databases               
                    RefSeq                           4960231   4773049      0.43     6   Sequence databases                         
                    SGD                                  249       249     <0.01    87   Organism-specific databases                
                    SMART                            2259092   1763202      0.19    16   Family and domain databases                
                    SMR                              3047121   3046550      0.26    13   3D structure databases                     
                    STRING                           1206996   1206847      0.10    22   Protein-protein interaction databases      
                    SUPFAM                           4527833   3743521      0.39     8   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    94   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01    99   2D gel databases                           
                    TAIR                               19075     18993     <0.01    61   Organism-specific databases                
                    TCDB                                2298      2286     <0.01    77   Protein family/group databases             
                    TIGR                              195174    188124      0.02    39   Genome annotation databases                
                    TIGRFAMs                         2249053   2051397      0.19    17   Family and domain databases                
                    TubercuList                         2275      2270     <0.01    79   Organism-specific databases                
                    UCSC                               50800     50800     <0.01    54   Genome annotation databases                
                    UniGene                           428516    396895      0.04    29   Sequence databases                         
                    VectorBase                         47583     47115     <0.01    56   Genome annotation databases                
                    World-2DPAGE                         947       942     <0.01    81   2D gel databases                           
                    WormBase                           41187     41063     <0.01    58   Organism-specific databases                
                    Xenbase                            12716     12693     <0.01    65   Organism-specific databases                
                    ZFIN                               21511     21506     <0.01    60   Organism-specific databases                
                    dictyBase                           8164      8163     <0.01    69                                              
                    eggNOG                           1149439   1149439      0.10    23                                              
                    euHCVdb                            72339     72336      0.01    50                                              
                    
                    Number of explicitly cross-referenced databases: 125
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.58   Gln (Q) 3.86   Leu (L) 9.81   Ser (S) 6.70
                    Arg (R) 5.47   Glu (E) 6.15   Lys (K) 5.30   Thr (T) 5.61
                    Asn (N) 4.17   Gly (G) 7.10   Met (M) 2.45   Trp (W) 1.31
                    Asp (D) 5.30   His (H) 2.19   Phe (F) 4.03   Tyr (Y) 3.07
                    Cys (C) 1.28   Ile (I) 6.02   Pro (P) 4.73   Val (V) 6.73
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Lys, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 330856
                    Total number of entries encoded on a Plasmid: 171263
                    Total number of entries encoded on a Plastid: 10116
                    Total number of entries encoded on a Plastid; Apicoplast: 335
                    Total number of entries encoded on a Plastid; Chloroplast: 119372
                    Total number of entries encoded on a Plastid; Cyanelle: 7
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 441