Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2010_08 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2010_08 of 13-Jul-2010 of UniProtKB/TrEMBL contains 11397958 sequence entries,
                    comprising 3661877567 amino acids .
                    
                    323546 sequences have been added since release 2010_07, the sequence data of
                    1430 existing entries has been updated and the annotations of
                    2365970 entries have been revised. This represents an increase of 3%.
                    
                    Number of fragments: 1894933
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           34159     0.30%
                    2: Evidence at transcript level       467632     4.10%
                    3: Inferred from homology            2258143    19.81%
                    4: Predicted                         8638024    75.79%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 246315
                    
                    The first twenty species represent 1173110 sequences:  10.3 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:10483
                    2x:43982
                    3x:24034
                    4x:14668
                    5x: 9104
                    6x: 6665
                    7x: 4587
                    8x: 3741
                    9x: 2990
                    10x: 5180
                    11- 20x:15072
                    21- 50x: 5427
                    51-100x: 1998
                    >100x: 4037
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     340747  Human immunodeficiency virus 1
                    2      95546  Oryza sativa subsp. japonica (Rice)
                    3      74426  Homo sapiens (Human)
                    4      57771  Hepatitis C virus
                    5      50404  Trichomonas vaginalis
                    6      48582  Mus musculus (Mouse)
                    7      45548  uncultured bacterium
                    8      44040  Populus trichocarpa (Western balsam poplar) 
                    9      42008  Arabidopsis thaliana (Mouse-ear cress)
                    10      41893  Zea mays (Maize)
                    11      39843  Paramecium tetraurelia
                    12      39307  Oryza sativa subsp. indica (Rice)
                    13      38931  Hepatitis B virus (HBV)
                    14      34760  Physcomitrella patens subsp. patens
                    15      33632  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    16      31273  Ricinus communis (Castor bean)
                    17      30467  Drosophila melanogaster (Fruit fly)
                    18      29073  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    19      28089  Tetraodon nigroviridis (Green puffer)
                    20      26770  Danio rerio (Zebrafish) (Brachydanio rerio)
                    21      25119  Vitis vinifera (Grape)
                    22      24832  Nematostella vectensis (Starlet sea anemone)
                    23      23512  Rattus norvegicus (Rat)
                    24      23115  Perkinsus marinus ATCC 50983
                    25      21137  Caenorhabditis elegans
                    26      21081  Ixodes scapularis (Black-legged tick) (Deer tick)
                    27      20675  Trypanosoma cruzi
                    28      18872  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    29      18099  Caenorhabditis briggsae
                    30      17861  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    31      17781  Ailuropoda melanoleuca (Giant panda)
                    32      17610  Phytophthora infestans T30-4
                    33      17469  Escherichia coli
                    34      17441  Drosophila simulans (Fruit fly)
                    35      16899  Drosophila yakuba (Fruit fly)
                    36      16751  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    37      16733  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    38      16714  Drosophila persimilis (Fruit fly)
                    39      16255  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    40      16189  Drosophila sechellia (Fruit fly)
                    41      15954  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    42      15874  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    43      15715  Naegleria gruberi (Amoeba)
                    44      15674  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    45      15426  Drosophila willistoni (Fruit fly)
                    46      15251  Tetrahymena thermophila SB210
                    47      15146  Drosophila ananassae (Fruit fly)
                    48      14932  Drosophila erecta (Fruit fly)
                    49      14814  Chlamydomonas reinhardtii
                    50      14783  Drosophila mojavensis (Fruit fly)
                    51      14767  Anopheles gambiae (African malaria mosquito)
                    52      14701  Drosophila virilis (Fruit fly)
                    53      14673  Plasmodium chabaudi
                    54      14660  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    55      14272  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    56      13818  Candida albicans (Yeast)
                    57      13629  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    58      13472  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    59      13441  Schistosoma mansoni (Blood fluke)
                    60      13378  Aspergillus flavus 
                    61      12979  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    62      12728  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    63      12713  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
                    64      12516  Xenopus laevis (African clawed frog)
                    65      12468  Glycine max (Soybean) (Glycine hispida)
                    66      12340  Polysphondylium pallidum PN500
                    67      12032  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    68      11979  Hepatitis C virus subtype 1b
                    69      11865  Aspergillus oryzae
                    70      11801  Plasmodium berghei
                    71      11571  Trichoplax adhaerens
                    72      11500  Brugia malayi (Filarial nematode worm)
                    73      10939  Sordaria macrospora
                    74      10900  Schistosoma japonicum (Blood fluke)
                    75      10868  Chaetomium globosum (Soil fungus)
                    76      10863  Plasmodium falciparum
                    77      10725  Podospora anserina
                    78      10663  Ralstonia solanacearum (Pseudomonas solanacearum)
                    79      10441  Picea sitchensis (Sitka spruce)
                    80      10420  Neurospora crassa
                    81      10412  Aspergillus nidulans FGSC A4
                    82      10401  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    83      10334  Phaeodactylum tricornutum CCAP 1055/1
                    84      10279  Micromonas pusilla CCMP1545
                    85      10232  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    86      10136  Helicobacter pylori (Campylobacter pylori)
                    87      10129  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    88      10115  Micromonas sp. RCC299
                    89      10114  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    90      10088  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    91       9864  Bos taurus (Bovine)
                    92       9757  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    93       9748  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    94       9666  Trypanosoma brucei gambiense DAL972
                    95       9634  Cryptococcus neoformans (Filobasidiella neoformans)
                    96       9574  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    97       9568  Aspergillus fumigatus (Sartorya fumigata)
                    98       9540  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    99       9526  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    100       9469  Trypanosoma brucei
                    101       9361  Salmo salar (Atlantic salmon)
                    102       9243  Monosiga brevicollis (Choanoflagellate)
                    103       9201  Emericella nidulans (Aspergillus nidulans)
                    104       9195  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    105       9173  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    106       9122  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    107       9096  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    108       9033  Plasmodium vivax
                    109       9015  Dictyostelium discoideum (Slime mold)
                    110       8978  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    111       8965  Thalassiosira pseudonana (Marine diatom)
                    112       8955  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    113       8912  Catenulispora acidiphila 
                    114       8875  Aspergillus clavatus
                    115       8774  Rhodococcus sp. (strain RHA1)
                    116       8743  Rabies virus
                    117       8720  Paracoccidioides brasiliensis (strain Pb18)
                    118       8708  Nannizzia otae (strain CBS 113480) (Microsporum canis) (Arthroderma otae)
                    119       8700  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    120       8603  Entamoeba dispar SAW760
                    121       8523  Stigmatella aurantiaca DW4/3-1
                    122       8437  Plesiocystis pacifica SIR-1
                    123       8299  Entamoeba histolytica
                    124       8253  Streptomyces sviceus ATCC 29083
                    125       8249  Microscilla marina ATCC 23134
                    126       8201  Microcoleus chthonoplastes PCC 7420
                    127       8196  Bradyrhizobium japonicum
                    128       8163  Frankia sp. EUN1f
                    129       8154  Burkholderia xenovorans (strain LB400)
                    130       8098  Toxoplasma gondii GT1
                    131       8032  Pseudomonas aeruginosa
                    132       8028  Trichophyton verrucosum (strain HKI 0517)
                    133       8025  Leishmania infantum
                    134       7980  Arthroderma benhamiae (strain CBS 112371) (Trichophyton mentagrophytes)
                    135       7980  Toxoplasma gondii ME49
                    136       7957  Ostreococcus tauri
                    137       7952  Rhodococcus opacus (strain B4)
                    138       7916  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    139       7891  Leishmania braziliensis
                    140       7867  Streptomyces ghanaensis ATCC 14672
                    141       7860  Paracoccidioides brasiliensis (strain Pb03)
                    142       7857  Acaryochloris marina (strain MBIC 11017)
                    143       7838  Toxoplasma gondii VEG
                    144       7823  Burkholderia sp. Ch1-1
                    145       7813  Plasmodium yoelii yoelii
                    146       7747  Uncinocarpus reesii (strain UAMH 1704)
                    147       7571  Clostridium hathewayi DSM 13479
                    148       7563  Burkholderia pseudomallei MSHR346
                    149       7523  Streptomyces lividans TK24
                    150       7520  Solibacter usitatus (strain Ellin6076)
                    151       7501  Tuber melanosporum (Perigord truffle)
                    152       7489  Streptomyces coelicolor
                    153       7475  Burkholderia pseudomallei 1710a
                    154       7465  Burkholderia pseudomallei Pakistan 9
                    155       7459  Burkholderia sp. H160
                    156       7396  Ostreococcus lucimarinus (strain CCE9901)
                    157       7379  Streptomyces sp. ACT-1
                    158       7367  Burkholderia pseudomallei 576
                    159       7349  Burkholderia pseudomallei 305
                    160       7337  Streptomyces clavuligerus ATCC 27064
                    161       7310  Frankia sp. EuI1c
                    162       7274  Clostridium bolteae ATCC BAA-613
                    163       7243  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    164       7237  Streptomyces avermitilis
                    165       7232  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    166       7211  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    167       7189  Medicago truncatula (Barrel medic)
                    168       7179  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    169       7149  Giardia lamblia ATCC 50803
                    170       7140  Burkholderia pseudomallei 1106b
                    171       7132  Burkholderia phymatum (strain DSM 17167 / STM815)
                    172       7124  Burkholderia ambifaria MEX-5
                    173       7119  Leishmania major
                    174       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    175       7017  Myxococcus xanthus (strain DK 1622)
                    176       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    177       6979  Rhodopirellula baltica
                    178       6967  Frankia sp. (strain EAN1pec)
                    179       6943  Streptomyces sp. Mg1
                    180       6940  Kribbella flavida (strain DSM 17836 / JCM 10339 / NBRC 14399)
                    181       6923  Burkholderia ambifaria IOP40-10
                    182       6913  Saccharopolyspora erythraea (strain NRRL 23338)
                    183       6911  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    184       6892  Streptomyces roseosporus NRRL 15998
                    185       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    186       6867  Spirosoma linguale (strain ATCC 33905 / DSM 74 / LMG 10896)
                    187       6866  Burkholderia sp. CCGE1002
                    188       6860  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    189       6817  Clostridium asparagiforme DSM 15981
                    190       6817  Rhizobium loti (Mesorhizobium loti)
                    191       6772  Burkholderia pseudomallei (strain 1106a)
                    192       6744  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    193       6740  Burkholderia pseudomallei (strain 668)
                    194       6725  Burkholderia graminis C4D1M
                    195       6714  Rhizobium leguminosarum bv. viciae (strain 3841)
                    196       6712  Rhodococcus erythropolis SK121
                    197       6705  Chthoniobacter flavus Ellin428
                    198       6702  Streptomyces flavogriseus ATCC 33331
                    199       6692  Bacillus thuringiensis IBL 200
                    200       6686  Hepatitis C virus subtype 1a
                    201       6684  Haliangium ochraceum (strain DSM 14365 / JCM 11303 / SMP-2)
                    202       6679  Mesorhizobium opportunistum WSM2075
                    203       6662  Burkholderia pseudomallei S13
                    204       6657  Burkholderia cepacia (strain J2315 / LMG 16656) (Burkholderia cenocepacia 
                    205       6655  Bacillus thuringiensis IBL 4222
                    206       6655  Sus scrofa (Pig)
                    207       6644  Beggiatoa sp. PS
                    208       6642  Streptococcus pneumoniae
                    209       6627  Burkholderia cenocepacia (strain MC0-3)
                    210       6614  Burkholderia multivorans CGD2
                    211       6613  Burkholderia pseudomallei Pasteur 52237
                    212       6606  Burkholderia multivorans CGD2M
                    213       6583  Bacillus thuringiensis serovar sotto str. T04001
                    214       6527  Burkholderia multivorans CGD1
                    215       6521  Streptomyces sp. ACTE
                    216       6514  Frankia alni (strain ACN14a)
                    217       6498  bacterium Ellin514
                    218       6497  Burkholderia cenocepacia (strain HI2424)
                    219       6488  Bacillus thuringiensis serovar monterrey BGSC 4AJ1
                    220       6463  Planctomyces maris DSM 8797
                    221       6453  Mycobacterium parascrofulaceum ATCC BAA-614
                    222       6429  Streptomyces sp. SPB74
                    223       6427  Agrobacterium radiobacter (strain K84 / ATCC BAA-868)
                    224       6417  Methylobacterium sp. (strain 4-46)
                    225       6413  Cyanothece sp. CCY0110
                    226       6390  Ustilago maydis (Smut fungus)
                    227       6388  Bradyrhizobium sp. (strain ORS278)
                    228       6379  Stackebrandtia nassauensis 
                    229       6377  Clostridium carboxidivorans P7
                    230       6372  Micromonospora aurantiaca ATCC 27029
                    231       6360  Rhizobium meliloti (Sinorhizobium meliloti)
                    232       6356  'Nostoc azollae' 0708
                    233       6347  Micromonospora sp. L5
                    234       6342  uncultured archaeon
                    235       6336  Burkholderia ambifaria (strain MC40-6)
                    236       6322  Bacillus thuringiensis serovar thuringiensis str. T01001
                    237       6309  Hahella chejuensis (strain KCTC 2396)
                    238       6305  Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
                    239       6298  Bacillus thuringiensis Bt407
                    240       6294  Burkholderia pseudomallei 406e
                    241       6290  Nostoc punctiforme (strain ATCC 29133 / PCC 73102)
                    242       6288  Burkholderia pseudomallei 1655
                    243       6272  Labrenzia aggregata IAM 12614
                    244       6252  Clostridiales bacterium 1_7_47FAA
                    245       6242  Bacillus thuringiensis serovar berliner ATCC 10792
                    246       6237  Geobacillus sp. (strain Y412MC10)
                    247       6233  Rhodococcus erythropolis (strain PR4 / NBRC 100887)
                    248       6212  Candida tropicalis (strain ATCC MYA-3404 / T1) (Yeast)
                    249       6210  Burkholderia ambifaria (strain ATCC BAA-244 / AMMD) (Burkholderia cepacia 
                    250       6206  Paenibacillus sp. (strain JDR-2)
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          218934 (  2%)
                    Bacteria        7111911 ( 62%)
                    Eukaryota       3085696 ( 27%)
                    Viruses          969142 (  9%)
                    Other             12274 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  74461 (  2%)           (  1%)
                    Other Mammalia        195097 (  6%)           (  2%)
                    Other Vertebrata      297068 ( 10%)           (  3%)
                    Viridiplantae         714366 ( 23%)           (  6%)
                    Fungi                 648462 ( 21%)           (  6%)
                    Insecta               424405 ( 14%)           (  4%)
                    Nematoda               61820 (  2%)           (  1%)
                    Other                 670017 ( 22%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  248050             1001-1100    67037
                    51- 100  908032             1101-1200    47435
                    101- 150 1044597             1201-1300    32326
                    151- 200 1008957             1301-1400    21402
                    201- 250 1011058             1401-1500    17225
                    251- 300  978902             1501-1600    12398
                    301- 350  889005             1601-1700     9121
                    351- 400  690659             1701-1800     7202
                    401- 450  580423             1801-1900     5817
                    451- 500  485220             1901-2000     4887
                    501- 550  331915             2001-2100     3977
                    551- 600  255076             2101-2200     4120
                    601- 650  185182             2201-2300     3245
                    651- 700  144168             2301-2400     2555
                    701- 750  124000             2401-2500     2201
                    751- 800  110912             >2500        19289
                    801- 850   82204
                    851- 900   74464
                    901- 950   50989
                    951-1000   38975
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   321 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    13674519                1.20                                                    
                    Submitted to EMBL/GenBank/DDBJ   8084086   7195206      0.71                                                    
                    Journal                          5449959   4966456      0.48                                                    
                    Submitted to other databases       32342     32317     <0.01                                                    
                    Thesis                              7352      7295     <0.01                                                    
                    Book citation                       5149      5098     <0.01                                                    
                    Other                              95631     95386      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 293059
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                       9062745                0.80                                                    
                    CATALYTIC ACTIVITY                849284    774493      0.07     4                                              
                    CAUTION                          2807260   2807260      0.25     1                                              
                    COFACTOR                          255820    248528      0.02     8                                              
                    DOMAIN                              6822      6822     <0.01    10                                              
                    FUNCTION                          984338    908363      0.09     3                                              
                    INTERACTION                         5003      5003     <0.01    11                                              
                    MISCELLANEOUS                      23993     23989     <0.01     9                                              
                    PATHWAY                           354202    325448      0.03     7                                              
                    SIMILARITY                       2759159   2371036      0.24     2                                              
                    SUBCELLULAR LOCATION              649975    648404      0.06     5                                              
                    SUBUNIT                           366889    366359      0.03     6                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       3893275                0.34                                                    
                    CHAIN                             422012    330647      0.04     2                                              
                    NON_TER                          3196268   1893430      0.28     1                                              
                    SIGNAL                            274410    274159      0.02     3                                              
                    TRANSIT                              585       585     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             128941331               11.31                                                    
                    AGD                                 3867      3867     <0.01    73   Organism-specific databases                
                    ANU-2DPAGE                            57        57     <0.01    91   2D gel databases                           
                    ArachnoServer                        368       368     <0.01    84   Organism-specific databases                
                    ArrayExpress                       94526     94513      0.01    46   Gene expression databases                  
                    BRENDA                              2902      2834     <0.01    74   Enzyme and pathway databases               
                    Bgee                              129974    129872      0.01    43   Gene expression databases                  
                    BioCyc                            794988    769782      0.07    23   Enzyme and pathway databases               
                    CAZy                               74851     70333      0.01    48   Protein family/group databases             
                    CGD                                 6800      6800     <0.01    70   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     5         5     <0.01    95   2D gel databases                           
                    CTD                               161869    161171      0.01    40   Organism-specific databases                
                    CYGD                                   5         5     <0.01    96   Organism-specific databases                
                    DIP                                 2581      2576     <0.01    75   Protein-protein interaction databases      
                    EMBL                            12673634  11381754      1.11     3   Sequence databases                         
                    Ensembl                           311317    188176      0.03    31   Genome annotation databases                
                    EnsemblBacteria                   478019    448557      0.04    25   Genome annotation databases                
                    EnsemblFungi                       88269     88179      0.01    47   Genome annotation databases                
                    EnsemblMetazoa                    290551    251029      0.03    32   Genome annotation databases                
                    EnsemblPlants                     143055    130200      0.01    42   Genome annotation databases                
                    EnsemblProtists                    15268     15158     <0.01    61   Genome annotation databases                
                    EuPathDB                          151376    151376      0.01    41   Organism-specific databases                
                    FlyBase                           195400    193870      0.02    36   Organism-specific databases                
                    GO                              22731378   6948945      1.99     2   Ontologies                                 
                    Gene3D                           3468781   2932783      0.30    10   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01    99   Organism-specific databases                
                    GeneID                           4778158   4603533      0.42     7   Genome annotation databases                
                    Genevestigator                    102549    102538      0.01    45   Gene expression databases                  
                    GenoList                           14765     14492     <0.01    62   Organism-specific databases                
                    GenomeReviews                    3221376   3138206      0.28    11   Genome annotation databases                
                    Gramene                            69007     69007      0.01    50   Organism-specific databases                
                    H-InvDB                              536       439     <0.01    82   Organism-specific databases                
                    HAMAP                             457521    455547      0.04    26   Family and domain databases                
                    HGNC                               53555     51705     <0.01    52   Organism-specific databases                
                    HOGENOM                          2204104   2204031      0.19    16   Phylogenomic databases                     
                    HOVERGEN                          319805    319130      0.03    30   Phylogenomic databases                     
                    HSSP                              254941    254665      0.02    33   3D structure databases                     
                    IPI                               222307    222305      0.02    34   Sequence databases                         
                    InParanoid                        197200    197105      0.02    35   Phylogenomic databases                     
                    IntAct                             13478     13478     <0.01    63   Protein-protein interaction databases      
                    InterPro                        22846839   8607679      2.00     1   Family and domain databases                
                    KEGG                             4187150   4092264      0.37     9   Genome annotation databases                
                    LegioList                           5143      5115     <0.01    71   Organism-specific databases                
                    Leproma                              942       941     <0.01    81   Organism-specific databases                
                    MEROPS                             66431     65199      0.01    51   Protein family/group databases             
                    MGI                                42067     42055     <0.01    56   Organism-specific databases                
                    MINT                                9193      9193     <0.01    67   Protein-protein interaction databases      
                    NMPDR                             926401    926390      0.08    22   Genome annotation databases                
                    NextBio                            48157     48154     <0.01    54   Other                                      
                    OMA                              2437490   2437488      0.21    14   Phylogenomic databases                     
                    OrthoDB                           430390    430389      0.04    27   Phylogenomic databases                     
                    PANTHER                          1784193   1680976      0.16    18   Family and domain databases                
                    PDB                                12266      7347     <0.01    65   3D structure databases                     
                    PDBsum                             11947      7156     <0.01    66   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    88   2D gel databases                           
                    PIR                               176265    143405      0.02    39   Sequence databases                         
                    PIRSF                             607127    607127      0.05    24   Family and domain databases                
                    PMAP-CutDB                           260       260     <0.01    85   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    97   2D gel databases                           
                    PRIDE                             104656    104655      0.01    44   Proteomic databases                        
                    PRINTS                           1772737   1563256      0.16    19   Family and domain databases                
                    PROSITE                          5523442   3665106      0.48     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    94   Enzyme and pathway databases               
                    PeptideAtlas                         148       148     <0.01    87   Proteomic databases                        
                    PeroxiBase                          2282      2276     <0.01    78   Protein family/group databases             
                    Pfam                            10992666   8175362      0.96     4   Family and domain databases                
                    PharmGKB                              85        85     <0.01    90   Organism-specific databases                
                    PhosphoSite                         1797      1797     <0.01    79   PTM databases                              
                    PhylomeDB                         373476    373444      0.03    29   Phylogenomic databases                     
                    ProDom                            194982    184096      0.02    38   Family and domain databases                
                    ProMEX                               450       450     <0.01    83   Proteomic databases                        
                    ProtClustDB                      2624875   2624859      0.23    13   Phylogenomic databases                     
                    PseudoCAP                           4347      4344     <0.01    72   Organism-specific databases                
                    REBASE                              7667      7399     <0.01    69   Protein family/group databases             
                    REPRODUCTION-2DPAGE                   96        95     <0.01    89   2D gel databases                           
                    RGD                                18760     18672     <0.01    60   Organism-specific databases                
                    Reactome                              55        53     <0.01    92   Enzyme and pathway databases               
                    RefSeq                           4901262   4714451      0.43     6   Sequence databases                         
                    SGD                                  249       249     <0.01    86   Organism-specific databases                
                    SMART                            2213805   1727093      0.19    15   Family and domain databases                
                    SMR                              3054227   3054218      0.27    12   3D structure databases                     
                    STRING                           1203938   1203795      0.11    20   Protein-protein interaction databases      
                    SUPFAM                           4422071   3655945      0.39     8   Family and domain databases                
                    SWISS-2DPAGE                          29        29     <0.01    93   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01    98   2D gel databases                           
                    TAIR                               19219     19136     <0.01    59   Organism-specific databases                
                    TCDB                                2301      2282     <0.01    76   Protein family/group databases             
                    TIGR                              195252    188199      0.02    37   Genome annotation databases                
                    TIGRFAMs                         2182641   1990890      0.19    17   Family and domain databases                
                    TubercuList                         2300      2294     <0.01    77   Organism-specific databases                
                    UCSC                               50891     50891     <0.01    53   Genome annotation databases                
                    UniGene                           396228    363299      0.03    28   Sequence databases                         
                    VectorBase                         47583     47115     <0.01    55   Genome annotation databases                
                    World-2DPAGE                         947       942     <0.01    80   2D gel databases                           
                    WormBase                           41251     41127     <0.01    57   Organism-specific databases                
                    Xenbase                            12464     12445     <0.01    64   Organism-specific databases                
                    ZFIN                               21355     21350     <0.01    58   Organism-specific databases                
                    dictyBase                           8164      8163     <0.01    68                                              
                    eggNOG                           1149841   1149841      0.10    21                                              
                    euHCVdb                            71269     71266      0.01    49                                              
                    
                    Number of explicitly cross-referenced databases: 126
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.59   Gln (Q) 3.87   Leu (L) 9.81   Ser (S) 6.69
                    Arg (R) 5.46   Glu (E) 6.14   Lys (K) 5.30   Thr (T) 5.62
                    Asn (N) 4.17   Gly (G) 7.10   Met (M) 2.45   Trp (W) 1.31
                    Asp (D) 5.30   His (H) 2.19   Phe (F) 4.03   Tyr (Y) 3.07
                    Cys (C) 1.27   Ile (I) 6.03   Pro (P) 4.73   Val (V) 6.73
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.04
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Lys, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 326834
                    Total number of entries encoded on a Plasmid: 168999
                    Total number of entries encoded on a Plastid: 9999
                    Total number of entries encoded on a Plastid; Apicoplast: 335
                    Total number of entries encoded on a Plastid; Chloroplast: 117965
                    Total number of entries encoded on a Plastid; Cyanelle: 7
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 441