Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
                    UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2010_04 STATISTICS
                    
                    
                    1.  INTRODUCTION
                    
                    Release 2010_04 of 23-Mar-2010 of UniProtKB/TrEMBL contains 10618387 sequence entries,
                    comprising 3423871800 amino acids .
                    
                    128150 sequences have been added since release 40.15, the sequence data of
                    542 existing entries has been updated and the annotations of
                    1846108 entries have been revised. This represents an increase of 1%.
                    
                    Number of fragments: 1821744
                    
                    Protein existence (PE):              entries      %
                    1: Evidence at protein level           17277     0.16%
                    2: Evidence at transcript level       459682     4.33%
                    3: Inferred from homology            2140790    20.16%
                    4: Predicted                         8000638    75.35%
                    5: Uncertain                               0     0.00%
                    
                    The growth of the database is summarized below.
                    
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 232037
                    
                    The first twenty species represent 1172650 sequences:    11 % of the
                    total number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:10052
                    2x:40872
                    3x:22634
                    4x:13664
                    5x: 8343
                    6x: 6104
                    7x: 4280
                    8x: 3492
                    9x: 2814
                    10x: 4360
                    11- 20x:14220
                    21- 50x: 5044
                    51-100x: 1936
                    >100x: 3753
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     324879  Human immunodeficiency virus 1
                    2      95731  Oryza sativa subsp. japonica (Rice)
                    3      77399  Homo sapiens (Human)
                    4      56162  Hepatitis C virus
                    5      50804  Vitis vinifera (Grape)
                    6      50402  Trichomonas vaginalis
                    7      44033  Populus trichocarpa (Western balsam poplar) 
                    8      43064  uncultured bacterium
                    9      42846  Mus musculus (Mouse)
                    10      42368  Arabidopsis thaliana (Mouse-ear cress)
                    11      41872  Zea mays (Maize)
                    12      39843  Paramecium tetraurelia
                    13      39281  Oryza sativa subsp. indica (Rice)
                    14      36885  Hepatitis B virus (HBV)
                    15      34761  Physcomitrella patens subsp. patens
                    16      33625  Sorghum bicolor (Sorghum) (Sorghum vulgare)
                    17      31213  Ricinus communis (Castor bean)
                    18      30325  Drosophila melanogaster (Fruit fly)
                    19      29072  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    20      28085  Tetraodon nigroviridis (Green puffer)
                    21      26550  Danio rerio (Zebrafish) (Brachydanio rerio)
                    22      24831  Nematostella vectensis (Starlet sea anemone)
                    23      23115  Perkinsus marinus ATCC 50983
                    24      21081  Ixodes scapularis (Black-legged tick) (Deer tick)
                    25      20920  Caenorhabditis elegans
                    26      20632  Trypanosoma cruzi
                    27      18809  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    28      18095  Caenorhabditis briggsae
                    29      17862  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    30      17786  Ailuropoda melanoleuca (Giant panda)
                    31      17610  Phytophthora infestans T30-4
                    32      17449  Drosophila simulans (Fruit fly)
                    33      17385  Escherichia coli
                    34      16909  Drosophila yakuba (Fruit fly)
                    35      16757  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    36      16723  Drosophila persimilis (Fruit fly)
                    37      16272  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    38      16258  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    39      16200  Drosophila sechellia (Fruit fly)
                    40      15963  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    41      15874  Phaeosphaeria nodorum (Glume blotch fungus) (Septoria nodorum)
                    42      15717  Naegleria gruberi (Amoeba)
                    43      15676  Nectria haematococca (strain 77-13-4 / FGSC 9596 / MPVI) 
                    44      15434  Drosophila willistoni (Fruit fly)
                    45      15251  Tetrahymena thermophila SB210
                    46      15157  Drosophila ananassae (Fruit fly)
                    47      14944  Drosophila erecta (Fruit fly)
                    48      14794  Drosophila mojavensis (Fruit fly)
                    49      14771  Anopheles gambiae (African malaria mosquito)
                    50      14762  Chlamydomonas reinhardtii
                    51      14708  Drosophila virilis (Fruit fly)
                    52      14670  Plasmodium chabaudi
                    53      14669  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    54      14276  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    55      13811  Candida albicans (Yeast)
                    56      13709  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    57      13473  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    58      13457  Aspergillus flavus 
                    59      13436  Schistosoma mansoni (Blood fluke)
                    60      12980  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
                    61      12733  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
                    62      12715  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
                    63      12515  Xenopus laevis (African clawed frog)
                    64      12430  Glycine max (Soybean)
                    65      12340  Polysphondylium pallidum PN500
                    66      12034  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    67      11905  Aspergillus oryzae
                    68      11796  Plasmodium berghei
                    69      11572  Trichoplax adhaerens
                    70      11496  Brugia malayi (Filarial nematode worm)
                    71      11475  Hepatitis C virus subtype 1b
                    72      10941  Sordaria macrospora
                    73      10895  Schistosoma japonicum (Blood fluke)
                    74      10869  Chaetomium globosum (Soil fungus)
                    75      10693  Podospora anserina
                    76      10653  Ralstonia solanacearum (Pseudomonas solanacearum)
                    77      10519  Aspergillus nidulans FGSC A4
                    78      10421  Neurospora crassa
                    79      10404  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
                    80      10335  Phaeodactylum tricornutum CCAP 1055/1
                    81      10279  Micromonas pusilla CCMP1545
                    82      10233  Verticillium albo-atrum (strain VaMs.102) (Verticillium wilt)
                    83      10225  Plasmodium falciparum
                    84      10204  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
                    85      10193  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
                    86      10115  Micromonas sp. RCC299
                    87      10091  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    88       9834  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    89       9834  Bos taurus (Bovine)
                    90       9666  Trypanosoma brucei gambiense DAL972
                    91       9643  Aspergillus fumigatus (Sartorya fumigata)
                    92       9637  Cryptococcus neoformans (Filobasidiella neoformans)
                    93       9602  Helicobacter pylori (Campylobacter pylori)
                    94       9599  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    95       9575  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
                    96       9541  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
                    97       9527  Ajellomyces dermatitidis (strain ER-3) (Blastomyces dermatitidis)
                    98       9470  Trypanosoma brucei
                    99       9358  Salmo salar (Atlantic salmon)
                    100       9328  Emericella nidulans (Aspergillus nidulans)
                    101       9245  Monosiga brevicollis (Choanoflagellate)
                    102       9196  Ajellomyces capsulata (strain ATCC 26029 / G186AR / H82 / RMSCC 2432)  
                    103       9174  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    104       9122  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    105       9097  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
                    106       9076  Dictyostelium discoideum (Slime mold)
                    107       8978  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
                    108       8960  Rattus norvegicus (Rat)
                    109       8955  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
                    110       8924  Aspergillus clavatus
                    111       8912  Catenulispora acidiphila 
                    112       8775  Rhodococcus sp. (strain RHA1)
                    113       8721  Paracoccidioides brasiliensis (strain Pb18)
                    114       8712  Nannizzia otae (strain CBS 113480) (Microsporum canis) (Arthroderma otae)
                    115       8700  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
                    116       8603  Entamoeba dispar SAW760
                    117       8523  Stigmatella aurantiaca DW4/3-1
                    118       8439  Plasmodium vivax
                    119       8437  Plesiocystis pacifica SIR-1
                    120       8381  Rabies virus
                    121       8253  Streptomyces sviceus ATCC 29083
                    122       8249  Microscilla marina ATCC 23134
                    123       8201  Microcoleus chthonoplastes PCC 7420
                    124       8172  Bradyrhizobium japonicum
                    125       8163  Frankia sp. EUN1f
                    126       8154  Burkholderia xenovorans (strain LB400)
                    127       8116  Picea sitchensis (Sitka spruce)
                    128       8098  Toxoplasma gondii GT1
                    129       8026  Leishmania infantum
                    130       7988  Pseudomonas aeruginosa
                    131       7980  Toxoplasma gondii ME49
                    132       7963  Thalassiosira pseudonana (Marine diatom)
                    133       7958  Ostreococcus tauri
                    134       7953  Rhodococcus opacus (strain B4)
                    135       7952  Entamoeba histolytica HM-1:IMSS
                    136       7916  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
                    137       7889  Leishmania braziliensis
                    138       7861  Paracoccidioides brasiliensis (strain Pb03)
                    139       7857  Acaryochloris marina (strain MBIC 11017)
                    140       7838  Toxoplasma gondii VEG
                    141       7813  Plasmodium yoelii yoelii
                    142       7748  Uncinocarpus reesii (strain UAMH 1704)
                    143       7571  Clostridium hathewayi DSM 13479
                    144       7563  Burkholderia pseudomallei MSHR346
                    145       7520  Solibacter usitatus (strain Ellin6076)
                    146       7489  Streptomyces coelicolor
                    147       7475  Burkholderia pseudomallei 1710a
                    148       7465  Burkholderia pseudomallei Pakistan 9
                    149       7459  Burkholderia sp. H160
                    150       7397  Ostreococcus lucimarinus (strain CCE9901)
                    151       7379  Streptomyces sp. ACT-1
                    152       7367  Burkholderia pseudomallei 576
                    153       7349  Burkholderia pseudomallei 305
                    154       7310  Frankia sp. EuI1c
                    155       7274  Clostridium bolteae ATCC BAA-613
                    156       7243  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    157       7237  Streptomyces avermitilis
                    158       7232  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
                    159       7225  Burkholderia sp. CCGE1002
                    160       7212  Coccidioides posadasii (strain C735) (Valley fever fungus)
                    161       7179  Chitinophaga pinensis (strain ATCC 43595 / DSM 2588 / NCIB 11800 / UQM 2034)
                    162       7171  Medicago truncatula (Barrel medic)
                    163       7140  Burkholderia pseudomallei 1106b
                    164       7132  Burkholderia phymatum (strain DSM 17167 / STM815)
                    165       7125  Rhizobium loti (Mesorhizobium loti)
                    166       7124  Burkholderia ambifaria MEX-5
                    167       7120  Leishmania major
                    168       7033  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
                    169       7017  Myxococcus xanthus (strain DK 1622)
                    170       6985  Rhizobium leguminosarum bv. trifolii (strain WSM1325)
                    171       6979  Rhodopirellula baltica
                    172       6967  Frankia sp. (strain EAN1pec)
                    173       6943  Streptomyces sp. Mg1
                    174       6940  Kribbella flavida DSM 17836
                    175       6923  Burkholderia ambifaria IOP40-10
                    176       6913  Saccharopolyspora erythraea (strain NRRL 23338)
                    177       6911  Actinosynnema mirum (strain ATCC 29888 / DSM 43827 / NBRC 14064 / IMRU 3971)
                    178       6882  Burkholderia multivorans (strain ATCC 17616 / 249)
                    179       6867  Spirosoma linguale DSM 74
                    180       6862  Burkholderia phytofirmans (strain DSM 17436 / PsJN)
                    181       6817  Clostridium asparagiforme DSM 15981
                    182       6772  Burkholderia pseudomallei (strain 1106a)
                    183       6744  Streptomyces griseus subsp. griseus (strain JCM 4626 / NBRC 13350)
                    184       6740  Burkholderia pseudomallei (strain 668)
                    185       6725  Burkholderia graminis C4D1M
                    186       6714  Rhizobium leguminosarum bv. viciae (strain 3841)
                    187       6712  Rhodococcus erythropolis SK121
                    188       6705  Chthoniobacter flavus Ellin428
                    189       6702  Streptomyces flavogriseus ATCC 33331
                    190       6692  Bacillus thuringiensis IBL 200
                    191       6684  Haliangium ochraceum (strain DSM 14365 / JCM 11303 / SMP-2)
                    192       6679  Mesorhizobium opportunistum WSM2075
                    193       6662  Burkholderia pseudomallei S13
                    194       6657  Burkholderia cepacia (strain J2315 / LMG 16656) (Burkholderia cenocepacia 
                    195       6655  Bacillus thuringiensis IBL 4222
                    196       6644  Beggiatoa sp. PS
                    197       6627  Burkholderia cenocepacia (strain MC0-3)
                    198       6614  Burkholderia multivorans CGD2
                    199       6613  Burkholderia pseudomallei Pasteur 52237
                    200       6606  Burkholderia multivorans CGD2M
                    201       6583  Bacillus thuringiensis serovar sotto str. T04001
                    202       6567  Stackebrandtia nassauensis DSM 44728
                    203       6563  Streptococcus pneumoniae
                    204       6527  Burkholderia multivorans CGD1
                    205       6521  Streptomyces sp. ACTE
                    206       6514  Frankia alni (strain ACN14a)
                    207       6498  bacterium Ellin514
                    208       6497  Burkholderia cenocepacia (strain HI2424)
                    209       6496  Sus scrofa (Pig)
                    210       6488  Bacillus thuringiensis serovar monterrey BGSC 4AJ1
                    211       6463  Planctomyces maris DSM 8797
                    212       6462  Streptomyces clavuligerus ATCC 27064
                    213       6427  Agrobacterium radiobacter (strain K84 / ATCC BAA-868)
                    214       6417  Methylobacterium sp. (strain 4-46)
                    215       6413  Cyanothece sp. CCY0110
                    216       6390  Ustilago maydis (Smut fungus)
                    217       6388  Bradyrhizobium sp. (strain ORS278)
                    218       6372  Micromonospora aurantiaca ATCC 27029
                    219       6356  Rhizobium meliloti (Sinorhizobium meliloti)
                    220       6356  'Nostoc azollae' 0708
                    221       6347  Micromonospora sp. L5
                    222       6336  Burkholderia ambifaria (strain MC40-6)
                    223       6322  Bacillus thuringiensis serovar thuringiensis str. T01001
                    224       6322  Giardia lamblia ATCC 50803
                    225       6318  Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
                    226       6309  Hahella chejuensis (strain KCTC 2396)
                    227       6298  Bacillus thuringiensis Bt407
                    228       6294  Burkholderia pseudomallei 406e
                    229       6290  Nostoc punctiforme (strain ATCC 29133 / PCC 73102)
                    230       6288  Burkholderia pseudomallei 1655
                    231       6272  Labrenzia aggregata IAM 12614
                    232       6252  Clostridiales bacterium 1_7_47FAA
                    233       6242  Bacillus thuringiensis serovar berliner ATCC 10792
                    234       6237  Geobacillus sp. Y412MC10
                    235       6234  Rhodococcus erythropolis (strain PR4 / NBRC 100887)
                    236       6214  Candida tropicalis (strain ATCC MYA-3404 / T1) (Yeast)
                    237       6210  Burkholderia ambifaria (strain ATCC BAA-244 / AMMD) (Burkholderia cepacia 
                    238       6206  Paenibacillus sp. (strain JDR-2)
                    239       6184  Methylobacterium extorquens (strain ATCC 14718 / DSM 1338 / AM1)
                    240       6172  Methylobacterium radiotolerans (strain ATCC 27329 / DSM 1819 / JCM 2831)
                    241       6162  Oryza sativa (Rice)
                    242       6154  Ralstonia eutropha  (Cupriavidus necator 
                    243       6129  Bacillus thuringiensis serovar israelensis ATCC 35646
                    244       6110  Lyngbya sp. PCC 8106
                    245       6097  Hepatitis C virus subtype 1a
                    246       6092  Rhizobium leguminosarum bv. trifolii (strain WSM2304)
                    247       6081  Ralstonia metallidurans (strain CH34 / ATCC 43123 / DSM 2839)
                    248       6073  Burkholderia sp. CCGE1001
                    249       6072  Bacillus anthracis
                    250       6069  Gallus gallus (Chicken)
                    
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          209426 (  2%)
                    Bacteria        6473479 ( 61%)
                    Eukaryota       3008585 ( 28%)
                    Viruses          916040 (  9%)
                    Other             10856 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  77421 (  3%)           (  1%)
                    Other Mammalia        172123 (  6%)           (  2%)
                    Other Vertebrata      287692 ( 10%)           (  3%)
                    Viridiplantae         722718 ( 24%)           (  7%)
                    Fungi                 623060 ( 21%)           (  6%)
                    Insecta               403166 ( 13%)           (  4%)
                    Nematoda               61319 (  2%)           (  1%)
                    Other                 661086 ( 22%)           (  6%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  229129             1001-1100    63217
                    51- 100  835723             1101-1200    44702
                    101- 150  964328             1201-1300    30634
                    151- 200  932766             1301-1400    20330
                    201- 250  933574             1401-1500    16375
                    251- 300  902949             1501-1600    11790
                    301- 350  820898             1601-1700     8725
                    351- 400  640113             1701-1800     6920
                    401- 450  537206             1801-1900     5550
                    451- 500  449194             1901-2000     4662
                    501- 550  309247             2001-2100     3804
                    551- 600  237090             2101-2200     3918
                    601- 650  173068             2201-2300     3081
                    651- 700  134825             2301-2400     2445
                    701- 750  115581             2401-2500     2127
                    751- 800  103279             >2500        18631
                    801- 850   76883
                    851- 900   69038
                    901- 950   47972
                    951-1000   36869
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   322 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    4.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    12988769                1.22                                                    
                    Submitted to EMBL/GenBank/DDBJ   7674944   6794675      0.72                                                    
                    Journal                          5194160   4690906      0.49                                                    
                    Submitted to other databases       16556     16548     <0.01                                                    
                    Thesis                              7388      7331     <0.01                                                    
                    Book citation                       5123      5072     <0.01                                                    
                    Other                              90598     90181      0.01                                                    
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 290306
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Comments (CC)                       7984688                0.75                                                    
                    CATALYTIC ACTIVITY                704958    649607      0.07     4                                              
                    CAUTION                          2733850   2733850      0.26     1                                              
                    COFACTOR                          217667    212257      0.02     8                                              
                    DOMAIN                              2823      2823     <0.01    10                                              
                    FUNCTION                          770682    708327      0.07     3                                              
                    INTERACTION                         2463      2463     <0.01    11                                              
                    MISCELLANEOUS                      19924     19921     <0.01     9                                              
                    PATHWAY                           250347    229912      0.02     7                                              
                    SIMILARITY                       2583638   2231584      0.24     2                                              
                    SUBCELLULAR LOCATION              440787    440742      0.04     5                                              
                    SUBUNIT                           257549    257545      0.02     6                                              
                    
                    Total number of comment topics: 11
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ---------------------------------  -------- ---------  ---------  ----
                    
                    Features (FT)                       3723331                0.35                                                    
                    CHAIN                             407163    321155      0.04     2                                              
                    NON_TER                          3051266   1820218      0.29     1                                              
                    SIGNAL                            264317    264317      0.02     3                                              
                    TRANSIT                              585       585     <0.01     4                                              
                    
                    Total number of feature keys: 4
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank  Category
                    ---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
                    Cross-references (DR)             119591141               11.26                                                    
                    AGD                                 3871      3871     <0.01    68   Organism-specific databases                
                    ANU-2DPAGE                            58        58     <0.01    86   2D gel databases                           
                    ArachnoServer                         77        77     <0.01    85   Organism-specific databases                
                    ArrayExpress                       95202     95189      0.01    43   Gene expression databases                  
                    BRENDA                              2925      2852     <0.01    70   Enzyme and pathway databases               
                    Bgee                              108695    108582      0.01    40   Gene expression databases                  
                    BioCyc                            797073    771835      0.08    23   Enzyme and pathway databases               
                    CAZy                               36259     33900     <0.01    52   Protein family/group databases             
                    CGD                                 6805      6805     <0.01    63   Organism-specific databases                
                    COMPLUYEAST-2DPAGE                     1         1     <0.01    95   2D gel databases                           
                    CTD                               151908    151202      0.01    38   Organism-specific databases                
                    CYGD                                   6         6     <0.01    91   Organism-specific databases                
                    DIP                                 2587      2582     <0.01    71   Protein-protein interaction databases      
                    EMBL                            11826930  10612450      1.11     3   Sequence databases                         
                    Ensembl                           347604    215321      0.03    29   Genome annotation databases                
                    EuPathDB                          151379    151379      0.01    39   Organism-specific databases                
                    FlyBase                           195584    194052      0.02    35   Organism-specific databases                
                    GO                              20986874   6407073      1.98     1   Ontologies                                 
                    Gene3D                           3225680   2727330      0.30    10   Family and domain databases                
                    GeneDB_Spombe                          1         1     <0.01    94   Organism-specific databases                
                    GeneID                           4555530   4449775      0.43     7   Genome annotation databases                
                    Genevestigator                    103275    103264      0.01    41   Gene expression databases                  
                    GenoList                           14766     14493     <0.01    57   Organism-specific databases                
                    GenomeReviews                    3083521   3001468      0.29    11   Genome annotation databases                
                    Gramene                            69136     69136      0.01    44   Organism-specific databases                
                    H-InvDB                              543       443     <0.01    77   Organism-specific databases                
                    HAMAP                             353188    351571      0.03    28   Family and domain databases                
                    HGNC                               57568     53298      0.01    47   Organism-specific databases                
                    HOGENOM                          2206290   2206216      0.21    15   Phylogenomic databases                     
                    HOVERGEN                          321543    320841      0.03    30   Phylogenomic databases                     
                    HSSP                              255522    255241      0.02    31   3D structure databases                     
                    IPI                               207747    207746      0.02    32   Sequence databases                         
                    InParanoid                        197880    197784      0.02    33   Phylogenomic databases                     
                    IntAct                             13480     13480     <0.01    58   Protein-protein interaction databases      
                    InterPro                        20689040   7891014      1.95     2   Family and domain databases                
                    KEGG                             4045597   3951165      0.38     9   Genome annotation databases                
                    LegioList                           5143      5115     <0.01    65   Organism-specific databases                
                    Leproma                              943       942     <0.01    76   Organism-specific databases                
                    MEROPS                             67375     66091      0.01    46   Protein family/group databases             
                    MGI                                37569     37312     <0.01    51   Organism-specific databases                
                    MINT                                4469      4469     <0.01    66   Protein-protein interaction databases      
                    NMPDR                             928330    928319      0.09    22   Genome annotation databases                
                    NextBio                            48608     48605     <0.01    49   Other                                      
                    OMA                              2439980   2439978      0.23    14   Phylogenomic databases                     
                    OrthoDB                           431486    431485      0.04    25   Phylogenomic databases                     
                    PANTHER                          1654992   1559393      0.16    19   Family and domain databases                
                    PDB                                11516      6881     <0.01    60   3D structure databases                     
                    PDBsum                              5332      3120     <0.01    64   3D structure databases                     
                    PHCI-2DPAGE                          102       102     <0.01    83   2D gel databases                           
                    PIR                               176564    143695      0.02    37   Sequence databases                         
                    PIRSF                             545630    545630      0.05    24   Family and domain databases                
                    PMAP-CutDB                           268       268     <0.01    80   Other                                      
                    PMMA-2DPAGE                            3         3     <0.01    92   2D gel databases                           
                    PRIDE                             101141    101141      0.01    42   Proteomic databases                        
                    PRINTS                           1683918   1477702      0.16    18   Family and domain databases                
                    PROSITE                          5095867   3365723      0.48     5   Family and domain databases                
                    Pathway_Interaction_DB                11         9     <0.01    89   Enzyme and pathway databases               
                    PeptideAtlas                         149       149     <0.01    82   Proteomic databases                        
                    PeroxiBase                          2287      2281     <0.01    73   Protein family/group databases             
                    Pfam                             9765341   7322549      0.92     4   Family and domain databases                
                    PharmGKB                              88        88     <0.01    84   Organism-specific databases                
                    PhosphoSite                         1737      1737     <0.01    75   PTM databases                              
                    PhylomeDB                         374285    374253      0.04    27   Phylogenomic databases                     
                    ProDom                            195780    184886      0.02    34   Family and domain databases                
                    ProMEX                               457       457     <0.01    78                                              
                    ProtClustDB                      2625367   2625350      0.25    13   Phylogenomic databases                     
                    PseudoCAP                           4350      4347     <0.01    67   Organism-specific databases                
                    REBASE                              7579      7314     <0.01    62   Protein family/group databases             
                    REPRODUCTION-2DPAGE                    8         8     <0.01    90   2D gel databases                           
                    RGD                                 3555      3549     <0.01    69   Organism-specific databases                
                    Reactome                              56        53     <0.01    87   Enzyme and pathway databases               
                    RefSeq                           4601862   4486340      0.43     6   Sequence databases                         
                    SGD                                  251       251     <0.01    81   Organism-specific databases                
                    SMART                            2078460   1620595      0.20    16   Family and domain databases                
                    SMR                              3057273   3057263      0.29    12   3D structure databases                     
                    STRING                           1205299   1205150      0.11    20   Protein-protein interaction databases      
                    SUPFAM                           4314151   3545063      0.41     8   Family and domain databases                
                    SWISS-2DPAGE                          28        28     <0.01    88   2D gel databases                           
                    Siena-2DPAGE                           2         2     <0.01    93   2D gel databases                           
                    TAIR                               19429     19346     <0.01    56   Organism-specific databases                
                    TCDB                                2200      2181     <0.01    74   Protein family/group databases             
                    TIGR                              195320    188268      0.02    36   Genome annotation databases                
                    TIGRFAMs                         1976636   1808228      0.19    17   Family and domain databases                
                    TubercuList                         2340      2334     <0.01    72   Organism-specific databases                
                    UCSC                               53841     53761      0.01    48   Genome annotation databases                
                    UniGene                           401818    368661      0.04    26   Sequence databases                         
                    VectorBase                         47597     47129     <0.01    50   Genome annotation databases                
                    World-2DPAGE                         406       406     <0.01    79   2D gel databases                           
                    WormBase                           19595     19499     <0.01    55   Organism-specific databases                
                    WormPep                            19605     19499     <0.01    54   Organism-specific databases                
                    Xenbase                            12895     12505     <0.01    59   Organism-specific databases                
                    ZFIN                               20300     20295     <0.01    53   Organism-specific databases                
                    dictyBase                           8236      8235     <0.01    61                                              
                    eggNOG                           1150726   1150726      0.11    21                                              
                    euHCVdb                            68440     68439      0.01    45                                              
                    
                    Number of explicitly cross-referenced databases: 125
                    
                    
                    5.  AMINO ACID COMPOSITION
                    
                    5.1  Composition in percent for the complete database
                    
                    Ala (A) 8.56   Gln (Q) 3.88   Leu (L) 9.81   Ser (S) 6.72
                    Arg (R) 5.46   Glu (E) 6.14   Lys (K) 5.31   Thr (T) 5.61
                    Asn (N) 4.18   Gly (G) 7.07   Met (M) 2.45   Trp (W) 1.31
                    Asp (D) 5.29   His (H) 2.20   Phe (F) 4.04   Tyr (Y) 3.06
                    Cys (C) 1.29   Ile (I) 6.02   Pro (P) 4.74   Val (V) 6.71
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    5.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Lys, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Trp, Cys
                    
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of entries encoded on a Mitochondrion: 302454
                    Total number of entries encoded on a Plasmid: 157243
                    Total number of entries encoded on a Plastid: 10121
                    Total number of entries encoded on a Plastid; Apicoplast: 334
                    Total number of entries encoded on a Plastid; Chloroplast: 109062
                    Total number of entries encoded on a Plastid; Cyanelle: 7
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 440