Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2014_07 STATISTICS


1.  INTRODUCTION

Release 2014_07 of 09-Jul-2014 of UniProtKB/TrEMBL contains 79824243 sequence entries,
comprising 25191011511 amino acids.

10834485 sequences have been added since release 2014_06, the sequence data of
8249 existing entries has been updated and the annotations of
15430886 entries have been revised. This represents an increase of 16%.

Number of fragments: 5980309

Protein existence (PE):              entries      %
1: Evidence at protein level          465666     0.58%
2: Evidence at transcript level       939132     1.18%
3: Inferred from homology           17797990    22.30%
4: Predicted                        60621455    75.94%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 498088

   The first twenty species represent 2439803 sequences:   3.1 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:20010
                            2x:80012
                            3x:43045
                            4x:30499
                            5x:18005
                            6x:13340
                            7x: 9588
                            8x: 7663
                            9x: 6001
                           10x:11077
                       11- 20x:39021
                       21- 50x:11770
                       51-100x: 4675
                         >100x:23292


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     595001  Human immunodeficiency virus 1
       2     352020  marine sediment metagenome
       3     222011  uncultured bacterium
       4     120080  Homo sapiens (Human)
       5     106100  Triticum aestivum (Wheat)
       6      96724  Oryza sativa subsp. japonica (Rice)
       7      96066  Hepatitis C virus
       8      86850  Hepatitis B virus (HBV)
       9      73951  Glycine max (Soybean) (Glycine hispida)
      10      73055  mine drainage metagenome
      11      70544  Hordeum vulgare var. distichum (Two-rowed barley)
      12      69517  Macaca mulatta (Rhesus macaque)
      13      67671  Phytophthora parasitica (Potato buckeye rot agent)
      14      65421  Ancylostoma ceylanicum
      15      60710  human gut metagenome
      16      60416  Zea mays (Maize)
      17      57480  Mus musculus (Mouse)
      18      56235  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      19      55018  Callithrix jacchus (White-tufted-ear marmoset)
      20      54933  Solanum tuberosum (Potato)
      21      54159  Vitis vinifera (Grape)
      22      53334  Danio rerio (Zebrafish) (Brachydanio rerio)
      23      50661  Trichomonas vaginalis
      24      49270  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      25      48911  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      26      47056  Populus trichocarpa (Western balsam poplar) 
      27      44275  Eucalyptus grandis (Flooded gum)
      28      41207  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      29      40450  Escherichia coli
      30      40083  Arabidopsis thaliana (Mouse-ear cress)
      31      39923  Reticulomyxa filosa
      32      39886  Oryza sativa subsp. indica (Rice)
      33      39852  Paramecium tetraurelia
      34      39391  Setaria italica (Foxtail millet) (Panicum italicum)
      35      38796  Mustela putorius furo (European domestic ferret) (Mustela furo)
      36      38212  Simian immunodeficiency virus (SIV)
      37      37309  Acyrthosiphon pisum (Pea aphid)
      38      37233  Drosophila melanogaster (Fruit fly)
      39      36602  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      40      35952  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      41      35672  Ailuropoda melanoleuca (Giant panda)
      42      35599  Emiliania huxleyi CCMP1516
      43      35317  Physcomitrella patens subsp. patens (Moss)
      44      35137  Caenorhabditis japonica
      45      34570  Thalassiosira oceanica (Marine diatom)
      46      34555  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      47      33870  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      48      33686  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      49      33258  Selaginella moellendorffii (Spikemoss)
      50      32772  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      51      32448  Sus scrofa (Pig)
      52      32409  Phaseolus vulgaris (Kidney bean) (French bean)
      53      32342  Oryza brachyantha
      54      32142  Oryza glaberrima (African rice)
      55      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      56      32050  Capitella teleta (Polychaete worm)
      57      31988  Anas platyrhynchos (Domestic duck) (Anas boschas)
      58      31864  Pan troglodytes (Chimpanzee)
      59      31403  Ricinus communis (Castor bean)
      60      31290  Citrus clementina
      61      30976  Daphnia pulex (Water flea)
      62      30713  Caenorhabditis brenneri (Nematode worm)
      63      30181  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      64      29845  Rhizophagus irregularis (strain DAOM 181602 / DAOM 197198 / MUCL 43194)  
      65      29815  Amphimedon queenslandica (Sponge)
      66      29495  Strongylocentrotus purpuratus (Purple sea urchin)
      67      29321  Pristionchus pacificus (Parasitic nematode)
      68      29194  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      69      29083  Oikopleura dioica (Tunicate)
      70      28875  Mimulus guttatus (Spotted monkey flower) (Yellow monkey flower)
      71      28835  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      72      28825  Capsella rubella
      73      28669  Rhizophagus irregularis DAOM 197198w
      74      28637  Prunus persica (Peach) (Amygdalus persica)
      75      28382  Eutrema salsugineum (Saltwater cress) (Sisymbrium salsugineum)
      76      28104  Gasterosteus aculeatus (Three-spined stickleback)
      77      27849  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      78      27697  Canis familiaris (Dog) (Canis lupus familiaris)
      79      27542  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      80      27539  Equus caballus (Horse)
      81      27434  Amborella trichopoda
      82      27090  Gorilla gorilla gorilla (Lowland gorilla)
      83      26921  Tetrahymena thermophila (strain SB210)
      84      26856  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      85      26770  Morus notabilis
      86      26489  Phytophthora parasitica CJ01A1
      87      26477  Phytophthora parasitica P1569
      88      26452  Phytophthora parasitica P10297
      89      26438  Phytophthora parasitica (strain INRA-310)
      90      26371  Ovis aries (Sheep)
      91      25990  Oryzias latipes (Medaka fish) (Japanese ricefish)
      92      25831  Bos taurus (Bovine)
      93      25825  Loxodonta africana (African elephant)
      94      25764  Rattus norvegicus (Rat)
      95      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      96      25025  Aphanomyces astaci
      97      24917  Nematostella vectensis (Starlet sea anemone)
      98      24590  Guillardia theta CCMP2712
      99      24211  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
     100      23804  Astyanax mexicanus (Blind cave fish) (Astyanax fasciatus mexicanus)
     101      23742  Ornithorhynchus anatinus (Duckbill platypus)
     102      23687  Lottia gigantea (Giant owl limpet)
     103      23651  Dendroctonus ponderosae (Mountain pine beetle)
     104      23565  Oxytricha trifallax
     105      23496  Latimeria chalumnae (West Indian ocean coelacanth)
     106      23369  Helobdella robusta (Californian leech)
     107      23357  Caenorhabditis elegans
     108      23318  Fusarium oxysporum f. sp. melonis 26406
     109      23271  Fusarium oxysporum f. sp. conglutinans race 2 54008
     110      23263  Fusarium oxysporum f. sp. pisi HDV247
     111      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
     112      22780  Monodelphis domestica (Gray short-tailed opossum)
     113      22754  Fusarium oxysporum f. sp. raphani 54005
     114      22564  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
     115      22525  Lepisosteus oculatus (Spotted gar)
     116      22319  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
     117      22248  Fusarium oxysporum f. sp. vasinfectum 25433
     118      22174  gut metagenome
     119      21935  Oryctolagus cuniculus (Rabbit)
     120      21713  Haemonchus contortus (Barber pole worm)
     121      21689  Fusarium oxysporum f. sp. radicis-lycopersici 26381
     122      21661  Fusarium oxysporum Fo47
     123      21549  Fusarium oxysporum f. sp. lycopersici MN25
     124      21548  Heterocephalus glaber (Naked mole rat)
     125      21544  Gallus gallus (Chicken)
     126      21398  Caenorhabditis briggsae
     127      21339  Anopheles darlingi (Mosquito)
     128      21220  Echinococcus granulosus (Hydatid tapeworm)
     129      21171  Myotis lucifugus (Little brown bat)
     130      21136  Ixodes scapularis (Black-legged tick) (Deer tick)
     131      21031  Felis catus (Cat) (Felis silvestris catus)
     132      20865  Tupaia chinensis (Chinese tree shrew)
     133      20805  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     134      20767  Fusarium oxysporum FOSC 3-a
     135      20540  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     136      20168  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     137      20115  Ciona savignyi (Pacific transparent sea squirt)
     138      20097  Cavia porcellus (Guinea pig)
     139      20061  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     140      20028  Camelus ferus (Wild Bactrian camel)
     141      19989  Callorhynchus milii (Elephant fish) (Australian ghost shark)
     142      19826  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     143      19807  Fusarium oxysporum f. sp. cubense tropical race 4 54006
     144      19702  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     145      19609  Bactrocera dorsalis (Oriental fruit fly) (Dacus dorsalis)
     146      19602  Anolis carolinensis (Green anole) (American chameleon)
     147      19594  Aphanomyces invadans
     148      19573  Brugia malayi (Filarial nematode worm)
     149      19561  Pteropus alecto (Black flying fox)
     150      19522  Wuchereria bancrofti
     151      19300  Myotis brandtii (Brandt's bat)
     152      19200  Trypanosoma cruzi (strain CL Brener)
     153      19194  Necator americanus (Human hookworm)
     154      19062  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     155      18967  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     156      18865  Drosophila simulans (Fruit fly)
     157      18600  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     158      18559  Bos mutus
     159      18488  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     160      18417  Ophiophagus hannah (King cobra) (Naja hannah)
     161      18272  Tetranychus urticae (Two-spotted spider mite)
     162      18126  Atta cephalotes (Leafcutter ant)
     163      18049  Anopheles gambiae (African malaria mosquito)
     164      18047  Saprolegnia diclina VS20
     165      17976  Moniliophthora roreri (strain MCA 2997) (Cocoa frosty pod rot fungus) 
     166      17888  Plasmodium falciparum
     167      17872  Hepatitis C virus subtype 1b
     168      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     169      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     170      17749  Bombyx mori (Silk moth)
     171      17683  Genlisea aurea
     172      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     173      17590  Gibberella moniliformis (strain M3125 / FGSC 7600)  
     174      17474  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     175      17384  Ceratitis capitata (Mediterranean fruit fly) (Tephritis capitata)
     176      17289  Nasonia vitripennis (Parasitic wasp)
     177      17104  Drosophila yakuba (Fruit fly)
     178      17094  uncultured archaeon
     179      17078  Tribolium castaneum (Red flour beetle)
     180      16949  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     181      16919  Meleagris gallopavo (Common turkey)
     182      16796  Klebsiella pneumoniae
     183      16723  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     184      16715  Drosophila persimilis (Fruit fly)
     185      16639  Fusarium oxysporum f. sp. lycopersici  
     186      16619  Rhodnius prolixus (Triatomid bug)
     187      16528  Cerapachys biroi (Ant)
     188      16430  Ectocarpus siliculosus (Brown alga)
     189      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     190      16380  Listeria monocytogenes
     191      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     192      16331  Danaus plexippus (Monarch butterfly)
     193      16276  Trichinella spiralis (Trichina worm)
     194      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     195      16218  Neovison vison (American mink) (Mustela vison)
     196      16208  Ixodes ricinus (Common tick)
     197      16193  Drosophila sechellia (Fruit fly)
     198      16192  Schistosoma japonicum (Blood fluke)
     199      16149  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     200      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     201      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     202      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     203      15718  Naegleria gruberi (Amoeba)
     204      15663  Helicobacter pylori (Campylobacter pylori)
     205      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     206      15592  Phytophthora ramorum (Sudden oak death agent)
     207      15561  Rabies virus
     208      15467  Myotis davidii (David's myotis)
     209      15423  Drosophila willistoni (Fruit fly)
     210      15412  Pestalotiopsis fici W106-1
     211      15380  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     212      15355  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     213      15354  Loa loa (Eye worm) (Filaria loa)
     214      15155  Drosophila ananassae (Fruit fly)
     215      15153  Pythium ultimum DAOM BR144
     216      15057  Pararge aegeria (specked wood butterfly)
     217      15042  Harpegnathos saltator (Jerdon's jumping ant)
     218      15033  Strigamia maritima (European centipede) (Geophilus maritimus)
     219      14944  Acanthamoeba castellanii str. Neff
     220      14928  Drosophila erecta (Fruit fly)
     221      14869  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     222      14801  Camponotus floridanus (Florida carpenter ant)
     223      14794  Drosophila mojavensis (Fruit fly)
     224      14790  Gibberella fujikuroi (strain CBS 195.34 / IMI 58289 / NRRL A-6831)  
     225      14713  Plasmodium chabaudi
     226      14708  Drosophila virilis (Fruit fly)
     227      14654  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     228      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     229      14597  Angomonas deanei
     230      14417  Volvox carteri (Green alga)
     231      14364  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     232      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     233      14235  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     234      14159  Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
     235      13971  Acromyrmex echinatior (Panamanian leafcutter ant) 
     236      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     237      13896  Porcine reproductive and respiratory syndrome virus (PRRSV)
     238      13879  Clonorchis sinensis (Chinese liver fluke)
     239      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     240      13806  Fomitopsis pinicola (strain FP-58527) (Brown rot fungus)
     241      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     242      13767  Gibberella zeae (Wheat head blight fungus) (Fusarium graminearum)
     243      13765  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     244      13759  Colletotrichum fioriniae PJ7
     245      13707  Trypanosoma cruzi
     246      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     247      13445  Giardia intestinalis (Giardia lamblia)
     248      13438  Hepatitis C virus subtype 1a
     249      13417  Cladophialophora psammophila CBS 110553
     250      13345  Aspergillus flavus 


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          796882 (  1%)
    Bacteria       65532440 ( 82%)
    Eukaryota      10896348 ( 14%)
    Viruses         2058511 (  3%)
    Other            540061 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 120133 (  1%)           (  0%)
     Other Mammalia       1080262 ( 10%)           (  1%)
     Other Vertebrata     1031767 (  9%)           (  1%)
     Viridiplantae        2090243 ( 19%)           (  3%)
     Fungi                2896464 ( 27%)           (  4%)
     Insecta              1061294 ( 10%)           (  1%)
     Nematoda              391330 (  4%)           (  0%)
     Other                2224855 ( 20%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1837589             1001-1100   392218
                 51- 100 7299259             1101-1200   285948
                101- 150 8487314             1201-1300   204516
                151- 200 7980615             1301-1400   114875
                201- 250 8193604             1401-1500   101031
                251- 300 8010111             1501-1600    65553
                301- 350 7204134             1601-1700    51292
                351- 400 5341691             1701-1800    31494
                401- 450 4664109             1801-1900    27439
                451- 500 3765146             1901-2000    21233
                501- 550 2391025             2001-2100    21210
                551- 600 1822896             2101-2200    29303
                601- 650 1288975             2201-2300    16544
                651- 700 1035199             2301-2400    13886
                701- 750  796844             2401-2500    12121
                751- 800  680356             >2500        82800
                801- 850  528136
                851- 900  486119
                901- 950  330240
                951-1000  229109



   The average sequence length in UniProtKB/TrEMBL is   315 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    91120741                1.14                                                    
   Submitted to EMBL/GenBank/DDBJ  63509397  60623168      0.80                                                    
   Journal                         25450736  24064301      0.32                                                    
   Submitted to other databases     2132734   2125470      0.03                                                    
   Thesis                             18730     18671     <0.01                                                    
   Book citation                       9143      9080     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 513946


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                     125959850                1.58                                                    
   CATALYTIC ACTIVITY               8689006   7964653      0.11     4                                              
   CAUTION                         57137816  57089476      0.72     1                                              
   COFACTOR                         3856026   3532006      0.05     8                                              
   DOMAIN                            414293    396559      0.01     9                                              
   ENZYME REGULATION                 136332    136332     <0.01    11                                              
   FUNCTION                         9890805   9415182      0.12     3                                              
   INTERACTION                         1731      1731     <0.01    12                                              
   MISCELLANEOUS                     248696    248480     <0.01    10                                              
   PATHWAY                          4480117   4047635      0.06     7                                              
   SIMILARITY                      27359956  20946009      0.34     2                                              
   SUBCELLULAR LOCATION             8300407   7985731      0.10     5                                              
   SUBUNIT                          5444665   5393008      0.07     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      45336284                0.57                                                    
   ACT_SITE                         3687863   2310901      0.05     5                                              
   BINDING                          8015859   2069575      0.10     2                                              
   CARBOHYD                             760       287     <0.01    27                                              
   CHAIN                             904614    720736      0.01    10                                              
   COILED                            142142     71738     <0.01    16                                              
   COMPBIAS                           22862     22710     <0.01    22                                              
   CROSSLNK                           20723     15012     <0.01    23                                              
   DISULFID                          172306    132219     <0.01    15                                              
   DNA_BIND                          131281    123272     <0.01    17                                              
   DOMAIN                           1619496   1279009      0.02     8                                              
   INIT_MET                           22943     22943     <0.01    21                                              
   INTRAMEM                             392        56     <0.01    28                                              
   LIPID                             126778     63389     <0.01    19                                              
   METAL                            7542616   1959844      0.09     3                                              
   MOD_RES                           587786    543283      0.01    12                                              
   MOTIF                             459248    295683      0.01    14                                              
   NON_STD                             1945      1820     <0.01    26                                              
   NON_TER                          8880326   5983702      0.11     1                                              
   NP_BIND                          3117224   1873481      0.04     6                                              
   PEPTIDE                              121       121     <0.01    29                                              
   PROPEP                              6934      6934     <0.01    24                                              
   REGION                           2559242   1393541      0.03     7                                              
   REPEAT                             99844     23165     <0.01    20                                              
   SIGNAL                            786973    783250      0.01    11                                              
   SITE                             1132818    563326      0.01     9                                              
   TOPO_DOM                          541397    113728      0.01    13                                              
   TRANSIT                             1971      1961     <0.01    25                                              
   TRANSMEM                         4621337    824441      0.06     4                                              
   ZN_FING                           128483    115186     <0.01    18                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             745234558                9.34                                                    
   Allergome                           3726      3087     <0.01    83   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   102   Organism-specific databases                
   ArrayExpress                       62343     62343     <0.01    55   Gene expression databases                  
   BRENDA                              2608      2580     <0.01    86   Enzyme and pathway databases               
   Bgee                               96722     96722     <0.01    49   Gene expression databases                  
   BindingDB                           5716      5716     <0.01    78   Chemistry                                  
   BioCyc                           5770091   5692556      0.07    21   Enzyme and pathway databases               
   CAZy                               73866     69412     <0.01    53   Protein family/group databases             
   CGD                                 6777      6777     <0.01    76   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   108   2D gel databases                           
   CTD                               430662    429321      0.01    37   Organism-specific databases                
   ChEMBL                               660       660     <0.01    94   Chemistry                                  
   ChiTaRS                            64523     64523     <0.01    54   Other                                      
   ConoServer                           159       159     <0.01   100   Organism-specific databases                
   DIP                                 3058      3053     <0.01    85   Protein-protein interaction databases      
   DNASU                              41956     41630     <0.01    62   Protocols and materials databases          
   EMBL                            84053528  78615603      1.05     3   Sequence databases                         
   Ensembl                          1110659   1095904      0.01    30   Genome annotation databases                
   EnsemblBacteria                 37513898  36914619      0.47     7   Genome annotation databases                
   EnsemblFungi                      409083    406595      0.01    38   Genome annotation databases                
   EnsemblMetazoa                    901656    885455      0.01    33   Genome annotation databases                
   EnsemblPlants                     777254    739527      0.01    34   Genome annotation databases                
   EnsemblProtists                   199527    196899     <0.01    42   Genome annotation databases                
   EuPathDB                          161171    161170     <0.01    48   Organism-specific databases                
   EvolutionaryTrace                   7920      7920     <0.01    75   Other                                      
   FlyBase                           198838    197368     <0.01    43   Organism-specific databases                
   GO                             108710658  37969099      1.36     2   Ontologies                                 
   Gene3D                          40601177  31750966      0.51     5   Family and domain databases                
   GeneID                          11551934  11279782      0.14    13   Genome annotation databases                
   GeneTree                         1024706   1024648      0.01    31   Phylogenomic databases                     
   Genevestigator                     82796     82792     <0.01    50   Gene expression databases                  
   GenoList                           14730     14457     <0.01    71   Organism-specific databases                
   GenomeRNAi                         24521     24521     <0.01    66   Other                                      
   Gramene                           197627    197627     <0.01    44   Organism-specific databases                
   GuidetoPHARMACOLOGY                   21        21     <0.01   106   Chemistry                                  
   H-InvDB                              601       454     <0.01    95   Organism-specific databases                
   HAMAP                            8780424   8658456      0.11    16   Family and domain databases                
   HGNC                               46956     46876     <0.01    60   Organism-specific databases                
   HOGENOM                          3645179   3645136      0.05    26   Phylogenomic databases                     
   HOVERGEN                          303409    303401     <0.01    40   Phylogenomic databases                     
   InParanoid                        180666    180666     <0.01    45   Phylogenomic databases                     
   IntAct                             13758     13758     <0.01    72   Protein-protein interaction databases      
   InterPro                       157097367  53375470      1.97     1   Family and domain databases                
   KEGG                            10274859  10032838      0.13    14   Genome annotation databases                
   KO                               4304486   4282026      0.05    24   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    80   Organism-specific databases                
   Leproma                             1272      1270     <0.01    89   Organism-specific databases                
   MEROPS                            175124    175124     <0.01    46   Protein family/group databases             
   MGI                                52371     51931     <0.01    57   Organism-specific databases                
   MIM                                    4         4     <0.01   109   Organism-specific databases                
   MINT                               10143     10142     <0.01    73   Protein-protein interaction databases      
   MaxQB                               1617      1617     <0.01    88   Proteomic databases                        
   NextBio                           204776    204775     <0.01    41   Other                                      
   OGP                                    3         3     <0.01   110   2D gel databases                           
   OMA                              7296110   7296104      0.09    19   Phylogenomic databases                     
   OrthoDB                          5180884   5180882      0.06    22   Phylogenomic databases                     
   PANTHER                          8293511   8079817      0.10    17   Family and domain databases                
   PATRIC                           8251535   8251341      0.10    18   Genome annotation databases                
   PDB                                23730     12685     <0.01    67   3D structure databases                     
   PDBsum                             23650     12637     <0.01    68   3D structure databases                     
   PIR                               171527    138687     <0.01    47   Sequence databases                         
   PIRSF                            6968743   6912778      0.09    20   Family and domain databases                
   PMAP-CutDB                           199       199     <0.01    99   Other                                      
   PRIDE                             921537    921537      0.01    32   Proteomic databases                        
   PRINTS                           9598797   8651782      0.12    15   Family and domain databases                
   PRO                                26970     26969     <0.01    64   Other                                      
   PROSITE                         33312754  22402991      0.42     8   Family and domain databases                
   PaxDb                              28464     28462     <0.01    63   Proteomic databases                        
   PeptideAtlas                         127       127     <0.01   101   Proteomic databases                        
   PeroxiBase                          2588      2580     <0.01    87   Protein family/group databases             
   Pfam                            68443830  49866760      0.86     4   Family and domain databases                
   PharmGKB                            3303      3303     <0.01    84   Organism-specific databases                
   PhosSite                             890       878     <0.01    92   PTM databases                              
   PhosphoSite                         1090      1090     <0.01    91   PTM databases                              
   PhylomeDB                         384332    384332     <0.01    39   Phylogenomic databases                     
   PomBase                                1         1     <0.01   111   Organism-specific databases                
   PptaseDB                              38        36     <0.01   104   Protein family/group databases             
   ProDom                           1333184   1295310      0.02    29   Family and domain databases                
   ProMEX                              5291      5291     <0.01    79   Proteomic databases                        
   ProteinModelPortal              16909066  16909066      0.21    10   3D structure databases                     
   PseudoCAP                           4506      4500     <0.01    81   Organism-specific databases                
   REBASE                             47892     47865     <0.01    58   Protein family/group databases             
   REPRODUCTION-2DPAGE                   65        64     <0.01   103   2D gel databases                           
   RGD                                21282     20266     <0.01    70   Organism-specific databases                
   Reactome                             244       202     <0.01    98   Enzyme and pathway databases               
   RefSeq                          11902011  11459551      0.15    12   Sequence databases                         
   SABIO-RK                             517       517     <0.01    96   Enzyme and pathway databases               
   SGD                                   17        17     <0.01   107   Organism-specific databases                
   SMART                           14413573  11000044      0.18    11   Family and domain databases                
   SMR                              4705333   4705333      0.06    23   3D structure databases                     
   STRING                           3131210   3131041      0.04    27   Protein-protein interaction databases      
   SUPFAM                          38387683  30893589      0.48     6   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   105   2D gel databases                           
   SignaLink                           4237      4234     <0.01    82   Enzyme and pathway databases               
   TAIR                               21725     21607     <0.01    69   Organism-specific databases                
   TCDB                                6053      6045     <0.01    77   Protein family/group databases             
   TIGRFAMs                        17947257  16362830      0.22     9   Family and domain databases                
   TreeFam                           587931    587929      0.01    35   Phylogenomic databases                     
   TubercuList                         1101      1100     <0.01    90   Organism-specific databases                
   UCSC                               57886     57689     <0.01    56   Genome annotation databases                
   UniGene                           554360    521351      0.01    36   Sequence databases                         
   UniPathway                       4021306   3732054      0.05    25   Enzyme and pathway databases               
   VectorBase                         78248     77731     <0.01    51   Genome annotation databases                
   World-2DPAGE                         671       666     <0.01    93   2D gel databases                           
   WormBase                           43220     43046     <0.01    61   Organism-specific databases                
   Xenbase                            25135     25075     <0.01    65   Organism-specific databases                
   ZFIN                               47696     47213     <0.01    59   Organism-specific databases                
   dictyBase                           7997      7775     <0.01    74   Organism-specific databases                
   eggNOG                           2754746   2754712      0.03    28   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    52   Organism-specific databases                
   mycoCLAP                             416       416     <0.01    97   Protein family/group databases             

Number of explicitly cross-referenced databases: 130


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.89   Gln (Q) 3.98   Leu (L) 9.91   Ser (S) 6.35
   Arg (R) 5.33   Glu (E) 6.09   Lys (K) 5.25   Thr (T) 5.57
   Asn (N) 4.15   Gly (G) 7.21   Met (M) 2.48   Trp (W) 1.25
   Asp (D) 5.44   His (H) 2.20   Phe (F) 3.99   Tyr (Y) 3.07
   Cys (C) 1.09   Ile (I) 6.23   Pro (P) 4.49   Val (V) 6.93

   Asx (B) 0      Glx (Z) 0      Xaa (X) 0.01


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Ile, Glu, Thr, Asp, Arg, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 748773
Total number of entries encoded on a Plasmid: 425082
Total number of entries encoded on a Plastid: 33831
Total number of entries encoded on a Plastid; Apicoplast: 929
Total number of entries encoded on a Plastid; Chloroplast: 277575
Total number of entries encoded on a Plastid; Cyanelle: 49
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1797