Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2014_09 STATISTICS


1.  INTRODUCTION

Release 2014_09 of 01-Oct-2014 of UniProtKB/TrEMBL contains 83955074 sequence entries,
comprising 26515172718 amino acids.

1923825 sequences have been added since release 2014_08, the sequence data of
7997 existing entries has been updated and the annotations of
82029730 entries have been revised. This represents an increase of 2%.

Number of fragments: 1668

Protein existence (PE):              entries      %
1: Evidence at protein level           44837     0.05%
2: Evidence at transcript level       996424     1.19%
3: Inferred from homology           19885577    23.69%
4: Predicted                        63028236    75.07%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 507847

   The first twenty species represent 2481089 sequences:     3 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:20370
                            2x:81141
                            3x:43838
                            4x:31383
                            5x:18495
                            6x:13522
                            7x: 9751
                            8x: 7743
                            9x: 6170
                           10x:11229
                       11- 20x:39714
                       21- 50x:12145
                       51-100x: 4824
                         >100x:24186


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     600007  Human immunodeficiency virus 1
       2     352020  marine sediment metagenome
       3     234606  uncultured bacterium
       4     118322  Homo sapiens (Human)
       5     106153  Triticum aestivum (Wheat)
       6      96739  Hepatitis C virus
       7      96670  Oryza sativa subsp. japonica (Rice)
       8      88547  Hepatitis B virus (HBV)
       9      73977  Glycine max (Soybean) (Glycine hispida)
      10      73055  mine drainage metagenome
      11      70544  Hordeum vulgare var. distichum (Two-rowed barley)
      12      69595  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      13      69530  Macaca mulatta (Rhesus macaque)
      14      67671  Phytophthora parasitica (Potato buckeye rot agent)
      15      65421  Ancylostoma ceylanicum
      16      64105  Escherichia coli
      17      61006  Zea mays (Maize)
      18      60710  human gut metagenome
      19      57380  Mus musculus (Mouse)
      20      55031  Callithrix jacchus (White-tufted-ear marmoset)
      21      54927  Solanum tuberosum (Potato)
      22      54173  Vitis vinifera (Grape)
      23      53348  Danio rerio (Zebrafish) (Brachydanio rerio)
      24      50661  Trichomonas vaginalis
      25      49716  Oncorhynchus mykiss (Rainbow trout) (Salmo gairdneri)
      26      49274  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      27      48911  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      28      47052  Populus trichocarpa (Western balsam poplar) 
      29      44328  Citrus sinensis (Sweet orange) (Citrus aurantium var. sinensis)
      30      44275  Eucalyptus grandis (Flooded gum)
      31      41209  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      32      40872  Theobroma cacao (Cacao) (Cocoa)
      33      39923  Reticulomyxa filosa
      34      39896  Oryza sativa subsp. indica (Rice)
      35      39848  Paramecium tetraurelia
      36      39787  Arabidopsis thaliana (Mouse-ear cress)
      37      39391  Setaria italica (Foxtail millet) (Panicum italicum)
      38      38796  Mustela putorius furo (European domestic ferret) (Mustela furo)
      39      38357  Simian immunodeficiency virus (SIV)
      40      37309  Acyrthosiphon pisum (Pea aphid)
      41      37219  Drosophila melanogaster (Fruit fly)
      42      36609  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      43      35979  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      44      35672  Ailuropoda melanoleuca (Giant panda)
      45      35599  Emiliania huxleyi CCMP1516
      46      35323  Physcomitrella patens subsp. patens (Moss)
      47      35137  Caenorhabditis japonica
      48      34630  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      49      34570  Thalassiosira oceanica (Marine diatom)
      50      34556  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      51      33882  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      52      33687  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      53      33258  Selaginella moellendorffii (Spikemoss)
      54      32772  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      55      32485  Sus scrofa (Pig)
      56      32410  Phaseolus vulgaris (Kidney bean) (French bean)
      57      32342  Oryza brachyantha
      58      32174  Oryza glaberrima (African rice)
      59      32123  Caenorhabditis remanei (Caenorhabditis vulgaris)
      60      32050  Capitella teleta (Polychaete worm)
      61      31995  Anas platyrhynchos (Domestic duck) (Anas boschas)
      62      31865  Pan troglodytes (Chimpanzee)
      63      31403  Ricinus communis (Castor bean)
      64      31290  Citrus clementina
      65      30981  Daphnia pulex (Water flea)
      66      30713  Caenorhabditis brenneri (Nematode worm)
      67      30181  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      68      29845  Rhizophagus irregularis (strain DAOM 181602 / DAOM 197198 / MUCL 43194)  
      69      29815  Amphimedon queenslandica (Sponge)
      70      29564  Vibrio parahaemolyticus
      71      29494  Strongylocentrotus purpuratus (Purple sea urchin)
      72      29333  Pristionchus pacificus (Parasitic nematode)
      73      29205  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      74      29083  Oikopleura dioica (Tunicate)
      75      28885  Erythranthe guttata (Yellow monkey flower) (Mimulus guttatus)
      76      28838  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      77      28825  Capsella rubella
      78      28669  Rhizophagus irregularis DAOM 197198w
      79      28642  Prunus persica (Peach) (Amygdalus persica)
      80      28382  Eutrema salsugineum (Saltwater cress) (Sisymbrium salsugineum)
      81      28105  Gasterosteus aculeatus (Three-spined stickleback)
      82      27923  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      83      27762  Canis familiaris (Dog) (Canis lupus familiaris)
      84      27554  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      85      27543  Equus caballus (Horse)
      86      27519  Jatropha curcas (Barbados nut)
      87      27434  Amborella trichopoda
      88      27090  Gorilla gorilla gorilla (Lowland gorilla)
      89      26921  Tetrahymena thermophila (strain SB210)
      90      26858  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      91      26770  Morus notabilis
      92      26489  Phytophthora parasitica CJ01A1
      93      26477  Phytophthora parasitica P1569
      94      26452  Phytophthora parasitica P10297
      95      26438  Phytophthora parasitica (strain INRA-310)
      96      26391  Ovis aries (Sheep)
      97      25995  Oryzias latipes (Medaka fish) (Japanese ricefish)
      98      25825  Loxodonta africana (African elephant)
      99      25821  Bos taurus (Bovine)
     100      25772  Rattus norvegicus (Rat)
     101      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
     102      25594  Coffea canephora (Robusta coffee)
     103      25025  Aphanomyces astaci
     104      24917  Nematostella vectensis (Starlet sea anemone)
     105      24590  Guillardia theta CCMP2712
     106      24375  Oxytricha trifallax
     107      24301  Tetraselmis sp. GSL018
     108      23808  Astyanax mexicanus (Blind cave fish) (Astyanax fasciatus mexicanus)
     109      23742  Ornithorhynchus anatinus (Duckbill platypus)
     110      23687  Lottia gigantea (Giant owl limpet)
     111      23651  Dendroctonus ponderosae (Mountain pine beetle)
     112      23503  Caenorhabditis elegans
     113      23496  Latimeria chalumnae (West Indian ocean coelacanth)
     114      23373  Helobdella robusta (Californian leech)
     115      23318  Fusarium oxysporum f. sp. melonis 26406
     116      23271  Fusarium oxysporum f. sp. conglutinans race 2 54008
     117      23263  Fusarium oxysporum f. sp. pisi HDV247
     118      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
     119      22780  Monodelphis domestica (Gray short-tailed opossum)
     120      22754  Fusarium oxysporum f. sp. raphani 54005
     121      22564  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
     122      22527  Lepisosteus oculatus (Spotted gar)
     123      22323  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
     124      22248  Fusarium oxysporum f. sp. vasinfectum 25433
     125      22174  gut metagenome
     126      21922  Oryctolagus cuniculus (Rabbit)
     127      21754  Haemonchus contortus (Barber pole worm)
     128      21689  Fusarium oxysporum f. sp. radicis-lycopersici 26381
     129      21661  Fusarium oxysporum Fo47
     130      21549  Fusarium oxysporum f. sp. lycopersici MN25
     131      21548  Heterocephalus glaber (Naked mole rat)
     132      21538  Gallus gallus (Chicken)
     133      21398  Caenorhabditis briggsae
     134      21357  Galerina marginata CBS 339.88
     135      21339  Anopheles darlingi (Mosquito)
     136      21235  Echinococcus granulosus (Hydatid tapeworm)
     137      21171  Myotis lucifugus (Little brown bat)
     138      21137  Ixodes scapularis (Black-legged tick) (Deer tick)
     139      21036  Felis catus (Cat) (Felis silvestris catus)
     140      20867  Tupaia chinensis (Chinese tree shrew)
     141      20805  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     142      20767  Fusarium oxysporum FOSC 3-a
     143      20540  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     144      20168  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     145      20115  Ciona savignyi (Pacific transparent sea squirt)
     146      20098  Cavia porcellus (Guinea pig)
     147      20062  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     148      20052  Saprolegnia parasitica (strain CBS 223.65)
     149      20028  Camelus ferus (Wild Bactrian camel)
     150      19996  Callorhynchus milii (Elephant fish) (Australian ghost shark)
     151      19830  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     152      19807  Fusarium oxysporum f. sp. cubense tropical race 4 54006
     153      19702  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     154      19640  Klebsiella pneumoniae
     155      19621  Bactrocera dorsalis (Oriental fruit fly) (Dacus dorsalis)
     156      19619  Brugia malayi (Filarial nematode worm)
     157      19602  Anolis carolinensis (Green anole) (American chameleon)
     158      19594  Aphanomyces invadans
     159      19561  Pteropus alecto (Black flying fox)
     160      19522  Wuchereria bancrofti
     161      19300  Myotis brandtii (Brandt's bat)
     162      19200  Trypanosoma cruzi (strain CL Brener)
     163      19196  Necator americanus (Human hookworm)
     164      19062  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     165      19017  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     166      18867  Drosophila simulans (Fruit fly)
     167      18600  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     168      18561  Bos mutus
     169      18488  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     170      18417  Ophiophagus hannah (King cobra) (Naja hannah)
     171      18272  Tetranychus urticae (Two-spotted spider mite)
     172      18213  uncultured archaeon
     173      18141  Plasmodium falciparum
     174      18126  Atta cephalotes (Leafcutter ant)
     175      18048  Anopheles gambiae (African malaria mosquito)
     176      18047  Saprolegnia diclina VS20
     177      17979  Hepatitis C virus subtype 1b
     178      17976  Moniliophthora roreri (strain MCA 2997) (Cocoa frosty pod rot fungus) 
     179      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     180      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     181      17758  Bombyx mori (Silk moth)
     182      17683  Genlisea aurea
     183      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     184      17590  Gibberella moniliformis (strain M3125 / FGSC 7600)  
     185      17487  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     186      17457  Listeria monocytogenes
     187      17384  Ceratitis capitata (Mediterranean fruit fly) (Tephritis capitata)
     188      17289  Nasonia vitripennis (Parasitic wasp)
     189      17104  Drosophila yakuba (Fruit fly)
     190      17078  Tribolium castaneum (Red flour beetle)
     191      16949  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     192      16918  Meleagris gallopavo (Common turkey)
     193      16828  Pseudomonas aeruginosa
     194      16723  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     195      16715  Drosophila persimilis (Fruit fly)
     196      16638  Fusarium oxysporum f. sp. lycopersici  
     197      16618  Rhodnius prolixus (Triatomid bug)
     198      16534  Cerapachys biroi (Ant)
     199      16484  Botryobasidium botryosum FD-172 SS1
     200      16430  Ectocarpus siliculosus (Brown alga)
     201      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     202      16371  Opisthorchis viverrini
     203      16341  Jaapia argillacea MUCL 33604
     204      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     205      16332  Danaus plexippus (Monarch butterfly)
     206      16282  Trichinella spiralis (Trichina worm)
     207      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     208      16222  Schistosoma japonicum (Blood fluke)
     209      16219  Neovison vison (American mink) (Mustela vison)
     210      16208  Ixodes ricinus (Common tick)
     211      16193  Drosophila sechellia (Fruit fly)
     212      16149  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     213      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     214      16059  Helicobacter pylori (Campylobacter pylori)
     215      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     216      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     217      15741  Rabies virus
     218      15718  Naegleria gruberi (Amoeba)
     219      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     220      15593  Phytophthora ramorum (Sudden oak death agent)
     221      15467  Myotis davidii (David's myotis)
     222      15423  Drosophila willistoni (Fruit fly)
     223      15412  Pestalotiopsis fici W106-1
     224      15380  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     225      15355  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     226      15349  Loa loa (Eye worm) (Filaria loa)
     227      15155  Drosophila ananassae (Fruit fly)
     228      15153  Pythium ultimum DAOM BR144
     229      15064  Pararge aegeria (specked wood butterfly)
     230      15042  Harpegnathos saltator (Jerdon's jumping ant)
     231      15033  Strigamia maritima (European centipede) (Geophilus maritimus)
     232      14944  Acanthamoeba castellanii str. Neff
     233      14928  Drosophila erecta (Fruit fly)
     234      14869  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     235      14801  Camponotus floridanus (Florida carpenter ant)
     236      14794  Drosophila mojavensis (Fruit fly)
     237      14790  Gibberella fujikuroi (strain CBS 195.34 / IMI 58289 / NRRL A-6831)  
     238      14713  Plasmodium chabaudi
     239      14708  Drosophila virilis (Fruit fly)
     240      14654  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     241      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     242      14597  Angomonas deanei
     243      14553  Zootermopsis nevadensis (Dampwood termite)
     244      14417  Volvox carteri (Green alga)
     245      14366  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     246      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     247      14235  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     248      14159  Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
     249      14061  Porcine reproductive and respiratory syndrome virus (PRRSV)
     250      13971  Acromyrmex echinatior (Panamanian leafcutter ant) 


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          813059 (  1%)
    Bacteria       69015058 ( 82%)
    Eukaryota      11494044 ( 14%)
    Viruses         2092298 (  2%)
    Other            540614 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 118377 (  1%)           (  0%)
     Other Mammalia       1092381 ( 10%)           (  1%)
     Other Vertebrata     1087485 (  9%)           (  1%)
     Viridiplantae        2277886 ( 20%)           (  3%)
     Fungi                3141101 ( 27%)           (  4%)
     Insecta              1089384 (  9%)           (  1%)
     Nematoda              391731 (  3%)           (  0%)
     Other                2295699 ( 20%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 2419450             1001-1100   443826
                 51- 100 8625949             1101-1200   318041
                101- 150 9860616             1201-1300   227634
                151- 200 9308649             1301-1400   130368
                201- 250 9494382             1401-1500   113453
                251- 300 8919501             1501-1600    74363
                301- 350 7985050             1601-1700    60863
                351- 400 5893117             1701-1800    37185
                401- 450 5134035             1801-1900    32248
                451- 500 4138564             1901-2000    24757
                501- 550 2611803             2001-2100    24420
                551- 600 1986578             2101-2200    32434
                601- 650 1405430             2201-2300    18691
                651- 700 1139988             2301-2400    15734
                701- 750  877462             2401-2500    13770
                751- 800  746429             >2500        96405
                801- 850  583852
                851- 900  532918
                901- 950  365572
                951-1000  259869



   The average sequence length in UniProtKB/TrEMBL is   315 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    95478618                1.14                                                    
   Submitted to EMBL/GenBank/DDBJ  66461542  63390902      0.79                                                    
   Journal                         26860840  25352955      0.32                                                    
   Submitted to other databases     2128347   2121201      0.03                                                    
   Thesis                             18749     18690     <0.01                                                    
   Book citation                       9139      9076     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 521659


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                     140883416                1.68                                                    
   CATALYTIC ACTIVITY              10150255   9293686      0.12     4                                              
   CAUTION                         60313160  60256561      0.72     1                                              
   COFACTOR                         4715613   4269875      0.06     8                                              
   DOMAIN                            498828    477377      0.01     9                                              
   ENZYME REGULATION                 166894    166893     <0.01    11                                              
   FUNCTION                        11742710  11096880      0.14     3                                              
   INTERACTION                         1717      1717     <0.01    12                                              
   MISCELLANEOUS                     326256    318698     <0.01    10                                              
   PATHWAY                          5297460   4764928      0.06     7                                              
   SIMILARITY                      31656465  24488537      0.38     2                                              
   SUBCELLULAR LOCATION             9593132   9200486      0.11     5                                              
   SUBUNIT                          6420926   6328727      0.08     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      52117684                0.62                                                    
   ACT_SITE                         4465964   2809774      0.05     5                                              
   BINDING                          9743165   2506325      0.12     1                                              
   CARBOHYD                             802       302     <0.01    27                                              
   CHAIN                             911210    721758      0.01    10                                              
   COILED                            168779     83933     <0.01    16                                              
   COMPBIAS                           28359     28203     <0.01    21                                              
   CROSSLNK                           27360     19612     <0.01    22                                              
   DISULFID                          200689    153519     <0.01    15                                              
   DNA_BIND                          149666    140814     <0.01    18                                              
   DOMAIN                           1789875   1434881      0.02     8                                              
   INIT_MET                           26937     26937     <0.01    23                                              
   INTRAMEM                             392        56     <0.01    28                                              
   LIPID                             143640     71820     <0.01    19                                              
   METAL                            8993557   2375005      0.11     3                                              
   MOD_RES                           694132    643066      0.01    12                                              
   MOTIF                             545644    350574      0.01    14                                              
   NON_STD                             2000      1858     <0.01    26                                              
   NON_TER                          9179815   6198110      0.11     2                                              
   NP_BIND                          3663132   2192144      0.04     6                                              
   PEPTIDE                              126       126     <0.01    29                                              
   PROPEP                              8825      8825     <0.01    24                                              
   REGION                           3044893   1671252      0.04     7                                              
   REPEAT                            119900     27941     <0.01    20                                              
   SIGNAL                            804466    800737      0.01    11                                              
   SITE                             1331046    667595      0.02     9                                              
   TOPO_DOM                          615578    129310      0.01    13                                              
   TRANSIT                             2059      2047     <0.01    25                                              
   TRANSMEM                         5301491    951823      0.06     4                                              
   ZN_FING                           154182    138044     <0.01    17                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             773288133                9.21                                                    
   Allergome                           3777      3128     <0.01    84   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   103   Organism-specific databases                
   ArrayExpress                       60152     60152     <0.01    56   Gene expression databases                  
   BRENDA                              2593      2566     <0.01    87   Enzyme and pathway databases               
   Bgee                               94886     94886     <0.01    50   Gene expression databases                  
   BindingDB                           5710      5710     <0.01    79   Chemistry                                  
   BioCyc                           5769661   5692136      0.07    22   Enzyme and pathway databases               
   CAZy                               73853     69401     <0.01    54   Protein family/group databases             
   CGD                                 6776      6776     <0.01    77   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   109   2D gel databases                           
   CTD                               458391    457111      0.01    37   Organism-specific databases                
   ChEMBL                               660       660     <0.01    95   Chemistry                                  
   ChiTaRS                            63063     63063     <0.01    55   Other                                      
   ConoServer                           159       159     <0.01   100   Organism-specific databases                
   DIP                                 3118      3113     <0.01    86   Protein-protein interaction databases      
   DNASU                              41874     41548     <0.01    63   Protocols and materials databases          
   DrugBank                             146        58     <0.01   101   Chemistry                                  
   EMBL                            88640733  82746895      1.06     3   Sequence databases                         
   Ensembl                          1105782   1091546      0.01    30   Genome annotation databases                
   EnsemblBacteria                 37472795  36873353      0.45     7   Genome annotation databases                
   EnsemblFungi                      409069    406581     <0.01    38   Genome annotation databases                
   EnsemblMetazoa                    903726    887438      0.01    33   Genome annotation databases                
   EnsemblPlants                     777027    739319      0.01    34   Genome annotation databases                
   EnsemblProtists                   199523    196898     <0.01    42   Genome annotation databases                
   EuPathDB                          161170    161169     <0.01    48   Organism-specific databases                
   EvolutionaryTrace                   7899      7899     <0.01    76   Other                                      
   FlyBase                           198822    197352     <0.01    43   Organism-specific databases                
   GO                             122989324  41858635      1.46     2   Ontologies                                 
   Gene3D                          40535391  31696764      0.48     5   Family and domain databases                
   GeneID                          11535404  11258513      0.14    13   Genome annotation databases                
   GeneTree                         1024274   1024216      0.01    31   Phylogenomic databases                     
   Genevestigator                     82503     82499     <0.01    51   Gene expression databases                  
   GenoList                           14727     14454     <0.01    72   Organism-specific databases                
   GenomeRNAi                         23493     23493     <0.01    69   Other                                      
   Gramene                           196950    196950     <0.01    44   Organism-specific databases                
   GuidetoPHARMACOLOGY                   20        20     <0.01   107   Chemistry                                  
   H-InvDB                              598       451     <0.01    96   Organism-specific databases                
   HAMAP                            8758842   8637000      0.10    16   Family and domain databases                
   HGNC                               43352     43295     <0.01    61   Organism-specific databases                
   HOGENOM                          3644089   3644047      0.04    26   Phylogenomic databases                     
   HOVERGEN                          302485    302479     <0.01    40   Phylogenomic databases                     
   InParanoid                        180459    180459     <0.01    45   Phylogenomic databases                     
   IntAct                             11883     11883     <0.01    73   Protein-protein interaction databases      
   InterPro                       156774727  53297385      1.87     1   Family and domain databases                
   KEGG                            10382654  10144687      0.12    14   Genome annotation databases                
   KO                               4453633   4430703      0.05    24   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    80   Organism-specific databases                
   Leproma                             1272      1270     <0.01    90   Organism-specific databases                
   MEROPS                            174949    174949     <0.01    46   Protein family/group databases             
   MGI                                52142     51795     <0.01    58   Organism-specific databases                
   MIM                                    4         4     <0.01   110   Organism-specific databases                
   MINT                               10119     10118     <0.01    74   Protein-protein interaction databases      
   MaxQB                               1412      1412     <0.01    89   Proteomic databases                        
   NextBio                           201483    201482     <0.01    41   Other                                      
   OGP                                    3         3     <0.01   111   2D gel databases                           
   OMA                              7293448   7293429      0.09    20   Phylogenomic databases                     
   OrthoDB                          5180204   5180202      0.06    23   Phylogenomic databases                     
   PANTHER                          8313411   8094765      0.10    18   Family and domain databases                
   PATRIC                           8248373   8248176      0.10    19   Genome annotation databases                
   PDB                                24657     13129     <0.01    67   3D structure databases                     
   PDBsum                             24279     12885     <0.01    68   3D structure databases                     
   PIR                               171292    138456     <0.01    47   Sequence databases                         
   PIRSF                            6961341   6905437      0.08    21   Family and domain databases                
   PMAP-CutDB                           199       199     <0.01    99   Other                                      
   PRIDE                             919481    919481      0.01    32   Proteomic databases                        
   PRINTS                           9585429   8639690      0.11    15   Family and domain databases                
   PRO                                26931     26930     <0.01    65   Other                                      
   PROSITE                         33117819  22324599      0.39     8   Family and domain databases                
   PaxDb                              28405     28403     <0.01    64   Proteomic databases                        
   PeptideAtlas                         127       127     <0.01   102   Proteomic databases                        
   PeroxiBase                          2588      2580     <0.01    88   Protein family/group databases             
   Pfam                            68278383  49751002      0.81     4   Family and domain databases                
   PharmGKB                            3214      3214     <0.01    85   Organism-specific databases                
   PhosSite                             889       877     <0.01    93   PTM databases                              
   PhosphoSite                         1083      1083     <0.01    92   PTM databases                              
   PhylomeDB                         381547    381547     <0.01    39   Phylogenomic databases                     
   PomBase                                2         2     <0.01   112   Organism-specific databases                
   PptaseDB                              38        36     <0.01   105   Protein family/group databases             
   ProDom                           1312568   1274695      0.02    29   Family and domain databases                
   ProMEX                              5014      5014     <0.01    81   Proteomic databases                        
   ProteinModelPortal              16983939  16983939      0.20    11   3D structure databases                     
   PseudoCAP                           4506      4500     <0.01    82   Organism-specific databases                
   REBASE                             48236     48223     <0.01    59   Protein family/group databases             
   REPRODUCTION-2DPAGE                   65        64     <0.01   104   2D gel databases                           
   RGD                                21572     20547     <0.01    70   Organism-specific databases                
   Reactome                           97096     43035     <0.01    49   Enzyme and pathway databases               
   RefSeq                          17723277  14288131      0.21    10   Sequence databases                         
   SABIO-RK                             481       481     <0.01    97   Enzyme and pathway databases               
   SGD                                    7         7     <0.01   108   Organism-specific databases                
   SMART                           14385227  10978287      0.17    12   Family and domain databases                
   SMR                              8584381   8584381      0.10    17   3D structure databases                     
   STRING                           3130920   3130748      0.04    27   Protein-protein interaction databases      
   SUPFAM                          38360873  30880718      0.46     6   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   106   2D gel databases                           
   SignaLink                           4125      4123     <0.01    83   Enzyme and pathway databases               
   TAIR                               21521     21403     <0.01    71   Organism-specific databases                
   TCDB                                6228      6219     <0.01    78   Protein family/group databases             
   TIGRFAMs                        17885211  16302413      0.21     9   Family and domain databases                
   TreeFam                           587637    587635      0.01    35   Phylogenomic databases                     
   TubercuList                         1100      1099     <0.01    91   Organism-specific databases                
   UCSC                               56694     56497     <0.01    57   Genome annotation databases                
   UniGene                           551221    518430      0.01    36   Sequence databases                         
   UniPathway                       4057686   3754162      0.05    25   Enzyme and pathway databases               
   VectorBase                         78242     77725     <0.01    52   Genome annotation databases                
   World-2DPAGE                         671       666     <0.01    94   2D gel databases                           
   WormBase                           43210     43088     <0.01    62   Organism-specific databases                
   Xenbase                            25042     24983     <0.01    66   Organism-specific databases                
   ZFIN                               47324     47267     <0.01    60   Organism-specific databases                
   dictyBase                           7997      7775     <0.01    75   Organism-specific databases                
   eggNOG                           2754448   2754414      0.03    28   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    53   Organism-specific databases                
   mycoCLAP                             414       414     <0.01    98   Protein family/group databases             

Number of explicitly cross-referenced databases: 132


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.93   Gln (Q) 3.98   Leu (L) 9.93   Ser (S) 6.35
   Arg (R) 5.35   Glu (E) 6.08   Lys (K) 5.21   Thr (T) 5.56
   Asn (N) 4.13   Gly (G) 7.22   Met (M) 2.48   Trp (W) 1.25
   Asp (D) 5.43   His (H) 2.21   Phe (F) 3.98   Tyr (Y) 3.06
   Cys (C) 1.09   Ile (I) 6.20   Pro (P) 4.50   Val (V) 6.94

   Asx (B) 0      Glx (Z) 0      Xaa (X) 0.01


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Ile, Glu, Thr, Asp, Arg, Lys, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 768049
Total number of entries encoded on a Plasmid: 449555
Total number of entries encoded on a Plastid: 36405
Total number of entries encoded on a Plastid; Apicoplast: 
Total number of entries encoded on a Plastid; Chloroplast: 63
Total number of entries encoded on a Plastid; Cyanelle: 
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: