Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2014_08 STATISTICS


1.  INTRODUCTION

Release 2014_08 of 03-Sep-2014 of UniProtKB/TrEMBL contains 82126897 sequence entries,
comprising 25939595569 amino acids.

2344312 sequences have been added since release 2014_07, the sequence data of
2884 existing entries has been updated and the annotations of
21087020 entries have been revised. This represents an increase of 3%.

Number of fragments: 6116721

Protein existence (PE):              entries      %
1: Evidence at protein level          539486     0.66%
2: Evidence at transcript level       948101     1.15%
3: Inferred from homology           20613304    25.10%
4: Predicted                        60026006    73.09%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 503562

   The first twenty species represent 2459919 sequences:     3 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:20231
                            2x:80604
                            3x:43502
                            4x:30849
                            5x:18265
                            6x:13433
                            7x: 9652
                            8x: 7696
                            9x: 6111
                           10x:11148
                       11- 20x:39433
                       21- 50x:11992
                       51-100x: 4791
                         >100x:23776


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     597659  Human immunodeficiency virus 1
       2     352020  marine sediment metagenome
       3     232523  uncultured bacterium
       4     120798  Homo sapiens (Human)
       5     106132  Triticum aestivum (Wheat)
       6      96674  Oryza sativa subsp. japonica (Rice)
       7      96400  Hepatitis C virus
       8      87612  Hepatitis B virus (HBV)
       9      73944  Glycine max (Soybean) (Glycine hispida)
      10      73055  mine drainage metagenome
      11      70544  Hordeum vulgare var. distichum (Two-rowed barley)
      12      69530  Macaca mulatta (Rhesus macaque)
      13      67671  Phytophthora parasitica (Potato buckeye rot agent)
      14      65421  Ancylostoma ceylanicum
      15      60996  Zea mays (Maize)
      16      60710  human gut metagenome
      17      59498  Escherichia coli
      18      57475  Mus musculus (Mouse)
      19      56235  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      20      55022  Callithrix jacchus (White-tufted-ear marmoset)
      21      54931  Solanum tuberosum (Potato)
      22      54170  Vitis vinifera (Grape)
      23      53333  Danio rerio (Zebrafish) (Brachydanio rerio)
      24      50661  Trichomonas vaginalis
      25      49705  Oncorhynchus mykiss (Rainbow trout) (Salmo gairdneri)
      26      49274  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      27      48911  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      28      47052  Populus trichocarpa (Western balsam poplar) 
      29      44328  Citrus sinensis (Sweet orange) (Citrus aurantium var. sinensis)
      30      44275  Eucalyptus grandis (Flooded gum)
      31      41207  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      32      40872  Theobroma cacao (Cacao) (Cocoa)
      33      39959  Arabidopsis thaliana (Mouse-ear cress)
      34      39923  Reticulomyxa filosa
      35      39894  Oryza sativa subsp. indica (Rice)
      36      39852  Paramecium tetraurelia
      37      39391  Setaria italica (Foxtail millet) (Panicum italicum)
      38      38796  Mustela putorius furo (European domestic ferret) (Mustela furo)
      39      38357  Simian immunodeficiency virus (SIV)
      40      37309  Acyrthosiphon pisum (Pea aphid)
      41      37230  Drosophila melanogaster (Fruit fly)
      42      36609  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      43      35971  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      44      35672  Ailuropoda melanoleuca (Giant panda)
      45      35599  Emiliania huxleyi CCMP1516
      46      35316  Physcomitrella patens subsp. patens (Moss)
      47      35137  Caenorhabditis japonica
      48      34626  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      49      34570  Thalassiosira oceanica (Marine diatom)
      50      34556  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      51      33882  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      52      33687  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      53      33258  Selaginella moellendorffii (Spikemoss)
      54      32772  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      55      32466  Sus scrofa (Pig)
      56      32411  Phaseolus vulgaris (Kidney bean) (French bean)
      57      32342  Oryza brachyantha
      58      32174  Oryza glaberrima (African rice)
      59      32123  Caenorhabditis remanei (Caenorhabditis vulgaris)
      60      32050  Capitella teleta (Polychaete worm)
      61      31988  Anas platyrhynchos (Domestic duck) (Anas boschas)
      62      31864  Pan troglodytes (Chimpanzee)
      63      31403  Ricinus communis (Castor bean)
      64      31290  Citrus clementina
      65      30981  Daphnia pulex (Water flea)
      66      30713  Caenorhabditis brenneri (Nematode worm)
      67      30181  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      68      29845  Rhizophagus irregularis (strain DAOM 181602 / DAOM 197198 / MUCL 43194)  
      69      29815  Amphimedon queenslandica (Sponge)
      70      29495  Strongylocentrotus purpuratus (Purple sea urchin)
      71      29333  Pristionchus pacificus (Parasitic nematode)
      72      29194  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      73      29083  Oikopleura dioica (Tunicate)
      74      28875  Erythranthe guttata (Yellow monkey flower) (Mimulus guttatus)
      75      28837  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      76      28825  Capsella rubella
      77      28669  Rhizophagus irregularis DAOM 197198w
      78      28638  Prunus persica (Peach) (Amygdalus persica)
      79      28382  Eutrema salsugineum (Saltwater cress) (Sisymbrium salsugineum)
      80      28104  Gasterosteus aculeatus (Three-spined stickleback)
      81      27923  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      82      27754  Canis familiaris (Dog) (Canis lupus familiaris)
      83      27545  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      84      27543  Equus caballus (Horse)
      85      27517  Jatropha curcas (Barbados nut)
      86      27434  Amborella trichopoda
      87      27090  Gorilla gorilla gorilla (Lowland gorilla)
      88      26921  Tetrahymena thermophila (strain SB210)
      89      26857  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      90      26770  Morus notabilis
      91      26489  Phytophthora parasitica CJ01A1
      92      26477  Phytophthora parasitica P1569
      93      26452  Phytophthora parasitica P10297
      94      26438  Phytophthora parasitica (strain INRA-310)
      95      26387  Ovis aries (Sheep)
      96      25995  Oryzias latipes (Medaka fish) (Japanese ricefish)
      97      25826  Bos taurus (Bovine)
      98      25825  Loxodonta africana (African elephant)
      99      25762  Rattus norvegicus (Rat)
     100      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
     101      25025  Aphanomyces astaci
     102      24917  Nematostella vectensis (Starlet sea anemone)
     103      24590  Guillardia theta CCMP2712
     104      24301  Tetraselmis sp. GSL018
     105      23808  Astyanax mexicanus (Blind cave fish) (Astyanax fasciatus mexicanus)
     106      23742  Ornithorhynchus anatinus (Duckbill platypus)
     107      23687  Lottia gigantea (Giant owl limpet)
     108      23651  Dendroctonus ponderosae (Mountain pine beetle)
     109      23565  Oxytricha trifallax
     110      23511  Caenorhabditis elegans
     111      23496  Latimeria chalumnae (West Indian ocean coelacanth)
     112      23369  Helobdella robusta (Californian leech)
     113      23318  Fusarium oxysporum f. sp. melonis 26406
     114      23271  Fusarium oxysporum f. sp. conglutinans race 2 54008
     115      23263  Fusarium oxysporum f. sp. pisi HDV247
     116      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
     117      22780  Monodelphis domestica (Gray short-tailed opossum)
     118      22754  Fusarium oxysporum f. sp. raphani 54005
     119      22564  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
     120      22525  Lepisosteus oculatus (Spotted gar)
     121      22321  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
     122      22248  Fusarium oxysporum f. sp. vasinfectum 25433
     123      22174  gut metagenome
     124      21929  Oryctolagus cuniculus (Rabbit)
     125      21713  Haemonchus contortus (Barber pole worm)
     126      21689  Fusarium oxysporum f. sp. radicis-lycopersici 26381
     127      21661  Fusarium oxysporum Fo47
     128      21549  Fusarium oxysporum f. sp. lycopersici MN25
     129      21548  Heterocephalus glaber (Naked mole rat)
     130      21536  Gallus gallus (Chicken)
     131      21398  Caenorhabditis briggsae
     132      21357  Galerina marginata CBS 339.88
     133      21339  Anopheles darlingi (Mosquito)
     134      21234  Echinococcus granulosus (Hydatid tapeworm)
     135      21171  Myotis lucifugus (Little brown bat)
     136      21136  Ixodes scapularis (Black-legged tick) (Deer tick)
     137      21035  Felis catus (Cat) (Felis silvestris catus)
     138      20865  Tupaia chinensis (Chinese tree shrew)
     139      20805  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     140      20767  Fusarium oxysporum FOSC 3-a
     141      20540  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     142      20168  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     143      20115  Ciona savignyi (Pacific transparent sea squirt)
     144      20098  Cavia porcellus (Guinea pig)
     145      20062  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     146      20052  Saprolegnia parasitica (strain CBS 223.65)
     147      20028  Camelus ferus (Wild Bactrian camel)
     148      19996  Callorhynchus milii (Elephant fish) (Australian ghost shark)
     149      19830  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     150      19807  Fusarium oxysporum f. sp. cubense tropical race 4 54006
     151      19702  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     152      19619  Brugia malayi (Filarial nematode worm)
     153      19616  Bactrocera dorsalis (Oriental fruit fly) (Dacus dorsalis)
     154      19603  Klebsiella pneumoniae
     155      19602  Anolis carolinensis (Green anole) (American chameleon)
     156      19594  Aphanomyces invadans
     157      19561  Pteropus alecto (Black flying fox)
     158      19522  Wuchereria bancrofti
     159      19300  Myotis brandtii (Brandt's bat)
     160      19200  Trypanosoma cruzi (strain CL Brener)
     161      19196  Necator americanus (Human hookworm)
     162      19062  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     163      19016  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     164      18866  Drosophila simulans (Fruit fly)
     165      18600  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     166      18561  Bos mutus
     167      18488  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     168      18417  Ophiophagus hannah (King cobra) (Naja hannah)
     169      18272  Tetranychus urticae (Two-spotted spider mite)
     170      18134  Plasmodium falciparum
     171      18126  Atta cephalotes (Leafcutter ant)
     172      18049  Anopheles gambiae (African malaria mosquito)
     173      18047  Saprolegnia diclina VS20
     174      17977  Hepatitis C virus subtype 1b
     175      17976  Moniliophthora roreri (strain MCA 2997) (Cocoa frosty pod rot fungus) 
     176      17873  uncultured archaeon
     177      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     178      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     179      17754  Bombyx mori (Silk moth)
     180      17683  Genlisea aurea
     181      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     182      17590  Gibberella moniliformis (strain M3125 / FGSC 7600)  
     183      17486  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     184      17384  Ceratitis capitata (Mediterranean fruit fly) (Tephritis capitata)
     185      17289  Nasonia vitripennis (Parasitic wasp)
     186      17104  Drosophila yakuba (Fruit fly)
     187      17078  Tribolium castaneum (Red flour beetle)
     188      16949  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     189      16919  Meleagris gallopavo (Common turkey)
     190      16723  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     191      16715  Drosophila persimilis (Fruit fly)
     192      16638  Fusarium oxysporum f. sp. lycopersici  
     193      16618  Rhodnius prolixus (Triatomid bug)
     194      16534  Cerapachys biroi (Ant)
     195      16484  Botryobasidium botryosum FD-172 SS1
     196      16430  Ectocarpus siliculosus (Brown alga)
     197      16398  Listeria monocytogenes
     198      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     199      16341  Jaapia argillacea MUCL 33604
     200      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     201      16332  Danaus plexippus (Monarch butterfly)
     202      16276  Trichinella spiralis (Trichina worm)
     203      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     204      16218  Neovison vison (American mink) (Mustela vison)
     205      16208  Ixodes ricinus (Common tick)
     206      16193  Drosophila sechellia (Fruit fly)
     207      16193  Schistosoma japonicum (Blood fluke)
     208      16149  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     209      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     210      16055  Helicobacter pylori (Campylobacter pylori)
     211      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     212      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     213      15735  Rabies virus
     214      15718  Naegleria gruberi (Amoeba)
     215      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     216      15593  Phytophthora ramorum (Sudden oak death agent)
     217      15467  Myotis davidii (David's myotis)
     218      15423  Drosophila willistoni (Fruit fly)
     219      15412  Pestalotiopsis fici W106-1
     220      15380  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     221      15355  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     222      15354  Loa loa (Eye worm) (Filaria loa)
     223      15155  Drosophila ananassae (Fruit fly)
     224      15153  Pythium ultimum DAOM BR144
     225      15064  Pararge aegeria (specked wood butterfly)
     226      15042  Harpegnathos saltator (Jerdon's jumping ant)
     227      15033  Strigamia maritima (European centipede) (Geophilus maritimus)
     228      14944  Acanthamoeba castellanii str. Neff
     229      14928  Drosophila erecta (Fruit fly)
     230      14869  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     231      14801  Camponotus floridanus (Florida carpenter ant)
     232      14794  Drosophila mojavensis (Fruit fly)
     233      14790  Gibberella fujikuroi (strain CBS 195.34 / IMI 58289 / NRRL A-6831)  
     234      14713  Plasmodium chabaudi
     235      14708  Drosophila virilis (Fruit fly)
     236      14654  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     237      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     238      14597  Angomonas deanei
     239      14553  Zootermopsis nevadensis (Dampwood termite)
     240      14417  Volvox carteri (Green alga)
     241      14364  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     242      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     243      14235  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     244      14159  Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
     245      13982  Porcine reproductive and respiratory syndrome virus (PRRSV)
     246      13971  Acromyrmex echinatior (Panamanian leafcutter ant) 
     247      13948  Rhizoctonia solani AG-8 WAC10335
     248      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     249      13880  Clonorchis sinensis (Chinese liver fluke)
     250      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          812290 (  1%)
    Bacteria       67377292 ( 82%)
    Eukaryota      11321436 ( 14%)
    Viruses         2075771 (  3%)
    Other            540107 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 120853 (  1%)           (  0%)
     Other Mammalia       1091632 ( 10%)           (  1%)
     Other Vertebrata     1083898 ( 10%)           (  1%)
     Viridiplantae        2234207 ( 20%)           (  3%)
     Fungi                3049659 ( 27%)           (  4%)
     Insecta              1081018 ( 10%)           (  1%)
     Nematoda              391623 (  3%)           (  0%)
     Other                2268546 ( 20%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1872588             1001-1100   405178
                 51- 100 7484538             1101-1200   294525
                101- 150 8722439             1201-1300   210912
                151- 200 8211751             1301-1400   118628
                201- 250 8432689             1401-1500   104072
                251- 300 8261548             1501-1600    67466
                301- 350 7437063             1601-1700    52780
                351- 400 5505220             1701-1800    32695
                401- 450 4806572             1801-1900    28240
                451- 500 3882134             1901-2000    21880
                501- 550 2463826             2001-2100    21817
                551- 600 1876075             2101-2200    29915
                601- 650 1327154             2201-2300    16981
                651- 700 1064954             2301-2400    14181
                701- 750  820845             2401-2500    12431
                751- 800  700882             >2500        84965
                801- 850  544894
                851- 900  500891
                901- 950  341340
                951-1000  236107



   The average sequence length in UniProtKB/TrEMBL is   315 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    93121353                1.13                                                    
   Submitted to EMBL/GenBank/DDBJ  64787191  61852899      0.79                                                    
   Journal                         26173752  24760146      0.32                                                    
   Submitted to other databases     2132521   2125198      0.03                                                    
   Thesis                             18750     18691     <0.01                                                    
   Book citation                       9138      9075     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 518065


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                     138807601                1.69                                                    
   CATALYTIC ACTIVITY              10068012   9220402      0.12     4                                              
   CAUTION                         58833578  58779127      0.72     1                                              
   COFACTOR                         4545971   4169931      0.06     8                                              
   DOMAIN                            498234    477565      0.01     9                                              
   ENZYME REGULATION                 164543    164543     <0.01    11                                              
   FUNCTION                        11608859  11077531      0.14     3                                              
   INTERACTION                         1723      1723     <0.01    12                                              
   MISCELLANEOUS                     319049    318834     <0.01    10                                              
   PATHWAY                          5255083   4740538      0.06     7                                              
   SIMILARITY                      31700866  24248445      0.39     2                                              
   SUBCELLULAR LOCATION             9460060   9161027      0.12     5                                              
   SUBUNIT                          6351623   6307444      0.08     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      52023096                0.63                                                    
   ACT_SITE                         4439672   2802957      0.05     5                                              
   BINDING                          9746292   2506599      0.12     1                                              
   CARBOHYD                             784       295     <0.01    27                                              
   CHAIN                             906854    722316      0.01    10                                              
   COILED                            168809     83948     <0.01    16                                              
   COMPBIAS                           21809     21653     <0.01    23                                              
   CROSSLNK                           27366     19619     <0.01    21                                              
   DISULFID                          198777    153576     <0.01    15                                              
   DNA_BIND                          149675    140825     <0.01    18                                              
   DOMAIN                           1832267   1452945      0.02     8                                              
   INIT_MET                           26944     26944     <0.01    22                                              
   INTRAMEM                             392        56     <0.01    28                                              
   LIPID                             143656     71828     <0.01    19                                              
   METAL                            8996557   2375674      0.11     3                                              
   MOD_RES                           695162    644216      0.01    12                                              
   MOTIF                             545788    350667      0.01    14                                              
   NON_STD                             1963      1838     <0.01    26                                              
   NON_TER                          9072442   6121294      0.11     2                                              
   NP_BIND                          3661422   2192353      0.04     6                                              
   PEPTIDE                              126       126     <0.01    29                                              
   PROPEP                              8825      8825     <0.01    24                                              
   REGION                           3046274   1671969      0.04     7                                              
   REPEAT                            119966     27950     <0.01    20                                              
   SIGNAL                            805521    801787      0.01    11                                              
   SITE                             1331426    667809      0.02     9                                              
   TOPO_DOM                          615521    129314      0.01    13                                              
   TRANSIT                             2059      2047     <0.01    25                                              
   TRANSMEM                         5302537    952029      0.06     4                                              
   ZN_FING                           154210    138068     <0.01    17                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             765886470                9.33                                                    
   Allergome                           3721      3082     <0.01    84   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   102   Organism-specific databases                
   ArrayExpress                       61940     61940     <0.01    56   Gene expression databases                  
   BRENDA                              2597      2570     <0.01    87   Enzyme and pathway databases               
   Bgee                               96593     96593     <0.01    50   Gene expression databases                  
   BindingDB                           5712      5712     <0.01    79   Chemistry                                  
   BioCyc                           5769771   5692242      0.07    21   Enzyme and pathway databases               
   CAZy                               73862     69409     <0.01    54   Protein family/group databases             
   CGD                                 6776      6776     <0.01    77   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   108   2D gel databases                           
   CTD                               457009    455648      0.01    37   Organism-specific databases                
   ChEMBL                               660       660     <0.01    95   Chemistry                                  
   ChiTaRS                            64353     64353     <0.01    55   Other                                      
   ConoServer                           159       159     <0.01   100   Organism-specific databases                
   DIP                                 3053      3048     <0.01    86   Protein-protein interaction databases      
   DNASU                              41921     41595     <0.01    63   Protocols and materials databases          
   EMBL                            86544781  80918518      1.05     3   Sequence databases                         
   Ensembl                          1110300   1095580      0.01    30   Genome annotation databases                
   EnsemblBacteria                 37492027  36892478      0.46     7   Genome annotation databases                
   EnsemblFungi                      409069    406581     <0.01    38   Genome annotation databases                
   EnsemblMetazoa                    903821    887518      0.01    33   Genome annotation databases                
   EnsemblPlants                     777150    739430      0.01    34   Genome annotation databases                
   EnsemblProtists                   199527    196899     <0.01    42   Genome annotation databases                
   EuPathDB                          161170    161169     <0.01    48   Organism-specific databases                
   EvolutionaryTrace                   7910      7910     <0.01    76   Other                                      
   FlyBase                           198837    197367     <0.01    43   Organism-specific databases                
   GO                             121128061  41767941      1.47     2   Ontologies                                 
   Gene3D                          40574590  31728155      0.49     5   Family and domain databases                
   GeneID                          11549981  11272854      0.14    13   Genome annotation databases                
   GeneTree                         1024648   1024590      0.01    31   Phylogenomic databases                     
   Genevestigator                     82675     82671     <0.01    51   Gene expression databases                  
   GenoList                           14727     14454     <0.01    72   Organism-specific databases                
   GenomeRNAi                         24286     24286     <0.01    68   Other                                      
   Gramene                           197042    197042     <0.01    44   Organism-specific databases                
   GuidetoPHARMACOLOGY                   20        20     <0.01   106   Chemistry                                  
   H-InvDB                              600       453     <0.01    96   Organism-specific databases                
   HAMAP                            8763211   8641293      0.11    16   Family and domain databases                
   HGNC                               46641     46561     <0.01    61   Organism-specific databases                
   HOGENOM                          3644858   3644815      0.04    26   Phylogenomic databases                     
   HOVERGEN                          303126    303118     <0.01    40   Phylogenomic databases                     
   InParanoid                        180571    180571     <0.01    45   Phylogenomic databases                     
   IntAct                             11918     11918     <0.01    73   Protein-protein interaction databases      
   InterPro                       156927892  53350766      1.91     1   Family and domain databases                
   KEGG                            10390150  10151942      0.13    14   Genome annotation databases                
   KO                               4450748   4427816      0.05    24   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    80   Organism-specific databases                
   Leproma                             1272      1270     <0.01    90   Organism-specific databases                
   MEROPS                            175068    175068     <0.01    46   Protein family/group databases             
   MGI                                52365     51925     <0.01    58   Organism-specific databases                
   MIM                                    4         4     <0.01   109   Organism-specific databases                
   MINT                               10134     10133     <0.01    74   Protein-protein interaction databases      
   MaxQB                               1501      1501     <0.01    89   Proteomic databases                        
   NextBio                           204349    204348     <0.01    41   Other                                      
   OGP                                    3         3     <0.01   110   2D gel databases                           
   OMA                              7295955   7295949      0.09    19   Phylogenomic databases                     
   OrthoDB                          5180770   5180768      0.06    22   Phylogenomic databases                     
   PANTHER                          8324111   8105105      0.10    17   Family and domain databases                
   PATRIC                           8248432   8248235      0.10    18   Genome annotation databases                
   PDB                                24287     12901     <0.01    67   3D structure databases                     
   PDBsum                             23923     12730     <0.01    69   3D structure databases                     
   PIR                               171394    138557     <0.01    47   Sequence databases                         
   PIRSF                            6966067   6910127      0.08    20   Family and domain databases                
   PMAP-CutDB                           199       199     <0.01    99   Other                                      
   PRIDE                             920713    920713      0.01    32   Proteomic databases                        
   PRINTS                           9595111   8648399      0.12    15   Family and domain databases                
   PRO                                26953     26952     <0.01    65   Other                                      
   PROSITE                         33157563  22349280      0.40     8   Family and domain databases                
   PaxDb                              28443     28441     <0.01    64   Proteomic databases                        
   PeptideAtlas                         127       127     <0.01   101   Proteomic databases                        
   PeroxiBase                          2588      2580     <0.01    88   Protein family/group databases             
   Pfam                            68344738  49799436      0.83     4   Family and domain databases                
   PharmGKB                            3275      3275     <0.01    85   Organism-specific databases                
   PhosSite                             889       877     <0.01    93   PTM databases                              
   PhosphoSite                         1088      1088     <0.01    92   PTM databases                              
   PhylomeDB                         366842    366842     <0.01    39   Phylogenomic databases                     
   PomBase                                1         1     <0.01   111   Organism-specific databases                
   PptaseDB                              38        36     <0.01   104   Protein family/group databases             
   ProDom                           1313239   1275367      0.02    29   Family and domain databases                
   ProMEX                              5108      5108     <0.01    81   Proteomic databases                        
   ProteinModelPortal              17006780  17006780      0.21    11   3D structure databases                     
   PseudoCAP                           4506      4500     <0.01    82   Organism-specific databases                
   REBASE                             48270     48248     <0.01    59   Protein family/group databases             
   REPRODUCTION-2DPAGE                   65        64     <0.01   103   2D gel databases                           
   RGD                                21600     20571     <0.01    71   Organism-specific databases                
   Reactome                           97213     43088     <0.01    49   Enzyme and pathway databases               
   RefSeq                          17738113  14302499      0.22    10   Sequence databases                         
   SABIO-RK                             485       485     <0.01    97   Enzyme and pathway databases               
   SGD                                    7         7     <0.01   107   Organism-specific databases                
   SMART                           14406964  10994713      0.18    12   Family and domain databases                
   SMR                              4687925   4687925      0.06    23   3D structure databases                     
   STRING                           3131080   3130908      0.04    27   Protein-protein interaction databases      
   SUPFAM                          38398389  30911114      0.47     6   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   105   2D gel databases                           
   SignaLink                           4221      4219     <0.01    83   Enzyme and pathway databases               
   TAIR                               21651     21533     <0.01    70   Organism-specific databases                
   TCDB                                6171      6163     <0.01    78   Protein family/group databases             
   TIGRFAMs                        17896364  16312623      0.22     9   Family and domain databases                
   TreeFam                           587909    587907      0.01    35   Phylogenomic databases                     
   TubercuList                         1101      1100     <0.01    91   Organism-specific databases                
   UCSC                               57606     57409     <0.01    57   Genome annotation databases                
   UniGene                           553675    520737      0.01    36   Sequence databases                         
   UniPathway                       4023420   3734299      0.05    25   Enzyme and pathway databases               
   VectorBase                         78248     77731     <0.01    52   Genome annotation databases                
   World-2DPAGE                         671       666     <0.01    94   2D gel databases                           
   WormBase                           43111     42989     <0.01    62   Organism-specific databases                
   Xenbase                            25051     24992     <0.01    66   Organism-specific databases                
   ZFIN                               47331     47257     <0.01    60   Organism-specific databases                
   dictyBase                           7997      7775     <0.01    75   Organism-specific databases                
   eggNOG                           2754615   2754581      0.03    28   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    53   Organism-specific databases                
   mycoCLAP                             414       414     <0.01    98   Protein family/group databases             

Number of explicitly cross-referenced databases: 132


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.91   Gln (Q) 3.98   Leu (L) 9.93   Ser (S) 6.35
   Arg (R) 5.34   Glu (E) 6.08   Lys (K) 5.23   Thr (T) 5.56
   Asn (N) 4.14   Gly (G) 7.21   Met (M) 2.48   Trp (W) 1.25
   Asp (D) 5.43   His (H) 2.21   Phe (F) 3.98   Tyr (Y) 3.07
   Cys (C) 1.09   Ile (I) 6.21   Pro (P) 4.50   Val (V) 6.93

   Asx (B) 0      Glx (Z) 0      Xaa (X) 0.01


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Ile, Glu, Thr, Asp, Arg, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 758682
Total number of entries encoded on a Plasmid: 436692
Total number of entries encoded on a Plastid: 36272
Total number of entries encoded on a Plastid; Apicoplast: 929
Total number of entries encoded on a Plastid; Chloroplast: 282003
Total number of entries encoded on a Plastid; Cyanelle: 49
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1612