Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Release 2013_07 of 26-Jun-2013 of UniProtKB/TrEMBL contains 39870577 sequence entries,
comprising 12710398609 amino acids .

4424183 sequences have been added since release 2013_06, the sequence data of
19309 existing entries has been updated and the annotations of
5720960 entries have been revised. This represents an increase of 12%.

Number of fragments: 4288475

Protein existence (PE):              entries      %
1: Evidence at protein level           20216     0.05%
2: Evidence at transcript level       812810     2.04%
3: Inferred from homology            8298200    20.81%
4: Predicted                        30739351    77.10%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 420885

   The first twenty species represent 1880024 sequences:   4.7 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:17407
                            2x:69383
                            3x:37407
                            4x:26173
                            5x:15865
                            6x:11346
                            7x: 8582
                            8x: 6759
                            9x: 5370
                           10x:10453
                       11- 20x:29607
                       21- 50x: 9993
                       51-100x: 3845
                         >100x:12025


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     540812  Human immunodeficiency virus 1
       2     198853  uncultured bacterium
       3     113505  Homo sapiens (Human)
       4      96879  Oryza sativa subsp. japonica (Rice)
       5      86162  Hepatitis C virus
       6      73827  Glycine max (Soybean) (Glycine hispida)
       7      70410  Hordeum vulgare var. distichum (Two-rowed barley)
       8      69137  Macaca mulatta (Rhesus macaque)
       9      60526  Zea mays (Maize)
      10      59545  Hepatitis B virus (HBV)
      11      56486  Mus musculus (Mouse)
      12      56143  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      13      54890  Solanum tuberosum (Potato)
      14      54112  Vitis vinifera (Grape)
      15      52254  Danio rerio (Zebrafish) (Brachydanio rerio)
      16      50601  Trichomonas vaginalis
      17      49237  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      18      48893  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      19      44560  Populus trichocarpa (Western balsam poplar) 
      20      43192  Callithrix jacchus (White-tufted-ear marmoset)
      21      41473  Arabidopsis thaliana (Mouse-ear cress)
      22      41202  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      23      39850  Paramecium tetraurelia
      24      39832  Oryza sativa subsp. indica (Rice)
      25      39300  Setaria italica (Foxtail millet) (Panicum italicum)
      26      38791  Mustela putorius furo (European domestic ferret) (Mustela furo)
      27      38163  human gut metagenome
      28      36631  Drosophila melanogaster (Fruit fly)
      29      36522  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      30      35899  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      31      35621  Ailuropoda melanoleuca (Giant panda)
      32      35599  Emiliania huxleyi CCMP1516
      33      35195  Acyrthosiphon pisum (Pea aphid)
      34      35066  Caenorhabditis japonica
      35      34927  Simian immunodeficiency virus (SIV)
      36      34830  Physcomitrella patens subsp. patens (Moss)
      37      34569  Thalassiosira oceanica (Marine diatom)
      38      33821  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      39      33253  Selaginella moellendorffii (Spikemoss)
      40      32767  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      41      32342  Oryza brachyantha
      42      32177  Sus scrofa (Pig)
      43      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      44      32094  Oryza glaberrima (African rice)
      45      31848  Pan troglodytes (Chimpanzee)
      46      31384  Ricinus communis (Castor bean)
      47      30921  Daphnia pulex (Water flea)
      48      30300  Caenorhabditis brenneri (Nematode worm)
      49      30146  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      50      29815  Amphimedon queenslandica (Sponge)
      51      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      52      29317  Pristionchus pacificus (Parasitic nematode)
      53      29183  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      54      29054  Oikopleura dioica (Tunicate)
      55      28833  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      56      28825  Capsella rubella
      57      28753  Escherichia coli
      58      28610  Prunus persica (Peach) (Amygdalus persica)
      59      28489  Canis familiaris (Dog) (Canis lupus familiaris)
      60      28075  Gasterosteus aculeatus (Three-spined stickleback)
      61      27983  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      62      27738  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      63      27504  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      64      27453  Equus caballus (Horse)
      65      27089  Gorilla gorilla gorilla (Lowland gorilla)
      66      26824  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      67      25909  Oryzias latipes (Medaka fish) (Japanese ricefish)
      68      25796  Loxodonta africana (African elephant)
      69      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      70      25636  Bos taurus (Bovine)
      71      25622  Rattus norvegicus (Rat)
      72      25091  Oryctolagus cuniculus (Rabbit)
      73      24904  Nematostella vectensis (Starlet sea anemone)
      74      24643  Tetrahymena thermophila (strain SB210)
      75      24590  Guillardia theta CCMP2712
      76      24373  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      77      24208  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      78      23716  Ornithorhynchus anatinus (Duckbill platypus)
      79      23565  Oxytricha trifallax
      80      23502  Latimeria chalumnae (West Indian ocean coelacanth)
      81      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      82      22750  Monodelphis domestica (Gray short-tailed opossum)
      83      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      84      22553  Caenorhabditis elegans
      85      22313  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      86      22163  gut metagenome
      87      21548  Heterocephalus glaber (Naked mole rat)
      88      21338  Caenorhabditis briggsae
      89      21260  Gallus gallus (Chicken)
      90      21106  Ixodes scapularis (Black-legged tick) (Deer tick)
      91      20936  Felis catus (Cat) (Felis silvestris catus)
      92      20861  Myotis lucifugus (Little brown bat)
      93      20838  Tupaia chinensis (Chinese tree shrew)
      94      20758  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
      95      20512  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
      96      20133  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      97      20114  Ciona savignyi (Pacific transparent sea squirt)
      98      20072  Cavia porcellus (Guinea pig)
      99      19985  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     100      19816  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     101      19680  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     102      19551  Anolis carolinensis (Green anole) (American chameleon)
     103      19544  Pteropus alecto (Black flying fox)
     104      19438  Wuchereria bancrofti
     105      19331  Toxoplasma gondii
     106      19200  Trypanosoma cruzi (strain CL Brener)
     107      19057  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     108      18943  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     109      18856  Drosophila simulans (Fruit fly)
     110      18771  mine drainage metagenome
     111      18592  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     112      18555  Bos grunniens mutus
     113      18121  Atta cephalotes (Leafcutter ant)
     114      18023  Anopheles gambiae (African malaria mosquito)
     115      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     116      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
     117      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     118      17512  Bombyx mori (Silk moth)
     119      17412  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     120      17287  Anas platyrhynchos (Domestic duck) (Anas boschas)
     121      17282  Nasonia vitripennis (Parasitic wasp)
     122      17046  Tribolium castaneum (Red flour beetle)
     123      17040  Drosophila yakuba (Fruit fly)
     124      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     125      16899  Meleagris gallopavo (Common turkey)
     126      16714  Drosophila persimilis (Fruit fly)
     127      16698  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     128      16643  Fusarium oxysporum f. sp. lycopersici  
     129      16538  Plasmodium falciparum
     130      16426  Ectocarpus siliculosus (Brown alga)
     131      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     132      16319  Hepatitis C virus subtype 1b
     133      16315  Danaus plexippus (Monarch butterfly)
     134      16273  Trichinella spiralis (Trichina worm)
     135      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     136      16188  Drosophila sechellia (Fruit fly)
     137      16147  Schistosoma japonicum (Blood fluke)
     138      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     139      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     140      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     141      15716  Naegleria gruberi (Amoeba)
     142      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     143      15568  Phytophthora ramorum (Sudden oak death agent)
     144      15461  Myotis davidii (David's myotis)
     145      15421  Drosophila willistoni (Fruit fly)
     146      15371  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     147      15354  Loa loa (Eye worm) (Filaria loa)
     148      15345  Fusarium oxysporum f. sp. cubense race 1
     149      15225  Pythium ultimum
     150      15177  Hepatitis C virus subtype 1a
     151      15144  Drosophila ananassae (Fruit fly)
     152      15040  Harpegnathos saltator (Jerdon's jumping ant)
     153      14938  Acanthamoeba castellanii str. Neff
     154      14927  Drosophila erecta (Fruit fly)
     155      14858  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     156      14853  Dendroctonus ponderosae (mountain pine beetle)
     157      14801  Camponotus floridanus (Florida carpenter ant)
     158      14791  Drosophila mojavensis (Fruit fly)
     159      14713  Plasmodium chabaudi
     160      14704  Drosophila virilis (Fruit fly)
     161      14652  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     162      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     163      14417  Volvox carteri (Green alga)
     164      14341  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     165      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     166      14293  Ralstonia solanacearum (Pseudomonas solanacearum)
     167      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     168      14164  uncultured archaeon
     169      14147  Fusarium oxysporum f. sp. cubense race 4
     170      13970  Acromyrmex echinatior (Panamanian leafcutter ant) 
     171      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     172      13876  Clonorchis sinensis (Chinese liver fluke)
     173      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     174      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     175      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     176      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     177      13587  Trypanosoma cruzi
     178      13551  Rabies virus
     179      13345  Aspergillus flavus 
     180      13336  Colletotrichum orbiculare   
     181      13266  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     182      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     183      13062  Pseudocercospora fijiensis CIRAD86
     184      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     185      12983  Albugo laibachii Nc14
     186      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     187      12950  Stigmatella aurantiaca (strain DW4/3-1)
     188      12900  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     189      12858  Magnaporthe oryzae Y34
     190      12857  Bipolaris maydis C5
     191      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     192      12711  Magnaporthe oryzae P131
     193      12705  Bipolaris maydis ATCC 48331
     194      12696  Trypanosoma congolense (strain IL3000)
     195      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     196      12681  Schistosoma mansoni (Blood fluke)
     197      12666  Porcine reproductive and respiratory syndrome virus (PRRSV)
     198      12623  Xenopus laevis (African clawed frog)
     199      12586  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     200      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     201      12440  Polysphondylium pallidum (Cellular slime mold)
     202      12414  Dothistroma septosporum NZE10
     203      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     204      12352  Dictyostelium purpureum (Slime mold)
     205      12263  Helicobacter pylori (Campylobacter pylori)
     206      12197  Rhizoctonia solani AG-1 IB
     207      12174  Bipolaris sorokiniana ND90Pr
     208      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     209      12078  Ceriporiopsis subvermispora B
     210      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     211      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     212      11941  Emericella nidulans  
     213      11931  Apis mellifera (Honeybee)
     214      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     215      11780  Piriformospora indica (strain DSM 11827)
     216      11752  Chondrocladia sp. SMF<DEU
     217      11751  Cladorhiza sp. SMF<DEU
     218      11750  Abyssocladia sp. SMF<DEU
     219      11726  Phelloderma sp. SMF<DEU
     220      11719  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     221      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     222      11687  Setosphaeria turcica Et28A
     223      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     224      11682  Eutypa lata UCREL1
     225      11678  Anopheles darlingi (Mosquito)
     226      11644  Plasmodium berghei (strain Anka)
     227      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     228      11567  Trichoplax adhaerens (Trichoplax reptans)
     229      11557  Trypanosoma vivax (strain Y486)
     230      11515  Puccinia triticina (isolate 1-1 / race 1 (BBBD)) (Brown leaf rust fungus)
     231      11514  Aureococcus anophagefferens (Harmful bloom alga)
     232      11499  Brugia malayi (Filarial nematode worm)
     233      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     234      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     235      11396  Aspergillus oryzae (strain 3.042) (Yellow koji mold)
     236      11303  Magnaporthe poae (strain ATCC 64411 / 73-15) (Kentucky bluegrass fungus)
     237      11278  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     238      11211  Ktedonobacter racemifer DSM 44963
     239      11211  Agaricus bisporus var. burnettii (strain JB137-S8 / ATCC MYA-4627 / FGSC 10392) 
     240      11205  Rhipicephalus pulchellus
     241      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     242      11018  Botryotinia fuckeliana BcDW1
     243      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     244      10964  Streptomyces clavuligerus 
     245      10949  Aspergillus niger 
     246      10839  Pediculus humanus subsp. corporis (Body louse)
     247      10822  Chaetomium globosum  
     248      10667  Klebsiella pneumoniae
     249      10570  Metarhizium anisopliae (strain ARSEF 23 / ATCC MYA-3075)
     250      10563  Amycolatopsis mediterranei S699


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          682461 (  2%)
    Bacteria       29347555 ( 74%)
    Eukaryota       8009624 ( 20%)
    Viruses         1727348 (  4%)
    Other            103588 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 113541 (  1%)           (  0%)
     Other Mammalia        972442 ( 12%)           (  2%)
     Other Vertebrata      832674 ( 10%)           (  2%)
     Viridiplantae        1659747 ( 21%)           (  4%)
     Fungi                1895195 ( 24%)           (  5%)
     Insecta               837150 ( 10%)           (  2%)
     Nematoda              253317 (  3%)           (  1%)
     Other                1445558 ( 18%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1081342             1001-1100   218488
                 51- 100 3539453             1101-1200   152016
                101- 150 3949730             1201-1300   110144
                151- 200 3820893             1301-1400    66193
                201- 250 3856786             1401-1500    54798
                251- 300 3733013             1501-1600    37519
                301- 350 3387587             1601-1700    27555
                351- 400 2533453             1701-1800    20825
                401- 450 2199647             1801-1900    16853
                451- 500 1803753             1901-2000    14188
                501- 550 1159728             2001-2100    11479
                551- 600  891511             2101-2200    11652
                601- 650  652231             2201-2300     8958
                651- 700  513845             2301-2400     7197
                701- 750  428711             2401-2500     6358
                751- 800  369740             >2500        49096
                801- 850  287879
                851- 900  257424
                901- 950  176983
                951-1000  125074



   The average sequence length in UniProtKB/TrEMBL is   318 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    47453309                1.19                                                    
   Submitted to EMBL/GenBank/DDBJ  27498574  25804171      0.69                                                    
   Journal                         18194441  17193814      0.46                                                    
   Submitted to other databases     1743312   1733326      0.04                                                    
   Thesis                             10252     10194     <0.01                                                    
   Book citation                       6729      6679     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 467537


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      50589992                1.27                                                    
   CATALYTIC ACTIVITY               3854286   3509040      0.10     4                                              
   CAUTION                         22002732  21986480      0.55     1                                              
   COFACTOR                         1537907   1425228      0.04     8                                              
   DOMAIN                            159062    152983     <0.01     9                                              
   FUNCTION                         4362880   4101790      0.11     3                                              
   INTERACTION                         1241      1241     <0.01    11                                              
   MISCELLANEOUS                     103932    103736     <0.01    10                                              
   PATHWAY                          1942891   1756672      0.05     7                                              
   SIMILARITY                      10963854   9534094      0.27     2                                              
   SUBCELLULAR LOCATION             3460064   3304373      0.09     5                                              
   SUBUNIT                          2201143   2177927      0.06     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       8247210                0.21                                                    
   CHAIN                             863589    707932      0.02     2                                              
   NON_TER                          6695204   4290216      0.17     1                                              
   SIGNAL                            687464    684052      0.02     3                                              
   TRANSIT                              953       953     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             412627363               10.35                                                    
   Allergome                           3441      2811     <0.01    84   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   101   Organism-specific databases                
   ArrayExpress                      181976    181976     <0.01    45   Gene expression databases                  
   BRENDA                              2649      2620     <0.01    86   Enzyme and pathway databases               
   Bgee                               99985     99985     <0.01    50   Gene expression databases                  
   BindingDB                           5832      5832     <0.01    77   Other                                      
   BioCyc                           5640220   5572900      0.14    16   Enzyme and pathway databases               
   CAZy                               74016     69543     <0.01    55   Protein family/group databases             
   CGD                                 7033      7033     <0.01    76   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   107   2D gel databases                           
   CTD                               343851    342536      0.01    38   Organism-specific databases                
   ChEMBL                               575       575     <0.01    93   Other                                      
   ChiTaRS                            66075     66075     <0.01    56   Other                                      
   ConoServer                           160       160     <0.01    98   Organism-specific databases                
   DIP                                 2819      2814     <0.01    85   Protein-protein interaction databases      
   DNASU                              42379     42045     <0.01    62   Protocols and materials databases          
   EMBL                            42985867  38851135      1.08     3   Sequence databases                         
   Ensembl                          1002626    988146      0.03    29   Genome annotation databases                
   EnsemblBacteria                 17887154  17613603      0.45     5   Genome annotation databases                
   EnsemblFungi                      351527    349533      0.01    37   Genome annotation databases                
   EnsemblMetazoa                    675975    660741      0.02    32   Genome annotation databases                
   EnsemblPlants                     654277    620753      0.02    33   Genome annotation databases                
   EnsemblProtists                   156294    153898     <0.01    47   Genome annotation databases                
   EuPathDB                           98298     98155     <0.01    51   Organism-specific databases                
   EvolutionaryTrace                   8057      8057     <0.01    74   Other                                      
   FlyBase                           196137    194669     <0.01    43   Organism-specific databases                
   GO                              69969308  22189221      1.75     2   Ontologies                                 
   Gene3D                          16164626  12757468      0.41     8   Family and domain databases                
   GeneID                           9806109   9550403      0.25    10   Genome annotation databases                
   GeneTree                          835031    834974      0.02    30   Phylogenomic databases                     
   Genevestigator                     86639     86633     <0.01    52   Gene expression databases                  
   GenoList                           14733     14460     <0.01    72   Organism-specific databases                
   GenomeRNAi                         19688     19687     <0.01    67   Other                                      
   Gramene                           204087    204087      0.01    42   Organism-specific databases                
   H-InvDB                              618       470     <0.01    92   Organism-specific databases                
   HAMAP                            3753183   3705656      0.09    20   Family and domain databases                
   HGNC                               47697     47625     <0.01    59   Organism-specific databases                
   HOGENOM                          3654332   3654287      0.09    21   Phylogenomic databases                     
   HOVERGEN                          305672    305661      0.01    39   Phylogenomic databases                     
   IPI                               280400    279507      0.01    40   Sequence databases                         
   InParanoid                        186646    186646     <0.01    44   Phylogenomic databases                     
   IntAct                             17278     17278     <0.01    70   Protein-protein interaction databases      
   InterPro                        78016077  27389719      1.96     1   Family and domain databases                
   KEGG                             8703238   8505171      0.22    12   Genome annotation databases                
   KO                               3491832   3475462      0.09    22   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    79   Organism-specific databases                
   Leproma                             1272      1270     <0.01    88   Organism-specific databases                
   MEROPS                            138815    138814     <0.01    49   Protein family/group databases             
   MGI                                51880     51393     <0.01    58   Organism-specific databases                
   MINT                               10273     10272     <0.01    73   Protein-protein interaction databases      
   NextBio                           209279    209262      0.01    41   Other                                      
   OMA                              4858518   4858301      0.12    19   Phylogenomic databases                     
   OrthoDB                           553232    553189      0.01    34   Phylogenomic databases                     
   PANTHER                          5197651   4895654      0.13    18   Family and domain databases                
   PATRIC                           8286176   8286059      0.21    13   Genome annotation databases                
   PDB                                19652     10984     <0.01    68   3D structure databases                     
   PDBsum                             19376     10778     <0.01    69   3D structure databases                     
   PIR                               172427    139599     <0.01    46   Sequence databases                         
   PIRSF                            3158606   3155409      0.08    23   Family and domain databases                
   PMAP-CutDB                           209       209     <0.01    96   Other                                      
   PRIDE                             458080    458080      0.01    36   Proteomic databases                        
   PRINTS                           5291012   4729704      0.13    17   Family and domain databases                
   PROSITE                         17537175  11645562      0.44     6   Family and domain databases                
   Pathway_Interaction_DB                10         8     <0.01   106   Enzyme and pathway databases               
   PaxDb                              29103     29101     <0.01    64   Proteomic databases                        
   PeptideAtlas                         129       129     <0.01    99   Proteomic databases                        
   PeroxiBase                          2577      2569     <0.01    87   Protein family/group databases             
   Pfam                            34841351  25535862      0.87     4   Family and domain databases                
   PharmGKB                            3624      3624     <0.01    83   Organism-specific databases                
   PhosphoSite                         1130      1130     <0.01    89   PTM databases                              
   PhylomeDB                         144842    144842     <0.01    48   Phylogenomic databases                     
   PomBase                               40        27     <0.01   102   Organism-specific databases                
   PptaseDB                              36        35     <0.01   103   Protein family/group databases             
   ProDom                            704655    677155      0.02    31   Family and domain databases                
   ProMEX                              5238      5238     <0.01    78   Proteomic databases                        
   ProtClustDB                      2719708   2719696      0.07    26   Phylogenomic databases                     
   ProteinModelPortal               9640241   9640241      0.24    11   3D structure databases                     
   PseudoCAP                           4534      4528     <0.01    80   Organism-specific databases                
   REBASE                             38073     38064     <0.01    63   Protein family/group databases             
   REPRODUCTION-2DPAGE                   66        65     <0.01   100   2D gel databases                           
   RGD                                21093     20186     <0.01    66   Organism-specific databases                
   Reactome                             180       145     <0.01    97   Enzyme and pathway databases               
   RefSeq                           9850605   9562860      0.25     9   Sequence databases                         
   SABIO-RK                             482       482     <0.01    94   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   105   Organism-specific databases                
   SMART                            7774632   5891681      0.19    15   Family and domain databases                
   SMR                              2071072   2071072      0.05    27   3D structure databases                     
   STRING                           2904093   2904024      0.07    24   Protein-protein interaction databases      
   SUPFAM                          16318781  13172320      0.41     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   104   2D gel databases                           
   SignaLink                           4411      4409     <0.01    81   Enzyme and pathway databases               
   TAIR                               15296     15223     <0.01    71   Organism-specific databases                
   TCDB                                4124      4117     <0.01    82   Protein family/group databases             
   TIGRFAMs                         8257785   7536476      0.21    14   Family and domain databases                
   TubercuList                         1108      1107     <0.01    90   Organism-specific databases                
   UCSC                               58198     58027     <0.01    57   Genome annotation databases                
   UniGene                           551892    522187      0.01    35   Sequence databases                         
   UniPathway                       1600182   1489649      0.04    28   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    53   Genome annotation databases                
   World-2DPAGE                         673       668     <0.01    91   2D gel databases                           
   WormBase                           42416     42244     <0.01    61   Organism-specific databases                
   Xenbase                            25583     25514     <0.01    65   Organism-specific databases                
   ZFIN                               45655     45084     <0.01    60   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    75   Organism-specific databases                
   eggNOG                           2768514   2768494      0.07    25   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    54   Organism-specific databases                
   mycoCLAP                             422       422     <0.01    95   Protein family/group databases             

Number of explicitly cross-referenced databases: 128


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.67   Gln (Q) 4.00   Leu (L) 9.97   Ser (S) 6.57
   Arg (R) 5.38   Glu (E) 6.21   Lys (K) 5.29   Thr (T) 5.55
   Asn (N) 4.10   Gly (G) 7.09   Met (M) 2.48   Trp (W) 1.29
   Asp (D) 5.32   His (H) 2.19   Phe (F) 4.05   Tyr (Y) 3.05
   Cys (C) 1.20   Ile (I) 6.06   Pro (P) 4.60   Val (V) 6.79

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.02


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 628837
Total number of entries encoded on a Plasmid: 343696
Total number of entries encoded on a Plastid: 26528
Total number of entries encoded on a Plastid; Apicoplast: 719
Total number of entries encoded on a Plastid; Chloroplast: 230497
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1031