Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2014_10 STATISTICS


1.  INTRODUCTION

Release 2014_10 of 29-Oct-2014 of UniProtKB/TrEMBL contains 86536393 sequence entries,
comprising 27389160364 amino acids.

2646634 sequences have been added since release 2014_09, the sequence data of
11693 existing entries has been updated and the annotations of
33921871 entries have been revised. This represents an increase of 3%.

Number of fragments: 6385327

Protein existence (PE):              entries      %
1: Evidence at protein level           42628     0.05%
2: Evidence at transcript level      1007112     1.16%
3: Inferred from homology           20758196    23.99%
4: Predicted                        64728457    74.80%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 519715

   The first twenty species represent 2560549 sequences:     3 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:20783
                            2x:82922
                            3x:44935
                            4x:31930
                            5x:18718
                            6x:13724
                            7x: 9877
                            8x: 7925
                            9x: 6241
                           10x:11340
                       11- 20x:41276
                       21- 50x:13285
                       51-100x: 5013
                         >100x:24692


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     611181  Human immunodeficiency virus 1
       2     352020  marine sediment metagenome
       3     236743  uncultured bacterium
       4     120799  Homo sapiens (Human)
       5     110303  Triticum aestivum (Wheat)
       6     100505  Brassica napus (Rape)
       7      97177  Hepatitis C virus
       8      96660  Oryza sativa subsp. japonica (Rice)
       9      90350  Hepatitis B virus (HBV)
      10      78201  Escherichia coli
      11      73978  Glycine max (Soybean) (Glycine hispida)
      12      73055  mine drainage metagenome
      13      70544  Hordeum vulgare var. distichum (Two-rowed barley)
      14      69592  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      15      69555  Macaca mulatta (Rhesus macaque)
      16      67671  Phytophthora parasitica (Potato buckeye rot agent)
      17      65421  Ancylostoma ceylanicum
      18      60710  human gut metagenome
      19      58550  Mus musculus (Mouse)
      20      57534  Zea mays (Maize)
      21      55041  Callithrix jacchus (White-tufted-ear marmoset)
      22      54930  Solanum tuberosum (Potato)
      23      54204  Vitis vinifera (Grape)
      24      53351  Danio rerio (Zebrafish) (Brachydanio rerio)
      25      50661  Trichomonas vaginalis
      26      49734  Oncorhynchus mykiss (Rainbow trout) (Salmo gairdneri)
      27      49274  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      28      48911  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      29      48116  Vibrio parahaemolyticus
      30      47063  Populus trichocarpa (Western balsam poplar) 
      31      44332  Citrus sinensis (Sweet orange) (Citrus aurantium var. sinensis)
      32      44277  Eucalyptus grandis (Flooded gum)
      33      41211  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      34      40872  Theobroma cacao (Cacao) (Cocoa)
      35      39923  Reticulomyxa filosa
      36      39906  Oryza sativa subsp. indica (Rice)
      37      39848  Paramecium tetraurelia
      38      39609  Arabidopsis thaliana (Mouse-ear cress)
      39      39391  Setaria italica (Foxtail millet) (Panicum italicum)
      40      39276  Simian immunodeficiency virus (SIV)
      41      38814  Mustela putorius furo (European domestic ferret) (Mustela furo)
      42      37312  Acyrthosiphon pisum (Pea aphid)
      43      37300  Drosophila melanogaster (Fruit fly)
      44      36609  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      45      35983  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      46      35672  Ailuropoda melanoleuca (Giant panda)
      47      35599  Emiliania huxleyi CCMP1516
      48      35324  Physcomitrella patens subsp. patens (Moss)
      49      35138  Caenorhabditis japonica
      50      34630  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      51      34570  Thalassiosira oceanica (Marine diatom)
      52      34566  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      53      33883  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      54      33712  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      55      33260  Selaginella moellendorffii (Spikemoss)
      56      32772  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      57      32645  Vibrio cholerae
      58      32533  Sus scrofa (Pig)
      59      32415  Phaseolus vulgaris (Kidney bean) (French bean)
      60      32342  Oryza brachyantha
      61      32205  Oryza glaberrima (African rice)
      62      32123  Caenorhabditis remanei (Caenorhabditis vulgaris)
      63      32101  Capitella teleta (Polychaete worm)
      64      32005  Anas platyrhynchos (Domestic duck) (Anas boschas)
      65      31896  Pan troglodytes (Chimpanzee)
      66      31404  Ricinus communis (Castor bean)
      67      31290  Citrus clementina
      68      30981  Daphnia pulex (Water flea)
      69      30713  Caenorhabditis brenneri (Nematode worm)
      70      30184  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      71      30177  Staphylococcus aureus
      72      29845  Rhizophagus irregularis (strain DAOM 181602 / DAOM 197198 / MUCL 43194)  
      73      29815  Amphimedon queenslandica (Sponge)
      74      29494  Strongylocentrotus purpuratus (Purple sea urchin)
      75      29334  Pristionchus pacificus (Parasitic nematode)
      76      29328  Klebsiella pneumoniae
      77      29205  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      78      29083  Oikopleura dioica (Tunicate)
      79      28885  Erythranthe guttata (Yellow monkey flower) (Mimulus guttatus)
      80      28840  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      81      28826  Capsella rubella
      82      28669  Rhizophagus irregularis DAOM 197198w
      83      28643  Prunus persica (Peach) (Amygdalus persica)
      84      28382  Eutrema salsugineum (Saltwater cress) (Sisymbrium salsugineum)
      85      28195  Gasterosteus aculeatus (Three-spined stickleback)
      86      27952  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      87      27772  Canis familiaris (Dog) (Canis lupus familiaris)
      88      27683  Pseudomonas aeruginosa
      89      27561  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      90      27546  Equus caballus (Horse)
      91      27519  Jatropha curcas (Barbados nut)
      92      27438  Amborella trichopoda
      93      27101  Gorilla gorilla gorilla (Lowland gorilla)
      94      27017  Stegodyphus mimosarum
      95      26921  Tetrahymena thermophila (strain SB210)
      96      26859  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      97      26771  Morus notabilis
      98      26517  Phytophthora parasitica P1976
      99      26489  Phytophthora parasitica CJ01A1
     100      26477  Phytophthora parasitica P1569
     101      26452  Phytophthora parasitica P10297
     102      26438  Phytophthora parasitica (strain INRA-310)
     103      26420  Ovis aries (Sheep)
     104      26058  Listeria monocytogenes
     105      25996  Oryzias latipes (Medaka fish) (Japanese ricefish)
     106      25842  Bos taurus (Bovine)
     107      25832  Loxodonta africana (African elephant)
     108      25761  Rattus norvegicus (Rat)
     109      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
     110      25594  Coffea canephora (Robusta coffee)
     111      25025  Aphanomyces astaci
     112      24918  Nematostella vectensis (Starlet sea anemone)
     113      24590  Guillardia theta CCMP2712
     114      24375  Oxytricha trifallax
     115      24301  Tetraselmis sp. GSL018
     116      23809  Astyanax mexicanus (Blind cave fish) (Astyanax fasciatus mexicanus)
     117      23742  Ornithorhynchus anatinus (Duckbill platypus)
     118      23687  Lottia gigantea (Giant owl limpet)
     119      23651  Dendroctonus ponderosae (Mountain pine beetle)
     120      23548  Caenorhabditis elegans
     121      23497  Latimeria chalumnae (West Indian ocean coelacanth)
     122      23382  Helobdella robusta (Californian leech)
     123      23365  Arabis alpina (Alpine rock-cress)
     124      23318  Fusarium oxysporum f. sp. melonis 26406
     125      23271  Fusarium oxysporum f. sp. conglutinans race 2 54008
     126      23263  Fusarium oxysporum f. sp. pisi HDV247
     127      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
     128      22809  Monodelphis domestica (Gray short-tailed opossum)
     129      22754  Fusarium oxysporum f. sp. raphani 54005
     130      22565  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
     131      22527  Lepisosteus oculatus (Spotted gar)
     132      22325  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
     133      22248  Fusarium oxysporum f. sp. vasinfectum 25433
     134      22174  gut metagenome
     135      21972  Trichuris suis (pig whipworm)
     136      21927  Oryctolagus cuniculus (Rabbit)
     137      21754  Haemonchus contortus (Barber pole worm)
     138      21689  Fusarium oxysporum f. sp. radicis-lycopersici 26381
     139      21661  Fusarium oxysporum Fo47
     140      21549  Gallus gallus (Chicken)
     141      21549  Fusarium oxysporum f. sp. lycopersici MN25
     142      21547  Heterocephalus glaber (Naked mole rat)
     143      21398  Caenorhabditis briggsae
     144      21357  Galerina marginata CBS 339.88
     145      21257  Echinococcus granulosus (Hydatid tapeworm)
     146      21188  Ixodes scapularis (Black-legged tick) (Deer tick)
     147      21173  Myotis lucifugus (Little brown bat)
     148      21037  Felis catus (Cat) (Felis silvestris catus)
     149      20969  Poecilia formosa (Amazon molly) (Limia formosa)
     150      20867  Tupaia chinensis (Chinese tree shrew)
     151      20805  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     152      20768  Stylonychia lemnae
     153      20767  Fusarium oxysporum FOSC 3-a
     154      20541  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     155      20539  Bacillus subtilis
     156      20168  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     157      20115  Ciona savignyi (Pacific transparent sea squirt)
     158      20105  Cavia porcellus (Guinea pig)
     159      20088  Helicobacter pylori (Campylobacter pylori)
     160      20062  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     161      20052  Saprolegnia parasitica (strain CBS 223.65)
     162      20028  Camelus ferus (Wild Bactrian camel)
     163      19998  Callorhynchus milii (Elephant fish) (Australian ghost shark)
     164      19836  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     165      19807  Fusarium oxysporum f. sp. cubense tropical race 4 54006
     166      19704  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     167      19625  Bactrocera dorsalis (Oriental fruit fly) (Dacus dorsalis)
     168      19619  Brugia malayi (Filarial nematode worm)
     169      19607  Anolis carolinensis (Green anole) (American chameleon)
     170      19594  Aphanomyces invadans
     171      19562  Pteropus alecto (Black flying fox)
     172      19522  Wuchereria bancrofti
     173      19425  Anopheles sinensis
     174      19300  Myotis brandtii (Brandt's bat)
     175      19200  Trypanosoma cruzi (strain CL Brener)
     176      19196  Necator americanus (Human hookworm)
     177      19062  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     178      19017  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     179      18924  Drosophila simulans (Fruit fly)
     180      18767  Mycobacterium tuberculosis
     181      18600  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     182      18561  Bos mutus
     183      18488  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     184      18417  Ophiophagus hannah (King cobra) (Naja hannah)
     185      18411  uncultured archaeon
     186      18331  Plasmodium falciparum
     187      18294  Tetranychus urticae (Two-spotted spider mite)
     188      18125  Atta cephalotes (Leafcutter ant)
     189      18053  Anopheles gambiae (African malaria mosquito)
     190      18047  Saprolegnia diclina VS20
     191      17990  Hepatitis C virus subtype 1b
     192      17976  Moniliophthora roreri (strain MCA 2997) (Cocoa frosty pod rot fungus) 
     193      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     194      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     195      17784  Bombyx mori (Silk moth)
     196      17683  Genlisea aurea
     197      17615  Bacillus cereus
     198      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     199      17590  Gibberella moniliformis (strain M3125 / FGSC 7600)  
     200      17490  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     201      17393  Rhizobium radiobacter (Agrobacterium tumefaciens) (Agrobacterium radiobacter)
     202      17384  Ceratitis capitata (Mediterranean fruit fly) (Tephritis capitata)
     203      17289  Nasonia vitripennis (Parasitic wasp)
     204      17107  Drosophila yakuba (Fruit fly)
     205      17080  Tribolium castaneum (Red flour beetle)
     206      16949  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     207      16929  Meleagris gallopavo (Common turkey)
     208      16723  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     209      16715  Drosophila persimilis (Fruit fly)
     210      16638  Fusarium oxysporum f. sp. lycopersici  
     211      16619  Rhodnius prolixus (Triatomid bug)
     212      16534  Cerapachys biroi (Ant)
     213      16484  Botryobasidium botryosum FD-172 SS1
     214      16453  Apis mellifera (Honeybee)
     215      16430  Ectocarpus siliculosus (Brown alga)
     216      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     217      16372  Opisthorchis viverrini
     218      16341  Jaapia argillacea MUCL 33604
     219      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     220      16332  Danaus plexippus (Monarch butterfly)
     221      16282  Trichinella spiralis (Trichina worm)
     222      16268  Streptococcus mitis
     223      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     224      16223  Schistosoma japonicum (Blood fluke)
     225      16219  Neovison vison (American mink) (Mustela vison)
     226      16209  Ixodes ricinus (Common tick)
     227      16195  Streptomyces scabiei
     228      16185  Drosophila sechellia (Fruit fly)
     229      16149  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     230      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     231      15929  Acinetobacter baumannii
     232      15898  Vibrio vulnificus
     233      15815  Rabies virus
     234      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     235      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     236      15720  Pseudomonas syringae
     237      15718  Naegleria gruberi (Amoeba)
     238      15662  Plasmodium berghei (strain Anka)
     239      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     240      15593  Phytophthora ramorum (Sudden oak death agent)
     241      15467  Myotis davidii (David's myotis)
     242      15423  Drosophila willistoni (Fruit fly)
     243      15412  Pestalotiopsis fici W106-1
     244      15380  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     245      15355  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     246      15349  Loa loa (Eye worm) (Filaria loa)
     247      15155  Drosophila ananassae (Fruit fly)
     248      15153  Pythium ultimum DAOM BR144
     249      15064  Pararge aegeria (specked wood butterfly)
     250      15042  Harpegnathos saltator (Jerdon's jumping ant)


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          872771 (  1%)
    Bacteria       70964587 ( 82%)
    Eukaryota      12023270 ( 14%)
    Viruses         2134881 (  2%)
    Other            540883 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 120854 (  1%)           (  0%)
     Other Mammalia       1096409 (  9%)           (  1%)
     Other Vertebrata     1141501 (  9%)           (  1%)
     Viridiplantae        2423013 ( 20%)           (  3%)
     Fungi                3253321 ( 27%)           (  4%)
     Insecta              1110873 (  9%)           (  1%)
     Nematoda              423571 (  4%)           (  0%)
     Other                2453728 ( 20%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1985994             1001-1100   431415
                 51- 100 7884052             1101-1200   312800
                101- 150 9170910             1201-1300   223977
                151- 200 8650428             1301-1400   126546
                201- 250 8876890             1401-1500   110731
                251- 300 8701697             1501-1600    72272
                301- 350 7840084             1601-1700    56412
                351- 400 5808048             1701-1800    35148
                401- 450 5070256             1801-1900    30300
                451- 500 4099823             1901-2000    23468
                501- 550 2604236             2001-2100    23301
                551- 600 1980298             2101-2200    31463
                601- 650 1402968             2201-2300    17937
                651- 700 1124825             2301-2400    14996
                701- 750  869680             2401-2500    13183
                751- 800  743429             >2500        91466
                801- 850  578758
                851- 900  530261
                901- 950  361909
                951-1000  251105



   The average sequence length in UniProtKB/TrEMBL is   316 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    98883350                1.14                                                    
   Submitted to EMBL/GenBank/DDBJ  69061919  65597341      0.80                                                    
   Journal                         27636396  26133715      0.32                                                    
   Submitted to other databases     2156994   2149604      0.02                                                    
   Thesis                             18901     18842     <0.01                                                    
   Book citation                       9139      9076     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 529080


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                     145354351                1.68                                                    
   CATALYTIC ACTIVITY              10652172   9778777      0.12     4                                              
   CAUTION                         61416781  61340597      0.71     1                                              
   COFACTOR                         4884633   4489725      0.06     8                                              
   DOMAIN                            524632    503034      0.01     9                                              
   ENZYME REGULATION                 175502    175502     <0.01    11                                              
   FUNCTION                        12270085  11679565      0.14     3                                              
   INTERACTION                         1797      1797     <0.01    12                                              
   MISCELLANEOUS                     323790    323534     <0.01    10                                              
   PATHWAY                          5529109   4979793      0.06     7                                              
   SIMILARITY                      32919178  25521259      0.38     2                                              
   SUBCELLULAR LOCATION             9949444   9609942      0.11     5                                              
   SUBUNIT                          6707228   6652800      0.08     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      54977729                0.64                                                    
   ACT_SITE                         4724918   2937440      0.05     5                                              
   BINDING                         10240263   2636743      0.12     1                                              
   CARBOHYD                             837       312     <0.01    27                                              
   CHAIN                             922448    730557      0.01    10                                              
   COILED                            189021     99683     <0.01    16                                              
   COMPBIAS                           29482     29320     <0.01    21                                              
   CROSSLNK                           29005     20691     <0.01    22                                              
   DISULFID                          211153    160724     <0.01    15                                              
   DNA_BIND                          160512    150683     <0.01    18                                              
   DOMAIN                           1947198   1552789      0.02     8                                              
   INIT_MET                           28741     28741     <0.01    23                                              
   INTRAMEM                             392        56     <0.01    28                                              
   LIPID                             153424     76712     <0.01    19                                              
   METAL                            9508316   2501594      0.11     2                                              
   MOD_RES                           731888    677199      0.01    12                                              
   MOTIF                             580261    374039      0.01    14                                              
   NON_STD                             2031      1889     <0.01    26                                              
   NON_TER                          9445608   6390171      0.11     3                                              
   NP_BIND                          3866654   2314255      0.04     6                                              
   PEPTIDE                              127       127     <0.01    29                                              
   PROPEP                              9310      9310     <0.01    24                                              
   REGION                           3206477   1759756      0.04     7                                              
   REPEAT                            126026     29277     <0.01    20                                              
   SIGNAL                            827191    823428      0.01    11                                              
   SITE                             1405216    705116      0.02     9                                              
   TOPO_DOM                          665414    138523      0.01    13                                              
   TRANSIT                             2199      2187     <0.01    25                                              
   TRANSMEM                         5802577   1040282      0.07     4                                              
   ZN_FING                           161040    144197     <0.01    17                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             784241996                9.06                                                    
   Allergome                           3782      3131     <0.01    83   Protein family/group databases             
   ArachnoServer                         98        98     <0.01   103   Organism-specific databases                
   BRENDA                              2570      2543     <0.01    89   Enzyme and pathway databases               
   Bgee                               94631     94631     <0.01    51   Gene expression databases                  
   BindingDB                           5706      5706     <0.01    79   Chemistry                                  
   BioCyc                           5767363   5689926      0.07    22   Enzyme and pathway databases               
   CAZy                               73827     69376     <0.01    55   Protein family/group databases             
   CGD                                 6763      6763     <0.01    77   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   109   2D gel databases                           
   CTD                               462243    461028      0.01    39   Organism-specific databases                
   ChEMBL                               785       785     <0.01    94   Chemistry                                  
   ChiTaRS                            62898     62898     <0.01    56   Other                                      
   ConoServer                           159       159     <0.01   100   Organism-specific databases                
   DIP                                 3111      3106     <0.01    86   Protein-protein interaction databases      
   DNASU                              41860     41534     <0.01    63   Protocols and materials databases          
   DrugBank                             145        57     <0.01   101   Chemistry                                  
   EMBL                            92522957  85335972      1.07     3   Sequence databases                         
   Ensembl                          1131354   1116491      0.01    31   Genome annotation databases                
   EnsemblBacteria                 37461540  36862186      0.43     7   Genome annotation databases                
   EnsemblFungi                      467828    465306      0.01    38   Genome annotation databases                
   EnsemblMetazoa                    917571    901271      0.01    34   Genome annotation databases                
   EnsemblPlants                     815070    774880      0.01    35   Genome annotation databases                
   EnsemblProtists                   190946    188510     <0.01    47   Genome annotation databases                
   EuPathDB                          161153    161152     <0.01    49   Organism-specific databases                
   EvolutionaryTrace                   7870      7870     <0.01    76   Other                                      
   ExpressionAtlas                   199058    199058     <0.01    44   Gene expression databases                  
   FlyBase                           198803    197332     <0.01    45   Organism-specific databases                
   GO                             122045464  41675982      1.41     2   Ontologies                                 
   Gene3D                          40524709  31690530      0.47     5   Family and domain databases                
   GeneID                          11584790  11305888      0.13    13   Genome annotation databases                
   GeneTree                         1064457   1064420      0.01    32   Phylogenomic databases                     
   Genevestigator                     82315     82311     <0.01    52   Gene expression databases                  
   GenoList                           14727     14454     <0.01    73   Organism-specific databases                
   GenomeRNAi                         23461     23461     <0.01    69   Other                                      
   Gramene                           196911    196911     <0.01    46   Organism-specific databases                
   GuidetoPHARMACOLOGY                   20        20     <0.01   107   Chemistry                                  
   H-InvDB                              596       449     <0.01    96   Organism-specific databases                
   HAMAP                            8769627   8647821      0.10    16   Family and domain databases                
   HGNC                               45998     45932     <0.01    61   Organism-specific databases                
   HOGENOM                          3643827   3643781      0.04    26   Phylogenomic databases                     
   HOVERGEN                          302390    302379     <0.01    41   Phylogenomic databases                     
   InParanoid                       2737778   2737778      0.03    29   Phylogenomic databases                     
   IntAct                             15427     15427     <0.01    72   Protein-protein interaction databases      
   InterPro                       156803233  53269447      1.81     1   Family and domain databases                
   KEGG                            10375679  10137833      0.12    14   Genome annotation databases                
   KO                               4457605   4434682      0.05    24   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    80   Organism-specific databases                
   Leproma                             1272      1270     <0.01    90   Organism-specific databases                
   MEROPS                            225859    225858     <0.01    42   Protein family/group databases             
   MGI                                53249     52876     <0.01    58   Organism-specific databases                
   MIM                                    4         4     <0.01   110   Organism-specific databases                
   MINT                               10107     10106     <0.01    74   Protein-protein interaction databases      
   MaxQB                               2722      2721     <0.01    87   Proteomic databases                        
   NextBio                           201227    201126     <0.01    43   Other                                      
   OGP                                    3         3     <0.01   111   2D gel databases                           
   OMA                              7282992   7282967      0.08    20   Phylogenomic databases                     
   OrthoDB                          5179328   5179325      0.06    23   Phylogenomic databases                     
   PANTHER                          8416463   8181246      0.10    18   Family and domain databases                
   PATRIC                           8246286   8246089      0.10    19   Genome annotation databases                
   PDB                                24964     13296     <0.01    67   3D structure databases                     
   PDBsum                             24854     13216     <0.01    68   3D structure databases                     
   PIR                               171190    138357     <0.01    48   Sequence databases                         
   PIRSF                            6958516   6902635      0.08    21   Family and domain databases                
   PMAP-CutDB                           199       199     <0.01    99   Other                                      
   PRIDE                             918067    918067      0.01    33   Proteomic databases                        
   PRINTS                           9559950   8615949      0.11    15   Family and domain databases                
   PRO                                26895     26894     <0.01    65   Other                                      
   PROSITE                         33233338  22353094      0.38     8   Family and domain databases                
   PaxDb                              28372     28370     <0.01    64   Proteomic databases                        
   PeptideAtlas                         127       127     <0.01   102   Proteomic databases                        
   PeroxiBase                          2588      2580     <0.01    88   Protein family/group databases             
   Pfam                            68239509  49716451      0.79     4   Family and domain databases                
   PharmGKB                            3210      3210     <0.01    85   Organism-specific databases                
   PhosSite                             888       876     <0.01    93   PTM databases                              
   PhosphoSite                         1078      1078     <0.01    92   PTM databases                              
   PhylomeDB                         378978    378978     <0.01    40   Phylogenomic databases                     
   PomBase                                2         2     <0.01   112   Organism-specific databases                
   PptaseDB                              38        36     <0.01   105   Protein family/group databases             
   ProDom                           1323871   1286000      0.02    30   Family and domain databases                
   ProMEX                              3270      3270     <0.01    84   Proteomic databases                        
   ProteinModelPortal              21932000  21932000      0.25     9   3D structure databases                     
   PseudoCAP                           4504      4498     <0.01    81   Organism-specific databases                
   REBASE                             48151     48140     <0.01    59   Protein family/group databases             
   REPRODUCTION-2DPAGE                   65        64     <0.01   104   2D gel databases                           
   RGD                                21545     20538     <0.01    70   Organism-specific databases                
   Reactome                           97053     43007     <0.01    50   Enzyme and pathway databases               
   RefSeq                          17671300  14251619      0.20    11   Sequence databases                         
   SABIO-RK                             531       531     <0.01    97   Enzyme and pathway databases               
   SGD                                    7         7     <0.01   108   Organism-specific databases                
   SMART                           14372238  10969106      0.17    12   Family and domain databases                
   SMR                              8581039   8581039      0.10    17   3D structure databases                     
   STRING                           3130731   3130557      0.04    27   Protein-protein interaction databases      
   SUPFAM                          38363037  30886778      0.44     6   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   106   2D gel databases                           
   SignaLink                           4115      4110     <0.01    82   Enzyme and pathway databases               
   TAIR                               21390     21272     <0.01    71   Organism-specific databases                
   TCDB                                6277      6268     <0.01    78   Protein family/group databases             
   TIGRFAMs                        17895760  16324619      0.21    10   Family and domain databases                
   TreeFam                           587603    587601      0.01    36   Phylogenomic databases                     
   TubercuList                         1100      1099     <0.01    91   Organism-specific databases                
   UCSC                               56609     56390     <0.01    57   Genome annotation databases                
   UniGene                           549515    513325      0.01    37   Sequence databases                         
   UniPathway                       4051415   3754293      0.05    25   Enzyme and pathway databases               
   VectorBase                         78242     77725     <0.01    53   Genome annotation databases                
   World-2DPAGE                         671       666     <0.01    95   2D gel databases                           
   WormBase                           43162     43040     <0.01    62   Organism-specific databases                
   Xenbase                            25031     24972     <0.01    66   Organism-specific databases                
   ZFIN                               47300     47245     <0.01    60   Organism-specific databases                
   dictyBase                           7995      7773     <0.01    75   Organism-specific databases                
   eggNOG                           2754240   2754205      0.03    28   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    54   Organism-specific databases                
   mycoCLAP                             412       412     <0.01    98   Protein family/group databases             

Number of explicitly cross-referenced databases: 132


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.94   Gln (Q) 3.98   Leu (L) 9.93   Ser (S) 6.36
   Arg (R) 5.36   Glu (E) 6.08   Lys (K) 5.21   Thr (T) 5.56
   Asn (N) 4.12   Gly (G) 7.22   Met (M) 2.48   Trp (W) 1.26
   Asp (D) 5.43   His (H) 2.21   Phe (F) 3.98   Tyr (Y) 3.06
   Cys (C) 1.09   Ile (I) 6.18   Pro (P) 4.51   Val (V) 6.93

   Asx (B) 0      Glx (Z) 0      Xaa (X) 0.01


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Ile, Glu, Thr, Asp, Arg, Lys, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 785292
Total number of entries encoded on a Plasmid: 465722
Total number of entries encoded on a Plastid: 37638
Total number of entries encoded on a Plastid; Apicoplast: 
Total number of entries encoded on a Plastid; Chloroplast: 63
Total number of entries encoded on a Plastid; Cyanelle: 
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: