Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2014_06 STATISTICS


1.  INTRODUCTION

Release 2014_06 of 11-Jun-2014 of UniProtKB/TrEMBL contains 69014937 sequence entries,
comprising 21709983645 amino acids.

13038709 sequences have been added since release 2014_05, the sequence data of
874 existing entries has been updated and the annotations of
20714904 entries have been revised. This represents an increase of 22%.

Number of fragments: 5792275

Protein existence (PE):              entries      %
1: Evidence at protein level           45600     0.07%
2: Evidence at transcript level       869096     1.26%
3: Inferred from homology           14491866    21.00%
4: Predicted                        53608375    77.68%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 488595

   The first twenty species represent 2430428 sequences:   3.5 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:19858
                            2x:78963
                            3x:42536
                            4x:30256
                            5x:17811
                            6x:12810
                            7x: 9436
                            8x: 7526
                            9x: 5895
                           10x:10979
                       11- 20x:37222
                       21- 50x:11621
                       51-100x: 4620
                         >100x:20333


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     593155  Human immunodeficiency virus 1
       2     352020  marine sediment metagenome
       3     219630  uncultured bacterium
       4     116747  Homo sapiens (Human)
       5     106042  Triticum aestivum (Wheat)
       6      96759  Oryza sativa subsp. japonica (Rice)
       7      94901  Hepatitis C virus
       8      86317  Hepatitis B virus (HBV)
       9      73937  Glycine max (Soybean) (Glycine hispida)
      10      73055  mine drainage metagenome
      11      70496  Hordeum vulgare var. distichum (Two-rowed barley)
      12      69509  Macaca mulatta (Rhesus macaque)
      13      67671  Phytophthora parasitica (Potato buckeye rot agent)
      14      65421  Ancylostoma ceylanicum
      15      60710  human gut metagenome
      16      60416  Zea mays (Maize)
      17      57460  Mus musculus (Mouse)
      18      56235  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      19      55018  Callithrix jacchus (White-tufted-ear marmoset)
      20      54929  Solanum tuberosum (Potato)
      21      54157  Vitis vinifera (Grape)
      22      53329  Danio rerio (Zebrafish) (Brachydanio rerio)
      23      50606  Trichomonas vaginalis
      24      49267  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      25      48911  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      26      47057  Populus trichocarpa (Western balsam poplar) 
      27      41207  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      28      40348  Arabidopsis thaliana (Mouse-ear cress)
      29      39923  Reticulomyxa filosa
      30      39885  Oryza sativa subsp. indica (Rice)
      31      39850  Paramecium tetraurelia
      32      39391  Setaria italica (Foxtail millet) (Panicum italicum)
      33      38796  Mustela putorius furo (European domestic ferret) (Mustela furo)
      34      38068  Simian immunodeficiency virus (SIV)
      35      37309  Acyrthosiphon pisum (Pea aphid)
      36      37227  Drosophila melanogaster (Fruit fly)
      37      36602  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      38      35951  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      39      35672  Ailuropoda melanoleuca (Giant panda)
      40      35599  Emiliania huxleyi CCMP1516
      41      35315  Physcomitrella patens subsp. patens (Moss)
      42      35137  Caenorhabditis japonica
      43      34570  Thalassiosira oceanica (Marine diatom)
      44      34551  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      45      33865  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      46      33686  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      47      33426  Escherichia coli
      48      33258  Selaginella moellendorffii (Spikemoss)
      49      32772  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      50      32439  Sus scrofa (Pig)
      51      32409  Phaseolus vulgaris (Kidney bean) (French bean)
      52      32342  Oryza brachyantha
      53      32142  Oryza glaberrima (African rice)
      54      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      55      32050  Capitella teleta (Polychaete worm)
      56      31959  Anas platyrhynchos (Domestic duck) (Anas boschas)
      57      31861  Pan troglodytes (Chimpanzee)
      58      31402  Ricinus communis (Castor bean)
      59      31290  Citrus clementina
      60      30957  Daphnia pulex (Water flea)
      61      30713  Caenorhabditis brenneri (Nematode worm)
      62      30181  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      63      29845  Rhizophagus irregularis (strain DAOM 181602 / DAOM 197198 / MUCL 43194)  
      64      29815  Amphimedon queenslandica (Sponge)
      65      29498  Strongylocentrotus purpuratus (Purple sea urchin)
      66      29321  Pristionchus pacificus (Parasitic nematode)
      67      29194  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      68      29083  Oikopleura dioica (Tunicate)
      69      28875  Mimulus guttatus (Spotted monkey flower) (Yellow monkey flower)
      70      28832  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      71      28825  Capsella rubella
      72      28669  Rhizophagus irregularis DAOM 197198w
      73      28637  Prunus persica (Peach) (Amygdalus persica)
      74      28382  Eutrema salsugineum (Saltwater cress) (Sisymbrium salsugineum)
      75      28104  Gasterosteus aculeatus (Three-spined stickleback)
      76      27811  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      77      27691  Canis familiaris (Dog) (Canis lupus familiaris)
      78      27537  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      79      27513  Equus caballus (Horse)
      80      27434  Amborella trichopoda
      81      27090  Gorilla gorilla gorilla (Lowland gorilla)
      82      26921  Tetrahymena thermophila (strain SB210)
      83      26854  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      84      26763  Morus notabilis
      85      26489  Phytophthora parasitica CJ01A1
      86      26477  Phytophthora parasitica P1569
      87      26452  Phytophthora parasitica P10297
      88      26438  Phytophthora parasitica (strain INRA-310)
      89      26367  Ovis aries (Sheep)
      90      25985  Oryzias latipes (Medaka fish) (Japanese ricefish)
      91      25828  Bos taurus (Bovine)
      92      25825  Loxodonta africana (African elephant)
      93      25745  Rattus norvegicus (Rat)
      94      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      95      25025  Aphanomyces astaci
      96      24915  Nematostella vectensis (Starlet sea anemone)
      97      24590  Guillardia theta CCMP2712
      98      24211  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      99      23804  Astyanax mexicanus (Blind cave fish) (Astyanax fasciatus mexicanus)
     100      23742  Ornithorhynchus anatinus (Duckbill platypus)
     101      23687  Lottia gigantea (Giant owl limpet)
     102      23650  Dendroctonus ponderosae (Mountain pine beetle)
     103      23565  Oxytricha trifallax
     104      23496  Latimeria chalumnae (West Indian ocean coelacanth)
     105      23369  Helobdella robusta (Californian leech)
     106      23363  Caenorhabditis elegans
     107      23318  Fusarium oxysporum f. sp. melonis 26406
     108      23271  Fusarium oxysporum f. sp. conglutinans race 2 54008
     109      23263  Fusarium oxysporum f. sp. pisi HDV247
     110      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
     111      22780  Monodelphis domestica (Gray short-tailed opossum)
     112      22754  Fusarium oxysporum f. sp. raphani 54005
     113      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
     114      22525  Lepisosteus oculatus (Spotted gar)
     115      22319  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
     116      22248  Fusarium oxysporum f. sp. vasinfectum 25433
     117      22174  gut metagenome
     118      21933  Oryctolagus cuniculus (Rabbit)
     119      21709  Haemonchus contortus (Barber pole worm)
     120      21689  Fusarium oxysporum f. sp. radicis-lycopersici 26381
     121      21661  Fusarium oxysporum Fo47
     122      21549  Fusarium oxysporum f. sp. lycopersici MN25
     123      21548  Heterocephalus glaber (Naked mole rat)
     124      21530  Gallus gallus (Chicken)
     125      21398  Caenorhabditis briggsae
     126      21339  Anopheles darlingi (Mosquito)
     127      21217  Echinococcus granulosus (Hydatid tapeworm)
     128      21136  Ixodes scapularis (Black-legged tick) (Deer tick)
     129      21124  Myotis lucifugus (Little brown bat)
     130      21028  Felis catus (Cat) (Felis silvestris catus)
     131      20865  Tupaia chinensis (Chinese tree shrew)
     132      20805  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     133      20767  Fusarium oxysporum FOSC 3-a
     134      20534  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     135      20149  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     136      20115  Ciona savignyi (Pacific transparent sea squirt)
     137      20097  Cavia porcellus (Guinea pig)
     138      20061  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     139      20028  Camelus ferus (Wild Bactrian camel)
     140      19989  Callorhynchus milii (Elephant fish) (Australian ghost shark)
     141      19826  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     142      19807  Fusarium oxysporum f. sp. cubense tropical race 4 54006
     143      19701  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     144      19601  Anolis carolinensis (Green anole) (American chameleon)
     145      19561  Pteropus alecto (Black flying fox)
     146      19522  Wuchereria bancrofti
     147      19300  Myotis brandtii (Brandt's bat)
     148      19201  Trypanosoma cruzi (strain CL Brener)
     149      19194  Necator americanus (Human hookworm)
     150      19062  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     151      18966  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     152      18861  Drosophila simulans (Fruit fly)
     153      18602  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     154      18559  Bos mutus
     155      18479  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     156      18417  Ophiophagus hannah (King cobra) (Naja hannah)
     157      18249  Tetranychus urticae (Two-spotted spider mite)
     158      18126  Atta cephalotes (Leafcutter ant)
     159      18048  Anopheles gambiae (African malaria mosquito)
     160      18047  Saprolegnia diclina VS20
     161      17976  Moniliophthora roreri (strain MCA 2997) (Cocoa frosty pod rot fungus) 
     162      17857  Hepatitis C virus subtype 1b
     163      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     164      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     165      17744  Bombyx mori (Silk moth)
     166      17683  Genlisea aurea
     167      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     168      17590  Gibberella moniliformis (strain M3125 / FGSC 7600)  
     169      17467  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     170      17383  Ceratitis capitata (Mediterranean fruit fly) (Tephritis capitata)
     171      17289  Nasonia vitripennis (Parasitic wasp)
     172      17269  Plasmodium falciparum
     173      17082  Drosophila yakuba (Fruit fly)
     174      17071  Tribolium castaneum (Red flour beetle)
     175      16949  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     176      16919  Meleagris gallopavo (Common turkey)
     177      16909  uncultured archaeon
     178      16715  Drosophila persimilis (Fruit fly)
     179      16698  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     180      16639  Fusarium oxysporum f. sp. lycopersici  
     181      16619  Rhodnius prolixus (Triatomid bug)
     182      16481  Klebsiella pneumoniae
     183      16430  Ectocarpus siliculosus (Brown alga)
     184      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     185      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     186      16330  Danaus plexippus (Monarch butterfly)
     187      16276  Trichinella spiralis (Trichina worm)
     188      16272  Listeria monocytogenes
     189      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     190      16218  Neovison vison (American mink) (Mustela vison)
     191      16208  Ixodes ricinus (Common tick)
     192      16191  Drosophila sechellia (Fruit fly)
     193      16191  Schistosoma japonicum (Blood fluke)
     194      16149  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     195      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     196      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     197      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     198      15718  Naegleria gruberi (Amoeba)
     199      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     200      15592  Phytophthora ramorum (Sudden oak death agent)
     201      15491  Rabies virus
     202      15467  Myotis davidii (David's myotis)
     203      15423  Drosophila willistoni (Fruit fly)
     204      15412  Pestalotiopsis fici W106-1
     205      15380  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     206      15355  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     207      15354  Loa loa (Eye worm) (Filaria loa)
     208      15155  Drosophila ananassae (Fruit fly)
     209      15153  Pythium ultimum DAOM BR144
     210      15057  Pararge aegeria (specked wood butterfly)
     211      15042  Harpegnathos saltator (Jerdon's jumping ant)
     212      15012  Strigamia maritima (European centipede) (Geophilus maritimus)
     213      14944  Acanthamoeba castellanii str. Neff
     214      14928  Drosophila erecta (Fruit fly)
     215      14869  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     216      14801  Camponotus floridanus (Florida carpenter ant)
     217      14794  Drosophila mojavensis (Fruit fly)
     218      14790  Gibberella fujikuroi (strain CBS 195.34 / IMI 58289 / NRRL A-6831)  
     219      14713  Plasmodium chabaudi
     220      14708  Drosophila virilis (Fruit fly)
     221      14654  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     222      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     223      14597  Angomonas deanei
     224      14417  Volvox carteri (Green alga)
     225      14356  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     226      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     227      14235  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     228      14159  Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
     229      13971  Acromyrmex echinatior (Panamanian leafcutter ant) 
     230      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     231      13883  Porcine reproductive and respiratory syndrome virus (PRRSV)
     232      13879  Clonorchis sinensis (Chinese liver fluke)
     233      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     234      13806  Fomitopsis pinicola (strain FP-58527) (Brown rot fungus)
     235      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     236      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     237      13761  Gibberella zeae (Wheat head blight fungus) (Fusarium graminearum)
     238      13759  Colletotrichum fioriniae PJ7
     239      13704  Trypanosoma cruzi
     240      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     241      13437  Hepatitis C virus subtype 1a
     242      13422  Giardia intestinalis (Giardia lamblia)
     243      13417  Cladophialophora psammophila CBS 110553
     244      13345  Aspergillus flavus 
     245      13338  Colletotrichum orbiculare   
     246      13306  Pyronema omphalodes (strain CBS 100304) (Pyronema confluens)
     247      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     248      13189  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     249      13159  Heterobasidion irregulare TC 32-1
     250      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          793641 (  1%)
    Bacteria       55039916 ( 80%)
    Eukaryota      10614989 ( 15%)
    Viruses         2026330 (  3%)
    Other            540060 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 116800 (  1%)           (  0%)
     Other Mammalia       1078488 ( 10%)           (  2%)
     Other Vertebrata     1026268 ( 10%)           (  1%)
     Viridiplantae        2034272 ( 19%)           (  3%)
     Fungi                2824619 ( 27%)           (  4%)
     Insecta              1014560 ( 10%)           (  1%)
     Nematoda              370129 (  3%)           (  1%)
     Other                2149853 ( 20%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1735581             1001-1100   347098
                 51- 100 6319264             1101-1200   246478
                101- 150 7165847             1201-1300   181911
                151- 200 6883904             1301-1400    98644
                201- 250 6983661             1401-1500    88437
                251- 300 6797847             1501-1600    56190
                301- 350 6152305             1601-1700    38966
                351- 400 4524994             1701-1800    28788
                401- 450 3948487             1801-1900    23452
                451- 500 3201977             1901-2000    19926
                501- 550 1993459             2001-2100    18490
                551- 600 1549082             2101-2200    19170
                601- 650 1116072             2201-2300    14773
                651- 700  893828             2301-2400    12814
                701- 750  703203             2401-2500    11086
                751- 800  591473             >2500        71688
                801- 850  473145
                851- 900  424004
                901- 950  289910
                951-1000  196708



   The average sequence length in UniProtKB/TrEMBL is   314 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    80016734                1.16                                                    
   Submitted to EMBL/GenBank/DDBJ  52669013  49990835      0.76                                                    
   Journal                         25202037  23831406      0.37                                                    
   Submitted to other databases     2119992   2112715      0.03                                                    
   Thesis                             18728     18669     <0.01                                                    
   Book citation                       6963      6900     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 510274


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                     100828685                1.46                                                    
   CATALYTIC ACTIVITY               6735171   6174351      0.10     4                                              
   CAUTION                         46755645  46716466      0.68     1                                              
   COFACTOR                         2975670   2726211      0.04     8                                              
   DOMAIN                            314783    301112     <0.01     9                                              
   ENZYME REGULATION                 104539    104539     <0.01    11                                              
   FUNCTION                         7793326   7361747      0.11     3                                              
   INTERACTION                         1739      1739     <0.01    12                                              
   MISCELLANEOUS                     177082    176867     <0.01    10                                              
   PATHWAY                          3396934   3078049      0.05     7                                              
   SIMILARITY                      21811344  16556359      0.32     2                                              
   SUBCELLULAR LOCATION             6589962   6325441      0.10     5                                              
   SUBUNIT                          4172490   4132131      0.06     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      37236170                0.54                                                    
   ACT_SITE                         2871011   1806851      0.04     5                                              
   BINDING                          6163488   1601939      0.09     2                                              
   CARBOHYD                             756       284     <0.01    27                                              
   CHAIN                             894907    715871      0.01     9                                              
   COILED                            100802     57589     <0.01    18                                              
   COMPBIAS                           15768     15629     <0.01    22                                              
   CROSSLNK                           15045     10175     <0.01    23                                              
   DISULFID                          143957    110998     <0.01    15                                              
   DNA_BIND                          104430     97468     <0.01    16                                              
   DOMAIN                           1238758    971814      0.02     8                                              
   INIT_MET                           18472     18472     <0.01    21                                              
   INTRAMEM                             392        56     <0.01    28                                              
   LIPID                             102416     51208     <0.01    17                                              
   METAL                            5866388   1521236      0.09     3                                              
   MOD_RES                           463827    420202      0.01    13                                              
   MOTIF                             355043    228762      0.01    14                                              
   NON_STD                             1934      1809     <0.01    25                                              
   NON_TER                          8647180   5795497      0.13     1                                              
   NP_BIND                          2332360   1398268      0.03     6                                              
   PEPTIDE                              112       112     <0.01    29                                              
   PROPEP                              6697      6697     <0.01    24                                              
   REGION                           1985210   1077191      0.03     7                                              
   REPEAT                             81418     18735     <0.01    20                                              
   SIGNAL                            758908    755242      0.01    11                                              
   SITE                              846898    427070      0.01    10                                              
   TOPO_DOM                          464004     91495      0.01    12                                              
   TRANSIT                             1909      1899     <0.01    26                                              
   TRANSMEM                         3655087    639749      0.05     4                                              
   ZN_FING                            98993     89525     <0.01    19                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             671129030                9.72                                                    
   Allergome                           3726      3089     <0.01    83   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   102   Organism-specific databases                
   ArrayExpress                       63798     63798     <0.01    55   Gene expression databases                  
   BRENDA                              2614      2586     <0.01    86   Enzyme and pathway databases               
   Bgee                               97140     97140     <0.01    49   Gene expression databases                  
   BindingDB                           5749      5749     <0.01    78   Chemistry                                  
   BioCyc                           5683488   5604905      0.08    21   Enzyme and pathway databases               
   CAZy                               73931     69466     <0.01    53   Protein family/group databases             
   CGD                                 6802      6802     <0.01    76   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   108   2D gel databases                           
   CTD                               439795    438409      0.01    37   Organism-specific databases                
   ChEMBL                               663       663     <0.01    94   Chemistry                                  
   ChiTaRS                            64682     64682     <0.01    54   Other                                      
   ConoServer                           159       159     <0.01   100   Organism-specific databases                
   DIP                                 3011      3006     <0.01    85   Protein-protein interaction databases      
   DNASU                              42011     41685     <0.01    62   Protocols and materials databases          
   EMBL                            72989099  67811322      1.06     3   Sequence databases                         
   Ensembl                          1110957   1096183      0.02    30   Genome annotation databases                
   EnsemblBacteria                 37537879  36938582      0.54     5   Genome annotation databases                
   EnsemblFungi                      409107    406619      0.01    38   Genome annotation databases                
   EnsemblMetazoa                    890856    874560      0.01    33   Genome annotation databases                
   EnsemblPlants                     777451    739700      0.01    34   Genome annotation databases                
   EnsemblProtists                   199530    196902     <0.01    42   Genome annotation databases                
   EuPathDB                          159765    159764     <0.01    48   Organism-specific databases                
   EvolutionaryTrace                   7938      7938     <0.01    75   Other                                      
   FlyBase                           198838    197368     <0.01    43   Organism-specific databases                
   GO                             105332202  35556927      1.53     2   Ontologies                                 
   Gene3D                          34988893  27375255      0.51     6   Family and domain databases                
   GeneID                          11184864  10883245      0.16    13   Genome annotation databases                
   GeneTree                         1024743   1024685      0.01    31   Phylogenomic databases                     
   Genevestigator                     83026     83022     <0.01    50   Gene expression databases                  
   GenoList                           14730     14457     <0.01    71   Organism-specific databases                
   GenomeRNAi                         24762     24762     <0.01    66   Other                                      
   Gramene                           197731    197731     <0.01    44   Organism-specific databases                
   GuidetoPHARMACOLOGY                   21        21     <0.01   106   Chemistry                                  
   H-InvDB                              602       455     <0.01    95   Organism-specific databases                
   HAMAP                            7121134   7027042      0.10    19   Family and domain databases                
   HGNC                               47177     47097     <0.01    60   Organism-specific databases                
   HOGENOM                          3645588   3645545      0.05    25   Phylogenomic databases                     
   HOVERGEN                          303717    303709     <0.01    40   Phylogenomic databases                     
   InParanoid                        185386    185386     <0.01    45   Phylogenomic databases                     
   IntAct                             14358     14358     <0.01    72   Protein-protein interaction databases      
   InterPro                       134913837  46001313      1.95     1   Family and domain databases                
   KEGG                            10285237  10043164      0.15    14   Genome annotation databases                
   KO                               4276934   4255481      0.06    24   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    80   Organism-specific databases                
   Leproma                             1272      1270     <0.01    89   Organism-specific databases                
   MEROPS                            175332    175332     <0.01    46   Protein family/group databases             
   MGI                                52335     51895     <0.01    57   Organism-specific databases                
   MIM                                    4         4     <0.01   109   Organism-specific databases                
   MINT                               10151     10150     <0.01    73   Protein-protein interaction databases      
   MaxQB                               1750      1750     <0.01    88   Proteomic databases                        
   NextBio                           205211    205208     <0.01    41   Other                                      
   OGP                                    3         3     <0.01   110   2D gel databases                           
   OMA                              7296442   7296436      0.11    17   Phylogenomic databases                     
   OrthoDB                          5181053   5181051      0.08    22   Phylogenomic databases                     
   PANTHER                          7185536   6995350      0.10    18   Family and domain databases                
   PATRIC                           8253147   8252952      0.12    16   Genome annotation databases                
   PDB                                23608     12628     <0.01    67   3D structure databases                     
   PDBsum                             23472     12551     <0.01    68   3D structure databases                     
   PIR                               171672    138822     <0.01    47   Sequence databases                         
   PIRSF                            5743694   5698832      0.08    20   Family and domain databases                
   PMAP-CutDB                           200       200     <0.01    99   Other                                      
   PRIDE                             926328    926328      0.01    32   Proteomic databases                        
   PRINTS                           8334906   7536830      0.12    15   Family and domain databases                
   PRO                                26998     26997     <0.01    64   Other                                      
   PROSITE                         28955488  19357694      0.42     8   Family and domain databases                
   PaxDb                              28498     28496     <0.01    63   Proteomic databases                        
   PeptideAtlas                         127       127     <0.01   101   Proteomic databases                        
   PeroxiBase                          2590      2582     <0.01    87   Protein family/group databases             
   Pfam                            58895781  42891662      0.85     4   Family and domain databases                
   PharmGKB                            3353      3353     <0.01    84   Organism-specific databases                
   PhosSite                             890       878     <0.01    92   PTM databases                              
   PhosphoSite                         1093      1093     <0.01    91   PTM databases                              
   PhylomeDB                         314762    314762     <0.01    39   Phylogenomic databases                     
   PomBase                                1         1     <0.01   111   Organism-specific databases                
   PptaseDB                              38        36     <0.01   104   Protein family/group databases             
   ProDom                           1160041   1124054      0.02    29   Family and domain databases                
   ProMEX                              5295      5295     <0.01    79   Proteomic databases                        
   ProteinModelPortal              16265986  16265986      0.24     9   3D structure databases                     
   PseudoCAP                           4506      4500     <0.01    81   Organism-specific databases                
   REBASE                             47922     47907     <0.01    58   Protein family/group databases             
   REPRODUCTION-2DPAGE                   65        64     <0.01   103   2D gel databases                           
   RGD                                21291     20275     <0.01    70   Organism-specific databases                
   Reactome                             244       202     <0.01    98   Enzyme and pathway databases               
   RefSeq                          11484850  11059164      0.17    12   Sequence databases                         
   SABIO-RK                             514       514     <0.01    96   Enzyme and pathway databases               
   SGD                                   18        18     <0.01   107   Organism-specific databases                
   SMART                           12625220   9618171      0.18    11   Family and domain databases                
   SMR                              4720121   4720121      0.07    23   3D structure databases                     
   STRING                           3131451   3131281      0.05    27   Protein-protein interaction databases      
   SUPFAM                          33105381  26617438      0.48     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   105   2D gel databases                           
   SignaLink                           4304      4301     <0.01    82   Enzyme and pathway databases               
   TAIR                               21899     21780     <0.01    69   Organism-specific databases                
   TCDB                                5927      5917     <0.01    77   Protein family/group databases             
   TIGRFAMs                        14743384  13448926      0.21    10   Family and domain databases                
   TreeFam                           587946    587944      0.01    35   Phylogenomic databases                     
   TubercuList                         1101      1100     <0.01    90   Organism-specific databases                
   UCSC                               58107     57898     <0.01    56   Genome annotation databases                
   UniGene                           555078    522189      0.01    36   Sequence databases                         
   UniPathway                       3309536   3075333      0.05    26   Enzyme and pathway databases               
   VectorBase                         78248     77731     <0.01    51   Genome annotation databases                
   World-2DPAGE                         671       666     <0.01    93   2D gel databases                           
   WormBase                           43105     42931     <0.01    61   Organism-specific databases                
   Xenbase                            25529     25468     <0.01    65   Organism-specific databases                
   ZFIN                               47704     47234     <0.01    59   Organism-specific databases                
   dictyBase                           7997      7775     <0.01    74   Organism-specific databases                
   eggNOG                           2754961   2754927      0.04    28   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    52   Organism-specific databases                
   mycoCLAP                             459       458     <0.01    97   Protein family/group databases             

Number of explicitly cross-referenced databases: 130


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.52   Gln (Q) 4.04   Leu (L) 9.92   Ser (S) 6.45
   Arg (R) 5.17   Glu (E) 6.22   Lys (K) 5.51   Thr (T) 5.54
   Asn (N) 4.27   Gly (G) 6.99   Met (M) 2.51   Trp (W) 1.23
   Asp (D) 5.39   His (H) 2.20   Phe (F) 4.08   Tyr (Y) 3.15
   Cys (C) 1.12   Ile (I) 6.36   Pro (P) 4.40   Val (V) 6.80

   Asx (B) 0      Glx (Z) 0      Xaa (X) 0.02


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Ile, Glu, Thr, Lys, Asp, Arg, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 732214
Total number of entries encoded on a Plasmid: 421698
Total number of entries encoded on a Plastid: 32557
Total number of entries encoded on a Plastid; Apicoplast: 902
Total number of entries encoded on a Plastid; Chloroplast: 273578
Total number of entries encoded on a Plastid; Cyanelle: 9
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1797