Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2013_06 STATISTICS


1.  INTRODUCTION

Release 2013_06 of 29-May-2013 of UniProtKB/TrEMBL contains 35502518 sequence entries,
comprising 11384440438 amino acids .

1540013 sequences have been added since release 2013_05, the sequence data of
2441 existing entries has been updated and the annotations of
20461639 entries have been revised. This represents an increase of 4%.

Number of fragments: 4172806

Protein existence (PE):              entries      %
1: Evidence at protein level           20110     0.06%
2: Evidence at transcript level       818675     2.31%
3: Inferred from homology            8304253    23.39%
4: Predicted                        26359480    74.25%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 414148

   The first twenty species represent 1873925 sequences:   5.3 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:17310
                            2x:68636
                            3x:36979
                            4x:24949
                            5x:15555
                            6x:11220
                            7x: 8527
                            8x: 6709
                            9x: 5282
                           10x:10336
                       11- 20x:28519
                       21- 50x: 9780
                       51-100x: 3801
                         >100x:10753


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     536430  Human immunodeficiency virus 1
       2     197588  uncultured bacterium
       3     114031  Homo sapiens (Human)
       4      96907  Oryza sativa subsp. japonica (Rice)
       5      85633  Hepatitis C virus
       6      73814  Glycine max (Soybean) (Glycine hispida)
       7      70409  Hordeum vulgare var. distichum (Two-rowed barley)
       8      69085  Macaca mulatta (Rhesus macaque)
       9      60528  Zea mays (Maize)
      10      59487  Hepatitis B virus (HBV)
      11      56505  Mus musculus (Mouse)
      12      56141  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      13      54889  Solanum tuberosum (Potato)
      14      54103  Vitis vinifera (Grape)
      15      51924  Danio rerio (Zebrafish) (Brachydanio rerio)
      16      50601  Trichomonas vaginalis
      17      49236  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      18      48886  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      19      44560  Populus trichocarpa (Western balsam poplar) 
      20      43168  Callithrix jacchus (White-tufted-ear marmoset)
      21      41680  Arabidopsis thaliana (Mouse-ear cress)
      22      41201  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      23      39850  Paramecium tetraurelia
      24      39829  Oryza sativa subsp. indica (Rice)
      25      39299  Setaria italica (Foxtail millet) (Panicum italicum)
      26      38791  Mustela putorius furo (European domestic ferret) (Mustela furo)
      27      38163  human gut metagenome
      28      36522  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      29      35895  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      30      35621  Ailuropoda melanoleuca (Giant panda)
      31      35195  Acyrthosiphon pisum (Pea aphid)
      32      35066  Caenorhabditis japonica
      33      34927  Simian immunodeficiency virus (SIV)
      34      34828  Physcomitrella patens subsp. patens (Moss)
      35      34633  Drosophila melanogaster (Fruit fly)
      36      34569  Thalassiosira oceanica (Marine diatom)
      37      33821  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      38      33252  Selaginella moellendorffii (Spikemoss)
      39      32767  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      40      32342  Oryza brachyantha
      41      32145  Sus scrofa (Pig)
      42      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      43      32094  Oryza glaberrima (African rice)
      44      31848  Pan troglodytes (Chimpanzee)
      45      31384  Ricinus communis (Castor bean)
      46      30920  Daphnia pulex (Water flea)
      47      30300  Caenorhabditis brenneri (Nematode worm)
      48      30145  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      49      29815  Amphimedon queenslandica (Sponge)
      50      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      51      29316  Pristionchus pacificus (Parasitic nematode)
      52      29179  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      53      29054  Oikopleura dioica (Tunicate)
      54      28836  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      55      28737  Escherichia coli
      56      28610  Prunus persica (Peach) (Amygdalus persica)
      57      28484  Canis familiaris (Dog) (Canis lupus familiaris)
      58      28070  Gasterosteus aculeatus (Three-spined stickleback)
      59      27732  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      60      27501  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      61      27450  Equus caballus (Horse)
      62      27102  Gorilla gorilla gorilla (Lowland gorilla)
      63      26849  Gallus gallus (Chicken)
      64      26823  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      65      25907  Oryzias latipes (Medaka fish) (Japanese ricefish)
      66      25796  Loxodonta africana (African elephant)
      67      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      68      25618  Bos taurus (Bovine)
      69      25614  Rattus norvegicus (Rat)
      70      25086  Oryctolagus cuniculus (Rabbit)
      71      24904  Nematostella vectensis (Starlet sea anemone)
      72      24643  Tetrahymena thermophila (strain SB210)
      73      24590  Guillardia theta CCMP2712
      74      24373  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      75      24207  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      76      23716  Ornithorhynchus anatinus (Duckbill platypus)
      77      23565  Oxytricha trifallax
      78      23498  Latimeria chalumnae (West Indian ocean coelacanth)
      79      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      80      22742  Monodelphis domestica (Gray short-tailed opossum)
      81      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      82      22503  Caenorhabditis elegans
      83      22312  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      84      22163  gut metagenome
      85      22116  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      86      21548  Heterocephalus glaber (Naked mole rat)
      87      21340  Caenorhabditis briggsae
      88      21106  Ixodes scapularis (Black-legged tick) (Deer tick)
      89      20934  Felis catus (Cat) (Felis silvestris catus)
      90      20861  Myotis lucifugus (Little brown bat)
      91      20838  Tupaia chinensis (Chinese tree shrew)
      92      20758  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
      93      20512  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
      94      20133  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      95      20114  Ciona savignyi (Pacific transparent sea squirt)
      96      20072  Cavia porcellus (Guinea pig)
      97      19985  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
      98      19816  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      99      19678  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     100      19544  Pteropus alecto (Black flying fox)
     101      19438  Wuchereria bancrofti
     102      19329  Toxoplasma gondii
     103      19312  Anolis carolinensis (Green anole) (American chameleon)
     104      19200  Trypanosoma cruzi (strain CL Brener)
     105      19057  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     106      18943  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     107      18856  Drosophila simulans (Fruit fly)
     108      18771  mine drainage metagenome
     109      18592  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     110      18555  Bos grunniens mutus
     111      18121  Atta cephalotes (Leafcutter ant)
     112      18008  Anopheles gambiae (African malaria mosquito)
     113      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     114      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
     115      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     116      17512  Bombyx mori (Silk moth)
     117      17397  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     118      17282  Nasonia vitripennis (Parasitic wasp)
     119      17040  Drosophila yakuba (Fruit fly)
     120      17036  Tribolium castaneum (Red flour beetle)
     121      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     122      16894  Meleagris gallopavo (Common turkey)
     123      16714  Drosophila persimilis (Fruit fly)
     124      16643  Fusarium oxysporum f. sp. lycopersici  
     125      16468  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     126      16426  Ectocarpus siliculosus (Brown alga)
     127      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     128      16328  Plasmodium falciparum
     129      16319  Hepatitis C virus subtype 1b
     130      16315  Danaus plexippus (Monarch butterfly)
     131      16273  Trichinella spiralis (Trichina worm)
     132      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     133      16188  Drosophila sechellia (Fruit fly)
     134      16146  Schistosoma japonicum (Blood fluke)
     135      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     136      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     137      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     138      15716  Naegleria gruberi (Amoeba)
     139      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     140      15568  Phytophthora ramorum (Sudden oak death agent)
     141      15461  Myotis davidii (David's myotis)
     142      15421  Drosophila willistoni (Fruit fly)
     143      15371  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     144      15354  Loa loa (Eye worm) (Filaria loa)
     145      15225  Pythium ultimum
     146      15177  Hepatitis C virus subtype 1a
     147      15144  Drosophila ananassae (Fruit fly)
     148      15040  Harpegnathos saltator (Jerdon's jumping ant)
     149      14937  Acanthamoeba castellanii str. Neff
     150      14927  Drosophila erecta (Fruit fly)
     151      14857  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     152      14801  Camponotus floridanus (Florida carpenter ant)
     153      14791  Drosophila mojavensis (Fruit fly)
     154      14713  Plasmodium chabaudi
     155      14704  Drosophila virilis (Fruit fly)
     156      14652  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     157      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     158      14417  Volvox carteri (Green alga)
     159      14341  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     160      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     161      14275  Ralstonia solanacearum (Pseudomonas solanacearum)
     162      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     163      14114  uncultured archaeon
     164      13970  Acromyrmex echinatior (Panamanian leafcutter ant) 
     165      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     166      13876  Clonorchis sinensis (Chinese liver fluke)
     167      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     168      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     169      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     170      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     171      13587  Trypanosoma cruzi
     172      13513  Rabies virus
     173      13345  Aspergillus flavus 
     174      13266  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     175      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     176      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     177      12983  Albugo laibachii Nc14
     178      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     179      12950  Stigmatella aurantiaca (strain DW4/3-1)
     180      12900  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     181      12858  Magnaporthe oryzae Y34
     182      12857  Bipolaris maydis C5
     183      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     184      12711  Magnaporthe oryzae P131
     185      12696  Trypanosoma congolense (strain IL3000)
     186      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     187      12680  Schistosoma mansoni (Blood fluke)
     188      12635  Porcine reproductive and respiratory syndrome virus (PRRSV)
     189      12619  Xenopus laevis (African clawed frog)
     190      12586  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     191      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     192      12440  Polysphondylium pallidum (Cellular slime mold)
     193      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     194      12352  Dictyostelium purpureum (Slime mold)
     195      12223  Helicobacter pylori (Campylobacter pylori)
     196      12197  Rhizoctonia solani AG-1 IB
     197      12174  Bipolaris sorokiniana ND90Pr
     198      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     199      12078  Ceriporiopsis subvermispora B
     200      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     201      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     202      11941  Emericella nidulans  
     203      11931  Apis mellifera (Honeybee)
     204      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     205      11780  Piriformospora indica (strain DSM 11827)
     206      11752  Chondrocladia sp. SMF<DEU
     207      11751  Cladorhiza sp. SMF<DEU
     208      11750  Abyssocladia sp. SMF<DEU
     209      11726  Phelloderma sp. SMF<DEU
     210      11719  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     211      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     212      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     213      11682  Eutypa lata UCREL1
     214      11678  Anopheles darlingi (Mosquito)
     215      11644  Plasmodium berghei (strain Anka)
     216      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     217      11567  Trichoplax adhaerens (Trichoplax reptans)
     218      11557  Trypanosoma vivax (strain Y486)
     219      11515  Puccinia triticina (isolate 1-1 / race 1 (BBBD)) (Brown leaf rust fungus)
     220      11514  Aureococcus anophagefferens (Harmful bloom alga)
     221      11499  Brugia malayi (Filarial nematode worm)
     222      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     223      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     224      11396  Aspergillus oryzae (strain 3.042) (Yellow koji mold)
     225      11303  Magnaporthe poae (strain ATCC 64411 / 73-15) (Kentucky bluegrass fungus)
     226      11278  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     227      11211  Ktedonobacter racemifer DSM 44963
     228      11211  Agaricus bisporus var. burnettii (strain JB137-S8 / ATCC MYA-4627 / FGSC 10392) 
     229      11205  Rhipicephalus pulchellus
     230      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     231      11018  Botryotinia fuckeliana BcDW1
     232      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     233      10964  Streptomyces clavuligerus 
     234      10949  Aspergillus niger 
     235      10839  Pediculus humanus subsp. corporis (Body louse)
     236      10822  Chaetomium globosum  
     237      10570  Metarhizium anisopliae (strain ARSEF 23 / ATCC MYA-3075)
     238      10563  Amycolatopsis mediterranei S699
     239      10561  Klebsiella pneumoniae
     240      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     241      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     242      10508  Baudoinia compniacensis UAMH 10762
     243      10499  Rhizoctonia solani AG-1 IA
     244      10397  Agaricus bisporus var. bisporus (strain H97 / ATCC MYA-4626 / FGSC 10389) 
     245      10394  Pseudocercospora fijiensis CIRAD86
     246      10387  Pseudomonas syringae pv. glycinea str. race 4
     247      10381  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     248      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     249      10368  Cystobacter fuscus DSM 2262
     250      10361  Beauveria bassiana (strain ARSEF 2860) (White muscardine disease fungus) 


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          672982 (  2%)
    Bacteria       25261484 ( 71%)
    Eukaryota       7764851 ( 22%)
    Viruses         1699620 (  5%)
    Other            103580 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 114067 (  1%)           (  0%)
     Other Mammalia        970261 ( 12%)           (  3%)
     Other Vertebrata      811638 ( 10%)           (  2%)
     Viridiplantae        1619940 ( 21%)           (  5%)
     Fungi                1778301 ( 23%)           (  5%)
     Insecta               817655 ( 11%)           (  2%)
     Nematoda              253110 (  3%)           (  1%)
     Other                1399879 ( 18%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  950508             1001-1100   201225
                 51- 100 3114919             1101-1200   139681
                101- 150 3456218             1201-1300    99807
                151- 200 3352819             1301-1400    61794
                201- 250 3368254             1401-1500    50778
                251- 300 3254157             1501-1600    34839
                301- 350 2959822             1601-1700    26152
                351- 400 2235245             1701-1800    19874
                401- 450 1933109             1801-1900    16190
                451- 500 1584759             1901-2000    13778
                501- 550 1036738             2001-2100    10741
                551- 600  797339             2101-2200    11113
                601- 650  582660             2201-2300     8509
                651- 700  459591             2301-2400     6807
                701- 750  384920             2401-2500     5978
                751- 800  337327             >2500        47312
                801- 850  260911
                851- 900  231256
                901- 950  159290
                951-1000  115292

   


   The average sequence length in UniProtKB/TrEMBL is   320 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    42869491                1.21                                                    
   Submitted to EMBL/GenBank/DDBJ  23647566  22049069      0.67                                                    
   Journal                         17455167  16480849      0.49                                                    
   Submitted to other databases     1749780   1740003      0.05                                                    
   Thesis                             10249     10191     <0.01                                                    
   Book citation                       6728      6678     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 468154


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      46577497                1.31                                                    
   CATALYTIC ACTIVITY               3849285   3511815      0.11     4                                              
   CAUTION                         18022014  18005745      0.51     1                                              
   COFACTOR                         1526715   1424147      0.04     8                                              
   DOMAIN                            159242    153157     <0.01     9                                              
   FUNCTION                         4359107   4102002      0.12     3                                              
   INTERACTION                         1252      1252     <0.01    11                                              
   MISCELLANEOUS                     106371    106175     <0.01    10                                              
   PATHWAY                          1932472   1758596      0.05     7                                              
   SIMILARITY                      10960929   9531257      0.31     2                                              
   SUBCELLULAR LOCATION             3461337   3308163      0.10     5                                              
   SUBUNIT                          2198773   2175521      0.06     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       8072497                0.23                                                    
   CHAIN                             854285    703439      0.02     2                                              
   NON_TER                          6533786   4173795      0.18     1                                              
   SIGNAL                            683470    680026      0.02     3                                              
   TRANSIT                              956       956     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             405819641               11.43                                                    
   Allergome                           3416      2788     <0.01    83   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   101   Organism-specific databases                
   ArrayExpress                      202653    202653      0.01    43   Gene expression databases                  
   BRENDA                              2654      2625     <0.01    85   Enzyme and pathway databases               
   Bgee                              101270    101270     <0.01    50   Gene expression databases                  
   BindingDB                           5841      5841     <0.01    77   Other                                      
   BioCyc                           5640270   5572945      0.16    16   Enzyme and pathway databases               
   CAZy                               74016     69543     <0.01    55   Protein family/group databases             
   CGD                                 7054      7054     <0.01    76   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   107   2D gel databases                           
   CTD                               341831    340491      0.01    38   Organism-specific databases                
   ChEMBL                               575       575     <0.01    93   Other                                      
   ChiTaRS                            66853     66853     <0.01    56   Other                                      
   ConoServer                           160       160     <0.01    98   Organism-specific databases                
   DIP                                 2818      2813     <0.01    84   Protein-protein interaction databases      
   DNASU                              42726     42392     <0.01    61   Protocols and materials databases          
   EMBL                            38562389  34484238      1.09     3   Sequence databases                         
   Ensembl                          1009287    994638      0.03    29   Genome annotation databases                
   EnsemblBacteria                 18691470  18413883      0.53     5   Genome annotation databases                
   EnsemblFungi                      351527    349533      0.01    37   Genome annotation databases                
   EnsemblMetazoa                    663751    648432      0.02    32   Genome annotation databases                
   EnsemblPlants                     620727    587638      0.02    33   Genome annotation databases                
   EnsemblProtists                   156294    153898     <0.01    47   Genome annotation databases                
   EuPathDB                           98298     98155     <0.01    51   Organism-specific databases                
   EvolutionaryTrace                   8077      8077     <0.01    74   Other                                      
   FlyBase                           196563    195096      0.01    44   Organism-specific databases                
   GO                              68233816  21786429      1.92     2   Ontologies                                 
   Gene3D                          16187179  12775204      0.46     8   Family and domain databases                
   GeneID                           9509047   9258081      0.27    10   Genome annotation databases                
   GeneTree                          843115    843059      0.02    30   Phylogenomic databases                     
   Genevestigator                     86873     86866     <0.01    52   Gene expression databases                  
   GenoList                           14733     14460     <0.01    72   Organism-specific databases                
   GenomeRNAi                         20682     20682     <0.01    66   Other                                      
   Gramene                           204107    204107      0.01    42   Organism-specific databases                
   H-InvDB                              622       474     <0.01    92   Organism-specific databases                
   HAMAP                            3758202   3710610      0.11    20   Family and domain databases                
   HGNC                               48800     48730     <0.01    59   Organism-specific databases                
   HOGENOM                          3654777   3654732      0.10    21   Phylogenomic databases                     
   HOVERGEN                          306043    306032      0.01    39   Phylogenomic databases                     
   IPI                               288589    287815      0.01    40   Sequence databases                         
   InParanoid                        186997    186997      0.01    45   Phylogenomic databases                     
   IntAct                             17355     17355     <0.01    69   Protein-protein interaction databases      
   InterPro                        78130833  27431015      2.20     1   Family and domain databases                
   KEGG                             8705350   8507048      0.25    11   Genome annotation databases                
   KO                               3465462   3450311      0.10    22   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    79   Organism-specific databases                
   Leproma                             1272      1270     <0.01    88   Organism-specific databases                
   MEROPS                            138967    138966     <0.01    49   Protein family/group databases             
   MGI                                51878     51413     <0.01    58   Organism-specific databases                
   MINT                               10315     10314     <0.01    73   Protein-protein interaction databases      
   NextBio                           211420    211412      0.01    41   Other                                      
   OMA                              4864556   4864340      0.14    19   Phylogenomic databases                     
   OrthoDB                           553408    553365      0.02    34   Phylogenomic databases                     
   PANTHER                          5205538   4903000      0.15    18   Family and domain databases                
   PATRIC                           8304026   8303909      0.23    13   Genome annotation databases                
   PDB                                19380     10853     <0.01    67   3D structure databases                     
   PDBsum                             19163     10676     <0.01    68   3D structure databases                     
   PIR                               172526    139696     <0.01    46   Sequence databases                         
   PIRSF                            3163072   3159870      0.09    23   Family and domain databases                
   PMAP-CutDB                           211       211     <0.01    96   Other                                      
   PRIDE                             453752    453752      0.01    36   Proteomic databases                        
   PRINTS                           5298266   4735632      0.15    17   Family and domain databases                
   PROSITE                         17566234  11664392      0.49     6   Family and domain databases                
   Pathway_Interaction_DB                10         8     <0.01   106   Enzyme and pathway databases               
   PaxDb                              29197     29196     <0.01    64   Proteomic databases                        
   PeptideAtlas                         130       130     <0.01    99   Proteomic databases                        
   PeroxiBase                          2578      2570     <0.01    86   Protein family/group databases             
   Pfam                            34892591  25573111      0.98     4   Family and domain databases                
   PharmGKB                            3799      3799     <0.01    82   Organism-specific databases                
   PhosphoSite                         1131      1131     <0.01    89   PTM databases                              
   PhylomeDB                         145079    145079     <0.01    48   Phylogenomic databases                     
   PomBase                               40        27     <0.01   102   Organism-specific databases                
   PptaseDB                              36        35     <0.01   103   Protein family/group databases             
   ProDom                            705334    677834      0.02    31   Family and domain databases                
   ProMEX                              5244      5244     <0.01    78   Proteomic databases                        
   ProtClustDB                      2719880   2719868      0.08    26   Phylogenomic databases                     
   ProteinModelPortal               8479305   8479305      0.24    12   3D structure databases                     
   PseudoCAP                           4535      4529     <0.01    80   Organism-specific databases                
   REBASE                             37133     37114     <0.01    63   Protein family/group databases             
   REPRODUCTION-2DPAGE                   66        65     <0.01   100   2D gel databases                           
   RGD                                21101     20144     <0.01    65   Organism-specific databases                
   Reactome                             177       142     <0.01    97   Enzyme and pathway databases               
   RefSeq                           9567988   9284614      0.27     9   Sequence databases                         
   SABIO-RK                             484       484     <0.01    94   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   105   Organism-specific databases                
   SMART                            7789396   5902452      0.22    15   Family and domain databases                
   SMR                              2073873   2073873      0.06    27   3D structure databases                     
   STRING                           2904297   2904228      0.08    24   Protein-protein interaction databases      
   SUPFAM                          16342041  13190913      0.46     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   104   2D gel databases                           
   SignaLink                           4469      4469     <0.01    81   Enzyme and pathway databases               
   TAIR                               15409     15336     <0.01    71   Organism-specific databases                
   TCDB                                2381      2370     <0.01    87   Protein family/group databases             
   TIGRFAMs                         8269680   7547337      0.23    14   Family and domain databases                
   TubercuList                         1111      1110     <0.01    90   Organism-specific databases                
   UCSC                               58894     58847     <0.01    57   Genome annotation databases                
   UniGene                           553378    523433      0.02    35   Sequence databases                         
   UniPathway                       1602182   1491500      0.05    28   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    53   Genome annotation databases                
   World-2DPAGE                         673       668     <0.01    91   2D gel databases                           
   WormBase                           42427     42254     <0.01    62   Organism-specific databases                
   Xenbase                            16039     16007     <0.01    70   Organism-specific databases                
   ZFIN                               44553     44300     <0.01    60   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    75   Organism-specific databases                
   eggNOG                           2768733   2768713      0.08    25   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    54   Organism-specific databases                
   mycoCLAP                             422       422     <0.01    95   Protein family/group databases             

Number of explicitly cross-referenced databases: 128


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.66   Gln (Q) 3.98   Leu (L) 9.96   Ser (S) 6.63
   Arg (R) 5.43   Glu (E) 6.19   Lys (K) 5.26   Thr (T) 5.55
   Asn (N) 4.09   Gly (G) 7.09   Met (M) 2.47   Trp (W) 1.30
   Asp (D) 5.33   His (H) 2.20   Phe (F) 4.03   Tyr (Y) 3.03
   Cys (C) 1.23   Ile (I) 6.00   Pro (P) 4.65   Val (V) 6.79

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 621146
Total number of entries encoded on a Plasmid: 341741
Total number of entries encoded on a Plastid: 26404
Total number of entries encoded on a Plastid; Apicoplast: 719
Total number of entries encoded on a Plastid; Chloroplast: 226837
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 993