Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2013_05 STATISTICS


1.  INTRODUCTION

Release 2013_05 of 01-May-2013 of UniProtKB/TrEMBL contains 33995348 sequence entries,
comprising 10924561758 amino acids .

903961 sequences have been added since release 2013_04, the sequence data of
4324 existing entries has been updated and the annotations of
10537476 entries have been revised. This represents an increase of 3%.

Number of fragments: 4100897

Protein existence (PE):              entries      %
1: Evidence at protein level           20120     0.06%
2: Evidence at transcript level       823021     2.42%
3: Inferred from homology            7548231    22.20%
4: Predicted                        25603976    75.32%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 410256

   The first twenty species represent 1857998 sequences:   5.5 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:17164
                            2x:67993
                            3x:36708
                            4x:24640
                            5x:15470
                            6x:11155
                            7x: 8438
                            8x: 6569
                            9x: 5230
                           10x:10271
                       11- 20x:28350
                       21- 50x: 9689
                       51-100x: 3732
                         >100x:10368


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     523805  Human immunodeficiency virus 1
       2     195513  uncultured bacterium
       3     113824  Homo sapiens (Human)
       4      96921  Oryza sativa subsp. japonica (Rice)
       5      85142  Hepatitis C virus
       6      73811  Glycine max (Soybean) (Glycine hispida)
       7      70409  Hordeum vulgare var. distichum (Two-rowed barley)
       8      69085  Macaca mulatta (Rhesus macaque)
       9      60531  Zea mays (Maize)
      10      59047  Hepatitis B virus (HBV)
      11      56489  Mus musculus (Mouse)
      12      56140  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      13      54887  Solanum tuberosum (Potato)
      14      54101  Vitis vinifera (Grape)
      15      51843  Danio rerio (Zebrafish) (Brachydanio rerio)
      16      50601  Trichomonas vaginalis
      17      49236  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      18      48886  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      19      44560  Populus trichocarpa (Western balsam poplar) 
      20      43167  Callithrix jacchus (White-tufted-ear marmoset)
      21      41736  Arabidopsis thaliana (Mouse-ear cress)
      22      41188  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      23      39850  Paramecium tetraurelia
      24      39828  Oryza sativa subsp. indica (Rice)
      25      39296  Setaria italica (Foxtail millet) (Panicum italicum)
      26      38791  Mustela putorius furo (European domestic ferret) (Mustela furo)
      27      38163  human gut metagenome
      28      36522  Musa acuminata subsp. malaccensis
      29      35893  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      30      35604  Ailuropoda melanoleuca (Giant panda)
      31      35195  Acyrthosiphon pisum (Pea aphid)
      32      35066  Caenorhabditis japonica
      33      34828  Physcomitrella patens subsp. patens (Moss)
      34      34623  Drosophila melanogaster (Fruit fly)
      35      34569  Thalassiosira oceanica (Marine diatom)
      36      33821  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      37      33252  Selaginella moellendorffii (Spikemoss)
      38      32767  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      39      32342  Oryza brachyantha
      40      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      41      32113  Sus scrofa (Pig)
      42      32094  Oryza glaberrima (African rice)
      43      31848  Pan troglodytes (Chimpanzee)
      44      31844  Simian immunodeficiency virus (SIV)
      45      31384  Ricinus communis (Castor bean)
      46      30920  Daphnia pulex (Water flea)
      47      30300  Caenorhabditis brenneri (Nematode worm)
      48      30145  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      49      29815  Amphimedon queenslandica (Sponge)
      50      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      51      29316  Pristionchus pacificus (Parasitic nematode)
      52      29178  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      53      29053  Oikopleura dioica (Tunicate)
      54      28837  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      55      28701  Escherichia coli
      56      28483  Canis familiaris (Dog) (Canis lupus familiaris)
      57      28061  Gasterosteus aculeatus (Three-spined stickleback)
      58      27731  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      59      27501  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      60      27421  Equus caballus (Horse)
      61      27102  Gorilla gorilla gorilla (Lowland gorilla)
      62      26837  Gallus gallus (Chicken)
      63      26821  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      64      25907  Oryzias latipes (Medaka fish) (Japanese ricefish)
      65      25796  Loxodonta africana (African elephant)
      66      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      67      25606  Rattus norvegicus (Rat)
      68      25491  Bos taurus (Bovine)
      69      25084  Oryctolagus cuniculus (Rabbit)
      70      24903  Nematostella vectensis (Starlet sea anemone)
      71      24643  Tetrahymena thermophila (strain SB210)
      72      24590  Guillardia theta CCMP2712
      73      24205  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      74      23715  Ornithorhynchus anatinus (Duckbill platypus)
      75      23565  Oxytricha trifallax
      76      23498  Latimeria chalumnae (West Indian ocean coelacanth)
      77      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      78      22718  Monodelphis domestica (Gray short-tailed opossum)
      79      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      80      22505  Caenorhabditis elegans
      81      22303  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      82      22163  gut metagenome
      83      21548  Heterocephalus glaber (Naked mole rat)
      84      21341  Caenorhabditis briggsae
      85      21089  Ixodes scapularis (Black-legged tick) (Deer tick)
      86      20933  Felis catus (Cat) (Felis silvestris catus)
      87      20861  Myotis lucifugus (Little brown bat)
      88      20838  Tupaia chinensis (Chinese tree shrew)
      89      20758  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
      90      20507  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
      91      20133  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      92      20114  Ciona savignyi (Pacific transparent sea squirt)
      93      20072  Cavia porcellus (Guinea pig)
      94      19984  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
      95      19816  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      96      19678  Taeniopygia guttata (Zebra finch) (Poephila guttata)
      97      19544  Pteropus alecto (Black flying fox)
      98      19438  Wuchereria bancrofti
      99      19332  Toxoplasma gondii
     100      19262  Anolis carolinensis (Green anole) (American chameleon)
     101      19200  Trypanosoma cruzi (strain CL Brener)
     102      18941  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     103      18856  Drosophila simulans (Fruit fly)
     104      18771  mine drainage metagenome
     105      18592  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     106      18555  Bos grunniens mutus
     107      18121  Atta cephalotes (Leafcutter ant)
     108      17999  Anopheles gambiae (African malaria mosquito)
     109      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     110      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
     111      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     112      17506  Bombyx mori (Silk moth)
     113      17394  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     114      17282  Nasonia vitripennis (Parasitic wasp)
     115      17040  Drosophila yakuba (Fruit fly)
     116      17022  Tribolium castaneum (Red flour beetle)
     117      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     118      16894  Meleagris gallopavo (Common turkey)
     119      16714  Drosophila persimilis (Fruit fly)
     120      16643  Fusarium oxysporum f. sp. lycopersici  
     121      16470  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     122      16426  Ectocarpus siliculosus (Brown alga)
     123      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     124      16328  Plasmodium falciparum
     125      16319  Hepatitis C virus subtype 1b
     126      16315  Danaus plexippus (Monarch butterfly)
     127      16273  Trichinella spiralis (Trichina worm)
     128      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     129      16188  Drosophila sechellia (Fruit fly)
     130      16142  Schistosoma japonicum (Blood fluke)
     131      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     132      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     133      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     134      15716  Naegleria gruberi (Amoeba)
     135      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     136      15568  Phytophthora ramorum (Sudden oak death agent)
     137      15461  Myotis davidii (David's myotis)
     138      15420  Drosophila willistoni (Fruit fly)
     139      15371  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     140      15354  Loa loa (Eye worm) (Filaria loa)
     141      15225  Pythium ultimum
     142      15177  Hepatitis C virus subtype 1a
     143      15144  Drosophila ananassae (Fruit fly)
     144      15040  Harpegnathos saltator (Jerdon's jumping ant)
     145      14937  Acanthamoeba castellanii str. Neff
     146      14927  Drosophila erecta (Fruit fly)
     147      14856  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     148      14801  Camponotus floridanus (Florida carpenter ant)
     149      14788  Drosophila mojavensis (Fruit fly)
     150      14713  Plasmodium chabaudi
     151      14701  Drosophila virilis (Fruit fly)
     152      14650  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     153      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     154      14417  Volvox carteri (Green alga)
     155      14341  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     156      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     157      14275  Ralstonia solanacearum (Pseudomonas solanacearum)
     158      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     159      13966  Acromyrmex echinatior (Panamanian leafcutter ant) 
     160      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     161      13868  Clonorchis sinensis (Chinese liver fluke)
     162      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     163      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     164      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     165      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     166      13544  Trypanosoma cruzi
     167      13509  uncultured archaeon
     168      13345  Aspergillus flavus 
     169      13266  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     170      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     171      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     172      12983  Albugo laibachii Nc14
     173      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     174      12950  Stigmatella aurantiaca (strain DW4/3-1)
     175      12900  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     176      12863  Rabies virus
     177      12858  Magnaporthe oryzae Y34
     178      12857  Cochliobolus heterostrophus C5
     179      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     180      12711  Magnaporthe oryzae P131
     181      12696  Trypanosoma congolense (strain IL3000)
     182      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     183      12680  Schistosoma mansoni (Blood fluke)
     184      12617  Xenopus laevis (African clawed frog)
     185      12586  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     186      12470  Porcine reproductive and respiratory syndrome virus (PRRSV)
     187      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     188      12440  Polysphondylium pallidum (Cellular slime mold)
     189      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     190      12352  Dictyostelium purpureum (Slime mold)
     191      12204  Helicobacter pylori (Campylobacter pylori)
     192      12174  Cochliobolus sativus ND90Pr
     193      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     194      12078  Ceriporiopsis subvermispora B
     195      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     196      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     197      11941  Emericella nidulans  
     198      11931  Apis mellifera (Honeybee)
     199      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     200      11780  Piriformospora indica (strain DSM 11827)
     201      11752  Chondrocladia sp. SMF<DEU
     202      11751  Cladorhiza sp. SMF<DEU
     203      11750  Abyssocladia sp. SMF<DEU
     204      11726  Phelloderma sp. SMF<DEU
     205      11719  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     206      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     207      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     208      11678  Anopheles darlingi (Mosquito)
     209      11644  Plasmodium berghei (strain Anka)
     210      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     211      11566  Trichoplax adhaerens (Trichoplax reptans)
     212      11557  Trypanosoma vivax (strain Y486)
     213      11515  Puccinia triticina (isolate 1-1 / race 1 (BBBD)) (Brown leaf rust fungus)
     214      11514  Aureococcus anophagefferens (Harmful bloom alga)
     215      11499  Brugia malayi (Filarial nematode worm)
     216      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     217      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     218      11396  Aspergillus oryzae (strain 3.042) (Yellow koji mold)
     219      11303  Magnaporthe poae (strain ATCC 64411 / 73-15) (Kentucky bluegrass fungus)
     220      11278  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     221      11211  Ktedonobacter racemifer DSM 44963
     222      11211  Agaricus bisporus var. burnettii (strain JB137-S8 / ATCC MYA-4627 / FGSC 10392) 
     223      11205  Rhipicephalus pulchellus
     224      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     225      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     226      10964  Streptomyces clavuligerus 
     227      10949  Aspergillus niger 
     228      10839  Pediculus humanus subsp. corporis (Body louse)
     229      10822  Chaetomium globosum  
     230      10570  Metarhizium anisopliae (strain ARSEF 23 / ATCC MYA-3075)
     231      10563  Amycolatopsis mediterranei S699
     232      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     233      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     234      10508  Baudoinia compniacensis UAMH 10762
     235      10499  Rhizoctonia solani AG-1 IA
     236      10471  Klebsiella pneumoniae
     237      10397  Agaricus bisporus var. bisporus (strain H97 / ATCC MYA-4626 / FGSC 10389) 
     238      10394  Pseudocercospora fijiensis CIRAD86
     239      10387  Pseudomonas syringae pv. glycinea str. race 4
     240      10381  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     241      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     242      10368  Cystobacter fuscus DSM 2262
     243      10361  Beauveria bassiana (strain ARSEF 2860) (White muscardine disease fungus) 
     244      10354  Phaeodactylum tricornutum (strain CCAP 1055/1)
     245      10273  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
     246      10221  Shigella flexneri 1235-66
     247      10216  Burkholderia terrae BS001
     248      10204  Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 
     249      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
     250      10179  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          667946 (  2%)
    Bacteria       23978439 ( 71%)
    Eukaryota       7577959 ( 22%)
    Viruses         1667575 (  5%)
    Other            103428 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 113860 (  2%)           (  0%)
     Other Mammalia        969350 ( 13%)           (  3%)
     Other Vertebrata      788930 ( 10%)           (  2%)
     Viridiplantae        1543228 ( 20%)           (  5%)
     Fungi                1709296 ( 23%)           (  5%)
     Insecta               814484 ( 11%)           (  2%)
     Nematoda              253061 (  3%)           (  1%)
     Other                1385750 ( 18%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  877802             1001-1100   193384
                 51- 100 2957949             1101-1200   134331
                101- 150 3296114             1201-1300    96003
                151- 200 3199485             1301-1400    59866
                201- 250 3217381             1401-1500    48917
                251- 300 3109969             1501-1600    33664
                301- 350 2827778             1601-1700    25401
                351- 400 2135818             1701-1800    19179
                401- 450 1846804             1801-1900    15800
                451- 500 1515344             1901-2000    13363
                501- 550  992003             2001-2100    10490
                551- 600  764948             2101-2200    10827
                601- 650  556935             2201-2300     8315
                651- 700  439101             2301-2400     6627
                701- 750  369167             2401-2500     5824
                751- 800  324731             >2500        46309
                801- 850  249594
                851- 900  221623
                901- 950  152811
                951-1000  110794

   


   The average sequence length in UniProtKB/TrEMBL is   321 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    41570299                1.22                                                    
   Submitted to EMBL/GenBank/DDBJ  23624743  21666462      0.69                                                    
   Journal                         16179130  15278473      0.48                                                    
   Submitted to other databases     1749502   1739844      0.05                                                    
   Thesis                             10232     10174     <0.01                                                    
   Book citation                       6673      6624     <0.01                                                    
   Unpublished observations              18        18     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 465164


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      41240101                1.21                                                    
   CATALYTIC ACTIVITY               3308418   3019667      0.10     4                                              
   CAUTION                         16845801  16839248      0.50     1                                              
   COFACTOR                         1285956   1203459      0.04     8                                              
   DOMAIN                            130531    125292     <0.01     9                                              
   FUNCTION                         3697597   3466860      0.11     3                                              
   INTERACTION                         1186      1186     <0.01    11                                              
   MISCELLANEOUS                      91635     91525     <0.01    10                                              
   PATHWAY                          1645249   1500095      0.05     7                                              
   SIMILARITY                       9455840   8214491      0.28     2                                              
   SUBCELLULAR LOCATION             2948871   2818067      0.09     5                                              
   SUBUNIT                          1829017   1806589      0.05     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       7943144                0.23                                                    
   CHAIN                             843607    693899      0.02     2                                              
   NON_TER                          6432298   4101790      0.19     1                                              
   SIGNAL                            666381    663097      0.02     3                                              
   TRANSIT                              858       858     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             383667227               11.29                                                    
   Allergome                           3393      2766     <0.01    83   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   101   Organism-specific databases                
   ArrayExpress                      218954    218954      0.01    42   Gene expression databases                  
   BRENDA                              2658      2629     <0.01    85   Enzyme and pathway databases               
   Bgee                              102712    102712     <0.01    52   Gene expression databases                  
   BindingDB                           5955      5955     <0.01    78   Other                                      
   BioCyc                           3255827   3220245      0.10    22   Enzyme and pathway databases               
   CAZy                               74028     69555     <0.01    56   Protein family/group databases             
   CGD                                 7054      7054     <0.01    77   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   107   2D gel databases                           
   CTD                               332780    331463      0.01    37   Organism-specific databases                
   ChEMBL                               575       575     <0.01    93   Other                                      
   ChiTaRS                            66866     66866     <0.01    57   Other                                      
   ConoServer                           160       160     <0.01    98   Organism-specific databases                
   DIP                                 2826      2821     <0.01    84   Protein-protein interaction databases      
   DNASU                              42740     42406     <0.01    62   Protocols and materials databases          
   EMBL                            36993623  32977069      1.09     3   Sequence databases                         
   Ensembl                          1009080    994448      0.03    29   Genome annotation databases                
   EnsemblBacteria                 18704300  18426663      0.55     5   Genome annotation databases                
   EnsemblFungi                      325900    324116      0.01    38   Genome annotation databases                
   EnsemblMetazoa                    663754    648435      0.02    32   Genome annotation databases                
   EnsemblPlants                     620797    587699      0.02    33   Genome annotation databases                
   EnsemblProtists                   156294    153898     <0.01    48   Genome annotation databases                
   EuPathDB                          147096    146644     <0.01    50   Organism-specific databases                
   EvolutionaryTrace                   8106      8106     <0.01    75   Other                                      
   FlyBase                           196563    195096      0.01    45   Organism-specific databases                
   GO                              66325976  20891939      1.95     2   Ontologies                                 
   Gene3D                          14301165  11397729      0.42     7   Family and domain databases                
   GeneID                           9309938   9077859      0.27    10   Genome annotation databases                
   GeneTree                          843131    843075      0.02    30   Phylogenomic databases                     
   Genevestigator                     86985     86979     <0.01    53   Gene expression databases                  
   GenoList                           14733     14460     <0.01    73   Organism-specific databases                
   GenomeRNAi                         20700     20700     <0.01    67   Other                                      
   Gramene                           197834    197834      0.01    44   Organism-specific databases                
   H-InvDB                              622       474     <0.01    92   Organism-specific databases                
   HAMAP                            3500990   3457343      0.10    20   Family and domain databases                
   HGNC                               48809     48739     <0.01    60   Organism-specific databases                
   HOGENOM                          3654882   3654838      0.11    19   Phylogenomic databases                     
   HOVERGEN                          306067    306056      0.01    39   Phylogenomic databases                     
   HSSP                              249915    249789      0.01    41   3D structure databases                     
   IPI                               288744    287970      0.01    40   Sequence databases                         
   InParanoid                        187048    187048      0.01    46   Phylogenomic databases                     
   IntAct                             17157     17157     <0.01    71   Protein-protein interaction databases      
   InterPro                        71621756  25861461      2.11     1   Family and domain databases                
   KEGG                             8411536   8209781      0.25    12   Genome annotation databases                
   KO                               3346723   3331699      0.10    21   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    80   Organism-specific databases                
   Leproma                             1272      1270     <0.01    88   Organism-specific databases                
   MEROPS                            139047    139046     <0.01    51   Protein family/group databases             
   MGI                                51879     51415     <0.01    59   Organism-specific databases                
   MINT                                8529      8529     <0.01    74   Protein-protein interaction databases      
   NextBio                           211463    211459      0.01    43   Other                                      
   OMA                              4864679   4864466      0.14    18   Phylogenomic databases                     
   OrthoDB                           553412    553369      0.02    34   Phylogenomic databases                     
   PANTHER                          4896683   4613764      0.14    17   Family and domain databases                
   PATRIC                           8306795   8306678      0.24    13   Genome annotation databases                
   PDB                                19204     10750     <0.01    69   3D structure databases                     
   PDBsum                             18994     10576     <0.01    70   3D structure databases                     
   PIR                               172585    139752      0.01    47   Sequence databases                         
   PIRSF                            2997611   2994604      0.09    24   Family and domain databases                
   PMAP-CutDB                           211       211     <0.01    96   Other                                      
   PRIDE                             468047    468047      0.01    36   Proteomic databases                        
   PRINTS                           5016674   4485428      0.15    16   Family and domain databases                
   PROSITE                         16607774  11029053      0.49     6   Family and domain databases                
   Pathway_Interaction_DB                10         8     <0.01   106   Enzyme and pathway databases               
   PaxDb                              29746     29745     <0.01    65   Proteomic databases                        
   PeptideAtlas                         130       130     <0.01    99   Proteomic databases                        
   PeroxiBase                          2578      2570     <0.01    86   Protein family/group databases             
   Pfam                            32997724  24186964      0.97     4   Family and domain databases                
   PharmGKB                            3801      3801     <0.01    82   Organism-specific databases                
   PhosphoSite                         1135      1135     <0.01    89   PTM databases                              
   PhylomeDB                         148028    148028     <0.01    49   Phylogenomic databases                     
   PomBase                               40        27     <0.01   102   Organism-specific databases                
   PptaseDB                              36        35     <0.01   103   Protein family/group databases             
   ProDom                            669827    643658      0.02    31   Family and domain databases                
   ProMEX                              5409      5409     <0.01    79   Proteomic databases                        
   ProtClustDB                      2720024   2720024      0.08    26   Phylogenomic databases                     
   ProteinModelPortal               8499716   8499716      0.25    11   3D structure databases                     
   PseudoCAP                           4537      4531     <0.01    81   Organism-specific databases                
   REBASE                             34866     34862     <0.01    64   Protein family/group databases             
   REPRODUCTION-2DPAGE                   67        66     <0.01   100   2D gel databases                           
   RGD                                19671     18794     <0.01    68   Organism-specific databases                
   Reactome                             177       142     <0.01    97   Enzyme and pathway databases               
   RefSeq                           9358301   9093050      0.28     9   Sequence databases                         
   SABIO-RK                             505       505     <0.01    94   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   105   Organism-specific databases                
   SMART                            7341818   5564394      0.22    15   Family and domain databases                
   SMR                              2074915   2074915      0.06    27   3D structure databases                     
   STRING                           3030527   2963333      0.09    23   Protein-protein interaction databases      
   SUPFAM                          13631100  11211778      0.40     8   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   104   2D gel databases                           
   TAIR                               15462     15389     <0.01    72   Organism-specific databases                
   TCDB                                2381      2370     <0.01    87   Protein family/group databases             
   TIGRFAMs                         7812943   7136023      0.23    14   Family and domain databases                
   TubercuList                         1111      1110     <0.01    90   Organism-specific databases                
   UCSC                               60557     60399     <0.01    58   Genome annotation databases                
   UniGene                           536230    506184      0.02    35   Sequence databases                         
   UniPathway                       1603320   1492555      0.05    28   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    54   Genome annotation databases                
   World-2DPAGE                         673       668     <0.01    91   2D gel databases                           
   WormBase                           42170     42050     <0.01    63   Organism-specific databases                
   Xenbase                            25630     25569     <0.01    66   Organism-specific databases                
   ZFIN                               44455     44192     <0.01    61   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    76   Organism-specific databases                
   eggNOG                           2768815   2768795      0.08    25   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    55   Organism-specific databases                
   mycoCLAP                             422       422     <0.01    95   Protein family/group databases             

Number of explicitly cross-referenced databases: 128


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.68   Gln (Q) 3.98   Leu (L) 9.96   Ser (S) 6.62
   Arg (R) 5.44   Glu (E) 6.19   Lys (K) 5.24   Thr (T) 5.56
   Asn (N) 4.08   Gly (G) 7.10   Met (M) 2.47   Trp (W) 1.30
   Asp (D) 5.33   His (H) 2.20   Phe (F) 4.02   Tyr (Y) 3.03
   Cys (C) 1.23   Ile (I) 5.98   Pro (P) 4.66   Val (V) 6.79

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 615030
Total number of entries encoded on a Plasmid: 331490
Total number of entries encoded on a Plastid: 25982
Total number of entries encoded on a Plastid; Apicoplast: 719
Total number of entries encoded on a Plastid; Chloroplast: 224989
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 952