Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2013_02 STATISTICS


1.  INTRODUCTION

Release 2013_02 of 06-Feb-2013 of UniProtKB/TrEMBL contains 29769971 sequence entries,
comprising 9585856378 amino acids .

525038 sequences have been added since release 2013_01, the sequence data of
1205 existing entries has been updated and the annotations of
5240727 entries have been revised. This represents an increase of 2%.

Number of fragments: 3848870

Protein existence (PE):              entries      %
1: Evidence at protein level           19466     0.07%
2: Evidence at transcript level       656011     2.20%
3: Inferred from homology            6776080    22.76%
4: Predicted                        22318414    74.97%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 392101

   The first twenty species represent 1781212 sequences:     6 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:16458
                            2x:65220
                            3x:35391
                            4x:23634
                            5x:14855
                            6x:10778
                            7x: 8079
                            8x: 6316
                            9x: 5107
                           10x:10004
                       11- 20x:26212
                       21- 50x: 9127
                       51-100x: 3524
                         >100x: 9268


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     505379  Human immunodeficiency virus 1
       2     182291  uncultured bacterium
       3     113265  Homo sapiens (Human)
       4      96948  Oryza sativa subsp. japonica (Rice)
       5      82817  Hepatitis C virus
       6      73727  Glycine max (Soybean) (Glycine hispida)
       7      68971  Macaca mulatta (Rhesus macaque)
       8      60447  Zea mays (Maize)
       9      58294  Mus musculus (Mouse)
      10      56286  Hepatitis B virus (HBV)
      11      56117  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      12      54200  Danio rerio (Zebrafish) (Brachydanio rerio)
      13      54091  Vitis vinifera (Grape)
      14      50594  Trichomonas vaginalis
      15      49231  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      16      48878  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      17      44541  Populus trichocarpa (Western balsam poplar) 
      18      43144  Callithrix jacchus (White-tufted-ear marmoset)
      19      42141  Arabidopsis thaliana (Mouse-ear cress)
      20      39850  Paramecium tetraurelia
      21      39807  Oryza sativa subsp. indica (Rice)
      22      39293  Setaria italica (Foxtail millet) (Panicum italicum)
      23      38163  human gut metagenome
      24      35889  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      25      35602  Ailuropoda melanoleuca (Giant panda)
      26      35193  Acyrthosiphon pisum (Pea aphid)
      27      35066  Caenorhabditis japonica
      28      34802  Physcomitrella patens subsp. patens (Moss)
      29      34453  Thalassiosira oceanica (Marine diatom)
      30      34230  Drosophila melanogaster (Fruit fly)
      31      33915  Rattus norvegicus (Rat)
      32      33778  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      33      33267  Selaginella moellendorffii (Spikemoss)
      34      32769  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      35      32339  Oryza brachyantha
      36      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      37      32093  Oryza glaberrima (African rice)
      38      31833  Pan troglodytes (Chimpanzee)
      39      31722  Sus scrofa (Pig)
      40      31397  Ricinus communis (Castor bean)
      41      30917  Daphnia pulex (Water flea)
      42      30300  Caenorhabditis brenneri (Nematode worm)
      43      30145  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      44      29815  Amphimedon queenslandica (Sponge)
      45      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      46      29315  Pristionchus pacificus
      47      29178  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      48      29053  Oikopleura dioica (Tunicate)
      49      28400  Escherichia coli
      50      28351  Simian immunodeficiency virus (SIV)
      51      28242  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      52      28236  Canis familiaris (Dog) (Canis lupus familiaris)
      53      28055  Gasterosteus aculeatus (Three-spined stickleback)
      54      27687  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      55      27491  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      56      27089  Gorilla gorilla gorilla (Lowland gorilla)
      57      26818  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      58      26795  Gallus gallus (Chicken)
      59      25900  Oryzias latipes (Medaka fish) (Japanese ricefish)
      60      25758  Loxodonta africana (African elephant)
      61      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      62      25424  Bos taurus (Bovine)
      63      25081  Oryctolagus cuniculus (Rabbit)
      64      24881  Nematostella vectensis (Starlet sea anemone)
      65      24643  Tetrahymena thermophila (strain SB210)
      66      24200  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      67      24061  Equus caballus (Horse)
      68      23714  Ornithorhynchus anatinus (Duckbill platypus)
      69      23565  Oxytricha trifallax
      70      23225  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      71      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      72      22715  Monodelphis domestica (Gray short-tailed opossum)
      73      22561  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      74      22502  Caenorhabditis elegans
      75      22304  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      76      22163  gut metagenome
      77      21821  Latimeria chalumnae (West Indian ocean coelacanth)
      78      21727  Hordeum vulgare var. distichum (Two-rowed barley)
      79      21546  Heterocephalus glaber (Naked mole rat)
      80      21339  Caenorhabditis briggsae
      81      21086  Ixodes scapularis (Black-legged tick) (Deer tick)
      82      20734  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
      83      20510  Myotis lucifugus (Little brown bat)
      84      20130  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      85      20114  Ciona savignyi (Pacific transparent sea squirt)
      86      20069  Cavia porcellus (Guinea pig)
      87      19969  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
      88      19671  Taeniopygia guttata (Zebra finch) (Poephila guttata)
      89      19438  Wuchereria bancrofti
      90      19331  Toxoplasma gondii
      91      19200  Trypanosoma cruzi (strain CL Brener)
      92      18988  Anolis carolinensis (Green anole) (American chameleon)
      93      18936  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      94      18828  Drosophila simulans (Fruit fly)
      95      18771  mine drainage metagenome
      96      18537  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
      97      18121  Atta cephalotes (Leafcutter ant)
      98      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
      99      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
     100      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     101      17388  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     102      17381  Bombyx mori (Silk moth)
     103      17277  Nasonia vitripennis (Parasitic wasp)
     104      17031  Drosophila yakuba (Fruit fly)
     105      17015  Tribolium castaneum (Red flour beetle)
     106      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     107      16871  Meleagris gallopavo (Common turkey)
     108      16714  Drosophila persimilis (Fruit fly)
     109      16643  Fusarium oxysporum f. sp. lycopersici  
     110      16475  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     111      16426  Ectocarpus siliculosus (Brown alga)
     112      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     113      16306  Danaus plexippus (Monarch butterfly)
     114      16263  Trichinella spiralis (Trichina worm)
     115      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     116      16187  Drosophila sechellia (Fruit fly)
     117      16141  Schistosoma japonicum (Blood fluke)
     118      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     119      15930  Hepatitis C virus subtype 1b
     120      15816  Plasmodium falciparum
     121      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     122      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     123      15715  Naegleria gruberi (Amoeba)
     124      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     125      15647  Anopheles gambiae (African malaria mosquito)
     126      15566  Phytophthora ramorum (Sudden oak death agent)
     127      15420  Drosophila willistoni (Fruit fly)
     128      15354  Loa loa (Eye worm) (Filaria loa)
     129      15225  Pythium ultimum
     130      15173  Hepatitis C virus subtype 1a
     131      15143  Drosophila ananassae (Fruit fly)
     132      15038  Harpegnathos saltator (Jerdon's jumping ant)
     133      14927  Drosophila erecta (Fruit fly)
     134      14852  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     135      14800  Camponotus floridanus (Florida carpenter ant)
     136      14788  Drosophila mojavensis (Fruit fly)
     137      14701  Drosophila virilis (Fruit fly)
     138      14697  Plasmodium chabaudi
     139      14650  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     140      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     141      14417  Volvox carteri (Green alga)
     142      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     143      14336  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     144      14260  Ralstonia solanacearum (Pseudomonas solanacearum)
     145      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     146      13966  Acromyrmex echinatior (Panamanian leafcutter ant) 
     147      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     148      13864  Clonorchis sinensis (Chinese liver fluke)
     149      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     150      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     151      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     152      13538  Trypanosoma cruzi
     153      13346  Aspergillus flavus 
     154      13266  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     155      13186  Mustela putorius furo (European domestic ferret) (Mustela furo)
     156      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     157      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     158      12983  Albugo laibachii Nc14
     159      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     160      12950  Stigmatella aurantiaca (strain DW4/3-1)
     161      12900  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     162      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     163      12696  Trypanosoma congolense (strain IL3000)
     164      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     165      12675  Schistosoma mansoni (Blood fluke)
     166      12609  Xenopus laevis (African clawed frog)
     167      12466  uncultured archaeon
     168      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     169      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     170      12440  Polysphondylium pallidum (Cellular slime mold)
     171      12407  Rabies virus
     172      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     173      12352  Dictyostelium purpureum (Slime mold)
     174      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     175      12043  Porcine reproductive and respiratory syndrome virus (PRRSV)
     176      12006  Helicobacter pylori (Campylobacter pylori)
     177      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     178      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     179      11944  Emericella nidulans  
     180      11915  Apis mellifera (Honeybee)
     181      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     182      11780  Piriformospora indica (strain DSM 11827)
     183      11752  Chondrocladia sp. SMF<DEU
     184      11751  Cladorhiza sp. SMF<DEU
     185      11750  Abyssocladia sp. SMF<DEU
     186      11726  Phelloderma sp. SMF<DEU
     187      11716  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     188      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     189      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     190      11674  Anopheles darlingi (Mosquito)
     191      11644  Plasmodium berghei (strain Anka)
     192      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     193      11566  Trichoplax adhaerens (Trichoplax reptans)
     194      11557  Trypanosoma vivax (strain Y486)
     195      11515  Puccinia triticina (isolate 1-1 / race 1 (BBBD)) (Brown leaf rust fungus)
     196      11514  Aureococcus anophagefferens (Harmful bloom alga)
     197      11499  Brugia malayi (Filarial nematode worm)
     198      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     199      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     200      11396  Aspergillus oryzae (strain 3.042) (Yellow koji mold)
     201      11278  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     202      11211  Ktedonobacter racemifer DSM 44963
     203      11211  Agaricus bisporus var. burnettii (strain JB137-S8 / ATCC MYA-4627 / FGSC 10392) 
     204      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     205      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     206      10964  Streptomyces clavuligerus 
     207      10949  Aspergillus niger 
     208      10839  Pediculus humanus subsp. corporis (Body louse)
     209      10822  Chaetomium globosum  
     210      10570  Metarhizium anisopliae (strain ARSEF 23 / ATCC MYA-3075)
     211      10563  Amycolatopsis mediterranei S699
     212      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     213      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     214      10397  Agaricus bisporus var. bisporus (strain H97 / ATCC MYA-4626 / FGSC 10389) 
     215      10393  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     216      10387  Pseudomonas syringae pv. glycinea str. race 4
     217      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     218      10361  Beauveria bassiana (strain ARSEF 2860) (White muscardine disease fungus) 
     219      10354  Phaeodactylum tricornutum (strain CCAP 1055/1)
     220      10273  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
     221      10221  Shigella flexneri 1235-66
     222      10216  Burkholderia terrae BS001
     223      10204  Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 
     224      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
     225      10170  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     226      10137  Klebsiella pneumoniae
     227      10127  Trypanosoma cruzi marinkellei
     228      10113  Burkholderia sp. BT03
     229      10109  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
     230      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
     231      10087  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
     232      10051  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
     233      10034  Marssonina brunnea f. sp. multigermtubi (strain MB_m1) 
     234      10013  Streptomyces bingchenggensis (strain BCW-1)
     235       9846  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
     236       9836  Chlorella variabilis (Green alga)
     237       9822  Metarhizium acridum (strain CQMa 102)
     238       9799  Coccomyxa subellipsoidea C-169
     239       9760  Thielavia terrestris (strain ATCC 38088 / NRRL 8126) (Acremonium alabamense)
     240       9722  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
     241       9662  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
     242       9651  Cordyceps militaris (strain CM01) (Caterpillar fungus)
     243       9597  Streptomyces cattleya 
     244       9533  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
     245       9513  Salmo salar (Atlantic salmon)
     246       9510  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
     247       9485  Ajellomyces dermatitidis (strain ER-3 / ATCC MYA-2586) 
     248       9483  Coccidioides immitis (strain RS) (Valley fever fungus)
     249       9443  Ajellomyces capsulata (strain H88) (Darling's disease fungus) 
     250       9391  Exophiala dermatitidis (strain ATCC 34100 / CBS 525.76 / NIH/UT8656)  


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          408998 (  1%)
    Bacteria       20754270 ( 70%)
    Eukaryota       6930174 ( 23%)
    Viruses         1573917 (  5%)
    Other            102611 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 113301 (  2%)           (  0%)
     Other Mammalia        854362 ( 12%)           (  3%)
     Other Vertebrata      755854 ( 11%)           (  3%)
     Viridiplantae        1350138 ( 19%)           (  5%)
     Fungi                1539503 ( 22%)           (  5%)
     Insecta               785777 ( 11%)           (  3%)
     Nematoda              252719 (  4%)           (  1%)
     Other                1278520 ( 18%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  775042             1001-1100   171622
                 51- 100 2553462             1101-1200   120222
                101- 150 2849472             1201-1300    84451
                151- 200 2763465             1301-1400    53705
                201- 250 2778653             1401-1500    43490
                251- 300 2691417             1501-1600    30043
                301- 350 2444006             1601-1700    22850
                351- 400 1852310             1701-1800    17324
                401- 450 1597169             1801-1900    14355
                451- 500 1309991             1901-2000    12257
                501- 550  865799             2001-2100     9626
                551- 600  667225             2101-2200     9840
                601- 650  487626             2201-2300     7678
                651- 700  383051             2301-2400     6149
                701- 750  322876             2401-2500     5249
                751- 800  284678             >2500        42521
                801- 850  216967
                851- 900  193438
                901- 950  134444
                951-1000   98628

   


   The average sequence length in UniProtKB/TrEMBL is   321 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    36397719                1.22                                                    
   Submitted to EMBL/GenBank/DDBJ  19948537  18410901      0.67                                                    
   Journal                         14943652  14067688      0.50                                                    
   Submitted to other databases     1488771   1479518      0.05                                                    
   Thesis                             10147     10089     <0.01                                                    
   Book citation                       6592      6543     <0.01                                                    
   Unpublished observations              19        19     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 459053


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      36781758                1.24                                                    
   CATALYTIC ACTIVITY               3149433   2864798      0.11     4                                              
   CAUTION                         13759836  13759572      0.46     1                                              
   COFACTOR                         1196474   1106504      0.04     8                                              
   DOMAIN                            122367    117394     <0.01     9                                              
   FUNCTION                         3466943   3237072      0.12     3                                              
   INTERACTION                          687       687     <0.01    11                                              
   MISCELLANEOUS                      85682     85585     <0.01    10                                              
   PATHWAY                          1555678   1417623      0.05     7                                              
   SIMILARITY                       8932250   7753761      0.30     2                                              
   SUBCELLULAR LOCATION             2795930   2669515      0.09     5                                              
   SUBUNIT                          1716478   1697346      0.06     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       7480261                0.25                                                    
   CHAIN                             792979    650192      0.03     2                                              
   NON_TER                          6063641   3849547      0.20     1                                              
   SIGNAL                            622774    619488      0.02     3                                              
   TRANSIT                              867       867     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             331667139               11.14                                                    
   AGD                                 2525      2525     <0.01    86   Organism-specific databases                
   ANU-2DPAGE                            52        52     <0.01   102   2D gel databases                           
   Allergome                           3231      2620     <0.01    82   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   101   Organism-specific databases                
   ArrayExpress                       86676     86676     <0.01    52   Gene expression databases                  
   BRENDA                              2672      2643     <0.01    84   Enzyme and pathway databases               
   Bgee                              117662    117662     <0.01    49   Gene expression databases                  
   BioCyc                           3255992   3220393      0.11    20   Enzyme and pathway databases               
   CAZy                               74116     69638     <0.01    56   Protein family/group databases             
   CGD                                 7064      7064     <0.01    77   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   108   2D gel databases                           
   CTD                               321285    319727      0.01    38   Organism-specific databases                
   ChEMBL                               576       576     <0.01    93   Other                                      
   ConoServer                           160       160     <0.01    97   Organism-specific databases                
   DIP                                 2833      2828     <0.01    83   Protein-protein interaction databases      
   DNASU                              43414     43080     <0.01    61   Protocols and materials databases          
   EMBL                            32643142  28926578      1.10     3   Sequence databases                         
   Ensembl                           957191    941531      0.03    29   Genome annotation databases                
   EnsemblBacteria                   834763    800682      0.03    30   Genome annotation databases                
   EnsemblFungi                      262813    261328      0.01    41   Genome annotation databases                
   EnsemblMetazoa                    628399    613213      0.02    32   Genome annotation databases                
   EnsemblPlants                     425129    405612      0.01    37   Genome annotation databases                
   EnsemblProtists                   126332    124845     <0.01    48   Genome annotation databases                
   EuPathDB                          147099    146647     <0.01    46   Organism-specific databases                
   EvolutionaryTrace                   8168      8168     <0.01    75   Other                                      
   FlyBase                           196628    195161      0.01    43   Organism-specific databases                
   GO                              59783168  18457280      2.01     2   Ontologies                                 
   Gene3D                          12537967   9979493      0.42     6   Family and domain databases                
   GeneID                           8847495   8634018      0.30     9   Genome annotation databases                
   GeneTree                          814814    814762      0.03    31   Phylogenomic databases                     
   Genevestigator                     93166     93158     <0.01    51   Gene expression databases                  
   GenoList                           14735     14462     <0.01    73   Organism-specific databases                
   GenomeRNAi                         20921     20921     <0.01    67   Other                                      
   GenomeReviews                    4251448   4152650      0.14    16   Genome annotation databases                
   Gramene                            67599     67599     <0.01    57   Organism-specific databases                
   H-InvDB                              625       477     <0.01    92   Organism-specific databases                
   HAMAP                            2915102   2879087      0.10    22   Family and domain databases                
   HGNC                               48479     48400     <0.01    59   Organism-specific databases                
   HOGENOM                          3658375   3658331      0.12    19   Phylogenomic databases                     
   HOVERGEN                          310831    310820      0.01    39   Phylogenomic databases                     
   HSSP                              250582    250355      0.01    42   3D structure databases                     
   IPI                               308397    307772      0.01    40   Sequence databases                         
   InParanoid                        189414    189414      0.01    44   Phylogenomic databases                     
   IntAct                             16838     16838     <0.01    71   Protein-protein interaction databases      
   InterPro                        63130105  22741531      2.12     1   Family and domain databases                
   KEGG                             8080735   7880559      0.27    11   Genome annotation databases                
   KO                               3175198   3161289      0.11    21   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    79   Organism-specific databases                
   Leproma                             1272      1270     <0.01    89   Organism-specific databases                
   MEROPS                             81106     81105     <0.01    53   Protein family/group databases             
   MGI                                35034     34596     <0.01    63   Organism-specific databases                
   MINT                                8586      8586     <0.01    74   Protein-protein interaction databases      
   NextBio                           103415    103121     <0.01    50   Other                                      
   OMA                              3888531   3888237      0.13    18   Phylogenomic databases                     
   OrthoDB                           556956    556920      0.02    34   Phylogenomic databases                     
   PANTHER                          4181125   3953196      0.14    17   Family and domain databases                
   PATRIC                           8310218   8310111      0.28    10   Genome annotation databases                
   PDB                                18361     10284     <0.01    68   3D structure databases                     
   PDBsum                             18243     10177     <0.01    69   3D structure databases                     
   PHCI-2DPAGE                           99        99     <0.01    99   2D gel databases                           
   PIR                               173603    140764      0.01    45   Sequence databases                         
   PIRSF                            2514399   2513725      0.08    26   Family and domain databases                
   PMAP-CutDB                           213       213     <0.01    95   Other                                      
   PMMA-2DPAGE                            2         2     <0.01   109   2D gel databases                           
   PRIDE                             479818    479818      0.02    36   Proteomic databases                        
   PRINTS                           4465650   3970455      0.15    15   Family and domain databases                
   PROSITE                         14601107   9681571      0.49     5   Family and domain databases                
   Pathway_Interaction_DB                11         9     <0.01   107   Enzyme and pathway databases               
   PaxDb                              16952     16952     <0.01    70   Proteomic databases                        
   PeptideAtlas                         141       141     <0.01    98   Proteomic databases                        
   PeroxiBase                          2558      2550     <0.01    85   Protein family/group databases             
   Pfam                            28766781  21127520      0.97     4   Family and domain databases                
   PharmGKB                            4118      4118     <0.01    81   Organism-specific databases                
   PhosphoSite                         1164      1164     <0.01    90   PTM databases                              
   PhylomeDB                         144585    144585     <0.01    47   Phylogenomic databases                     
   PomBase                               40        27     <0.01   103   Organism-specific databases                
   PptaseDB                              36        34     <0.01   104   Protein family/group databases             
   ProDom                            572548    547775      0.02    33   Family and domain databases                
   ProMEX                              5656      5656     <0.01    78   Proteomic databases                        
   ProtClustDB                      2720650   2720650      0.09    24   Phylogenomic databases                     
   ProteinModelPortal               7745426   7745426      0.26    12   3D structure databases                     
   PseudoCAP                           4539      4533     <0.01    80   Organism-specific databases                
   REBASE                             33355     33352     <0.01    64   Protein family/group databases             
   REPRODUCTION-2DPAGE                   83        82     <0.01   100   2D gel databases                           
   RGD                                24750     24427     <0.01    66   Organism-specific databases                
   Reactome                             209       179     <0.01    96   Enzyme and pathway databases               
   RefSeq                           8887742   8644506      0.30     8   Sequence databases                         
   SGD                                   11        11     <0.01   106   Organism-specific databases                
   SMART                            6506398   4931381      0.22    14   Family and domain databases                
   SMR                              1667325   1667325      0.06    27   3D structure databases                     
   STRING                           2587837   2587837      0.09    25   Protein-protein interaction databases      
   SUPFAM                          12023261   9889541      0.40     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   105   2D gel databases                           
   Siena-2DPAGE                           2         2     <0.01   110   2D gel databases                           
   TAIR                               15667     15591     <0.01    72   Organism-specific databases                
   TCDB                                2388      2376     <0.01    87   Protein family/group databases             
   TIGRFAMs                         6646975   6061703      0.22    13   Family and domain databases                
   TubercuList                         1976      1971     <0.01    88   Organism-specific databases                
   UCSC                               63802     63649     <0.01    58   Genome annotation databases                
   UniGene                           544426    513173      0.02    35   Sequence databases                         
   UniPathway                       1521780   1416138      0.05    28   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    54   Genome annotation databases                
   World-2DPAGE                         675       670     <0.01    91   2D gel databases                           
   WormBase                           42290     42172     <0.01    62   Organism-specific databases                
   Xenbase                            25493     25371     <0.01    65   Organism-specific databases                
   ZFIN                               45974     45712     <0.01    60   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    76   Organism-specific databases                
   eggNOG                           2770702   2770681      0.09    23   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    55   Organism-specific databases                
   mycoCLAP                             411       411     <0.01    94   Protein family/group databases             

Number of explicitly cross-referenced databases: 136


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.63   Gln (Q) 3.97   Leu (L) 9.92   Ser (S) 6.65
   Arg (R) 5.42   Glu (E) 6.18   Lys (K) 5.29   Thr (T) 5.56
   Asn (N) 4.11   Gly (G) 7.08   Met (M) 2.47   Trp (W) 1.30
   Asp (D) 5.32   His (H) 2.20   Phe (F) 4.03   Tyr (Y) 3.05
   Cys (C) 1.24   Ile (I) 6.00   Pro (P) 4.67   Val (V) 6.77

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 584715
Total number of entries encoded on a Plasmid: 315657
Total number of entries encoded on a Plastid: 24866
Total number of entries encoded on a Plastid; Apicoplast: 701
Total number of entries encoded on a Plastid; Chloroplast: 214738
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 927