Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2013_03 STATISTICS


1.  INTRODUCTION

Release 2013_03 of 06-Mar-2013 of UniProtKB/TrEMBL contains 32153798 sequence entries,
comprising 10331927364 amino acids .

2425909 sequences have been added since release 2013_02, the sequence data of
303 existing entries has been updated and the annotations of
14996955 entries have been revised. This represents an increase of 8%.

Number of fragments: 3947047

Protein existence (PE):              entries      %
1: Evidence at protein level           19760     0.06%
2: Evidence at transcript level       713680     2.22%
3: Inferred from homology            7119059    22.14%
4: Predicted                        24301299    75.58%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 397892

   The first twenty species represent 1789417 sequences:   5.6 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:16648
                            2x:65967
                            3x:35756
                            4x:24165
                            5x:15090
                            6x:10950
                            7x: 8205
                            8x: 6407
                            9x: 5169
                           10x:10159
                       11- 20x:26764
                       21- 50x: 9299
                       51-100x: 3610
                         >100x: 9868


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     515480  Human immunodeficiency virus 1
       2     183711  uncultured bacterium
       3     112364  Homo sapiens (Human)
       4      96935  Oryza sativa subsp. japonica (Rice)
       5      83508  Hepatitis C virus
       6      73734  Glycine max (Soybean) (Glycine hispida)
       7      68976  Macaca mulatta (Rhesus macaque)
       8      60451  Zea mays (Maize)
       9      57439  Hepatitis B virus (HBV)
      10      56514  Mus musculus (Mouse)
      11      56117  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      12      54093  Vitis vinifera (Grape)
      13      51887  Danio rerio (Zebrafish) (Brachydanio rerio)
      14      50601  Trichomonas vaginalis
      15      49235  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      16      48878  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      17      44560  Populus trichocarpa (Western balsam poplar) 
      18      43164  Callithrix jacchus (White-tufted-ear marmoset)
      19      41920  Arabidopsis thaliana (Mouse-ear cress)
      20      39850  Paramecium tetraurelia
      21      39811  Oryza sativa subsp. indica (Rice)
      22      39293  Setaria italica (Foxtail millet) (Panicum italicum)
      23      38163  human gut metagenome
      24      35889  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      25      35602  Ailuropoda melanoleuca (Giant panda)
      26      35193  Acyrthosiphon pisum (Pea aphid)
      27      35066  Caenorhabditis japonica
      28      34807  Physcomitrella patens subsp. patens (Moss)
      29      34569  Thalassiosira oceanica (Marine diatom)
      30      34509  Drosophila melanogaster (Fruit fly)
      31      33924  Rattus norvegicus (Rat)
      32      33778  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      33      33267  Selaginella moellendorffii (Spikemoss)
      34      32769  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      35      32339  Oryza brachyantha
      36      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      37      32093  Oryza glaberrima (African rice)
      38      31835  Pan troglodytes (Chimpanzee)
      39      31747  Sus scrofa (Pig)
      40      31397  Ricinus communis (Castor bean)
      41      30918  Daphnia pulex (Water flea)
      42      30300  Caenorhabditis brenneri (Nematode worm)
      43      30145  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      44      29815  Amphimedon queenslandica (Sponge)
      45      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      46      29316  Pristionchus pacificus
      47      29178  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      48      29053  Oikopleura dioica (Tunicate)
      49      28836  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      50      28504  Escherichia coli
      51      28448  Canis familiaris (Dog) (Canis lupus familiaris)
      52      28351  Simian immunodeficiency virus (SIV)
      53      28056  Gasterosteus aculeatus (Three-spined stickleback)
      54      27723  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      55      27498  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      56      27413  Equus caballus (Horse)
      57      27089  Gorilla gorilla gorilla (Lowland gorilla)
      58      26820  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      59      26805  Gallus gallus (Chicken)
      60      25904  Oryzias latipes (Medaka fish) (Japanese ricefish)
      61      25758  Loxodonta africana (African elephant)
      62      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      63      25423  Bos taurus (Bovine)
      64      25081  Oryctolagus cuniculus (Rabbit)
      65      24899  Nematostella vectensis (Starlet sea anemone)
      66      24643  Tetrahymena thermophila (strain SB210)
      67      24590  Guillardia theta CCMP2712
      68      24200  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      69      23715  Ornithorhynchus anatinus (Duckbill platypus)
      70      23565  Oxytricha trifallax
      71      23227  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      72      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      73      22715  Monodelphis domestica (Gray short-tailed opossum)
      74      22561  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      75      22453  Caenorhabditis elegans
      76      22304  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      77      22163  gut metagenome
      78      21821  Latimeria chalumnae (West Indian ocean coelacanth)
      79      21734  Hordeum vulgare var. distichum (Two-rowed barley)
      80      21546  Heterocephalus glaber (Naked mole rat)
      81      21339  Caenorhabditis briggsae
      82      21086  Ixodes scapularis (Black-legged tick) (Deer tick)
      83      20854  Myotis lucifugus (Little brown bat)
      84      20735  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
      85      20130  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      86      20114  Ciona savignyi (Pacific transparent sea squirt)
      87      20072  Cavia porcellus (Guinea pig)
      88      19969  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
      89      19673  Taeniopygia guttata (Zebra finch) (Poephila guttata)
      90      19544  Pteropus alecto (Black flying fox)
      91      19438  Wuchereria bancrofti
      92      19331  Toxoplasma gondii
      93      19258  Anolis carolinensis (Green anole) (American chameleon)
      94      19200  Trypanosoma cruzi (strain CL Brener)
      95      18936  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      96      18847  Drosophila simulans (Fruit fly)
      97      18771  mine drainage metagenome
      98      18591  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
      99      18121  Atta cephalotes (Leafcutter ant)
     100      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     101      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
     102      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     103      17477  Bombyx mori (Silk moth)
     104      17393  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     105      17278  Nasonia vitripennis (Parasitic wasp)
     106      17039  Drosophila yakuba (Fruit fly)
     107      17022  Tribolium castaneum (Red flour beetle)
     108      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     109      16871  Meleagris gallopavo (Common turkey)
     110      16714  Drosophila persimilis (Fruit fly)
     111      16643  Fusarium oxysporum f. sp. lycopersici  
     112      16475  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     113      16426  Ectocarpus siliculosus (Brown alga)
     114      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     115      16317  Hepatitis C virus subtype 1b
     116      16306  Danaus plexippus (Monarch butterfly)
     117      16263  Trichinella spiralis (Trichina worm)
     118      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     119      16187  Drosophila sechellia (Fruit fly)
     120      16142  Schistosoma japonicum (Blood fluke)
     121      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     122      15917  Plasmodium falciparum
     123      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     124      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     125      15716  Naegleria gruberi (Amoeba)
     126      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     127      15647  Anopheles gambiae (African malaria mosquito)
     128      15566  Phytophthora ramorum (Sudden oak death agent)
     129      15461  Myotis davidii (David's myotis)
     130      15420  Drosophila willistoni (Fruit fly)
     131      15371  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     132      15354  Loa loa (Eye worm) (Filaria loa)
     133      15225  Pythium ultimum
     134      15177  Hepatitis C virus subtype 1a
     135      15143  Drosophila ananassae (Fruit fly)
     136      15038  Harpegnathos saltator (Jerdon's jumping ant)
     137      14927  Drosophila erecta (Fruit fly)
     138      14852  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     139      14800  Camponotus floridanus (Florida carpenter ant)
     140      14788  Drosophila mojavensis (Fruit fly)
     141      14713  Plasmodium chabaudi
     142      14701  Drosophila virilis (Fruit fly)
     143      14650  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     144      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     145      14417  Volvox carteri (Green alga)
     146      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     147      14337  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     148      14261  Ralstonia solanacearum (Pseudomonas solanacearum)
     149      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     150      13966  Acromyrmex echinatior (Panamanian leafcutter ant) 
     151      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     152      13865  Clonorchis sinensis (Chinese liver fluke)
     153      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     154      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     155      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     156      13540  Trypanosoma cruzi
     157      13346  Aspergillus flavus 
     158      13266  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     159      13186  Mustela putorius furo (European domestic ferret) (Mustela furo)
     160      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     161      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     162      12983  Albugo laibachii Nc14
     163      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     164      12950  Stigmatella aurantiaca (strain DW4/3-1)
     165      12900  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     166      12858  Magnaporthe oryzae Y34
     167      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     168      12711  Magnaporthe oryzae P131
     169      12696  Trypanosoma congolense (strain IL3000)
     170      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     171      12679  Schistosoma mansoni (Blood fluke)
     172      12648  uncultured archaeon
     173      12613  Xenopus laevis (African clawed frog)
     174      12596  Rabies virus
     175      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     176      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     177      12440  Polysphondylium pallidum (Cellular slime mold)
     178      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     179      12352  Dictyostelium purpureum (Slime mold)
     180      12179  Porcine reproductive and respiratory syndrome virus (PRRSV)
     181      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     182      12019  Helicobacter pylori (Campylobacter pylori)
     183      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     184      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     185      11944  Emericella nidulans  
     186      11932  Apis mellifera (Honeybee)
     187      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     188      11780  Piriformospora indica (strain DSM 11827)
     189      11752  Chondrocladia sp. SMF<DEU
     190      11751  Cladorhiza sp. SMF<DEU
     191      11750  Abyssocladia sp. SMF<DEU
     192      11726  Phelloderma sp. SMF<DEU
     193      11716  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     194      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     195      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     196      11674  Anopheles darlingi (Mosquito)
     197      11644  Plasmodium berghei (strain Anka)
     198      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     199      11566  Trichoplax adhaerens (Trichoplax reptans)
     200      11557  Trypanosoma vivax (strain Y486)
     201      11515  Puccinia triticina (isolate 1-1 / race 1 (BBBD)) (Brown leaf rust fungus)
     202      11514  Aureococcus anophagefferens (Harmful bloom alga)
     203      11499  Brugia malayi (Filarial nematode worm)
     204      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     205      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     206      11396  Aspergillus oryzae (strain 3.042) (Yellow koji mold)
     207      11278  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     208      11211  Ktedonobacter racemifer DSM 44963
     209      11211  Agaricus bisporus var. burnettii (strain JB137-S8 / ATCC MYA-4627 / FGSC 10392) 
     210      11205  Rhipicephalus pulchellus
     211      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     212      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     213      10964  Streptomyces clavuligerus 
     214      10949  Aspergillus niger 
     215      10839  Pediculus humanus subsp. corporis (Body louse)
     216      10822  Chaetomium globosum  
     217      10570  Metarhizium anisopliae (strain ARSEF 23 / ATCC MYA-3075)
     218      10563  Amycolatopsis mediterranei S699
     219      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     220      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     221      10397  Agaricus bisporus var. bisporus (strain H97 / ATCC MYA-4626 / FGSC 10389) 
     222      10393  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     223      10389  Klebsiella pneumoniae
     224      10387  Pseudomonas syringae pv. glycinea str. race 4
     225      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     226      10361  Beauveria bassiana (strain ARSEF 2860) (White muscardine disease fungus) 
     227      10354  Phaeodactylum tricornutum (strain CCAP 1055/1)
     228      10273  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
     229      10221  Shigella flexneri 1235-66
     230      10216  Burkholderia terrae BS001
     231      10204  Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 
     232      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
     233      10170  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     234      10127  Trypanosoma cruzi marinkellei
     235      10113  Burkholderia sp. BT03
     236      10109  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
     237      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
     238      10087  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
     239      10051  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
     240      10034  Marssonina brunnea f. sp. multigermtubi (strain MB_m1) 
     241      10033  Streptomyces turgidiscabies Car8
     242      10013  Streptomyces bingchenggensis (strain BCW-1)
     243       9846  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
     244       9836  Chlorella variabilis (Green alga)
     245       9822  Metarhizium acridum (strain CQMa 102)
     246       9799  Coccomyxa subellipsoidea C-169
     247       9760  Thielavia terrestris (strain ATCC 38088 / NRRL 8126) (Acremonium alabamense)
     248       9722  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
     249       9662  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
     250       9651  Cordyceps militaris (strain CM01) (Caterpillar fungus)


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          428746 (  1%)
    Bacteria       22935705 ( 71%)
    Eukaryota       7086829 ( 22%)
    Viruses         1599881 (  5%)
    Other            102636 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 112400 (  2%)           (  0%)
     Other Mammalia        892709 ( 13%)           (  3%)
     Other Vertebrata      761191 ( 11%)           (  2%)
     Viridiplantae        1355593 ( 19%)           (  4%)
     Fungi                1592955 ( 22%)           (  5%)
     Insecta               793663 ( 11%)           (  2%)
     Nematoda              252797 (  4%)           (  1%)
     Other                1325521 ( 19%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  828429             1001-1100   182907
                 51- 100 2785036             1101-1200   127616
                101- 150 3105405             1201-1300    90686
                151- 200 3016684             1301-1400    56716
                201- 250 3036066             1401-1500    46154
                251- 300 2937555             1501-1600    31790
                301- 350 2671707             1601-1700    23993
                351- 400 2015847             1701-1800    18104
                401- 450 1743838             1801-1900    15008
                451- 500 1432306             1901-2000    12724
                501- 550  936915             2001-2100    10011
                551- 600  720853             2101-2200    10299
                601- 650  525532             2201-2300     7911
                651- 700  414320             2301-2400     6345
                701- 750  347951             2401-2500     5482
                751- 800  306150             >2500        44388
                801- 850  234708
                851- 900  208744
                901- 950  144118
                951-1000  104453

   


   The average sequence length in UniProtKB/TrEMBL is   321 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    39198685                1.22                                                    
   Submitted to EMBL/GenBank/DDBJ  22325583  20456713      0.69                                                    
   Journal                         15365925  14479742      0.48                                                    
   Submitted to other databases     1490406   1480958      0.05                                                    
   Thesis                             10159     10101     <0.01                                                    
   Book citation                       6592      6543     <0.01                                                    
   Unpublished observations              19        19     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 459623


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      39569613                1.23                                                    
   CATALYTIC ACTIVITY               3273373   2985145      0.10     4                                              
   CAUTION                         15665965  15660420      0.49     1                                              
   COFACTOR                         1246082   1164224      0.04     8                                              
   DOMAIN                            122452    117484     <0.01     9                                              
   FUNCTION                         3593838   3364428      0.11     3                                              
   INTERACTION                         1065      1065     <0.01    11                                              
   MISCELLANEOUS                      86096     85999     <0.01    10                                              
   PATHWAY                          1620818   1477949      0.05     7                                              
   SIMILARITY                       9290895   8077324      0.29     2                                              
   SUBCELLULAR LOCATION             2894365   2767113      0.09     5                                              
   SUBUNIT                          1774664   1757870      0.06     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       7688541                0.24                                                    
   CHAIN                             829719    685330      0.03     2                                              
   NON_TER                          6201265   3947756      0.19     1                                              
   SIGNAL                            656690    653409      0.02     3                                              
   TRANSIT                              867       867     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             344026934               10.70                                                    
   Allergome                           3359      2739     <0.01    84   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   102   Organism-specific databases                
   ArrayExpress                      105814    105814     <0.01    51   Gene expression databases                  
   BRENDA                              2664      2635     <0.01    87   Enzyme and pathway databases               
   Bgee                              105257    105257     <0.01    52   Gene expression databases                  
   BindingDB                           6106      6106     <0.01    79   Other                                      
   BioCyc                           3255938   3220342      0.10    21   Enzyme and pathway databases               
   CAZy                               74069     69595     <0.01    57   Protein family/group databases             
   CGD                                 7064      7064     <0.01    78   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   108   2D gel databases                           
   CTD                               327728    326349      0.01    38   Organism-specific databases                
   ChEMBL                               578       578     <0.01    94   Other                                      
   ChiTaRS                            67815     67815     <0.01    58   Other                                      
   ConoServer                           160       160     <0.01    99   Organism-specific databases                
   DIP                                 2837      2832     <0.01    85   Protein-protein interaction databases      
   DNASU                              43046     42712     <0.01    63   Protocols and materials databases          
   EMBL                            35075584  31310271      1.09     3   Sequence databases                         
   Ensembl                           958835    943099      0.03    29   Genome annotation databases                
   EnsemblBacteria                   836197    801399      0.03    30   Genome annotation databases                
   EnsemblFungi                      262752    261267      0.01    41   Genome annotation databases                
   EnsemblMetazoa                    629729    614547      0.02    32   Genome annotation databases                
   EnsemblPlants                     424987    405481      0.01    37   Genome annotation databases                
   EnsemblProtists                   126332    124845     <0.01    50   Genome annotation databases                
   EuPathDB                          147099    146647     <0.01    48   Organism-specific databases                
   EvolutionaryTrace                   8158      8158     <0.01    76   Other                                      
   FlyBase                           196582    195115      0.01    44   Organism-specific databases                
   GO                              59810013  18424070      1.86     2   Ontologies                                 
   Gene3D                          13184726  10507701      0.41     6   Family and domain databases                
   GeneID                           8941795   8725377      0.28     9   Genome annotation databases                
   GeneTree                          800543    800486      0.02    31   Phylogenomic databases                     
   Genevestigator                     87506     87500     <0.01    54   Gene expression databases                  
   GenoList                           14734     14461     <0.01    74   Organism-specific databases                
   GenomeRNAi                         20337     20337     <0.01    68   Other                                      
   GenomeReviews                    4252728   4153213      0.13    18   Genome annotation databases                
   Gramene                           198346    198346      0.01    43   Organism-specific databases                
   H-InvDB                              624       476     <0.01    93   Organism-specific databases                
   HAMAP                            3086240   3048148      0.10    22   Family and domain databases                
   HGNC                               48055     47975     <0.01    61   Organism-specific databases                
   HOGENOM                          3657046   3657002      0.11    19   Phylogenomic databases                     
   HOVERGEN                          306726    306715      0.01    39   Phylogenomic databases                     
   HSSP                              250450    250238      0.01    42   3D structure databases                     
   IPI                               301966    301370      0.01    40   Sequence databases                         
   InParanoid                        187338    187338      0.01    45   Phylogenomic databases                     
   IntAct                             16903     16903     <0.01    71   Protein-protein interaction databases      
   InterPro                        65788333  23835487      2.05     1   Family and domain databases                
   KEGG                             8258120   8055511      0.26    12   Genome annotation databases                
   KO                               3257833   3243491      0.10    20   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    81   Organism-specific databases                
   Leproma                             1272      1270     <0.01    90   Organism-specific databases                
   MEROPS                            139170    139169     <0.01    49   Protein family/group databases             
   MGI                                51927     51464     <0.01    60   Organism-specific databases                
   MINT                                8557      8557     <0.01    75   Protein-protein interaction databases      
   NextBio                           101722    101447     <0.01    53   Other                                      
   OMA                              4868308   4868118      0.15    15   Phylogenomic databases                     
   OrthoDB                           555564    555532      0.02    34   Phylogenomic databases                     
   PANTHER                          4520971   4260626      0.14    17   Family and domain databases                
   PATRIC                           8310882   8310773      0.26    11   Genome annotation databases                
   PDB                                18332     10280     <0.01    70   3D structure databases                     
   PDBsum                             18501     10315     <0.01    69   3D structure databases                     
   PIR                               174195    141356      0.01    46   Sequence databases                         
   PIRSF                            2683968   2682164      0.08    25   Family and domain databases                
   PMAP-CutDB                           213       213     <0.01    98   Other                                      
   PRIDE                             467582    467582      0.01    36   Proteomic databases                        
   PRINTS                           4654865   4153382      0.14    16   Family and domain databases                
   PROSITE                         15290136  10148819      0.48     5   Family and domain databases                
   Pathway_Interaction_DB                11         9     <0.01   107   Enzyme and pathway databases               
   PaxDb                              16791     16791     <0.01    72   Proteomic databases                        
   PeptideAtlas                         134       134     <0.01   100   Proteomic databases                        
   PeroxiBase                          2570      2562     <0.01    88   Protein family/group databases             
   Pfam                            30239425  22229416      0.94     4   Family and domain databases                
   PharmGKB                            3967      3967     <0.01    83   Organism-specific databases                
   PhosphoSite                         1145      1145     <0.01    91   PTM databases                              
   PhylomeDB                         159392    159392     <0.01    47   Phylogenomic databases                     
   PomBase                               40        27     <0.01   103   Organism-specific databases                
   PptaseDB                              37        35     <0.01   104   Protein family/group databases             
   ProDom                            600096    574584      0.02    33   Family and domain databases                
   ProMEX                              5608      5608     <0.01    80   Proteomic databases                        
   ProtClustDB                      2721229   2721217      0.08    24   Phylogenomic databases                     
   ProteinModelPortal               8350867   8350867      0.26    10   3D structure databases                     
   PseudoCAP                           4537      4531     <0.01    82   Organism-specific databases                
   REBASE                             34039     34031     <0.01    65   Protein family/group databases             
   REPRODUCTION-2DPAGE                   71        70     <0.01   101   2D gel databases                           
   RGD                                24737     24414     <0.01    67   Organism-specific databases                
   Reactome                             219       188     <0.01    97   Enzyme and pathway databases               
   RefSeq                           8974767   8728686      0.28     8   Sequence databases                         
   SABIO-RK                             518       518     <0.01    95   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   106   Organism-specific databases                
   SMART                            6825356   5171792      0.21    14   Family and domain databases                
   SMR                              1665444   1665444      0.05    27   3D structure databases                     
   STRING                           2579435   2579435      0.08    26   Protein-protein interaction databases      
   SUPFAM                          12576188  10342743      0.39     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   105   2D gel databases                           
   TAIR                               15562     15486     <0.01    73   Organism-specific databases                
   TCDB                                2393      2381     <0.01    89   Protein family/group databases             
   TIGRFAMs                         6979252   6393994      0.22    13   Family and domain databases                
   TubercuList                         2699      2690     <0.01    86   Organism-specific databases                
   UCSC                               62246     62101     <0.01    59   Genome annotation databases                
   UniGene                           538706    508420      0.02    35   Sequence databases                         
   UniPathway                       1585034   1476465      0.05    28   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    55   Genome annotation databases                
   World-2DPAGE                         673       668     <0.01    92   2D gel databases                           
   WormBase                           42241     42123     <0.01    64   Organism-specific databases                
   Xenbase                            25695     25566     <0.01    66   Organism-specific databases                
   ZFIN                               44460     44200     <0.01    62   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    77   Organism-specific databases                
   eggNOG                           2768952   2768932      0.09    23   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    56   Organism-specific databases                
   mycoCLAP                             422       422     <0.01    96   Protein family/group databases             

Number of explicitly cross-referenced databases: 129


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.67   Gln (Q) 3.99   Leu (L) 9.96   Ser (S) 6.61
   Arg (R) 5.43   Glu (E) 6.17   Lys (K) 5.25   Thr (T) 5.56
   Asn (N) 4.10   Gly (G) 7.09   Met (M) 2.48   Trp (W) 1.30
   Asp (D) 5.31   His (H) 2.20   Phe (F) 4.02   Tyr (Y) 3.04
   Cys (C) 1.23   Ile (I) 6.00   Pro (P) 4.66   Val (V) 6.78

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 597746
Total number of entries encoded on a Plasmid: 327333
Total number of entries encoded on a Plastid: 25354
Total number of entries encoded on a Plastid; Apicoplast: 715
Total number of entries encoded on a Plastid; Chloroplast: 218133
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 928