Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2012_08 STATISTICS


1.  INTRODUCTION

Release 2012_08 of 05-Sep-2012 of UniProtKB/TrEMBL contains 23994583 sequence entries,
comprising 7812677847 amino acids .

845189 sequences have been added since release 2012_07, the sequence data of
578 existing entries has been updated and the annotations of
4890319 entries have been revised. This represents an increase of 3%.

Number of fragments: 3438105

Protein existence (PE):              entries      %
1: Evidence at protein level           13619     0.06%
2: Evidence at transcript level       609496     2.54%
3: Inferred from homology            5413442    22.56%
4: Predicted                        17958026    74.84%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 362185

   The first twenty species represent 1596818 sequences:   6.7 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:15143
                            2x:61722
                            3x:33439
                            4x:21281
                            5x:13743
                            6x: 9949
                            7x: 7606
                            8x: 5801
                            9x: 4680
                           10x: 9102
                       11- 20x:24168
                       21- 50x: 8377
                       51-100x: 3185
                         >100x: 7697


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     471654  Human immunodeficiency virus 1
       2     111086  Homo sapiens (Human)
       3      97002  Oryza sativa subsp. japonica (Rice)
       4      75518  uncultured bacterium
       5      71681  Hepatitis C virus
       6      68901  Macaca mulatta (Rhesus macaque)
       7      61391  Mus musculus (Mouse)
       8      61192  Glycine max (Soybean) (Glycine hispida)
       9      56103  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      10      54440  Danio rerio (Zebrafish) (Brachydanio rerio)
      11      54062  Vitis vinifera (Grape)
      12      53012  Hepatitis B virus (HBV)
      13      50556  Trichomonas vaginalis
      14      49226  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      15      48866  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      16      44070  Populus trichocarpa (Western balsam poplar) 
      17      43131  Callithrix jacchus (White-tufted-ear marmoset)
      18      42983  Arabidopsis thaliana (Mouse-ear cress)
      19      42094  Zea mays (Maize)
      20      39850  Paramecium tetraurelia
      21      39601  Oryza sativa subsp. indica (Rice)
      22      35599  Ailuropoda melanoleuca (Giant panda)
      23      34801  Physcomitrella patens subsp. patens (Moss)
      24      33915  Drosophila melanogaster (Fruit fly)
      25      33909  Rattus norvegicus (Rat)
      26      33756  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      27      33258  Selaginella moellendorffii (Spikemoss)
      28      33069  Sus scrofa (Pig)
      29      32922  Monodelphis domestica (Gray short-tailed opossum)
      30      32680  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      31      32093  Oryza glaberrima (African rice)
      32      32033  Caenorhabditis remanei (Caenorhabditis vulgaris)
      33      31388  Ricinus communis (Castor bean)
      34      30853  Daphnia pulex (Water flea)
      35      30300  Caenorhabditis brenneri (Nematode worm)
      36      30144  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      37      29816  Amphimedon queenslandica (Sponge)
      38      29445  Strongylocentrotus purpuratus (Purple sea urchin)
      39      29315  Pristionchus pacificus
      40      29166  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      41      29026  Oikopleura dioica (Tunicate)
      42      28861  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      43      28037  Gasterosteus aculeatus (Three-spined stickleback)
      44      28016  Bos taurus (Bovine)
      45      27895  Simian immunodeficiency virus (SIV)
      46      27885  Canis familiaris (Dog) (Canis lupus familiaris)
      47      27427  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      48      27086  Gorilla gorilla gorilla (Lowland gorilla)
      49      26871  Ornithorhynchus anatinus (Duckbill platypus)
      50      26726  Gallus gallus (Chicken)
      51      25888  Oryzias latipes (Medaka fish) (Japanese ricefish)
      52      25755  Loxodonta africana (African elephant)
      53      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      54      25438  Caenorhabditis japonica
      55      25066  Oryctolagus cuniculus (Rabbit)
      56      24843  Nematostella vectensis (Starlet sea anemone)
      57      24658  Escherichia coli
      58      24199  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      59      24158  Pongo abelii (Sumatran orangutan)
      60      24026  Equus caballus (Horse)
      61      23322  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      62      23220  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      63      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      64      22835  Pan troglodytes (Chimpanzee)
      65      22520  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      66      22405  Caenorhabditis elegans
      67      21821  Latimeria chalumnae (West Indian ocean coelacanth)
      68      21681  Hordeum vulgare var. distichum (Two-rowed barley)
      69      21546  Heterocephalus glaber (Naked mole rat)
      70      21339  Caenorhabditis briggsae
      71      21086  Ixodes scapularis (Black-legged tick) (Deer tick)
      72      20853  Myotis lucifugus (Little brown bat)
      73      20128  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      74      20113  Ciona savignyi (Pacific transparent sea squirt)
      75      20053  Cavia porcellus (Guinea pig)
      76      19972  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
      77      19654  Taeniopygia guttata (Zebra finch) (Poephila guttata)
      78      19246  Toxoplasma gondii
      79      19201  Trypanosoma cruzi (strain CL Brener)
      80      19151  Anolis carolinensis (Green anole) (American chameleon)
      81      19032  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
      82      18912  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      83      18771  mine drainage metagenome
      84      18632  Drosophila simulans (Fruit fly)
      85      18121  Atta cephalotes (Leafcutter ant)
      86      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
      87      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
      88      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
      89      17351  Bombyx mori (Silk moth)
      90      17031  Drosophila yakuba (Fruit fly)
      91      16999  Tribolium castaneum (Red flour beetle)
      92      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
      93      16856  Meleagris gallopavo (Common turkey)
      94      16769  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
      95      16712  Drosophila persimilis (Fruit fly)
      96      16451  Drosophila pseudoobscura pseudoobscura (Fruit fly)
      97      16425  Ectocarpus siliculosus (Brown alga)
      98      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
      99      16306  Loa loa (Eye worm) (Filaria loa)
     100      16303  Danaus plexippus (Monarch butterfly)
     101      16263  Trichinella spiralis (Trichina worm)
     102      16239  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
     103      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     104      16234  Colletotrichum higginsianum
     105      16190  Drosophila sechellia (Fruit fly)
     106      15794  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     107      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     108      15729  Hepatitis C virus subtype 1b
     109      15715  Naegleria gruberi (Amoeba)
     110      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     111      15620  Anopheles gambiae (African malaria mosquito)
     112      15557  Phytophthora ramorum (Sudden oak death agent)
     113      15418  Drosophila willistoni (Fruit fly)
     114      15254  Plasmodium falciparum
     115      15230  Tetrahymena thermophila (strain SB210)
     116      15142  Drosophila ananassae (Fruit fly)
     117      15036  Harpegnathos saltator (Jerdon's jumping ant)
     118      15004  Hepatitis C virus subtype 1a
     119      14922  Drosophila erecta (Fruit fly)
     120      14848  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     121      14796  Camponotus floridanus (Florida carpenter ant)
     122      14788  Drosophila mojavensis (Fruit fly)
     123      14700  Drosophila virilis (Fruit fly)
     124      14697  Plasmodium chabaudi
     125      14649  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     126      14417  Volvox carteri (Green alga)
     127      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     128      14336  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     129      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     130      13966  Acromyrmex echinatior (Panamanian leafcutter ant) 
     131      13863  Clonorchis sinensis (Chinese liver fluke)
     132      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     133      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     134      13329  Aspergillus flavus 
     135      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     136      13175  Mustela putorius furo (European domestic ferret) (Mustela furo)
     137      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     138      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     139      12983  Albugo laibachii Nc14
     140      12950  Stigmatella aurantiaca (strain DW4/3-1)
     141      12936  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     142      12935  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     143      12919  Trypanosoma cruzi
     144      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     145      12696  Trypanosoma congolense (strain IL3000)
     146      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     147      12600  Schistosoma mansoni (Blood fluke)
     148      12600  Xenopus laevis (African clawed frog)
     149      12547  Ralstonia solanacearum (Pseudomonas solanacearum)
     150      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     151      12440  Polysphondylium pallidum (Cellular slime mold)
     152      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     153      12352  Dictyostelium purpureum (Slime mold)
     154      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     155      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     156      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     157      11947  Emericella nidulans  
     158      11899  Apis mellifera (Honeybee)
     159      11853  Rabies virus
     160      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     161      11780  Piriformospora indica (strain DSM 11827)
     162      11715  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     163      11714  Helicobacter pylori (Campylobacter pylori)
     164      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     165      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     166      11669  Anopheles darlingi (Mosquito)
     167      11644  Plasmodium berghei (strain Anka)
     168      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     169      11562  Trichoplax adhaerens (Trichoplax reptans)
     170      11557  Trypanosoma vivax (strain Y486)
     171      11514  Aureococcus anophagefferens (Harmful bloom alga)
     172      11499  Brugia malayi (Filarial nematode worm)
     173      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     174      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     175      11314  Porcine reproductive and respiratory syndrome virus (PRRSV)
     176      11280  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     177      11211  Ktedonobacter racemifer DSM 44963
     178      11179  uncultured archaeon
     179      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     180      11042  Schistosoma japonicum (Blood fluke)
     181      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     182      10964  Streptomyces clavuligerus 
     183      10949  Aspergillus niger 
     184      10839  Pediculus humanus subsp. corporis (Body louse)
     185      10822  Chaetomium globosum  
     186      10570  Metarhizium robertsii (strain ARSEF 23 / ATCC MYA-3075) (Metarhizium anisopliae)
     187      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     188      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     189      10387  Pseudomonas syringae pv. glycinea str. race 4
     190      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     191      10377  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     192      10354  Phaeodactylum tricornutum (strain CCAP 1055/1)
     193      10273  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
     194      10221  Shigella flexneri 1235-66
     195      10216  Burkholderia terrae BS001
     196      10204  Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 
     197      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
     198      10171  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     199      10109  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
     200      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
     201      10087  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
     202      10051  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
     203      10013  Streptomyces bingchenggensis (strain BCW-1)
     204       9846  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
     205       9836  Chlorella variabilis (Green alga)
     206       9822  Metarhizium acridum (strain CQMa 102)
     207       9799  Coccomyxa subellipsoidea C-169
     208       9760  Thielavia terrestris (strain ATCC 38088 / NRRL 8126) (Acremonium alabamense)
     209       9703  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
     210       9692  Klebsiella pneumoniae
     211       9662  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
     212       9651  Cordyceps militaris (strain CM01) (Caterpillar fungus)
     213       9597  Streptomyces cattleya 
     214       9551  Amycolatopsis mediterranei S699
     215       9533  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
     216       9510  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
     217       9485  Ajellomyces dermatitidis (strain ER-3 / ATCC MYA-2586) 
     218       9482  Salmo salar (Atlantic salmon)
     219       9443  Ajellomyces capsulata (strain H88) (Darling's disease fungus) 
     220       9391  Exophiala dermatitidis (strain ATCC 34100 / CBS 525.76 / NIH/UT8656)  
     221       9236  Monosiga brevicollis (Choanoflagellate)
     222       9201  Amycolatopsis mediterranei (strain U-32)
     223       9197  Streptomyces himastatinicus ATCC 53653
     224       9154  Ajellomyces capsulata (strain G186AR / H82 / ATCC MYA-2454 / RMSCC 2432)  
     225       9146  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
     226       9139  Pseudomonas syringae pv. pisi str. 1704B
     227       9113  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
     228       9112  Hypocrea jecorina (strain QM6a) (Trichoderma reesei)
     229       9083  Thielavia heterothallica (strain ATCC 42464 / BCRC 31852 / DSM 1799) 
     230       9076  Saccharomyces cerevisiae x Saccharomyces kudriavzevii VIN7
     231       9064  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
     232       9046  Streptomyces hygroscopicus subsp. jinggangensis (strain 5008)
     233       9008  Neurospora crassa 
     234       8991  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
     235       8988  Dictyostelium discoideum (Slime mold)
     236       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
     237       8955  Rhodococcus opacus M213
     238       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
     239       8941  Streptomyces violaceusniger Tu 4113
     240       8940  Burkholderia sp. TJI49
     241       8900  Catenulispora acidiphila 
     242       8859  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
     243       8849  Pichia sorbitophila  
     244       8796  Aspergillus clavatus 
     245       8794  Bradyrhizobium japonicum USDA 6
     246       8783  Pseudomonas syringae pv. japonica str. M301072
     247       8776  uncultured crenarchaeote
     248       8755  Rhodococcus sp. (strain RHA1)
     249       8738  Trypanosoma brucei brucei (strain 927/4 GUTat10.1)
     250       8705  Trichophyton rubrum (strain ATCC MYA-4607 / CBS 118892) (Athlete's foot fungus)


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          374730 (  2%)
    Bacteria       15952120 ( 66%)
    Eukaryota       6189838 ( 26%)
    Viruses         1436178 (  6%)
    Other             41716 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 111122 (  2%)           (  0%)
     Other Mammalia        849095 ( 14%)           (  4%)
     Other Vertebrata      693072 ( 11%)           (  3%)
     Viridiplantae        1171886 ( 19%)           (  5%)
     Fungi                1316987 ( 21%)           (  5%)
     Insecta               709377 ( 11%)           (  3%)
     Nematoda              224115 (  4%)           (  1%)
     Other                1114184 ( 18%)           (  5%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  578487             1001-1100   141565
                 51- 100 2000928             1101-1200   100139
                101- 150 2245006             1201-1300    70411
                151- 200 2177124             1301-1400    45399
                201- 250 2194475             1401-1500    36626
                251- 300 2123212             1501-1600    25726
                301- 350 1934342             1601-1700    19480
                351- 400 1473688             1701-1800    14968
                401- 450 1265847             1801-1900    12487
                451- 500 1045817             1901-2000    10642
                501- 550  699699             2001-2100     8461
                551- 600  540244             2101-2200     8469
                601- 650  394203             2201-2300     6733
                651- 700  309202             2301-2400     5296
                701- 750  263688             2401-2500     4546
                751- 800  234789             >2500        37169
                801- 850  177717
                851- 900  159140
                901- 950  109561
                951-1000   81192

   


   The average sequence length in UniProtKB/TrEMBL is   325 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    29329426                1.22                                                    
   Submitted to EMBL/GenBank/DDBJ  16018177  14802304      0.67                                                    
   Journal                         12040856  11294816      0.50                                                    
   Submitted to other databases     1254114   1249669      0.05                                                    
   Thesis                              9787      9729     <0.01                                                    
   Book citation                       6466      6417     <0.01                                                    
   Unpublished observations              25        25     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 441349


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      26596179                1.11                                                    
   CATALYTIC ACTIVITY               2378656   2187428      0.10     4                                              
   CAUTION                          9180854   9180833      0.38     1                                              
   COFACTOR                          855907    801218      0.04     8                                              
   DOMAIN                             80705     76919     <0.01     9                                              
   FUNCTION                         2623473   2445486      0.11     3                                              
   INTERACTION                          681       681     <0.01    11                                              
   MISCELLANEOUS                      45983     45891     <0.01    10                                              
   PATHWAY                          1163627   1060602      0.05     7                                              
   SIMILARITY                       6979428   6081534      0.29     2                                              
   SUBCELLULAR LOCATION             2060212   1973427      0.09     5                                              
   SUBUNIT                          1226653   1224648      0.05     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       6610922                0.28                                                    
   CHAIN                             659778    532614      0.03     2                                              
   NON_TER                          5463439   3438661      0.23     1                                              
   SIGNAL                            486945    485188      0.02     3                                              
   TRANSIT                              760       760     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             267592413               11.15                                                    
   AGD                                 2525      2525     <0.01    84   Organism-specific databases                
   ANU-2DPAGE                            52        52     <0.01    99   2D gel databases                           
   Allergome                           2832      2222     <0.01    80   Protein family/group databases             
   ArachnoServer                         66        66     <0.01    98   Organism-specific databases                
   ArrayExpress                       87540     87538     <0.01    52   Gene expression databases                  
   BRENDA                              2701      2670     <0.01    81   Enzyme and pathway databases               
   Bgee                              140672    140597      0.01    47   Gene expression databases                  
   BioCyc                            671014    656577      0.03    31   Enzyme and pathway databases               
   CAZy                               74165     69685     <0.01    55   Protein family/group databases             
   CGD                                 7083      7083     <0.01    76   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     5         5     <0.01   105   2D gel databases                           
   CTD                               310074    308361      0.01    38   Organism-specific databases                
   CYGD                                   2         2     <0.01   107   Organism-specific databases                
   ConoServer                           160       160     <0.01    94   Organism-specific databases                
   DIP                                 2687      2682     <0.01    82   Protein-protein interaction databases      
   DNASU                              44138     43814     <0.01    61   Protocols and materials databases          
   EMBL                            26522459  23301458      1.11     3   Sequence databases                         
   Ensembl                           964731    947862      0.04    28   Genome annotation databases                
   EnsemblBacteria                   835273    801157      0.03    30   Genome annotation databases                
   EnsemblFungi                      224058    223679      0.01    42   Genome annotation databases                
   EnsemblMetazoa                    499295    491771      0.02    34   Genome annotation databases                
   EnsemblPlants                     296093    287989      0.01    39   Genome annotation databases                
   EnsemblProtists                   115275    114213     <0.01    49   Genome annotation databases                
   EuPathDB                          178972    178971      0.01    45   Organism-specific databases                
   EvolutionaryTrace                   8210      8210     <0.01    74   Other                                      
   FlyBase                           195215    193670      0.01    43   Organism-specific databases                
   GO                              45517372  14466143      1.90     2   Ontologies                                 
   Gene3D                           9820717   7823162      0.41     6   Family and domain databases                
   GeneID                           8121744   7951090      0.34    10   Genome annotation databases                
   GeneTree                          893871    893809      0.04    29   Phylogenomic databases                     
   Genevestigator                     94320     94313     <0.01    51   Gene expression databases                  
   GenoList                           14735     14462     <0.01    72   Organism-specific databases                
   GenomeRNAi                         22196     22196     <0.01    67   Other                                      
   GenomeReviews                    4253282   4154498      0.18    15   Genome annotation databases                
   Gramene                            67670     67670     <0.01    56   Organism-specific databases                
   H-InvDB                              634       485     <0.01    90   Organism-specific databases                
   HAMAP                            2026116   2005211      0.08    24   Family and domain databases                
   HGNC                               48455     48363     <0.01    59   Organism-specific databases                
   HOGENOM                          3660720   3660698      0.15    17   Phylogenomic databases                     
   HOVERGEN                          312431    312423      0.01    37   Phylogenomic databases                     
   HSSP                              250989    250763      0.01    41   3D structure databases                     
   IPI                               321116    320959      0.01    36   Sequence databases                         
   InParanoid                        190230    190209      0.01    44   Phylogenomic databases                     
   IntAct                             16853     16852     <0.01    70   Protein-protein interaction databases      
   InterPro                        49092950  17669140      2.05     1   Family and domain databases                
   KEGG                             6543999   6433191      0.27    12   Genome annotation databases                
   KO                               2522116   2511223      0.11    23   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    77   Organism-specific databases                
   Leproma                             1272      1270     <0.01    88   Organism-specific databases                
   MEROPS                             62637     62635     <0.01    58   Protein family/group databases             
   MGI                                37531     37529     <0.01    63   Organism-specific databases                
   MINT                                8614      8614     <0.01    73   Protein-protein interaction databases      
   NextBio                           105833    105832     <0.01    50   Other                                      
   OMA                              3905493   3905460      0.16    16   Phylogenomic databases                     
   OrthoDB                           567380    567378      0.02    32   Phylogenomic databases                     
   PANTHER                          3422886   3234937      0.14    19   Family and domain databases                
   PATRIC                           8342487   8342399      0.35     8   Genome annotation databases                
   PDB                                17177      9772     <0.01    68   3D structure databases                     
   PDBsum                             17077      9679     <0.01    69   3D structure databases                     
   PHCI-2DPAGE                           99        99     <0.01    96   2D gel databases                           
   PIR                               174063    141216      0.01    46   Sequence databases                         
   PIRSF                            1706464   1706028      0.07    25   Family and domain databases                
   PMAP-CutDB                           220       220     <0.01    92   Other                                      
   PMMA-2DPAGE                            2         2     <0.01   106   2D gel databases                           
   PRIDE                             255384    255384      0.01    40   Proteomic databases                        
   PRINTS                           3617689   3203250      0.15    18   Family and domain databases                
   PROSITE                         11589156   7621233      0.48     5   Family and domain databases                
   Pathway_Interaction_DB                11         9     <0.01   104   Enzyme and pathway databases               
   PeptideAtlas                         144       144     <0.01    95   Proteomic databases                        
   PeroxiBase                          2551      2543     <0.01    83   Protein family/group databases             
   Pfam                            22153435  16346289      0.92     4   Family and domain databases                
   PharmGKB                            4691      4691     <0.01    78   Organism-specific databases                
   PhosphoSite                         1537      1537     <0.01    87   PTM databases                              
   PhylomeDB                         124017    124017      0.01    48   Phylogenomic databases                     
   PomBase                               40        27     <0.01   100   Organism-specific databases                
   PptaseDB                              36        34     <0.01   101   Protein family/group databases             
   ProDom                            437474    416072      0.02    35   Family and domain databases                
   ProMEX                               281       281     <0.01    91   Proteomic databases                        
   ProtClustDB                      2721803   2721803      0.11    21   Phylogenomic databases                     
   ProteinModelPortal               6774997   6774991      0.28    11   3D structure databases                     
   PseudoCAP                           4550      4544     <0.01    79   Organism-specific databases                
   REBASE                             28482     28479     <0.01    64   Protein family/group databases             
   REPRODUCTION-2DPAGE                   86        85     <0.01    97   2D gel databases                           
   RGD                                24920     24614     <0.01    66   Organism-specific databases                
   Reactome                             218       181     <0.01    93   Enzyme and pathway databases               
   RefSeq                           8148664   7952673      0.34     9   Sequence databases                         
   SGD                                   11        11     <0.01   103   Organism-specific databases                
   SMART                            5195236   3926197      0.22    13   Family and domain databases                
   SMR                              1445927   1445927      0.06    26   3D structure databases                     
   STRING                           2593585   2593582      0.11    22   Protein-protein interaction databases      
   SUPFAM                           9486040   7794857      0.40     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   102   2D gel databases                           
   Siena-2DPAGE                           2         2     <0.01   108   2D gel databases                           
   TAIR                               16108     16029     <0.01    71   Organism-specific databases                
   TCDB                                2408      2396     <0.01    85   Protein family/group databases             
   TIGRFAMs                         4879934   4451828      0.20    14   Family and domain databases                
   TubercuList                         2052      2047     <0.01    86   Organism-specific databases                
   UCSC                               65032     65031     <0.01    57   Genome annotation databases                
   UniGene                           533676    501284      0.02    33   Sequence databases                         
   UniPathway                       1073173    999402      0.04    27   Enzyme and pathway databases               
   VectorBase                         78371     77856     <0.01    53   Genome annotation databases                
   World-2DPAGE                         936       931     <0.01    89   2D gel databases                           
   WormBase                           42290     42171     <0.01    62   Organism-specific databases                
   Xenbase                            25668     25593     <0.01    65   Organism-specific databases                
   ZFIN                               45479     44953     <0.01    60   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    75   Organism-specific databases                
   eggNOG                           2780842   2780841      0.12    20   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    54   Organism-specific databases                

Number of explicitly cross-referenced databases: 135


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.61   Gln (Q) 3.94   Leu (L) 9.89   Ser (S) 6.70
   Arg (R) 5.46   Glu (E) 6.19   Lys (K) 5.27   Thr (T) 5.58
   Asn (N) 4.10   Gly (G) 7.08   Met (M) 2.46   Trp (W) 1.30
   Asp (D) 5.32   His (H) 2.21   Phe (F) 4.01   Tyr (Y) 3.03
   Cys (C) 1.28   Ile (I) 5.95   Pro (P) 4.73   Val (V) 6.76

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 538011
Total number of entries encoded on a Plasmid: 293737
Total number of entries encoded on a Plastid: 20840
Total number of entries encoded on a Plastid; Apicoplast: 660
Total number of entries encoded on a Plastid; Chloroplast: 196845
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 892