Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2013_04 STATISTICS


1.  INTRODUCTION

Release 2013_04 of 03-Apr-2013 of UniProtKB/TrEMBL contains 33106277 sequence entries,
comprising 10624970319 amino acids .

1013762 sequences have been added since release 2013_03, the sequence data of
6865 existing entries has been updated and the annotations of
22203949 entries have been revised. This represents an increase of 3%.

Number of fragments: 4020435

Protein existence (PE):              entries      %
1: Evidence at protein level           19382     0.06%
2: Evidence at transcript level       809602     2.45%
3: Inferred from homology            7125769    21.52%
4: Predicted                        25151524    75.97%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 404679

   The first twenty species represent 1843685 sequences:   5.6 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:16969
                            2x:66901
                            3x:36176
                            4x:24427
                            5x:15370
                            6x:11066
                            7x: 8333
                            8x: 6513
                            9x: 5198
                           10x:10205
                       11- 20x:27524
                       21- 50x: 9525
                       51-100x: 3682
                         >100x:10068


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     519249  Human immunodeficiency virus 1
       2     185925  uncultured bacterium
       3     114498  Homo sapiens (Human)
       4      96928  Oryza sativa subsp. japonica (Rice)
       5      84696  Hepatitis C virus
       6      73735  Glycine max (Soybean) (Glycine hispida)
       7      70402  Hordeum vulgare var. distichum (Two-rowed barley)
       8      68976  Macaca mulatta (Rhesus macaque)
       9      60450  Zea mays (Maize)
      10      58936  Hepatitis B virus (HBV)
      11      56506  Mus musculus (Mouse)
      12      56117  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      13      54883  Solanum tuberosum (Potato)
      14      54096  Vitis vinifera (Grape)
      15      51841  Danio rerio (Zebrafish) (Brachydanio rerio)
      16      50601  Trichomonas vaginalis
      17      49236  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      18      48885  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      19      44560  Populus trichocarpa (Western balsam poplar) 
      20      43165  Callithrix jacchus (White-tufted-ear marmoset)
      21      41838  Arabidopsis thaliana (Mouse-ear cress)
      22      39850  Paramecium tetraurelia
      23      39825  Oryza sativa subsp. indica (Rice)
      24      39293  Setaria italica (Foxtail millet) (Panicum italicum)
      25      38163  human gut metagenome
      26      36522  Musa acuminata subsp. malaccensis
      27      35893  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      28      35602  Ailuropoda melanoleuca (Giant panda)
      29      35193  Acyrthosiphon pisum (Pea aphid)
      30      35066  Caenorhabditis japonica
      31      34809  Physcomitrella patens subsp. patens (Moss)
      32      34569  Thalassiosira oceanica (Marine diatom)
      33      34521  Drosophila melanogaster (Fruit fly)
      34      33778  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      35      33252  Selaginella moellendorffii (Spikemoss)
      36      32769  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      37      32339  Oryza brachyantha
      38      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      39      32093  Oryza glaberrima (African rice)
      40      31886  Sus scrofa (Pig)
      41      31835  Pan troglodytes (Chimpanzee)
      42      31540  Simian immunodeficiency virus (SIV)
      43      31400  Ricinus communis (Castor bean)
      44      30918  Daphnia pulex (Water flea)
      45      30300  Caenorhabditis brenneri (Nematode worm)
      46      30145  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      47      29815  Amphimedon queenslandica (Sponge)
      48      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      49      29316  Pristionchus pacificus
      50      29178  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      51      29053  Oikopleura dioica (Tunicate)
      52      28838  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      53      28533  Escherichia coli
      54      28451  Canis familiaris (Dog) (Canis lupus familiaris)
      55      28056  Gasterosteus aculeatus (Three-spined stickleback)
      56      27725  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      57      27498  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      58      27414  Equus caballus (Horse)
      59      27089  Gorilla gorilla gorilla (Lowland gorilla)
      60      26820  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      61      26814  Gallus gallus (Chicken)
      62      25905  Oryzias latipes (Medaka fish) (Japanese ricefish)
      63      25793  Loxodonta africana (African elephant)
      64      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      65      25613  Rattus norvegicus (Rat)
      66      25485  Bos taurus (Bovine)
      67      25081  Oryctolagus cuniculus (Rabbit)
      68      24903  Nematostella vectensis (Starlet sea anemone)
      69      24643  Tetrahymena thermophila (strain SB210)
      70      24590  Guillardia theta CCMP2712
      71      24200  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      72      23715  Ornithorhynchus anatinus (Duckbill platypus)
      73      23565  Oxytricha trifallax
      74      23227  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      75      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      76      22715  Monodelphis domestica (Gray short-tailed opossum)
      77      22561  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      78      22490  Caenorhabditis elegans
      79      22303  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      80      22163  gut metagenome
      81      21821  Latimeria chalumnae (West Indian ocean coelacanth)
      82      21546  Heterocephalus glaber (Naked mole rat)
      83      21342  Caenorhabditis briggsae
      84      21089  Ixodes scapularis (Black-legged tick) (Deer tick)
      85      20855  Myotis lucifugus (Little brown bat)
      86      20838  Tupaia chinensis (Chinese tree shrew)
      87      20737  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
      88      20130  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      89      20114  Ciona savignyi (Pacific transparent sea squirt)
      90      20072  Cavia porcellus (Guinea pig)
      91      19978  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
      92      19678  Taeniopygia guttata (Zebra finch) (Poephila guttata)
      93      19544  Pteropus alecto (Black flying fox)
      94      19438  Wuchereria bancrofti
      95      19331  Toxoplasma gondii
      96      19258  Anolis carolinensis (Green anole) (American chameleon)
      97      19200  Trypanosoma cruzi (strain CL Brener)
      98      18936  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      99      18849  Drosophila simulans (Fruit fly)
     100      18771  mine drainage metagenome
     101      18591  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     102      18555  Bos grunniens mutus
     103      18121  Atta cephalotes (Leafcutter ant)
     104      17998  Anopheles gambiae (African malaria mosquito)
     105      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     106      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
     107      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     108      17490  Bombyx mori (Silk moth)
     109      17393  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     110      17282  Nasonia vitripennis (Parasitic wasp)
     111      17039  Drosophila yakuba (Fruit fly)
     112      17022  Tribolium castaneum (Red flour beetle)
     113      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     114      16884  Meleagris gallopavo (Common turkey)
     115      16714  Drosophila persimilis (Fruit fly)
     116      16643  Fusarium oxysporum f. sp. lycopersici  
     117      16469  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     118      16426  Ectocarpus siliculosus (Brown alga)
     119      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     120      16317  Hepatitis C virus subtype 1b
     121      16315  Danaus plexippus (Monarch butterfly)
     122      16263  Trichinella spiralis (Trichina worm)
     123      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     124      16187  Drosophila sechellia (Fruit fly)
     125      16142  Schistosoma japonicum (Blood fluke)
     126      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     127      15917  Plasmodium falciparum
     128      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     129      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     130      15716  Naegleria gruberi (Amoeba)
     131      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     132      15568  Phytophthora ramorum (Sudden oak death agent)
     133      15461  Myotis davidii (David's myotis)
     134      15420  Drosophila willistoni (Fruit fly)
     135      15371  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     136      15354  Loa loa (Eye worm) (Filaria loa)
     137      15225  Pythium ultimum
     138      15177  Hepatitis C virus subtype 1a
     139      15143  Drosophila ananassae (Fruit fly)
     140      15038  Harpegnathos saltator (Jerdon's jumping ant)
     141      14937  Acanthamoeba castellanii str. Neff
     142      14927  Drosophila erecta (Fruit fly)
     143      14855  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     144      14801  Camponotus floridanus (Florida carpenter ant)
     145      14788  Drosophila mojavensis (Fruit fly)
     146      14713  Plasmodium chabaudi
     147      14701  Drosophila virilis (Fruit fly)
     148      14650  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     149      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     150      14417  Volvox carteri (Green alga)
     151      14341  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     152      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     153      14262  Ralstonia solanacearum (Pseudomonas solanacearum)
     154      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     155      13966  Acromyrmex echinatior (Panamanian leafcutter ant) 
     156      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     157      13865  Clonorchis sinensis (Chinese liver fluke)
     158      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     159      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     160      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     161      13540  Trypanosoma cruzi
     162      13346  Aspergillus flavus 
     163      13266  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     164      13187  Mustela putorius furo (European domestic ferret) (Mustela furo)
     165      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     166      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     167      12983  Albugo laibachii Nc14
     168      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     169      12950  Stigmatella aurantiaca (strain DW4/3-1)
     170      12900  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     171      12858  Magnaporthe oryzae Y34
     172      12752  uncultured archaeon
     173      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     174      12711  Magnaporthe oryzae P131
     175      12696  Trypanosoma congolense (strain IL3000)
     176      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     177      12679  Schistosoma mansoni (Blood fluke)
     178      12621  Rabies virus
     179      12617  Xenopus laevis (African clawed frog)
     180      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     181      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     182      12440  Polysphondylium pallidum (Cellular slime mold)
     183      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     184      12352  Dictyostelium purpureum (Slime mold)
     185      12202  Helicobacter pylori (Campylobacter pylori)
     186      12192  Porcine reproductive and respiratory syndrome virus (PRRSV)
     187      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     188      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     189      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     190      11944  Emericella nidulans  
     191      11931  Apis mellifera (Honeybee)
     192      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     193      11780  Piriformospora indica (strain DSM 11827)
     194      11752  Chondrocladia sp. SMF<DEU
     195      11751  Cladorhiza sp. SMF<DEU
     196      11750  Abyssocladia sp. SMF<DEU
     197      11726  Phelloderma sp. SMF<DEU
     198      11718  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     199      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     200      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     201      11674  Anopheles darlingi (Mosquito)
     202      11644  Plasmodium berghei (strain Anka)
     203      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     204      11566  Trichoplax adhaerens (Trichoplax reptans)
     205      11557  Trypanosoma vivax (strain Y486)
     206      11515  Puccinia triticina (isolate 1-1 / race 1 (BBBD)) (Brown leaf rust fungus)
     207      11514  Aureococcus anophagefferens (Harmful bloom alga)
     208      11499  Brugia malayi (Filarial nematode worm)
     209      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     210      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     211      11396  Aspergillus oryzae (strain 3.042) (Yellow koji mold)
     212      11278  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     213      11211  Ktedonobacter racemifer DSM 44963
     214      11211  Agaricus bisporus var. burnettii (strain JB137-S8 / ATCC MYA-4627 / FGSC 10392) 
     215      11205  Rhipicephalus pulchellus
     216      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     217      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     218      10964  Streptomyces clavuligerus 
     219      10949  Aspergillus niger 
     220      10839  Pediculus humanus subsp. corporis (Body louse)
     221      10822  Chaetomium globosum  
     222      10570  Metarhizium anisopliae (strain ARSEF 23 / ATCC MYA-3075)
     223      10563  Amycolatopsis mediterranei S699
     224      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     225      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     226      10487  Rhizoctonia solani AG-1 IA
     227      10458  Klebsiella pneumoniae
     228      10397  Agaricus bisporus var. bisporus (strain H97 / ATCC MYA-4626 / FGSC 10389) 
     229      10393  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     230      10387  Pseudomonas syringae pv. glycinea str. race 4
     231      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     232      10368  Cystobacter fuscus DSM 2262
     233      10361  Beauveria bassiana (strain ARSEF 2860) (White muscardine disease fungus) 
     234      10354  Phaeodactylum tricornutum (strain CCAP 1055/1)
     235      10273  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
     236      10221  Shigella flexneri 1235-66
     237      10216  Burkholderia terrae BS001
     238      10204  Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 
     239      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
     240      10179  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     241      10127  Trypanosoma cruzi marinkellei
     242      10113  Burkholderia sp. BT03
     243      10109  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
     244      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
     245      10087  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
     246      10051  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
     247      10034  Marssonina brunnea f. sp. multigermtubi (strain MB_m1) 
     248      10033  Streptomyces turgidiscabies Car8
     249      10013  Streptomyces bingchenggensis (strain BCW-1)
     250       9846  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          656515 (  2%)
    Bacteria       23398523 ( 71%)
    Eukaryota       7322732 ( 22%)
    Viruses         1625623 (  5%)
    Other            102883 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 114534 (  2%)           (  0%)
     Other Mammalia        925342 ( 13%)           (  3%)
     Other Vertebrata      764473 ( 10%)           (  2%)
     Viridiplantae        1497346 ( 20%)           (  5%)
     Fungi                1613872 ( 22%)           (  5%)
     Insecta               810973 ( 11%)           (  2%)
     Nematoda              252935 (  3%)           (  1%)
     Other                1343257 ( 18%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  856496             1001-1100   187633
                 51- 100 2882118             1101-1200   130443
                101- 150 3211075             1201-1300    92997
                151- 200 3113312             1301-1400    58050
                201- 250 3129768             1401-1500    47309
                251- 300 3026062             1501-1600    32568
                301- 350 2753184             1601-1700    24512
                351- 400 2079132             1701-1800    18485
                401- 450 1796018             1801-1900    15319
                451- 500 1474171             1901-2000    12943
                501- 550  963716             2001-2100    10185
                551- 600  742077             2101-2200    10522
                601- 650  540568             2201-2300     8062
                651- 700  426427             2301-2400     6432
                701- 750  358099             2401-2500     5574
                751- 800  314466             >2500        45030
                801- 850  242084
                851- 900  215565
                901- 950  148121
                951-1000  107319

   


   The average sequence length in UniProtKB/TrEMBL is   320 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    40441113                1.22                                                    
   Submitted to EMBL/GenBank/DDBJ  23032928  21104334      0.70                                                    
   Journal                         15765747  14876010      0.48                                                    
   Submitted to other databases     1625514   1615886      0.05                                                    
   Thesis                             10232     10174     <0.01                                                    
   Book citation                       6673      6624     <0.01                                                    
   Unpublished observations              18        18     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 462029


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      40630948                1.23                                                    
   CATALYTIC ACTIVITY               3307337   3015931      0.10     4                                              
   CAUTION                         16297799  16291336      0.49     1                                              
   COFACTOR                         1279056   1196195      0.04     8                                              
   DOMAIN                            129331    124091     <0.01     9                                              
   FUNCTION                         3679278   3448998      0.11     3                                              
   INTERACTION                         1188      1188     <0.01    11                                              
   MISCELLANEOUS                      90170     90060     <0.01    10                                              
   PATHWAY                          1641545   1496882      0.05     7                                              
   SIMILARITY                       9441981   8204637      0.29     2                                              
   SUBCELLULAR LOCATION             2949798   2817983      0.09     5                                              
   SUBUNIT                          1813465   1795982      0.05     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       7811976                0.24                                                    
   CHAIN                             834921    689560      0.03     2                                              
   NON_TER                          6311760   4021210      0.19     1                                              
   SIGNAL                            664429    661142      0.02     3                                              
   TRANSIT                              866       866     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             366178356               11.06                                                    
   Allergome                           3393      2766     <0.01    84   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   102   Organism-specific databases                
   ArrayExpress                      205532    205532      0.01    44   Gene expression databases                  
   BRENDA                              2663      2634     <0.01    86   Enzyme and pathway databases               
   Bgee                              102912    102912     <0.01    53   Gene expression databases                  
   BindingDB                           5980      5980     <0.01    79   Other                                      
   BioCyc                           3255856   3220269      0.10    22   Enzyme and pathway databases               
   CAZy                               74046     69573     <0.01    57   Protein family/group databases             
   CGD                                 7060      7060     <0.01    78   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   108   2D gel databases                           
   CTD                               328436    326819      0.01    38   Organism-specific databases                
   ChEMBL                               575       575     <0.01    94   Other                                      
   ChiTaRS                            67236     67236     <0.01    58   Other                                      
   ConoServer                           160       160     <0.01    99   Organism-specific databases                
   DIP                                 2831      2826     <0.01    85   Protein-protein interaction databases      
   DNASU                              42816     42482     <0.01    63   Protocols and materials databases          
   EMBL                            36037391  32153877      1.09     3   Sequence databases                         
   Ensembl                           952865    936592      0.03    30   Genome annotation databases                
   EnsemblBacteria                 18709505  18431216      0.57     5   Genome annotation databases                
   EnsemblFungi                      314536    312819      0.01    39   Genome annotation databases                
   EnsemblMetazoa                    663787    648462      0.02    32   Genome annotation databases                
   EnsemblPlants                     579731    546873      0.02    34   Genome annotation databases                
   EnsemblProtists                   141975    140063     <0.01    51   Genome annotation databases                
   EuPathDB                          147096    146644     <0.01    49   Organism-specific databases                
   EvolutionaryTrace                   8124      8124     <0.01    76   Other                                      
   FlyBase                           196577    195110      0.01    46   Organism-specific databases                
   GO                              61634024  19077629      1.86     2   Ontologies                                 
   Gene3D                          13173012  10498279      0.40     7   Family and domain databases                
   GeneID                           9247137   9031364      0.28    10   Genome annotation databases                
   GeneTree                          798208    798152      0.02    31   Phylogenomic databases                     
   Genevestigator                     87116     87111     <0.01    54   Gene expression databases                  
   GenoList                           14733     14460     <0.01    74   Organism-specific databases                
   GenomeRNAi                         20854     20854     <0.01    68   Other                                      
   GenomeReviews                    4250279   4151525      0.13    19                                              
   Gramene                           197852    197852      0.01    45   Organism-specific databases                
   H-InvDB                              623       475     <0.01    93   Organism-specific databases                
   HAMAP                            3084879   3046799      0.09    23   Family and domain databases                
   HGNC                               49519     49435     <0.01    61   Organism-specific databases                
   HOGENOM                          3655044   3655000      0.11    20   Phylogenomic databases                     
   HOVERGEN                          306216    306205      0.01    40   Phylogenomic databases                     
   HSSP                              250106    249906      0.01    42   3D structure databases                     
   IPI                               289418    288644      0.01    41   Sequence databases                         
   InParanoid                        187146    187146      0.01    47   Phylogenomic databases                     
   IntAct                             17091     17091     <0.01    72   Protein-protein interaction databases      
   InterPro                        65726590  23814901      1.99     1   Family and domain databases                
   KEGG                             8254799   8053595      0.25    13   Genome annotation databases                
   KO                               3269394   3254647      0.10    21   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    81   Organism-specific databases                
   Leproma                             1272      1270     <0.01    89   Organism-specific databases                
   MEROPS                            139106    139105     <0.01    52   Protein family/group databases             
   MGI                                51910     51445     <0.01    60   Organism-specific databases                
   MINT                                8534      8534     <0.01    75   Protein-protein interaction databases      
   NextBio                           213218    212553      0.01    43   Other                                      
   OMA                              4865744   4865533      0.15    16   Phylogenomic databases                     
   OrthoDB                           553449    553406      0.02    35   Phylogenomic databases                     
   PANTHER                          4516103   4256137      0.14    18   Family and domain databases                
   PATRIC                           8306833   8306711      0.25    12   Genome annotation databases                
   PDB                                18266     10250     <0.01    71   3D structure databases                     
   PDBsum                             18814     10501     <0.01    70   3D structure databases                     
   PIR                               172642    139807      0.01    48   Sequence databases                         
   PIRSF                            2682176   2680375      0.08    27   Family and domain databases                
   PMAP-CutDB                           211       211     <0.01    97   Other                                      
   PRIDE                             456732    456732      0.01    37   Proteomic databases                        
   PRINTS                           4649432   4149162      0.14    17   Family and domain databases                
   PROSITE                         15271295  10138123      0.46     6   Family and domain databases                
   Pathway_Interaction_DB                10         8     <0.01   107   Enzyme and pathway databases               
   PaxDb                              29790     29790     <0.01    66   Proteomic databases                        
   PeptideAtlas                         130       130     <0.01   100   Proteomic databases                        
   PeroxiBase                          2577      2569     <0.01    87   Protein family/group databases             
   Pfam                            30212572  22210979      0.91     4   Family and domain databases                
   PharmGKB                            3837      3837     <0.01    83   Organism-specific databases                
   PhosphoSite                         1137      1137     <0.01    90   PTM databases                              
   PhylomeDB                         145656    145656     <0.01    50   Phylogenomic databases                     
   PomBase                               40        27     <0.01   103   Organism-specific databases                
   PptaseDB                              36        34     <0.01   104   Protein family/group databases             
   ProDom                            599788    574285      0.02    33   Family and domain databases                
   ProMEX                              5151      5151     <0.01    80   Proteomic databases                        
   ProtClustDB                      2720205   2720191      0.08    26   Phylogenomic databases                     
   ProteinModelPortal               8502693   8502693      0.26    11   3D structure databases                     
   PseudoCAP                           4537      4531     <0.01    82   Organism-specific databases                
   REBASE                             34549     34536     <0.01    65   Protein family/group databases             
   REPRODUCTION-2DPAGE                   67        66     <0.01   101   2D gel databases                           
   RGD                                19736     18838     <0.01    69   Organism-specific databases                
   Reactome                             210       180     <0.01    98   Enzyme and pathway databases               
   RefSeq                           9289370   9033551      0.28     9   Sequence databases                         
   SABIO-RK                             487       487     <0.01    95   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   106   Organism-specific databases                
   SMART                            6814560   5164401      0.21    15   Family and domain databases                
   SMR                              1663826   1663826      0.05    28   3D structure databases                     
   STRING                           3030883   2963663      0.09    24   Protein-protein interaction databases      
   SUPFAM                          12565092  10333824      0.38     8   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   105   2D gel databases                           
   TAIR                               15515     15439     <0.01    73   Organism-specific databases                
   TCDB                                2381      2370     <0.01    88   Protein family/group databases             
   TIGRFAMs                         6975536   6390610      0.21    14   Family and domain databases                
   TubercuList                         1111      1110     <0.01    91   Organism-specific databases                
   UCSC                               60731     60574     <0.01    59   Genome annotation databases                
   UniGene                           536617    506517      0.02    36   Sequence databases                         
   UniPathway                       1599524   1489335      0.05    29   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    55   Genome annotation databases                
   World-2DPAGE                         673       668     <0.01    92   2D gel databases                           
   WormBase                           42209     42090     <0.01    64   Organism-specific databases                
   Xenbase                            25698     25566     <0.01    67   Organism-specific databases                
   ZFIN                               44430     44178     <0.01    62   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    77   Organism-specific databases                
   eggNOG                           2768951   2768931      0.08    25   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    56   Organism-specific databases                
   mycoCLAP                             422       422     <0.01    96   Protein family/group databases             

Number of explicitly cross-referenced databases: 128


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.69   Gln (Q) 3.98   Leu (L) 9.96   Ser (S) 6.61
   Arg (R) 5.44   Glu (E) 6.18   Lys (K) 5.22   Thr (T) 5.56
   Asn (N) 4.08   Gly (G) 7.10   Met (M) 2.47   Trp (W) 1.31
   Asp (D) 5.33   His (H) 2.20   Phe (F) 4.01   Tyr (Y) 3.03
   Cys (C) 1.23   Ile (I) 5.98   Pro (P) 4.66   Val (V) 6.80

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 608485
Total number of entries encoded on a Plasmid: 328102
Total number of entries encoded on a Plastid: 25769
Total number of entries encoded on a Plastid; Apicoplast: 715
Total number of entries encoded on a Plastid; Chloroplast: 221154
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 928