Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Release 2013_09 of 18-Sep-2013 of UniProtKB/TrEMBL contains 42821879 sequence entries,
comprising 13630914768 amino acids.

1396586 sequences have been added since release 2013_08, the sequence data of
7581 existing entries has been updated and the annotations of
10916723 entries have been revised. This represents an increase of 3%.

Number of fragments: 4420020

Protein existence (PE):              entries      %
1: Evidence at protein level           20622     0.05%
2: Evidence at transcript level       818805     1.91%
3: Inferred from homology            9893252    23.10%
4: Predicted                        32089200    74.94%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 429833

   The first twenty species represent 1892237 sequences:   4.4 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:17720
                            2x:70277
                            3x:37887
                            4x:27067
                            5x:16361
                            6x:11384
                            7x: 8710
                            8x: 6853
                            9x: 5437
                           10x:10574
                       11- 20x:30839
                       21- 50x:10249
                       51-100x: 3983
                         >100x:13003


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     546713  Human immunodeficiency virus 1
       2     201102  uncultured bacterium
       3     113507  Homo sapiens (Human)
       4      96854  Oryza sativa subsp. japonica (Rice)
       5      89025  Hepatitis C virus
       6      73840  Glycine max (Soybean) (Glycine hispida)
       7      70413  Hordeum vulgare var. distichum (Two-rowed barley)
       8      69149  Macaca mulatta (Rhesus macaque)
       9      60522  Zea mays (Maize)
      10      60361  Hepatitis B virus (HBV)
      11      56803  Mus musculus (Mouse)
      12      56145  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      13      54890  Solanum tuberosum (Potato)
      14      54131  Vitis vinifera (Grape)
      15      52260  Danio rerio (Zebrafish) (Brachydanio rerio)
      16      50601  Trichomonas vaginalis
      17      49263  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      18      48906  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      19      44560  Populus trichocarpa (Western balsam poplar) 
      20      43192  Callithrix jacchus (White-tufted-ear marmoset)
      21      41214  Arabidopsis thaliana (Mouse-ear cress)
      22      41204  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      23      39850  Paramecium tetraurelia
      24      39842  Oryza sativa subsp. indica (Rice)
      25      39300  Setaria italica (Foxtail millet) (Panicum italicum)
      26      38798  Mustela putorius furo (European domestic ferret) (Mustela furo)
      27      38163  human gut metagenome
      28      36691  Drosophila melanogaster (Fruit fly)
      29      36522  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      30      35920  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      31      35631  Ailuropoda melanoleuca (Giant panda)
      32      35599  Emiliania huxleyi CCMP1516
      33      35205  Acyrthosiphon pisum (Pea aphid)
      34      35112  Simian immunodeficiency virus (SIV)
      35      35066  Caenorhabditis japonica
      36      34830  Physcomitrella patens subsp. patens (Moss)
      37      34570  Thalassiosira oceanica (Marine diatom)
      38      34369  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      39      33845  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      40      33253  Selaginella moellendorffii (Spikemoss)
      41      32767  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      42      32342  Oryza brachyantha
      43      32204  Sus scrofa (Pig)
      44      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      45      32094  Oryza glaberrima (African rice)
      46      31849  Pan troglodytes (Chimpanzee)
      47      31386  Ricinus communis (Castor bean)
      48      31207  Capitella teleta
      49      30926  Daphnia pulex (Water flea)
      50      30712  Caenorhabditis brenneri (Nematode worm)
      51      30146  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      52      29815  Amphimedon queenslandica (Sponge)
      53      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      54      29318  Pristionchus pacificus (Parasitic nematode)
      55      29183  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      56      29054  Oikopleura dioica (Tunicate)
      57      28856  Escherichia coli
      58      28835  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      59      28825  Capsella rubella
      60      28614  Prunus persica (Peach) (Amygdalus persica)
      61      28521  Canis familiaris (Dog) (Canis lupus familiaris)
      62      28099  Gasterosteus aculeatus (Three-spined stickleback)
      63      27753  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      64      27504  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      65      27460  Equus caballus (Horse)
      66      27089  Gorilla gorilla gorilla (Lowland gorilla)
      67      26827  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      68      25970  Oryzias latipes (Medaka fish) (Japanese ricefish)
      69      25797  Loxodonta africana (African elephant)
      70      25721  Rattus norvegicus (Rat)
      71      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      72      25655  Bos taurus (Bovine)
      73      25100  Oryctolagus cuniculus (Rabbit)
      74      24905  Nematostella vectensis (Starlet sea anemone)
      75      24643  Tetrahymena thermophila (strain SB210)
      76      24590  Guillardia theta CCMP2712
      77      24374  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      78      24208  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      79      23716  Ornithorhynchus anatinus (Duckbill platypus)
      80      23565  Oxytricha trifallax
      81      23502  Latimeria chalumnae (West Indian ocean coelacanth)
      82      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      83      22751  Monodelphis domestica (Gray short-tailed opossum)
      84      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      85      22525  Caenorhabditis elegans
      86      22313  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      87      22163  gut metagenome
      88      21548  Heterocephalus glaber (Naked mole rat)
      89      21346  Caenorhabditis briggsae
      90      21311  Gallus gallus (Chicken)
      91      21125  Ixodes scapularis (Black-legged tick) (Deer tick)
      92      20940  Felis catus (Cat) (Felis silvestris catus)
      93      20867  Myotis lucifugus (Little brown bat)
      94      20838  Tupaia chinensis (Chinese tree shrew)
      95      20760  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
      96      20512  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
      97      20133  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      98      20114  Ciona savignyi (Pacific transparent sea squirt)
      99      20073  Cavia porcellus (Guinea pig)
     100      19985  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     101      19816  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     102      19684  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     103      19551  Anolis carolinensis (Green anole) (American chameleon)
     104      19546  Pteropus alecto (Black flying fox)
     105      19438  Wuchereria bancrofti
     106      19336  Toxoplasma gondii
     107      19200  Trypanosoma cruzi (strain CL Brener)
     108      19057  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     109      18949  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     110      18855  Drosophila simulans (Fruit fly)
     111      18771  mine drainage metagenome
     112      18592  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     113      18555  Bos grunniens mutus
     114      18115  Atta cephalotes (Leafcutter ant)
     115      18026  Anopheles gambiae (African malaria mosquito)
     116      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     117      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     118      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     119      17520  Bombyx mori (Silk moth)
     120      17408  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     121      17301  Anas platyrhynchos (Domestic duck) (Anas boschas)
     122      17282  Nasonia vitripennis (Parasitic wasp)
     123      17047  Tribolium castaneum (Red flour beetle)
     124      17040  Drosophila yakuba (Fruit fly)
     125      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     126      16917  Meleagris gallopavo (Common turkey)
     127      16714  Drosophila persimilis (Fruit fly)
     128      16698  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     129      16649  Plasmodium falciparum
     130      16639  Fusarium oxysporum f. sp. lycopersici  
     131      16469  Hepatitis C virus subtype 1b
     132      16426  Ectocarpus siliculosus (Brown alga)
     133      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     134      16329  Danaus plexippus (Monarch butterfly)
     135      16274  Trichinella spiralis (Trichina worm)
     136      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     137      16188  Drosophila sechellia (Fruit fly)
     138      16156  Schistosoma japonicum (Blood fluke)
     139      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     140      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     141      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     142      15716  Naegleria gruberi (Amoeba)
     143      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     144      15568  Phytophthora ramorum (Sudden oak death agent)
     145      15461  Myotis davidii (David's myotis)
     146      15421  Drosophila willistoni (Fruit fly)
     147      15371  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     148      15354  Loa loa (Eye worm) (Filaria loa)
     149      15345  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     150      15225  Pythium ultimum
     151      15177  Hepatitis C virus subtype 1a
     152      15144  Drosophila ananassae (Fruit fly)
     153      15057  Pararge aegeria (specked wood butterfly)
     154      15041  Harpegnathos saltator (Jerdon's jumping ant)
     155      15040  Klebsiella pneumoniae
     156      14942  Acanthamoeba castellanii str. Neff
     157      14927  Drosophila erecta (Fruit fly)
     158      14910  Dendroctonus ponderosae (mountain pine beetle)
     159      14861  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     160      14801  Camponotus floridanus (Florida carpenter ant)
     161      14792  Fusarium fujikuroi IMI 58289
     162      14791  Drosophila mojavensis (Fruit fly)
     163      14713  Plasmodium chabaudi
     164      14704  Drosophila virilis (Fruit fly)
     165      14652  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     166      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     167      14592  uncultured archaeon
     168      14419  Rabies virus
     169      14417  Volvox carteri (Green alga)
     170      14341  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     171      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     172      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     173      14147  Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
     174      13970  Acromyrmex echinatior (Panamanian leafcutter ant) 
     175      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     176      13876  Clonorchis sinensis (Chinese liver fluke)
     177      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     178      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     179      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     180      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     181      13588  Trypanosoma cruzi
     182      13345  Aspergillus flavus 
     183      13329  Colletotrichum orbiculare   
     184      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     185      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     186      13109  Petromyzon marinus (Sea lamprey)
     187      13082  Glarea lozoyensis ATCC 20868
     188      13062  Mycosphaerella fijiensis (strain CIRAD86) (Black leaf streak disease fungus) 
     189      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     190      12983  Albugo laibachii Nc14
     191      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     192      12950  Stigmatella aurantiaca (strain DW4/3-1)
     193      12900  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     194      12856  Cochliobolus heterostrophus (strain C5 / ATCC 48332 / race O)  
     195      12846  Magnaporthe oryzae (strain Y34) (Rice blast fungus) (Pyricularia oryzae)
     196      12754  Porcine reproductive and respiratory syndrome virus (PRRSV)
     197      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     198      12711  Magnaporthe oryzae (strain P131) (Rice blast fungus) (Pyricularia oryzae)
     199      12703  Cochliobolus heterostrophus (strain C4 / ATCC 48331 / race T)  
     200      12697  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     201      12696  Trypanosoma congolense (strain IL3000)
     202      12681  Schistosoma mansoni (Blood fluke)
     203      12630  Xenopus laevis (African clawed frog)
     204      12586  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     205      12464  Helicobacter pylori (Campylobacter pylori)
     206      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     207      12440  Polysphondylium pallidum (Cellular slime mold)
     208      12414  Mycosphaerella pini (strain NZE10 / CBS 128990) (Red band needle blight fungus) 
     209      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     210      12352  Dictyostelium purpureum (Slime mold)
     211      12197  Thanatephorus cucumeris (strain AG1-IB / isolate 7/3/14)  
     212      12174  Cochliobolus sativus (strain ND90Pr / ATCC 201652)  
     213      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     214      12143  Mucor circinelloides f. circinelloides (strain 1006PhL) (Mucormycosis agent) 
     215      12078  Ceriporiopsis subvermispora (strain B) (White-rot fungus)
     216      12012  Apis mellifera (Honeybee)
     217      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     218      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     219      11941  Emericella nidulans  
     220      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     221      11780  Piriformospora indica (strain DSM 11827)
     222      11752  Chondrocladia sp. SMF<DEU
     223      11751  Cladorhiza sp. SMF<DEU
     224      11750  Abyssocladia sp. SMF<DEU
     225      11726  Phelloderma sp. SMF<DEU
     226      11719  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     227      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     228      11687  Setosphaeria turcica (strain 28A) (Northern leaf blight fungus) 
     229      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     230      11682  Eutypa lata (strain UCR-EL1) (Grapevine dieback disease fungus) 
     231      11679  Anopheles darlingi (Mosquito)
     232      11639  Plasmodium berghei (strain Anka)
     233      11603  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     234      11567  Trichoplax adhaerens (Trichoplax reptans)
     235      11557  Trypanosoma vivax (strain Y486)
     236      11518  Aureococcus anophagefferens (Harmful bloom alga)
     237      11515  Puccinia triticina (isolate 1-1 / race 1 (BBBD)) (Brown leaf rust fungus)
     238      11499  Brugia malayi (Filarial nematode worm)
     239      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     240      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     241      11396  Aspergillus oryzae (strain 3.042) (Yellow koji mold)
     242      11303  Magnaporthe poae (strain ATCC 64411 / 73-15) (Kentucky bluegrass fungus)
     243      11278  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     244      11211  Ktedonobacter racemifer DSM 44963
     245      11211  Agaricus bisporus var. burnettii (strain JB137-S8 / ATCC MYA-4627 / FGSC 10392) 
     246      11205  Rhipicephalus pulchellus
     247      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     248      11018  Botryotinia fuckeliana (strain BcDW1) (Noble rot fungus) (Botrytis cinerea)
     249      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     250      10964  Streptomyces clavuligerus 


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          697897 (  2%)
    Bacteria       32004412 ( 75%)
    Eukaryota       8237075 ( 19%)
    Viruses         1777493 (  4%)
    Other            105001 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 113547 (  1%)           (  0%)
     Other Mammalia        975279 ( 12%)           (  2%)
     Other Vertebrata      856288 ( 10%)           (  2%)
     Viridiplantae        1675936 ( 20%)           (  4%)
     Fungi                2000482 ( 24%)           (  5%)
     Insecta               861317 ( 10%)           (  2%)
     Nematoda              253797 (  3%)           (  1%)
     Other                1500429 ( 18%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1152890             1001-1100   233810
                 51- 100 3819409             1101-1200   162232
                101- 150 4263111             1201-1300   116662
                151- 200 4132686             1301-1400    69927
                201- 250 4172983             1401-1500    57952
                251- 300 4045536             1501-1600    39581
                301- 350 3662444             1601-1700    28778
                351- 400 2738225             1701-1800    21648
                401- 450 2379339             1801-1900    17548
                451- 500 1948639             1901-2000    14786
                501- 550 1246753             2001-2100    11923
                551- 600  962056             2101-2200    12091
                601- 650  702351             2201-2300     9309
                651- 700  552722             2301-2400     7474
                701- 750  459727             2401-2500     6605
                751- 800  395879             >2500        50761
                801- 850  308222
                851- 900  275494
                901- 950  189152
                951-1000  133154



   The average sequence length in UniProtKB/TrEMBL is   318 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    50749561                1.19                                                    
   Submitted to EMBL/GenBank/DDBJ  30016028  28188723      0.70                                                    
   Journal                         18960414  17938735      0.44                                                    
   Submitted to other databases     1756014   1745086      0.04                                                    
   Thesis                             10355     10297     <0.01                                                    
   Book citation                       6749      6699     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 476409


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      58956525                1.38                                                    
   CATALYTIC ACTIVITY               4641945   4194295      0.11     4                                              
   CAUTION                         24680360  24660034      0.58     1                                              
   COFACTOR                         1867584   1733466      0.04     8                                              
   DOMAIN                            197544    189952     <0.01     9                                              
   ENZYME REGULATION                  55788     55788     <0.01    11                                              
   FUNCTION                         5209978   4929003      0.12     3                                              
   INTERACTION                         1262      1262     <0.01    12                                              
   MISCELLANEOUS                     125316    125120     <0.01    10                                              
   PATHWAY                          2337778   2126819      0.05     7                                              
   SIMILARITY                      13013817  11329028      0.30     2                                              
   SUBCELLULAR LOCATION             4114352   3963753      0.10     5                                              
   SUBUNIT                          2710801   2686049      0.06     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      24345276                0.57                                                    
   ACT_SITE                         1735575   1067948      0.04     5                                              
   BINDING                          3656439    955064      0.09     2                                              
   CARBOHYD                             353       138     <0.01    28                                              
   CHAIN                             867813    708743      0.02     8                                              
   COILED                             64458     35295     <0.01    17                                              
   COMPBIAS                           10895     10895     <0.01    22                                              
   CROSSLNK                           10106      6751     <0.01    23                                              
   DISULFID                           85239     65565     <0.01    15                                              
   DNA_BIND                           51908     47617     <0.01    19                                              
   DOMAIN                            667148    517796      0.02    10                                              
   INIT_MET                           12646     12646     <0.01    21                                              
   INTRAMEM                             385        55     <0.01    27                                              
   LIPID                              63866     31933     <0.01    18                                              
   METAL                            3472915    896396      0.08     3                                              
   MOD_RES                           287205    258431      0.01    13                                              
   MOTIF                             195807    118437     <0.01    14                                              
   NON_STD                             1855      1676     <0.01    25                                              
   NON_TER                          6880692   4421808      0.16     1                                              
   NP_BIND                          1312418    785644      0.03     6                                              
   PEPTIDE                               34        34     <0.01    29                                              
   PROPEP                              4604      4604     <0.01    24                                              
   REGION                           1118492    617220      0.03     7                                              
   REPEAT                             50440     12156     <0.01    20                                              
   SIGNAL                            703463    700186      0.02     9                                              
   SITE                              397122    231130      0.01    11                                              
   TOPO_DOM                          296514     60297      0.01    12                                              
   TRANSIT                             1440      1440     <0.01    26                                              
   TRANSMEM                         2328592    407596      0.05     4                                              
   ZN_FING                            66852     60240     <0.01    16                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             457803165               10.69                                                    
   Allergome                           3478      2845     <0.01    84   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   102   Organism-specific databases                
   ArrayExpress                      185650    185650     <0.01    45   Gene expression databases                  
   BRENDA                              2642      2614     <0.01    86   Enzyme and pathway databases               
   Bgee                               99589     99589     <0.01    51   Gene expression databases                  
   BindingDB                           5825      5825     <0.01    77   Other                                      
   BioCyc                           5639949   5572624      0.13    18   Enzyme and pathway databases               
   CAZy                               74011     69538     <0.01    55   Protein family/group databases             
   CGD                                 7033      7033     <0.01    76   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   107   2D gel databases                           
   CTD                               351972    350648      0.01    38   Organism-specific databases                
   ChEMBL                               606       606     <0.01    94   Other                                      
   ChiTaRS                            65575     65575     <0.01    56   Other                                      
   ConoServer                           160       160     <0.01    99   Organism-specific databases                
   DIP                                 2873      2868     <0.01    85   Protein-protein interaction databases      
   DNASU                              42308     41974     <0.01    62   Protocols and materials databases          
   EMBL                            46113664  41791436      1.08     3   Sequence databases                         
   Ensembl                          1014023    999501      0.02    29   Genome annotation databases                
   EnsemblBacteria                 17859986  17586110      0.42     8   Genome annotation databases                
   EnsemblFungi                      372634    370469      0.01    37   Genome annotation databases                
   EnsemblMetazoa                    693448    677918      0.02    33   Genome annotation databases                
   EnsemblPlants                     654086    620584      0.02    34   Genome annotation databases                
   EnsemblProtists                   156283    153887     <0.01    47   Genome annotation databases                
   EuPathDB                          147096    147094     <0.01    49   Organism-specific databases                
   EvolutionaryTrace                   8045      8045     <0.01    74   Other                                      
   FlyBase                           196093    194626     <0.01    43   Organism-specific databases                
   GO                              73053695  23628003      1.71     2   Ontologies                                 
   Gene3D                          18982525  14964234      0.44     7   Family and domain databases                
   GeneID                           9877826   9621471      0.23    12   Genome annotation databases                
   GeneTree                          900564    900506      0.02    31   Phylogenomic databases                     
   Genevestigator                     86308     86303     <0.01    52   Gene expression databases                  
   GenoList                           14732     14459     <0.01    71   Organism-specific databases                
   GenomeRNAi                         19362     19362     <0.01    69   Other                                      
   Gramene                           204041    204041     <0.01    42   Organism-specific databases                
   H-InvDB                              611       464     <0.01    93   Organism-specific databases                
   HAMAP                            4676484   4616658      0.11    20   Family and domain databases                
   HGNC                               47308     47231     <0.01    59   Organism-specific databases                
   HOGENOM                          3653902   3653857      0.09    23   Phylogenomic databases                     
   HOVERGEN                          305195    305184      0.01    39   Phylogenomic databases                     
   IPI                               279279    278387      0.01    40   Sequence databases                         
   InParanoid                        186428    186428     <0.01    44   Phylogenomic databases                     
   IntAct                             12340     12340     <0.01    72   Protein-protein interaction databases      
   InterPro                        91953594  32259129      2.15     1   Family and domain databases                
   KEGG                             8939612   8718349      0.21    14   Genome annotation databases                
   KO                               3718574   3700849      0.09    22   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    79   Organism-specific databases                
   Leproma                             1272      1270     <0.01    88   Organism-specific databases                
   MEROPS                            138706    138705     <0.01    50   Protein family/group databases             
   MGI                                52364     51864     <0.01    58   Organism-specific databases                
   MIM                                    4         4     <0.01   108   Organism-specific databases                
   MINT                               10254     10253     <0.01    73   Protein-protein interaction databases      
   NextBio                           208199    208187     <0.01    41   Other                                      
   OGP                                    3         3     <0.01   109   2D gel databases                           
   OMA                              4858165   4857948      0.11    19   Phylogenomic databases                     
   OrthoDB                           553137    553094      0.01    35   Phylogenomic databases                     
   PANTHER                          6134456   5761264      0.14    16   Family and domain databases                
   PATRIC                           8281125   8281000      0.19    15   Genome annotation databases                
   PDB                                20274     11224     <0.01    67   3D structure databases                     
   PDBsum                             19943     11001     <0.01    68   3D structure databases                     
   PIR                               172301    139477     <0.01    46   Sequence databases                         
   PIRSF                            3862148   3858028      0.09    21   Family and domain databases                
   PMAP-CutDB                           209       209     <0.01    98   Other                                      
   PRIDE                             931362    931362      0.02    30   Proteomic databases                        
   PRINTS                           6116639   5484225      0.14    17   Family and domain databases                
   PROSITE                         20498431  13677251      0.48     5   Family and domain databases                
   PaxDb                              29030     29028     <0.01    64   Proteomic databases                        
   PeptideAtlas                         129       129     <0.01   100   Proteomic databases                        
   PeroxiBase                          2595      2587     <0.01    87   Protein family/group databases             
   Pfam                            41306136  30224533      0.96     4   Family and domain databases                
   PharmGKB                            3572      3572     <0.01    83   Organism-specific databases                
   PhosSite                             616       604     <0.01    92   PTM databases                              
   PhosphoSite                         1125      1125     <0.01    89   PTM databases                              
   PhylomeDB                         147513    147513     <0.01    48   Phylogenomic databases                     
   PomBase                               40        27     <0.01   103   Organism-specific databases                
   PptaseDB                              36        35     <0.01   104   Protein family/group databases             
   ProDom                            825962    796392      0.02    32   Family and domain databases                
   ProMEX                              5387      5387     <0.01    78   Proteomic databases                        
   ProtClustDB                      2719510   2719499      0.06    26   Phylogenomic databases                     
   ProteinModelPortal              10532143  10532143      0.25     9   3D structure databases                     
   PseudoCAP                           4533      4527     <0.01    80   Organism-specific databases                
   REBASE                             40133     40127     <0.01    63   Protein family/group databases             
   REPRODUCTION-2DPAGE                   66        65     <0.01   101   2D gel databases                           
   RGD                                21119     20290     <0.01    66   Organism-specific databases                
   Reactome                             228       174     <0.01    97   Enzyme and pathway databases               
   RefSeq                           9916615   9627663      0.23    11   Sequence databases                         
   SABIO-RK                             480       480     <0.01    95   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   106   Organism-specific databases                
   SMART                            9043980   6868802      0.21    13   Family and domain databases                
   SMR                              2600631   2600631      0.06    27   3D structure databases                     
   STRING                           2903825   2903756      0.07    24   Protein-protein interaction databases      
   SUPFAM                          19135150  15447562      0.45     6   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   105   2D gel databases                           
   SignaLink                           4399      4397     <0.01    82   Enzyme and pathway databases               
   TAIR                               15152     15079     <0.01    70   Organism-specific databases                
   TCDB                                4463      4455     <0.01    81   Protein family/group databases             
   TIGRFAMs                        10136144   9247943      0.24    10   Family and domain databases                
   TubercuList                         1101      1100     <0.01    90   Organism-specific databases                
   UCSC                               59397     59234     <0.01    57   Genome annotation databases                
   UniGene                           551026    521416      0.01    36   Sequence databases                         
   UniPathway                       2272596   2114725      0.05    28   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    53   Genome annotation databases                
   World-2DPAGE                         673       668     <0.01    91   2D gel databases                           
   WormBase                           42521     42348     <0.01    61   Organism-specific databases                
   Xenbase                            25592     25514     <0.01    65   Organism-specific databases                
   ZFIN                               45721     45153     <0.01    60   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    75   Organism-specific databases                
   eggNOG                           2768244   2768224      0.06    25   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    54   Organism-specific databases                
   mycoCLAP                             422       422     <0.01    96   Protein family/group databases             

Number of explicitly cross-referenced databases: 127


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.66   Gln (Q) 3.99   Leu (L) 9.95   Ser (S) 6.54
   Arg (R) 5.36   Glu (E) 6.22   Lys (K) 5.32   Thr (T) 5.55
   Asn (N) 4.11   Gly (G) 7.09   Met (M) 2.49   Trp (W) 1.29
   Asp (D) 5.33   His (H) 2.19   Phe (F) 4.05   Tyr (Y) 3.07
   Cys (C) 1.20   Ile (I) 6.09   Pro (P) 4.57   Val (V) 6.80

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.02


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 643390
Total number of entries encoded on a Plasmid: 350611
Total number of entries encoded on a Plastid: 26952
Total number of entries encoded on a Plastid; Apicoplast: 750
Total number of entries encoded on a Plastid; Chloroplast: 236998
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1059