Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Release 2013_10 of 16-Oct-2013 of UniProtKB/TrEMBL contains 44746523 sequence entries,
comprising 14225235989 amino acids.

1984369 sequences have been added since release 2013_09, the sequence data of
3624 existing entries has been updated and the annotations of
26319729 entries have been revised. This represents an increase of 5%.

Number of fragments: 4614728

Protein existence (PE):              entries      %
1: Evidence at protein level           20706     0.05%
2: Evidence at transcript level       840025     1.88%
3: Inferred from homology           10518430    23.51%
4: Predicted                        33367362    74.57%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 445145

   The first twenty species represent 1935156 sequences:   4.3 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:18442
                            2x:73728
                            3x:39343
                            4x:27704
                            5x:16723
                            6x:11630
                            7x: 8904
                            8x: 7007
                            9x: 5496
                           10x:10644
                       11- 20x:31324
                       21- 50x:10473
                       51-100x: 4106
                         >100x:13636


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     551836  Human immunodeficiency virus 1
       2     204290  uncultured bacterium
       3     114032  Homo sapiens (Human)
       4      96864  Oryza sativa subsp. japonica (Rice)
       5      89478  Hepatitis C virus
       6      73877  Glycine max (Soybean) (Glycine hispida)
       7      73054  mine drainage metagenome
       8      70493  Hordeum vulgare var. distichum (Two-rowed barley)
       9      69158  Macaca mulatta (Rhesus macaque)
      10      63878  Hepatitis B virus (HBV)
      11      60539  Zea mays (Maize)
      12      56746  Mus musculus (Mouse)
      13      56231  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      14      54895  Solanum tuberosum (Potato)
      15      54138  Vitis vinifera (Grape)
      16      52311  Danio rerio (Zebrafish) (Brachydanio rerio)
      17      50603  Trichomonas vaginalis
      18      49264  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      19      48906  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      20      44563  Populus trichocarpa (Western balsam poplar) 
      21      43221  Callithrix jacchus (White-tufted-ear marmoset)
      22      41202  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      23      40989  Arabidopsis thaliana (Mouse-ear cress)
      24      39880  Oryza sativa subsp. indica (Rice)
      25      39850  Paramecium tetraurelia
      26      39300  Setaria italica (Foxtail millet) (Panicum italicum)
      27      38798  Mustela putorius furo (European domestic ferret) (Mustela furo)
      28      38163  human gut metagenome
      29      36700  Drosophila melanogaster (Fruit fly)
      30      36598  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      31      35921  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      32      35652  Ailuropoda melanoleuca (Giant panda)
      33      35599  Emiliania huxleyi CCMP1516
      34      35207  Acyrthosiphon pisum (Pea aphid)
      35      35177  Simian immunodeficiency virus (SIV)
      36      35066  Caenorhabditis japonica
      37      34831  Physcomitrella patens subsp. patens (Moss)
      38      34570  Thalassiosira oceanica (Marine diatom)
      39      34473  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      40      33847  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      41      33660  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      42      33256  Selaginella moellendorffii (Spikemoss)
      43      32767  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      44      32342  Oryza brachyantha
      45      32242  Sus scrofa (Pig)
      46      32140  Oryza glaberrima (African rice)
      47      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      48      31850  Pan troglodytes (Chimpanzee)
      49      31389  Ricinus communis (Castor bean)
      50      31207  Capitella teleta
      51      30950  Daphnia pulex (Water flea)
      52      30712  Caenorhabditis brenneri (Nematode worm)
      53      30146  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      54      29815  Amphimedon queenslandica (Sponge)
      55      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      56      29318  Pristionchus pacificus (Parasitic nematode)
      57      29183  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      58      29054  Oikopleura dioica (Tunicate)
      59      28910  Escherichia coli
      60      28830  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      61      28825  Capsella rubella
      62      28618  Prunus persica (Peach) (Amygdalus persica)
      63      28511  Canis familiaris (Dog) (Canis lupus familiaris)
      64      28099  Gasterosteus aculeatus (Three-spined stickleback)
      65      27766  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      66      27513  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      67      27462  Equus caballus (Horse)
      68      27089  Gorilla gorilla gorilla (Lowland gorilla)
      69      26832  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      70      25972  Oryzias latipes (Medaka fish) (Japanese ricefish)
      71      25797  Loxodonta africana (African elephant)
      72      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      73      25691  Rattus norvegicus (Rat)
      74      25660  Bos taurus (Bovine)
      75      25103  Oryctolagus cuniculus (Rabbit)
      76      24904  Nematostella vectensis (Starlet sea anemone)
      77      24643  Tetrahymena thermophila (strain SB210)
      78      24590  Guillardia theta CCMP2712
      79      24208  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      80      23716  Ornithorhynchus anatinus (Duckbill platypus)
      81      23565  Oxytricha trifallax
      82      23502  Latimeria chalumnae (West Indian ocean coelacanth)
      83      23361  Helobdella robusta (Californian leech)
      84      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      85      22751  Monodelphis domestica (Gray short-tailed opossum)
      86      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      87      22555  Caenorhabditis elegans
      88      22314  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      89      22163  gut metagenome
      90      21548  Heterocephalus glaber (Naked mole rat)
      91      21346  Caenorhabditis briggsae
      92      21321  Gallus gallus (Chicken)
      93      21128  Ixodes scapularis (Black-legged tick) (Deer tick)
      94      20991  Felis catus (Cat) (Felis silvestris catus)
      95      20867  Myotis lucifugus (Little brown bat)
      96      20838  Tupaia chinensis (Chinese tree shrew)
      97      20765  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
      98      20513  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
      99      20133  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     100      20114  Ciona savignyi (Pacific transparent sea squirt)
     101      20073  Cavia porcellus (Guinea pig)
     102      20028  Camelus ferus (Wild Bactrian camel)
     103      19985  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     104      19818  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     105      19685  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     106      19551  Anolis carolinensis (Green anole) (American chameleon)
     107      19546  Pteropus alecto (Black flying fox)
     108      19520  Wuchereria bancrofti
     109      19300  Myotis brandtii (Brandt's bat)
     110      19201  Trypanosoma cruzi (strain CL Brener)
     111      19057  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     112      18957  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     113      18855  Drosophila simulans (Fruit fly)
     114      18594  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     115      18555  Bos grunniens mutus
     116      18234  Tetranychus urticae (Two-spotted spider mite)
     117      18115  Atta cephalotes (Leafcutter ant)
     118      18047  Saprolegnia diclina VS20
     119      18026  Anopheles gambiae (African malaria mosquito)
     120      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     121      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     122      17688  Bombyx mori (Silk moth)
     123      17683  Genlisea aurea
     124      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     125      17417  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     126      17333  Anas platyrhynchos (Domestic duck) (Anas boschas)
     127      17284  Nasonia vitripennis (Parasitic wasp)
     128      17090  Plasmodium falciparum
     129      17053  Tribolium castaneum (Red flour beetle)
     130      17040  Drosophila yakuba (Fruit fly)
     131      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     132      16918  Meleagris gallopavo (Common turkey)
     133      16714  Drosophila persimilis (Fruit fly)
     134      16698  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     135      16639  Fusarium oxysporum f. sp. lycopersici  
     136      16608  Rhodnius prolixus (Triatomid bug)
     137      16470  Hepatitis C virus subtype 1b
     138      16426  Ectocarpus siliculosus (Brown alga)
     139      16388  Colletotrichum gloeosporioides Cg-14
     140      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     141      16329  Danaus plexippus (Monarch butterfly)
     142      16275  Trichinella spiralis (Trichina worm)
     143      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     144      16188  Drosophila sechellia (Fruit fly)
     145      16157  Schistosoma japonicum (Blood fluke)
     146      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     147      16056  Listeria monocytogenes
     148      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     149      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     150      15716  Naegleria gruberi (Amoeba)
     151      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     152      15568  Phytophthora ramorum (Sudden oak death agent)
     153      15462  Myotis davidii (David's myotis)
     154      15422  Drosophila willistoni (Fruit fly)
     155      15371  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     156      15354  Loa loa (Eye worm) (Filaria loa)
     157      15345  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     158      15225  Pythium ultimum
     159      15192  Klebsiella pneumoniae
     160      15144  Drosophila ananassae (Fruit fly)
     161      15057  Pararge aegeria (specked wood butterfly)
     162      15042  Harpegnathos saltator (Jerdon's jumping ant)
     163      15011  Strigamia maritima (European centipede) (Geophilus maritimus)
     164      14942  Acanthamoeba castellanii str. Neff
     165      14927  Drosophila erecta (Fruit fly)
     166      14910  Dendroctonus ponderosae (mountain pine beetle)
     167      14861  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     168      14801  Camponotus floridanus (Florida carpenter ant)
     169      14794  Drosophila mojavensis (Fruit fly)
     170      14792  Fusarium fujikuroi IMI 58289
     171      14713  Plasmodium chabaudi
     172      14707  Drosophila virilis (Fruit fly)
     173      14654  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     174      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     175      14606  uncultured archaeon
     176      14542  Rabies virus
     177      14535  Angomonas deanei
     178      14417  Volvox carteri (Green alga)
     179      14341  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     180      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     181      14235  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     182      14147  Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
     183      13970  Acromyrmex echinatior (Panamanian leafcutter ant) 
     184      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     185      13876  Clonorchis sinensis (Chinese liver fluke)
     186      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     187      13806  Fomitopsis pinicola (strain FP-58527) (Brown rot fungus)
     188      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     189      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     190      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     191      13624  Trypanosoma cruzi
     192      13408  Hepatitis C virus subtype 1a
     193      13345  Aspergillus flavus 
     194      13329  Colletotrichum orbiculare   
     195      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     196      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     197      13110  Petromyzon marinus (Sea lamprey)
     198      13082  Glarea lozoyensis ATCC 20868
     199      13062  Mycosphaerella fijiensis (strain CIRAD86) (Black leaf streak disease fungus) 
     200      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     201      12983  Albugo laibachii Nc14
     202      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     203      12950  Stigmatella aurantiaca (strain DW4/3-1)
     204      12943  Porcine reproductive and respiratory syndrome virus (PRRSV)
     205      12900  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     206      12856  Cochliobolus heterostrophus (strain C5 / ATCC 48332 / race O)  
     207      12846  Magnaporthe oryzae (strain Y34) (Rice blast fungus) (Pyricularia oryzae)
     208      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     209      12711  Magnaporthe oryzae (strain P131) (Rice blast fungus) (Pyricularia oryzae)
     210      12703  Cochliobolus heterostrophus (strain C4 / ATCC 48331 / race T)  
     211      12697  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     212      12696  Trypanosoma congolense (strain IL3000)
     213      12685  Schistosoma mansoni (Blood fluke)
     214      12627  Xenopus laevis (African clawed frog)
     215      12586  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     216      12473  Helicobacter pylori (Campylobacter pylori)
     217      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     218      12440  Polysphondylium pallidum (Cellular slime mold)
     219      12414  Mycosphaerella pini (strain NZE10 / CBS 128990) (Red band needle blight fungus) 
     220      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     221      12352  Dictyostelium purpureum (Slime mold)
     222      12197  Thanatephorus cucumeris (strain AG1-IB / isolate 7/3/14)  
     223      12174  Cochliobolus sativus (strain ND90Pr / ATCC 201652)  
     224      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     225      12143  Mucor circinelloides f. circinelloides (strain 1006PhL) (Mucormycosis agent) 
     226      12078  Ceriporiopsis subvermispora (strain B) (White-rot fungus)
     227      12015  Apis mellifera (Honeybee)
     228      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     229      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     230      11939  Emericella nidulans  
     231      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     232      11780  Piriformospora indica (strain DSM 11827)
     233      11752  Chondrocladia sp. SMF<DEU
     234      11751  Cladorhiza sp. SMF<DEU
     235      11750  Abyssocladia sp. SMF<DEU
     236      11735  Gloeophyllum trabeum (strain ATCC 11539 / FP-39264 / Madison 617) 
     237      11726  Phelloderma sp. SMF<DEU
     238      11719  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     239      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     240      11687  Setosphaeria turcica (strain 28A) (Northern leaf blight fungus) 
     241      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     242      11682  Eutypa lata (strain UCR-EL1) (Grapevine dieback disease fungus) 
     243      11679  Anopheles darlingi (Mosquito)
     244      11639  Plasmodium berghei (strain Anka)
     245      11603  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     246      11567  Trichoplax adhaerens (Trichoplax reptans)
     247      11557  Trypanosoma vivax (strain Y486)
     248      11518  Aureococcus anophagefferens (Harmful bloom alga)
     249      11515  Puccinia triticina (isolate 1-1 / race 1 (BBBD)) (Brown leaf rust fungus)
     250      11500  Megaselia scalaris (Humpbacked fly) (Phora scalaris)


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          714272 (  2%)
    Bacteria       33475341 ( 75%)
    Eukaryota       8582819 ( 19%)
    Viruses         1814774 (  4%)
    Other            159316 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 114072 (  1%)           (  0%)
     Other Mammalia       1016711 ( 12%)           (  2%)
     Other Vertebrata      867669 ( 10%)           (  2%)
     Viridiplantae        1711583 ( 20%)           (  4%)
     Fungi                2093859 ( 24%)           (  5%)
     Insecta               910431 ( 11%)           (  2%)
     Nematoda              254230 (  3%)           (  1%)
     Other                1614264 ( 19%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1200676             1001-1100   243398
                 51- 100 3991828             1101-1200   168448
                101- 150 4460887             1201-1300   121553
                151- 200 4324299             1301-1400    72818
                201- 250 4368786             1401-1500    60317
                251- 300 4236136             1501-1600    41102
                301- 350 3831040             1601-1700    29962
                351- 400 2856444             1701-1800    22557
                401- 450 2489131             1801-1900    18207
                451- 500 2030442             1901-2000    15427
                501- 550 1299878             2001-2100    12439
                551- 600 1003303             2101-2200    12509
                601- 650  732211             2201-2300     9659
                651- 700  576588             2301-2400     7784
                701- 750  478539             2401-2500     6842
                751- 800  412344             >2500        52657
                801- 850  321392
                851- 900  286952
                901- 950  197206
                951-1000  138034



   The average sequence length in UniProtKB/TrEMBL is   317 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    53137101                1.19                                                    
   Submitted to EMBL/GenBank/DDBJ  31435357  29565329      0.70                                                    
   Journal                         19811909  18786903      0.44                                                    
   Submitted to other databases     1872698   1861254      0.04                                                    
   Thesis                             10349     10291     <0.01                                                    
   Book citation                       6787      6737     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 481775


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      64476130                1.44                                                    
   CATALYTIC ACTIVITY               4946853   4531940      0.11     4                                              
   CAUTION                         26006088  25984927      0.58     1                                              
   COFACTOR                         2052386   1905996      0.05     8                                              
   DOMAIN                            215374    206441     <0.01     9                                              
   ENZYME REGULATION                  60308     60308     <0.01    11                                              
   FUNCTION                         5634679   5333490      0.13     3                                              
   INTERACTION                         1260      1260     <0.01    12                                              
   MISCELLANEOUS                     134747    134549     <0.01    10                                              
   PATHWAY                          2529775   2291953      0.06     7                                              
   SIMILARITY                      15534204  12066062      0.35     2                                              
   SUBCELLULAR LOCATION             4420390   4258932      0.10     5                                              
   SUBUNIT                          2940066   2914905      0.07     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      26314715                0.59                                                    
   ACT_SITE                         1929507   1187792      0.04     5                                              
   BINDING                          4083682   1068731      0.09     2                                              
   CARBOHYD                             352       137     <0.01    28                                              
   CHAIN                             868064    709107      0.02     8                                              
   COILED                             69579     38131     <0.01    18                                              
   COMPBIAS                           11871     11871     <0.01    22                                              
   CROSSLNK                           10514      7075     <0.01    23                                              
   DISULFID                           95402     74242     <0.01    15                                              
   DNA_BIND                           56642     52001     <0.01    19                                              
   DOMAIN                            727121    563496      0.02     9                                              
   INIT_MET                           13644     13644     <0.01    21                                              
   INTRAMEM                             385        55     <0.01    27                                              
   LIPID                              70040     35020     <0.01    17                                              
   METAL                            3867392   1005074      0.09     3                                              
   MOD_RES                           315964    285339      0.01    13                                              
   MOTIF                             216042    130696     <0.01    14                                              
   NON_STD                             1839      1687     <0.01    25                                              
   NON_TER                          7151102   4616582      0.16     1                                              
   NP_BIND                          1450024    867813      0.03     6                                              
   PEPTIDE                               73        73     <0.01    29                                              
   PROPEP                              5038      5038     <0.01    24                                              
   REGION                           1249677    693206      0.03     7                                              
   REPEAT                             54701     13149     <0.01    20                                              
   SIGNAL                            711457    708096      0.02    10                                              
   SITE                              433982    251820      0.01    11                                              
   TOPO_DOM                          318564     64851      0.01    12                                              
   TRANSIT                             1261      1261     <0.01    26                                              
   TRANSMEM                         2526862    445483      0.06     4                                              
   ZN_FING                            73934     66709     <0.01    16                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             492723458               11.01                                                    
   Allergome                           3747      3110     <0.01    84   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   103   Organism-specific databases                
   ArrayExpress                      199571    199571     <0.01    44   Gene expression databases                  
   BRENDA                              2629      2601     <0.01    87   Enzyme and pathway databases               
   Bgee                               99456     99456     <0.01    51   Gene expression databases                  
   BindingDB                           5816      5816     <0.01    78   Other                                      
   BioCyc                           5639769   5572454      0.13    19   Enzyme and pathway databases               
   CAZy                               73987     69514     <0.01    55   Protein family/group databases             
   CGD                                 7031      7031     <0.01    77   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   108   2D gel databases                           
   CTD                               360920    359603      0.01    38   Organism-specific databases                
   ChEMBL                               604       604     <0.01    95   Other                                      
   ChiTaRS                            65549     65549     <0.01    56   Other                                      
   ConoServer                           160       160     <0.01   100   Organism-specific databases                
   DIP                                 2954      2949     <0.01    86   Protein-protein interaction databases      
   DNASU                              42290     41956     <0.01    62   Protocols and materials databases          
   EMBL                            48032002  43625475      1.07     3   Sequence databases                         
   Ensembl                          1014851    999364      0.02    30   Genome annotation databases                
   EnsemblBacteria                 17842812  17569379      0.40     8   Genome annotation databases                
   EnsemblFungi                      676637    370445      0.02    35   Genome annotation databases                
   EnsemblMetazoa                   1261191    786816      0.03    29   Genome annotation databases                
   EnsemblPlants                     876600    653693      0.02    34   Genome annotation databases                
   EnsemblProtists                   305908    192874      0.01    39   Genome annotation databases                
   EuPathDB                          142862    142860     <0.01    50   Organism-specific databases                
   EvolutionaryTrace                   8011      8011     <0.01    75   Other                                      
   FlyBase                           199072    197601     <0.01    45   Organism-specific databases                
   GO                              81772033  26180237      1.83     2   Ontologies                                 
   Gene3D                          20277709  15987708      0.45     7   Family and domain databases                
   GeneID                          10154696   9887731      0.23    12   Genome annotation databases                
   GeneTree                          900471    900413      0.02    32   Phylogenomic databases                     
   Genevestigator                     86053     86046     <0.01    52   Gene expression databases                  
   GenoList                           14730     14457     <0.01    72   Organism-specific databases                
   GenomeRNAi                         19334     19334     <0.01    70   Other                                      
   Gramene                           204001    204001     <0.01    43   Organism-specific databases                
   H-InvDB                              610       463     <0.01    94   Organism-specific databases                
   HAMAP                            4671421   4611770      0.10    20   Family and domain databases                
   HGNC                               47275     47199     <0.01    59   Organism-specific databases                
   HOGENOM                          3653566   3653521      0.08    23   Phylogenomic databases                     
   HOVERGEN                          305090    305079      0.01    40   Phylogenomic databases                     
   IPI                               278853    277962      0.01    41   Sequence databases                         
   InParanoid                        186268    186268     <0.01    46   Phylogenomic databases                     
   IntAct                             12135     12135     <0.01    73   Protein-protein interaction databases      
   InterPro                        99207993  34792462      2.22     1   Family and domain databases                
   KEGG                             9279763   9056808      0.21    14   Genome annotation databases                
   KO                               3781731   3763500      0.08    22   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    80   Organism-specific databases                
   Leproma                             1272      1270     <0.01    89   Organism-specific databases                
   MEROPS                            179811    179811     <0.01    47   Protein family/group databases             
   MGI                                52057     51631     <0.01    58   Organism-specific databases                
   MIM                                    4         4     <0.01   109   Organism-specific databases                
   MINT                               10242     10241     <0.01    74   Protein-protein interaction databases      
   NextBio                           207984    207980     <0.01    42   Other                                      
   OGP                                    3         3     <0.01   110   2D gel databases                           
   OMA                              6334143   6334143      0.14    17   Phylogenomic databases                     
   OrthoDB                           553050    553007      0.01    37   Phylogenomic databases                     
   PANTHER                          6163168   5874980      0.14    18   Family and domain databases                
   PATRIC                           8278595   8278470      0.19    15   Genome annotation databases                
   PDB                                20178     11179     <0.01    69   3D structure databases                     
   PDBsum                             21294     11605     <0.01    67   3D structure databases                     
   PIR                               172218    139354     <0.01    48   Sequence databases                         
   PIRSF                            4300013   4266351      0.10    21   Family and domain databases                
   PMAP-CutDB                           209       209     <0.01    99   Other                                      
   PRIDE                             945370    945370      0.02    31   Proteomic databases                        
   PRINTS                           6493801   5850454      0.15    16   Family and domain databases                
   PRO                                27312     27312     <0.01    65   Other                                      
   PROSITE                         21877969  14639313      0.49     5   Family and domain databases                
   PaxDb                              28993     28991     <0.01    64   Proteomic databases                        
   PeptideAtlas                         129       129     <0.01   101   Proteomic databases                        
   PeroxiBase                          2595      2587     <0.01    88   Protein family/group databases             
   Pfam                            44533348  32534229      1.00     4   Family and domain databases                
   PharmGKB                            3565      3565     <0.01    85   Organism-specific databases                
   PhosSite                             694       682     <0.01    92   PTM databases                              
   PhosphoSite                         1110      1110     <0.01    90   PTM databases                              
   PhylomeDB                         146434    146434     <0.01    49   Phylogenomic databases                     
   PomBase                               40        27     <0.01   104   Organism-specific databases                
   PptaseDB                              36        35     <0.01   105   Protein family/group databases             
   ProDom                            886137    855201      0.02    33   Family and domain databases                
   ProMEX                              5523      5523     <0.01    79   Proteomic databases                        
   ProtClustDB                      2717876   2717876      0.06    27   Phylogenomic databases                     
   ProteinModelPortal              12081160  12081160      0.27     9   3D structure databases                     
   PseudoCAP                           4529      4523     <0.01    82   Organism-specific databases                
   REBASE                             41461     41459     <0.01    63   Protein family/group databases             
   REPRODUCTION-2DPAGE                   66        65     <0.01   102   2D gel databases                           
   RGD                                21147     20254     <0.01    68   Organism-specific databases                
   Reactome                             242       185     <0.01    98   Enzyme and pathway databases               
   RefSeq                          10215395   9889092      0.23    11   Sequence databases                         
   SABIO-RK                             531       531     <0.01    96   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   107   Organism-specific databases                
   SMART                            9685761   7365481      0.22    13   Family and domain databases                
   SMR                              3517807   3517807      0.08    24   3D structure databases                     
   STRING                           2903591   2903522      0.06    25   Protein-protein interaction databases      
   SUPFAM                          21391866  17202908      0.48     6   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   106   2D gel databases                           
   SignaLink                           4398      4396     <0.01    83   Enzyme and pathway databases               
   TAIR                               15038     14965     <0.01    71   Organism-specific databases                
   TCDB                                4926      4919     <0.01    81   Protein family/group databases             
   TIGRFAMs                        11016497  10046296      0.25    10   Family and domain databases                
   TubercuList                         1094      1093     <0.01    91   Organism-specific databases                
   UCSC                               59272     59124     <0.01    57   Genome annotation databases                
   UniGene                           565516    535272      0.01    36   Sequence databases                         
   UniPathway                       2461628   2289315      0.06    28   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    53   Genome annotation databases                
   World-2DPAGE                         672       667     <0.01    93   2D gel databases                           
   WormBase                           42479     42306     <0.01    61   Organism-specific databases                
   Xenbase                            25503     25448     <0.01    66   Organism-specific databases                
   ZFIN                               44333     44333     <0.01    60   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    76   Organism-specific databases                
   eggNOG                           2768030   2768010      0.06    26   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    54   Organism-specific databases                
   mycoCLAP                             423       422     <0.01    97   Protein family/group databases             

Number of explicitly cross-referenced databases: 128


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.66   Gln (Q) 3.99   Leu (L) 9.96   Ser (S) 6.55
   Arg (R) 5.35   Glu (E) 6.22   Lys (K) 5.33   Thr (T) 5.55
   Asn (N) 4.11   Gly (G) 7.08   Met (M) 2.49   Trp (W) 1.28
   Asp (D) 5.34   His (H) 2.19   Phe (F) 4.05   Tyr (Y) 3.08
   Cys (C) 1.19   Ile (I) 6.10   Pro (P) 4.56   Val (V) 6.80

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.02


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 670500
Total number of entries encoded on a Plasmid: 361559
Total number of entries encoded on a Plastid: 27943
Total number of entries encoded on a Plastid; Apicoplast: 812
Total number of entries encoded on a Plastid; Chloroplast: 241665
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1222