Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Release 2013_11 of 13-Nov-2013 of UniProtKB/TrEMBL contains 48180424 sequence entries,
comprising 15282737498 amino acids.

3500305 sequences have been added since release 2013_10, the sequence data of
24395 existing entries has been updated and the annotations of
21817495 entries have been revised. This represents an increase of 8%.

Number of fragments: 4732694

Protein existence (PE):              entries      %
1: Evidence at protein level           21139     0.04%
2: Evidence at transcript level       876852     1.82%
3: Inferred from homology           10835741    22.49%
4: Predicted                        36446692    75.65%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 451756

   The first twenty species represent 1955462 sequences:   4.1 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:18601
                            2x:74776
                            3x:40156
                            4x:28217
                            5x:16968
                            6x:11827
                            7x: 8997
                            8x: 7065
                            9x: 5557
                           10x:10698
                       11- 20x:32100
                       21- 50x:10675
                       51-100x: 4220
                         >100x:14481


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     559432  Human immunodeficiency virus 1
       2     205559  uncultured bacterium
       3     114401  Homo sapiens (Human)
       4      96859  Oryza sativa subsp. japonica (Rice)
       5      89642  Hepatitis C virus
       6      73882  Glycine max (Soybean) (Glycine hispida)
       7      73054  mine drainage metagenome
       8      70511  Hordeum vulgare var. distichum (Two-rowed barley)
       9      69188  Macaca mulatta (Rhesus macaque)
      10      64190  Hepatitis B virus (HBV)
      11      60537  Zea mays (Maize)
      12      56728  Mus musculus (Mouse)
      13      56232  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      14      54979  Callithrix jacchus (White-tufted-ear marmoset)
      15      54899  Solanum tuberosum (Potato)
      16      54141  Vitis vinifera (Grape)
      17      52454  Danio rerio (Zebrafish) (Brachydanio rerio)
      18      50603  Trichomonas vaginalis
      19      49265  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      20      48906  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      21      44567  Populus trichocarpa (Western balsam poplar) 
      22      41202  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      23      40894  Arabidopsis thaliana (Mouse-ear cress)
      24      39893  Oryza sativa subsp. indica (Rice)
      25      39850  Paramecium tetraurelia
      26      39300  Setaria italica (Foxtail millet) (Panicum italicum)
      27      38798  Mustela putorius furo (European domestic ferret) (Mustela furo)
      28      38163  human gut metagenome
      29      36753  Drosophila melanogaster (Fruit fly)
      30      36598  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      31      36444  Simian immunodeficiency virus (SIV)
      32      35922  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      33      35652  Ailuropoda melanoleuca (Giant panda)
      34      35599  Emiliania huxleyi CCMP1516
      35      35208  Acyrthosiphon pisum (Pea aphid)
      36      35066  Caenorhabditis japonica
      37      34832  Physcomitrella patens subsp. patens (Moss)
      38      34570  Thalassiosira oceanica (Marine diatom)
      39      34474  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      40      33850  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      41      33663  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      42      33256  Selaginella moellendorffii (Spikemoss)
      43      32767  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      44      32342  Oryza brachyantha
      45      32313  Sus scrofa (Pig)
      46      32141  Oryza glaberrima (African rice)
      47      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      48      31849  Pan troglodytes (Chimpanzee)
      49      31802  Anas platyrhynchos (Domestic duck) (Anas boschas)
      50      31389  Ricinus communis (Castor bean)
      51      31207  Capitella teleta
      52      30954  Daphnia pulex (Water flea)
      53      30712  Caenorhabditis brenneri (Nematode worm)
      54      30146  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      55      29815  Amphimedon queenslandica (Sponge)
      56      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      57      29318  Pristionchus pacificus (Parasitic nematode)
      58      29230  Escherichia coli
      59      29183  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      60      29054  Oikopleura dioica (Tunicate)
      61      28830  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      62      28825  Capsella rubella
      63      28626  Prunus persica (Peach) (Amygdalus persica)
      64      28518  Canis familiaris (Dog) (Canis lupus familiaris)
      65      28099  Gasterosteus aculeatus (Three-spined stickleback)
      66      27767  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      67      27516  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      68      27468  Equus caballus (Horse)
      69      27089  Gorilla gorilla gorilla (Lowland gorilla)
      70      26834  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      71      25974  Oryzias latipes (Medaka fish) (Japanese ricefish)
      72      25797  Loxodonta africana (African elephant)
      73      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      74      25689  Rattus norvegicus (Rat)
      75      25679  Bos taurus (Bovine)
      76      24904  Nematostella vectensis (Starlet sea anemone)
      77      24643  Tetrahymena thermophila (strain SB210)
      78      24590  Guillardia theta CCMP2712
      79      24209  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      80      23716  Ornithorhynchus anatinus (Duckbill platypus)
      81      23565  Oxytricha trifallax
      82      23502  Latimeria chalumnae (West Indian ocean coelacanth)
      83      23361  Helobdella robusta (Californian leech)
      84      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      85      22751  Monodelphis domestica (Gray short-tailed opossum)
      86      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      87      22549  Caenorhabditis elegans
      88      22313  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      89      22163  gut metagenome
      90      21893  Oryctolagus cuniculus (Rabbit)
      91      21547  Heterocephalus glaber (Naked mole rat)
      92      21415  Gallus gallus (Chicken)
      93      21346  Caenorhabditis briggsae
      94      21128  Ixodes scapularis (Black-legged tick) (Deer tick)
      95      20996  Felis catus (Cat) (Felis silvestris catus)
      96      20867  Myotis lucifugus (Little brown bat)
      97      20838  Tupaia chinensis (Chinese tree shrew)
      98      20768  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
      99      20513  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     100      20133  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     101      20114  Ciona savignyi (Pacific transparent sea squirt)
     102      20073  Cavia porcellus (Guinea pig)
     103      20028  Camelus ferus (Wild Bactrian camel)
     104      19992  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     105      19818  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     106      19686  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     107      19553  Anolis carolinensis (Green anole) (American chameleon)
     108      19546  Pteropus alecto (Black flying fox)
     109      19520  Wuchereria bancrofti
     110      19300  Myotis brandtii (Brandt's bat)
     111      19201  Trypanosoma cruzi (strain CL Brener)
     112      19058  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     113      18957  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     114      18857  Drosophila simulans (Fruit fly)
     115      18597  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     116      18555  Bos grunniens mutus
     117      18477  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     118      18243  Tetranychus urticae (Two-spotted spider mite)
     119      18113  Atta cephalotes (Leafcutter ant)
     120      18047  Saprolegnia diclina VS20
     121      18026  Anopheles gambiae (African malaria mosquito)
     122      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     123      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     124      17696  Bombyx mori (Silk moth)
     125      17683  Genlisea aurea
     126      17607  Hepatitis C virus subtype 1b
     127      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     128      17420  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     129      17284  Nasonia vitripennis (Parasitic wasp)
     130      17191  Plasmodium falciparum
     131      17053  Tribolium castaneum (Red flour beetle)
     132      17040  Drosophila yakuba (Fruit fly)
     133      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     134      16919  Meleagris gallopavo (Common turkey)
     135      16714  Drosophila persimilis (Fruit fly)
     136      16698  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     137      16639  Fusarium oxysporum f. sp. lycopersici  
     138      16608  Rhodnius prolixus (Triatomid bug)
     139      16426  Ectocarpus siliculosus (Brown alga)
     140      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     141      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     142      16329  Danaus plexippus (Monarch butterfly)
     143      16275  Trichinella spiralis (Trichina worm)
     144      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     145      16189  Drosophila sechellia (Fruit fly)
     146      16158  Schistosoma japonicum (Blood fluke)
     147      16148  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     148      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     149      16076  Listeria monocytogenes
     150      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     151      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     152      15716  Naegleria gruberi (Amoeba)
     153      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     154      15568  Phytophthora ramorum (Sudden oak death agent)
     155      15465  Myotis davidii (David's myotis)
     156      15422  Drosophila willistoni (Fruit fly)
     157      15371  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     158      15354  Loa loa (Eye worm) (Filaria loa)
     159      15345  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     160      15314  uncultured archaeon
     161      15228  Pythium ultimum
     162      15203  Klebsiella pneumoniae
     163      15144  Drosophila ananassae (Fruit fly)
     164      15057  Pararge aegeria (specked wood butterfly)
     165      15042  Harpegnathos saltator (Jerdon's jumping ant)
     166      15011  Strigamia maritima (European centipede) (Geophilus maritimus)
     167      14942  Acanthamoeba castellanii str. Neff
     168      14927  Drosophila erecta (Fruit fly)
     169      14911  Dendroctonus ponderosae (mountain pine beetle)
     170      14861  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     171      14801  Camponotus floridanus (Florida carpenter ant)
     172      14794  Drosophila mojavensis (Fruit fly)
     173      14792  Gibberella fujikuroi (strain CBS 195.34 / IMI 58289 / NRRL A-6831)  
     174      14713  Plasmodium chabaudi
     175      14707  Drosophila virilis (Fruit fly)
     176      14666  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     177      14646  Rabies virus
     178      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     179      14562  Angomonas deanei
     180      14417  Volvox carteri (Green alga)
     181      14346  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     182      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     183      14235  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     184      14147  Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
     185      13970  Acromyrmex echinatior (Panamanian leafcutter ant) 
     186      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     187      13878  Clonorchis sinensis (Chinese liver fluke)
     188      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     189      13806  Fomitopsis pinicola (strain FP-58527) (Brown rot fungus)
     190      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     191      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     192      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     193      13626  Trypanosoma cruzi
     194      13410  Hepatitis C virus subtype 1a
     195      13345  Aspergillus flavus 
     196      13329  Colletotrichum orbiculare   
     197      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     198      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     199      13114  Petromyzon marinus (Sea lamprey)
     200      13082  Glarea lozoyensis (strain ATCC 20868 / MF5171)
     201      13062  Mycosphaerella fijiensis (strain CIRAD86) (Black leaf streak disease fungus) 
     202      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     203      12983  Albugo laibachii Nc14
     204      12967  Porcine reproductive and respiratory syndrome virus (PRRSV)
     205      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     206      12950  Stigmatella aurantiaca (strain DW4/3-1)
     207      12900  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     208      12856  Cochliobolus heterostrophus (strain C5 / ATCC 48332 / race O)  
     209      12846  Magnaporthe oryzae (strain Y34) (Rice blast fungus) (Pyricularia oryzae)
     210      12746  Schistosoma mansoni (Blood fluke)
     211      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     212      12711  Magnaporthe oryzae (strain P131) (Rice blast fungus) (Pyricularia oryzae)
     213      12703  Cochliobolus heterostrophus (strain C4 / ATCC 48331 / race T)  
     214      12697  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     215      12696  Trypanosoma congolense (strain IL3000)
     216      12641  Helicobacter pylori (Campylobacter pylori)
     217      12628  Xenopus laevis (African clawed frog)
     218      12586  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     219      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     220      12440  Polysphondylium pallidum (Cellular slime mold)
     221      12414  Mycosphaerella pini (strain NZE10 / CBS 128990) (Red band needle blight fungus) 
     222      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     223      12352  Dictyostelium purpureum (Slime mold)
     224      12300  Enterococcus gallinarum EGD-AAK12
     225      12197  Thanatephorus cucumeris (strain AG1-IB / isolate 7/3/14)  
     226      12174  Cochliobolus sativus (strain ND90Pr / ATCC 201652)  
     227      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     228      12143  Mucor circinelloides f. circinelloides (strain 1006PhL) (Mucormycosis agent) 
     229      12078  Ceriporiopsis subvermispora (strain B) (White-rot fungus)
     230      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     231      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     232      11990  Apis mellifera (Honeybee)
     233      11939  Emericella nidulans  
     234      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     235      11780  Piriformospora indica (strain DSM 11827)
     236      11752  Chondrocladia sp. SMF<DEU
     237      11751  Cladorhiza sp. SMF<DEU
     238      11750  Abyssocladia sp. SMF<DEU
     239      11735  Gloeophyllum trabeum (strain ATCC 11539 / FP-39264 / Madison 617) 
     240      11726  Phelloderma sp. SMF<DEU
     241      11719  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     242      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     243      11687  Setosphaeria turcica (strain 28A) (Northern leaf blight fungus) 
     244      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     245      11682  Eutypa lata (strain UCR-EL1) (Grapevine dieback disease fungus) 
     246      11679  Anopheles darlingi (Mosquito)
     247      11639  Plasmodium berghei (strain Anka)
     248      11603  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     249      11567  Trichoplax adhaerens (Trichoplax reptans)
     250      11557  Trypanosoma vivax (strain Y486)


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          745991 (  2%)
    Bacteria       36728167 ( 76%)
    Eukaryota       8698331 ( 18%)
    Viruses         1844284 (  4%)
    Other            163650 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 114441 (  1%)           (  0%)
     Other Mammalia       1026596 ( 12%)           (  2%)
     Other Vertebrata      910096 ( 10%)           (  2%)
     Viridiplantae        1721356 ( 20%)           (  4%)
     Fungi                2116794 ( 24%)           (  4%)
     Insecta               921030 ( 11%)           (  2%)
     Nematoda              262582 (  3%)           (  1%)
     Other                1625436 ( 19%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1296663             1001-1100   257376
                 51- 100 4329902             1101-1200   179273
                101- 150 4827686             1201-1300   129170
                151- 200 4691513             1301-1400    76175
                201- 250 4749124             1401-1500    63692
                251- 300 4602861             1501-1600    43209
                301- 350 4155199             1601-1700    31284
                351- 400 3098057             1701-1800    23610
                401- 450 2701060             1801-1900    18780
                451- 500 2203596             1901-2000    16006
                501- 550 1394995             2001-2100    13042
                551- 600 1075147             2101-2200    13192
                601- 650  789943             2201-2300     9931
                651- 700  621597             2301-2400     8142
                701- 750  513399             2401-2500     7034
                751- 800  441958             >2500        54872
                801- 850  346509
                851- 900  309535
                901- 950  208873
                951-1000  145325



   The average sequence length in UniProtKB/TrEMBL is   317 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    56835386                1.18                                                    
   Submitted to EMBL/GenBank/DDBJ  34606884  32634836      0.72                                                    
   Journal                         20309329  19273096      0.42                                                    
   Submitted to other databases     1902016   1890484      0.04                                                    
   Thesis                             10350     10292     <0.01                                                    
   Book citation                       6806      6756     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 485105


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      69122222                1.43                                                    
   CATALYTIC ACTIVITY               5073653   4652623      0.11     4                                              
   CAUTION                         29224604  29202856      0.61     1                                              
   COFACTOR                         2130718   1979661      0.04     8                                              
   DOMAIN                            231180    221737     <0.01     9                                              
   ENZYME REGULATION                  63347     63347     <0.01    11                                              
   FUNCTION                         5810445   5511263      0.12     3                                              
   INTERACTION                         1617      1617     <0.01    12                                              
   MISCELLANEOUS                     139311    139113     <0.01    10                                              
   PATHWAY                          2606888   2360780      0.05     7                                              
   SIMILARITY                      15786465  12255435      0.33     2                                              
   SUBCELLULAR LOCATION             5001558   4810149      0.10     5                                              
   SUBUNIT                          3052436   3027302      0.06     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      27503339                0.57                                                    
   ACT_SITE                         2036909   1251753      0.04     5                                              
   BINDING                          4376243   1137811      0.09     2                                              
   CARBOHYD                             352       137     <0.01    28                                              
   CHAIN                             873408    711146      0.02     8                                              
   COILED                             72715     39587     <0.01    18                                              
   COMPBIAS                           12640     12640     <0.01    22                                              
   CROSSLNK                           10974      7404     <0.01    23                                              
   DISULFID                          100418     78462     <0.01    15                                              
   DNA_BIND                           58400     53636     <0.01    19                                              
   DOMAIN                            764806    591854      0.02     9                                              
   INIT_MET                           14201     14201     <0.01    21                                              
   INTRAMEM                             385        55     <0.01    27                                              
   LIPID                              73432     36716     <0.01    17                                              
   METAL                            4111186   1070686      0.09     3                                              
   MOD_RES                           330379    299590      0.01    12                                              
   MOTIF                             227220    137451     <0.01    14                                              
   NON_STD                             1854      1702     <0.01    25                                              
   NON_TER                          7307217   4734585      0.15     1                                              
   NP_BIND                          1532604    913402      0.03     6                                              
   PEPTIDE                               96        96     <0.01    29                                              
   PROPEP                              5148      5148     <0.01    24                                              
   REGION                           1324990    737232      0.03     7                                              
   REPEAT                             57089     13747     <0.01    20                                              
   SIGNAL                            715144    711758      0.01    10                                              
   SITE                              456088    266052      0.01    11                                              
   TOPO_DOM                          325941     67264      0.01    13                                              
   TRANSIT                             1286      1286     <0.01    26                                              
   TRANSMEM                         2633713    465995      0.05     4                                              
   ZN_FING                            78501     70859     <0.01    16                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             518543452               10.76                                                    
   Allergome                           3747      3110     <0.01    84   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   103   Organism-specific databases                
   ArrayExpress                      208455    208455     <0.01    41   Gene expression databases                  
   BRENDA                              2626      2598     <0.01    87   Enzyme and pathway databases               
   Bgee                               99270     99270     <0.01    51   Gene expression databases                  
   BindingDB                           5781      5781     <0.01    78   Other                                      
   BioCyc                           5639472   5572169      0.12    19   Enzyme and pathway databases               
   CAZy                               73983     69512     <0.01    55   Protein family/group databases             
   CGD                                 7000      7000     <0.01    77   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   108   2D gel databases                           
   CTD                               361038    359718      0.01    38   Organism-specific databases                
   ChEMBL                               657       657     <0.01    94   Other                                      
   ChiTaRS                            65436     65436     <0.01    56   Other                                      
   ConoServer                           160       160     <0.01   100   Organism-specific databases                
   DIP                                 2951      2946     <0.01    86   Protein-protein interaction databases      
   DNASU                              42271     41937     <0.01    62   Protocols and materials databases          
   EMBL                            51632999  47060171      1.07     3   Sequence databases                         
   Ensembl                          1042968   1028345      0.02    30   Genome annotation databases                
   EnsemblBacteria                 29687705  29261806      0.62     5   Genome annotation databases                
   EnsemblFungi                      379077    376903      0.01    37   Genome annotation databases                
   EnsemblMetazoa                    802615    786415      0.02    34   Genome annotation databases                
   EnsemblPlants                     677983    645353      0.01    35   Genome annotation databases                
   EnsemblProtists                   195423    192855     <0.01    45   Genome annotation databases                
   EuPathDB                          142862    142860     <0.01    50   Organism-specific databases                
   EvolutionaryTrace                   8007      8007     <0.01    75   Other                                      
   FlyBase                           199062    197591     <0.01    44   Organism-specific databases                
   GO                              87219701  27551334      1.81     2   Ontologies                                 
   Gene3D                          20253617  15968411      0.42     8   Family and domain databases                
   GeneID                          10345692  10063018      0.21    12   Genome annotation databases                
   GeneTree                          886103    886045      0.02    32   Phylogenomic databases                     
   Genevestigator                     85944     85938     <0.01    52   Gene expression databases                  
   GenoList                           14730     14457     <0.01    73   Organism-specific databases                
   GenomeRNAi                         19304     19304     <0.01    70   Other                                      
   Gramene                           203961    203961     <0.01    43   Organism-specific databases                
   H-InvDB                              609       462     <0.01    95   Organism-specific databases                
   HAMAP                            4666523   4606918      0.10    21   Family and domain databases                
   HGNC                               47512     47427     <0.01    59   Organism-specific databases                
   HOGENOM                          3653459   3653414      0.08    24   Phylogenomic databases                     
   HOVERGEN                          304967    304956      0.01    39   Phylogenomic databases                     
   IPI                               278460    277562      0.01    40   Sequence databases                         
   InParanoid                        186173    186173     <0.01    46   Phylogenomic databases                     
   IntAct                             16701     16701     <0.01    71   Protein-protein interaction databases      
   InterPro                        99089041  34749690      2.06     1   Family and domain databases                
   KEGG                             9305065   9081874      0.19    14   Genome annotation databases                
   KO                               3800294   3782350      0.08    23   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    80   Organism-specific databases                
   Leproma                             1272      1270     <0.01    89   Organism-specific databases                
   MEROPS                            179589    179589     <0.01    47   Protein family/group databases             
   MGI                                52041     51616     <0.01    58   Organism-specific databases                
   MIM                                    4         4     <0.01   109   Organism-specific databases                
   MINT                               10232     10231     <0.01    74   Protein-protein interaction databases      
   NextBio                           207750    207743     <0.01    42   Other                                      
   OGP                                    3         3     <0.01   110   2D gel databases                           
   OMA                              6332760   6332758      0.13    17   Phylogenomic databases                     
   OrthoDB                          5210364   5210364      0.11    20   Phylogenomic databases                     
   PANTHER                          6156329   5868503      0.13    18   Family and domain databases                
   PATRIC                           8267963   8267836      0.17    15   Genome annotation databases                
   PDB                                20166     11179     <0.01    69   3D structure databases                     
   PDBsum                             21494     11698     <0.01    67   3D structure databases                     
   PIR                               172171    139308     <0.01    48   Sequence databases                         
   PIRSF                            4295626   4262000      0.09    22   Family and domain databases                
   PMAP-CutDB                           207       207     <0.01    99   Other                                      
   PRIDE                             961513    961513      0.02    31   Proteomic databases                        
   PRINTS                           6486332   5843814      0.13    16   Family and domain databases                
   PRO                                27294     27294     <0.01    65   Other                                      
   PROSITE                         21851280  14621715      0.45     6   Family and domain databases                
   PaxDb                              28974     28972     <0.01    64   Proteomic databases                        
   PeptideAtlas                         128       128     <0.01   101   Proteomic databases                        
   PeroxiBase                          2595      2587     <0.01    88   Protein family/group databases             
   Pfam                            44480004  32495094      0.92     4   Family and domain databases                
   PharmGKB                            3551      3551     <0.01    85   Organism-specific databases                
   PhosSite                             784       772     <0.01    92   PTM databases                              
   PhosphoSite                         1109      1109     <0.01    90   PTM databases                              
   PhylomeDB                         146294    146294     <0.01    49   Phylogenomic databases                     
   PomBase                               40        27     <0.01   104   Organism-specific databases                
   PptaseDB                              36        35     <0.01   105   Protein family/group databases             
   ProDom                            885593    854657      0.02    33   Family and domain databases                
   ProMEX                              5650      5650     <0.01    79   Proteomic databases                        
   ProtClustDB                      2717744   2717743      0.06    28   Phylogenomic databases                     
   ProteinModelPortal              13020932  13020932      0.27     9   3D structure databases                     
   PseudoCAP                           4529      4523     <0.01    82   Organism-specific databases                
   REBASE                             41441     41415     <0.01    63   Protein family/group databases             
   REPRODUCTION-2DPAGE                   66        65     <0.01   102   2D gel databases                           
   RGD                                21066     20241     <0.01    68   Organism-specific databases                
   Reactome                             242       186     <0.01    98   Enzyme and pathway databases               
   RefSeq                          10569634  10214663      0.22    11   Sequence databases                         
   SABIO-RK                             505       505     <0.01    96   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   107   Organism-specific databases                
   SMART                            9672036   7355585      0.20    13   Family and domain databases                
   SMR                              3517074   3517074      0.07    25   3D structure databases                     
   STRING                           2903501   2903432      0.06    26   Protein-protein interaction databases      
   SUPFAM                          21365875  17182096      0.44     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   106   2D gel databases                           
   SignaLink                           4393      4391     <0.01    83   Enzyme and pathway databases               
   TAIR                               14984     14911     <0.01    72   Organism-specific databases                
   TCDB                                4966      4959     <0.01    81   Protein family/group databases             
   TIGRFAMs                        11004414  10035208      0.23    10   Family and domain databases                
   TubercuList                         1094      1093     <0.01    91   Organism-specific databases                
   UCSC                               59163     59062     <0.01    57   Genome annotation databases                
   UniGene                           565286    535071      0.01    36   Sequence databases                         
   UniPathway                       2536562   2358342      0.05    29   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    53   Genome annotation databases                
   World-2DPAGE                         672       667     <0.01    93   2D gel databases                           
   WormBase                           42440     42267     <0.01    61   Organism-specific databases                
   Xenbase                            25503     25448     <0.01    66   Organism-specific databases                
   ZFIN                               45651     45184     <0.01    60   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    76   Organism-specific databases                
   eggNOG                           2767945   2767925      0.06    27   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    54   Organism-specific databases                
   mycoCLAP                             423       422     <0.01    97   Protein family/group databases             

Number of explicitly cross-referenced databases: 128


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.65   Gln (Q) 3.98   Leu (L) 9.97   Ser (S) 6.52
   Arg (R) 5.33   Glu (E) 6.22   Lys (K) 5.35   Thr (T) 5.54
   Asn (N) 4.14   Gly (G) 7.08   Met (M) 2.50   Trp (W) 1.28
   Asp (D) 5.34   His (H) 2.18   Phe (F) 4.05   Tyr (Y) 3.08
   Cys (C) 1.19   Ile (I) 6.14   Pro (P) 4.53   Val (V) 6.81

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.02


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Lys, Asp, Arg, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 678768
Total number of entries encoded on a Plasmid: 365690
Total number of entries encoded on a Plastid: 28022
Total number of entries encoded on a Plastid; Apicoplast: 812
Total number of entries encoded on a Plastid; Chloroplast: 249669
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1222