Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2014_01 STATISTICS


1.  INTRODUCTION

Release 2014_01 of 22-Jan-2014 of UniProtKB/TrEMBL contains 51616950 sequence entries,
comprising 16381125439 amino acids.

2960550 sequences have been added since release 2013_12, the sequence data of
3596 existing entries has been updated and the annotations of
10841278 entries have been revised. This represents an increase of 6%.

Number of fragments: 4899984

Protein existence (PE):              entries      %
1: Evidence at protein level           21809     0.04%
2: Evidence at transcript level       910375     1.76%
3: Inferred from homology           12431487    24.08%
4: Predicted                        38253279    74.11%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 461285

   The first twenty species represent 1981745 sequences:   3.8 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:18969
                            2x:75796
                            3x:40864
                            4x:28973
                            5x:17152
                            6x:12016
                            7x: 9120
                            8x: 7171
                            9x: 5647
                           10x:10749
                       11- 20x:33566
                       21- 50x:10918
                       51-100x: 4339
                         >100x:15280


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     567661  Human immunodeficiency virus 1
       2     208791  uncultured bacterium
       3     115348  Homo sapiens (Human)
       4      96846  Oryza sativa subsp. japonica (Rice)
       5      91032  Hepatitis C virus
       6      76893  Hepatitis B virus (HBV)
       7      73893  Glycine max (Soybean) (Glycine hispida)
       8      73054  mine drainage metagenome
       9      70512  Hordeum vulgare var. distichum (Two-rowed barley)
      10      69209  Macaca mulatta (Rhesus macaque)
      11      60381  Zea mays (Maize)
      12      56588  Mus musculus (Mouse)
      13      56235  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      14      54977  Callithrix jacchus (White-tufted-ear marmoset)
      15      54905  Solanum tuberosum (Potato)
      16      54146  Vitis vinifera (Grape)
      17      52500  Danio rerio (Zebrafish) (Brachydanio rerio)
      18      50603  Trichomonas vaginalis
      19      49265  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      20      48906  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      21      47018  Populus trichocarpa (Western balsam poplar) 
      22      41202  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      23      40742  Arabidopsis thaliana (Mouse-ear cress)
      24      39896  Oryza sativa subsp. indica (Rice)
      25      39850  Paramecium tetraurelia
      26      39363  Setaria italica (Foxtail millet) (Panicum italicum)
      27      38796  Mustela putorius furo (European domestic ferret) (Mustela furo)
      28      38163  human gut metagenome
      29      36757  Simian immunodeficiency virus (SIV)
      30      36733  Drosophila melanogaster (Fruit fly)
      31      36598  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      32      35928  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      33      35675  Ailuropoda melanoleuca (Giant panda)
      34      35599  Emiliania huxleyi CCMP1516
      35      35303  Physcomitrella patens subsp. patens (Moss)
      36      35208  Acyrthosiphon pisum (Pea aphid)
      37      35066  Caenorhabditis japonica
      38      34570  Thalassiosira oceanica (Marine diatom)
      39      34495  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      40      33853  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      41      33666  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      42      33256  Selaginella moellendorffii (Spikemoss)
      43      32772  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      44      32342  Oryza brachyantha
      45      32327  Sus scrofa (Pig)
      46      32142  Oryza glaberrima (African rice)
      47      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      48      31849  Pan troglodytes (Chimpanzee)
      49      31818  Anas platyrhynchos (Domestic duck) (Anas boschas)
      50      31390  Ricinus communis (Castor bean)
      51      31290  Citrus clementina
      52      31207  Capitella teleta (Polychaete worm)
      53      30954  Daphnia pulex (Water flea)
      54      30712  Caenorhabditis brenneri (Nematode worm)
      55      30147  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      56      29823  Rhizophagus irregularis DAOM 181602
      57      29815  Amphimedon queenslandica (Sponge)
      58      29528  Escherichia coli
      59      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      60      29318  Pristionchus pacificus (Parasitic nematode)
      61      29183  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      62      29054  Oikopleura dioica (Tunicate)
      63      28825  Capsella rubella
      64      28823  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      65      28631  Prunus persica (Peach) (Amygdalus persica)
      66      28380  Thellungiella salsuginea (Saltwater cress) (Arabidopsis glauca)
      67      28101  Gasterosteus aculeatus (Three-spined stickleback)
      68      27784  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      69      27555  Canis familiaris (Dog) (Canis lupus familiaris)
      70      27517  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      71      27472  Equus caballus (Horse)
      72      27089  Gorilla gorilla gorilla (Lowland gorilla)
      73      26840  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      74      25979  Oryzias latipes (Medaka fish) (Japanese ricefish)
      75      25797  Loxodonta africana (African elephant)
      76      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      77      25685  Bos taurus (Bovine)
      78      25675  Rattus norvegicus (Rat)
      79      24914  Nematostella vectensis (Starlet sea anemone)
      80      24643  Tetrahymena thermophila (strain SB210)
      81      24590  Guillardia theta CCMP2712
      82      24210  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      83      23717  Ornithorhynchus anatinus (Duckbill platypus)
      84      23687  Lottia gigantea (Giant owl limpet)
      85      23650  Dendroctonus ponderosae (Mountain pine beetle)
      86      23565  Oxytricha trifallax
      87      23495  Latimeria chalumnae (West Indian ocean coelacanth)
      88      23378  Helobdella robusta (Californian leech)
      89      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      90      22986  Caenorhabditis elegans
      91      22751  Monodelphis domestica (Gray short-tailed opossum)
      92      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      93      22311  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      94      22163  gut metagenome
      95      21895  Oryctolagus cuniculus (Rabbit)
      96      21547  Heterocephalus glaber (Naked mole rat)
      97      21440  Gallus gallus (Chicken)
      98      21346  Caenorhabditis briggsae
      99      21129  Ixodes scapularis (Black-legged tick) (Deer tick)
     100      21009  Felis catus (Cat) (Felis silvestris catus)
     101      20867  Myotis lucifugus (Little brown bat)
     102      20850  Tupaia chinensis (Chinese tree shrew)
     103      20772  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     104      20514  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     105      20133  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     106      20115  Ciona savignyi (Pacific transparent sea squirt)
     107      20082  Cavia porcellus (Guinea pig)
     108      20059  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     109      20028  Camelus ferus (Wild Bactrian camel)
     110      19818  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     111      19686  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     112      19556  Anolis carolinensis (Green anole) (American chameleon)
     113      19546  Pteropus alecto (Black flying fox)
     114      19520  Wuchereria bancrofti
     115      19300  Myotis brandtii (Brandt's bat)
     116      19201  Trypanosoma cruzi (strain CL Brener)
     117      19059  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     118      18957  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     119      18857  Drosophila simulans (Fruit fly)
     120      18600  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     121      18560  Haemonchus contortus (Barber pole worm)
     122      18557  Bos mutus
     123      18477  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     124      18243  Tetranychus urticae (Two-spotted spider mite)
     125      18113  Atta cephalotes (Leafcutter ant)
     126      18047  Saprolegnia diclina VS20
     127      18027  Anopheles gambiae (African malaria mosquito)
     128      17907  Moniliophthora roreri MCA 2997
     129      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     130      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     131      17703  Bombyx mori (Silk moth)
     132      17683  Genlisea aurea
     133      17664  Hepatitis C virus subtype 1b
     134      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     135      17437  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     136      17284  Nasonia vitripennis (Parasitic wasp)
     137      17204  Plasmodium falciparum
     138      17064  Tribolium castaneum (Red flour beetle)
     139      17040  Drosophila yakuba (Fruit fly)
     140      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     141      16919  Meleagris gallopavo (Common turkey)
     142      16714  Drosophila persimilis (Fruit fly)
     143      16698  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     144      16639  Fusarium oxysporum f. sp. lycopersici  
     145      16609  Rhodnius prolixus (Triatomid bug)
     146      16427  Ectocarpus siliculosus (Brown alga)
     147      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     148      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     149      16329  Danaus plexippus (Monarch butterfly)
     150      16275  Trichinella spiralis (Trichina worm)
     151      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     152      16214  Neovison vison (American mink) (Mustela vison)
     153      16196  Ixodes ricinus (Common tick)
     154      16189  Drosophila sechellia (Fruit fly)
     155      16189  Schistosoma japonicum (Blood fluke)
     156      16148  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     157      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     158      16057  Listeria monocytogenes
     159      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     160      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     161      15716  Naegleria gruberi (Amoeba)
     162      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     163      15592  Phytophthora ramorum (Sudden oak death agent)
     164      15469  uncultured archaeon
     165      15465  Myotis davidii (David's myotis)
     166      15422  Drosophila willistoni (Fruit fly)
     167      15371  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     168      15354  Loa loa (Eye worm) (Filaria loa)
     169      15345  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     170      15228  Pythium ultimum
     171      15144  Drosophila ananassae (Fruit fly)
     172      15057  Pararge aegeria (specked wood butterfly)
     173      15042  Harpegnathos saltator (Jerdon's jumping ant)
     174      15011  Strigamia maritima (European centipede) (Geophilus maritimus)
     175      14942  Acanthamoeba castellanii str. Neff
     176      14927  Drosophila erecta (Fruit fly)
     177      14857  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     178      14801  Camponotus floridanus (Florida carpenter ant)
     179      14794  Drosophila mojavensis (Fruit fly)
     180      14790  Gibberella fujikuroi (strain CBS 195.34 / IMI 58289 / NRRL A-6831)  
     181      14727  Rabies virus
     182      14713  Plasmodium chabaudi
     183      14708  Drosophila virilis (Fruit fly)
     184      14654  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     185      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     186      14597  Angomonas deanei
     187      14417  Volvox carteri (Green alga)
     188      14346  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     189      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     190      14235  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     191      14147  Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
     192      14085  Toxoplasma gondii
     193      13970  Acromyrmex echinatior (Panamanian leafcutter ant) 
     194      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     195      13878  Clonorchis sinensis (Chinese liver fluke)
     196      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     197      13806  Fomitopsis pinicola (strain FP-58527) (Brown rot fungus)
     198      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     199      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     200      13696  Trypanosoma cruzi
     201      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     202      13421  Hepatitis C virus subtype 1a
     203      13345  Aspergillus flavus 
     204      13329  Colletotrichum orbiculare   
     205      13306  Pyronema omphalodes CBS 100304
     206      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     207      13184  Porcine reproductive and respiratory syndrome virus (PRRSV)
     208      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     209      13115  Petromyzon marinus (Sea lamprey)
     210      13082  Glarea lozoyensis (strain ATCC 20868 / MF5171)
     211      13062  Mycosphaerella fijiensis (strain CIRAD86) (Black leaf streak disease fungus) 
     212      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     213      12983  Albugo laibachii Nc14
     214      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     215      12950  Stigmatella aurantiaca (strain DW4/3-1)
     216      12900  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     217      12856  Cochliobolus heterostrophus (strain C5 / ATCC 48332 / race O)  
     218      12846  Magnaporthe oryzae (strain Y34) (Rice blast fungus) (Pyricularia oryzae)
     219      12746  Schistosoma mansoni (Blood fluke)
     220      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     221      12711  Magnaporthe oryzae (strain P131) (Rice blast fungus) (Pyricularia oryzae)
     222      12703  Cochliobolus heterostrophus (strain C4 / ATCC 48331 / race T)  
     223      12697  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     224      12696  Trypanosoma congolense (strain IL3000)
     225      12652  Helicobacter pylori (Campylobacter pylori)
     226      12624  Xenopus laevis (African clawed frog)
     227      12586  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     228      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     229      12440  Polysphondylium pallidum (Cellular slime mold)
     230      12414  Mycosphaerella pini (strain NZE10 / CBS 128990) (Red band needle blight fungus) 
     231      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     232      12352  Dictyostelium purpureum (Slime mold)
     233      12300  Enterococcus gallinarum EGD-AAK12
     234      12197  Thanatephorus cucumeris (strain AG1-IB / isolate 7/3/14)  
     235      12174  Cochliobolus sativus (strain ND90Pr / ATCC 201652)  
     236      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     237      12143  Mucor circinelloides f. circinelloides (strain 1006PhL) (Mucormycosis agent) 
     238      12078  Ceriporiopsis subvermispora (strain B) (White-rot fungus)
     239      11997  Apis mellifera (Honeybee)
     240      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     241      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     242      11939  Emericella nidulans  
     243      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     244      11780  Piriformospora indica (strain DSM 11827)
     245      11752  Chondrocladia sp. SMF<DEU
     246      11751  Cladorhiza sp. SMF<DEU
     247      11750  Abyssocladia sp. SMF<DEU
     248      11735  Gloeophyllum trabeum (strain ATCC 11539 / FP-39264 / Madison 617) 
     249      11726  Phelloderma sp. SMF<DEU
     250      11719  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          771455 (  1%)
    Bacteria       39689268 ( 77%)
    Eukaryota       9095829 ( 18%)
    Viruses         1895709 (  4%)
    Other            164688 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 115388 (  1%)           (  0%)
     Other Mammalia       1044422 ( 11%)           (  2%)
     Other Vertebrata      917280 ( 10%)           (  2%)
     Viridiplantae        1803516 ( 20%)           (  3%)
     Fungi                2211018 ( 24%)           (  4%)
     Insecta               948171 ( 10%)           (  2%)
     Nematoda              281267 (  3%)           (  1%)
     Other                1774767 ( 20%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1369881             1001-1100   273950
                 51- 100 4633388             1101-1200   190584
                101- 150 5190795             1201-1300   138220
                151- 200 5047005             1301-1400    80818
                201- 250 5119977             1401-1500    67763
                251- 300 4967574             1501-1600    45544
                301- 350 4486893             1601-1700    33322
                351- 400 3333406             1701-1800    25019
                401- 450 2912843             1801-1900    19986
                451- 500 2377275             1901-2000    16870
                501- 550 1497022             2001-2100    13882
                551- 600 1154691             2101-2200    13929
                601- 650  846701             2201-2300    10487
                651- 700  667355             2301-2400     8647
                701- 750  551200             2401-2500     7536
                751- 800  473462             >2500        58555
                801- 850  371289
                851- 900  331123
                901- 950  224790
                951-1000  155184



   The average sequence length in UniProtKB/TrEMBL is   317 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    60671577                1.18                                                    
   Submitted to EMBL/GenBank/DDBJ  37758260  35349479      0.73                                                    
   Journal                         20990472  19898039      0.41                                                    
   Submitted to other databases     1905391   1897149      0.04                                                    
   Thesis                             10551     10492     <0.01                                                    
   Book citation                       6902      6852     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 491946


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      78098444                1.51                                                    
   CATALYTIC ACTIVITY               5761682   5263827      0.11     4                                              
   CAUTION                         32027826  31993897      0.62     1                                              
   COFACTOR                         2486748   2278028      0.05     8                                              
   DOMAIN                            271052    258952      0.01     9                                              
   ENZYME REGULATION                  83464     83464     <0.01    11                                              
   FUNCTION                         6696762   6318820      0.13     3                                              
   INTERACTION                         1706      1706     <0.01    12                                              
   MISCELLANEOUS                     154963    154759     <0.01    10                                              
   PATHWAY                          2949519   2670026      0.06     7                                              
   SIMILARITY                      18448266  14129946      0.36     2                                              
   SUBCELLULAR LOCATION             5665218   5449458      0.11     5                                              
   SUBUNIT                          3551238   3521412      0.07     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      31435886                0.61                                                    
   ACT_SITE                         2381217   1484344      0.05     5                                              
   BINDING                          5086719   1328449      0.10     2                                              
   CARBOHYD                             602       232     <0.01    27                                              
   CHAIN                             887257    719649      0.02     9                                              
   COILED                             89135     50693     <0.01    17                                              
   COMPBIAS                           14125     14001     <0.01    22                                              
   CROSSLNK                           12148      8067     <0.01    23                                              
   DISULFID                          112656     87430     <0.01    15                                              
   DNA_BIND                           89827     83943     <0.01    16                                              
   DOMAIN                            988685    760902      0.02     8                                              
   INIT_MET                           16265     16265     <0.01    21                                              
   INTRAMEM                             392        56     <0.01    28                                              
   LIPID                              82702     41351     <0.01    19                                              
   METAL                            4897531   1261871      0.09     3                                              
   MOD_RES                           385589    346350      0.01    13                                              
   MOTIF                             310062    199653      0.01    14                                              
   NON_STD                             1878      1727     <0.01    25                                              
   NON_TER                          7533983   4902620      0.15     1                                              
   NP_BIND                          1813293   1089270      0.04     6                                              
   PEPTIDE                               99        99     <0.01    29                                              
   PROPEP                              5983      5983     <0.01    24                                              
   REGION                           1605652    888506      0.03     7                                              
   REPEAT                             75178     18452     <0.01    20                                              
   SIGNAL                            741335    737870      0.01    10                                              
   SITE                              702630    341492      0.01    11                                              
   TOPO_DOM                          401445     79137      0.01    12                                              
   TRANSIT                             1315      1315     <0.01    26                                              
   TRANSMEM                         3111790    547062      0.06     4                                              
   ZN_FING                            86393     77938     <0.01    18                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             570151772               11.05                                                    
   Allergome                           3655      3019     <0.01    83   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   101   Organism-specific databases                
   ArrayExpress                      199033    199033     <0.01    41   Gene expression databases                  
   BRENDA                              2624      2596     <0.01    86   Enzyme and pathway databases               
   Bgee                               99055     99055     <0.01    50   Gene expression databases                  
   BindingDB                           5762      5762     <0.01    77   Chemistry                                  
   BioCyc                           5682060   5604570      0.11    20   Enzyme and pathway databases               
   CAZy                               73978     69509     <0.01    54   Protein family/group databases             
   CGD                                 6895      6895     <0.01    76   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   108   2D gel databases                           
   CTD                               383170    381833      0.01    38   Organism-specific databases                
   ChEMBL                               656       656     <0.01    93   Chemistry                                  
   ChiTaRS                            65353     65353     <0.01    55   Other                                      
   ConoServer                           160       160     <0.01    99   Organism-specific databases                
   DIP                                 2970      2965     <0.01    85   Protein-protein interaction databases      
   DNASU                              42214     41887     <0.01    62   Protocols and materials databases          
   EMBL                            55092867  50512747      1.07     3   Sequence databases                         
   Ensembl                          1041412   1026927      0.02    30   Genome annotation databases                
   EnsemblBacteria                 29678059  29250533      0.57     5   Genome annotation databases                
   EnsemblFungi                      385492    383159      0.01    37   Genome annotation databases                
   EnsemblMetazoa                    802419    786258      0.02    34   Genome annotation databases                
   EnsemblPlants                     670798    639490      0.01    35   Genome annotation databases                
   EnsemblProtists                   195422    192854     <0.01    44   Genome annotation databases                
   EuPathDB                          157794    157792     <0.01    48   Organism-specific databases                
   EvolutionaryTrace                   8002      8002     <0.01    74   Other                                      
   FlyBase                           199021    197549     <0.01    42   Organism-specific databases                
   GO                              96532248  31130254      1.87     2   Ontologies                                 
   Gene3D                          23118137  18226180      0.45     8   Family and domain databases                
   GeneID                          10657953  10361985      0.21    13   Genome annotation databases                
   GeneTree                          953903    953846      0.02    32   Phylogenomic databases                     
   Genevestigator                     85648     85640     <0.01    51   Gene expression databases                  
   GenoList                           14730     14457     <0.01    72   Organism-specific databases                
   GenomeRNAi                         19219     19219     <0.01    69   Other                                      
   Gramene                           198710    198710     <0.01    43   Organism-specific databases                
   GuidetoPHARMACOLOGY                   21        21     <0.01   106   Chemistry                                  
   H-InvDB                              607       460     <0.01    94   Organism-specific databases                
   HAMAP                            6190588   6108214      0.12    19   Family and domain databases                
   HGNC                               47479     47394     <0.01    58   Organism-specific databases                
   HOGENOM                          3646899   3646857      0.07    24   Phylogenomic databases                     
   HOVERGEN                          304740    304730      0.01    39   Phylogenomic databases                     
   InParanoid                        185941    185941     <0.01    45   Phylogenomic databases                     
   IntAct                             15395     15395     <0.01    70   Protein-protein interaction databases      
   InterPro                       113785660  39766541      2.20     1   Family and domain databases                
   KEGG                             9533977   9300115      0.18    14   Genome annotation databases                
   KO                               3912060   3893640      0.08    23   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    80   Organism-specific databases                
   Leproma                             1272      1270     <0.01    88   Organism-specific databases                
   MEROPS                            179419    179419     <0.01    46   Protein family/group databases             
   MGI                                51886     51459     <0.01    57   Organism-specific databases                
   MIM                                    4         4     <0.01   109   Organism-specific databases                
   MINT                               10204     10203     <0.01    73   Protein-protein interaction databases      
   NextBio                           207195    207193     <0.01    40   Other                                      
   OGP                                    3         3     <0.01   110   2D gel databases                           
   OMA                              6329746   6329740      0.12    18   Phylogenomic databases                     
   OrthoDB                          5207982   5207981      0.10    21   Phylogenomic databases                     
   PANTHER                          7109465   6755422      0.14    17   Family and domain databases                
   PATRIC                           8267874   8267747      0.16    15   Genome annotation databases                
   PDB                                22223     12082     <0.01    66   3D structure databases                     
   PDBsum                             21826     11834     <0.01    67   3D structure databases                     
   PIR                               172034    139176     <0.01    47   Sequence databases                         
   PIRSF                            5003493   4964901      0.10    22   Family and domain databases                
   PMAP-CutDB                           201       201     <0.01    98   Other                                      
   PRIDE                             935668    935668      0.02    33   Proteomic databases                        
   PRINTS                           7331385   6624816      0.14    16   Family and domain databases                
   PRO                                27245     27245     <0.01    64   Other                                      
   PROSITE                         25273896  16886074      0.49     6   Family and domain databases                
   PaxDb                              28894     28892     <0.01    63   Proteomic databases                        
   PeptideAtlas                         128       128     <0.01   100   Proteomic databases                        
   PeroxiBase                          2594      2586     <0.01    87   Protein family/group databases             
   Pfam                            50903424  37202469      0.99     4   Family and domain databases                
   PharmGKB                            3518      3518     <0.01    84   Organism-specific databases                
   PhosSite                             784       772     <0.01    91   PTM databases                              
   PhosphoSite                         1103      1103     <0.01    89   PTM databases                              
   PhylomeDB                         145339    145339     <0.01    49   Phylogenomic databases                     
   PomBase                               40        27     <0.01   103   Organism-specific databases                
   PptaseDB                              36        35     <0.01   104   Protein family/group databases             
   ProDom                           1025131    991508      0.02    31   Family and domain databases                
   ProMEX                              5337      5337     <0.01    78   Proteomic databases                        
   ProtClustDB                      2710571   2710571      0.05    29   Phylogenomic databases                     
   ProteinModelPortal              12934209  12934209      0.25     9   3D structure databases                     
   PseudoCAP                           4519      4513     <0.01    81   Organism-specific databases                
   REBASE                             43257     43233     <0.01    60   Protein family/group databases             
   REPRODUCTION-2DPAGE                   65        64     <0.01   102   2D gel databases                           
   RGD                                21084     20223     <0.01    68   Organism-specific databases                
   Reactome                             242       186     <0.01    97   Enzyme and pathway databases               
   RefSeq                          10913471  10524524      0.21    12   Sequence databases                         
   SABIO-RK                             497       497     <0.01    95   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   107   Organism-specific databases                
   SMART                           11038289   8402812      0.21    11   Family and domain databases                
   SMR                              3509614   3509614      0.07    25   3D structure databases                     
   STRING                           2900609   2900533      0.06    26   Protein-protein interaction databases      
   SUPFAM                          24392043  19604295      0.47     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   105   2D gel databases                           
   SignaLink                           4364      4362     <0.01    82   Enzyme and pathway databases               
   TAIR                               14892     14819     <0.01    71   Organism-specific databases                
   TCDB                                5273      5263     <0.01    79   Protein family/group databases             
   TIGRFAMs                        12858572  11725060      0.25    10   Family and domain databases                
   TubercuList                         1093      1092     <0.01    90   Organism-specific databases                
   UCSC                               58892     58860     <0.01    56   Genome annotation databases                
   UniGene                           561469    531320      0.01    36   Sequence databases                         
   UniPathway                       2869313   2667365      0.06    27   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    52   Genome annotation databases                
   World-2DPAGE                         672       667     <0.01    92   2D gel databases                           
   WormBase                           42407     42233     <0.01    61   Organism-specific databases                
   Xenbase                            25533     25472     <0.01    65   Organism-specific databases                
   ZFIN                               45637     45172     <0.01    59   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    75   Organism-specific databases                
   eggNOG                           2755836   2755802      0.05    28   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    53   Organism-specific databases                
   mycoCLAP                             455       454     <0.01    96   Protein family/group databases             

Number of explicitly cross-referenced databases: 128


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.71   Gln (Q) 4.00   Leu (L) 10.0   Ser (S) 6.51
   Arg (R) 5.36   Glu (E) 6.21   Lys (K) 5.30   Thr (T) 5.52
   Asn (N) 4.11   Gly (G) 7.10   Met (M) 2.50   Trp (W) 1.29
   Asp (D) 5.33   His (H) 2.18   Phe (F) 4.04   Tyr (Y) 3.07
   Cys (C) 1.19   Ile (I) 6.11   Pro (P) 4.53   Val (V) 6.81

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.02


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 692583
Total number of entries encoded on a Plasmid: 373929
Total number of entries encoded on a Plastid: 29662
Total number of entries encoded on a Plastid; Apicoplast: 877
Total number of entries encoded on a Plastid; Chloroplast: 258660
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1393