Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2014_04 STATISTICS


1.  INTRODUCTION

Release 2014_04 of 16-Apr-2014 of UniProtKB/TrEMBL contains 54958551 sequence entries,
comprising 17473872940 amino acids.

759560 sequences have been added since release 2014_03, the sequence data of
13762 existing entries has been updated and the annotations of
16654879 entries have been revised. This represents an increase of 2%.

Number of fragments: 5207483

Protein existence (PE):              entries      %
1: Evidence at protein level           22220     0.04%
2: Evidence at transcript level       856181     1.56%
3: Inferred from homology           13964112    25.41%
4: Predicted                        40116038    72.99%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 475794

   The first twenty species represent 2101753 sequences:   3.8 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:19540
                            2x:77828
                            3x:42094
                            4x:29732
                            5x:17470
                            6x:12585
                            7x: 9295
                            8x: 7448
                            9x: 5810
                           10x:10884
                       11- 20x:35216
                       21- 50x:11416
                       51-100x: 4513
                         >100x:16096


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     587027  Human immunodeficiency virus 1
       2     217044  uncultured bacterium
       3     116309  Homo sapiens (Human)
       4     105969  Triticum aestivum (Wheat)
       5      96787  Oryza sativa subsp. japonica (Rice)
       6      92236  Hepatitis C virus
       7      80678  Hepatitis B virus (HBV)
       8      73921  Glycine max (Soybean) (Glycine hispida)
       9      73055  mine drainage metagenome
      10      70518  Hordeum vulgare var. distichum (Two-rowed barley)
      11      69434  Macaca mulatta (Rhesus macaque)
      12      67669  Phytophthora parasitica (Potato buckeye rot agent)
      13      60710  human gut metagenome
      14      60424  Zea mays (Maize)
      15      56816  Mus musculus (Mouse)
      16      56236  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      17      55011  Callithrix jacchus (White-tufted-ear marmoset)
      18      54919  Solanum tuberosum (Potato)
      19      54154  Vitis vinifera (Grape)
      20      52836  Danio rerio (Zebrafish) (Brachydanio rerio)
      21      50605  Trichomonas vaginalis
      22      49267  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      23      48911  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      24      47013  Populus trichocarpa (Western balsam poplar) 
      25      41207  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      26      40352  Arabidopsis thaliana (Mouse-ear cress)
      27      39875  Oryza sativa subsp. indica (Rice)
      28      39850  Paramecium tetraurelia
      29      39364  Setaria italica (Foxtail millet) (Panicum italicum)
      30      38796  Mustela putorius furo (European domestic ferret) (Mustela furo)
      31      37446  Simian immunodeficiency virus (SIV)
      32      36744  Drosophila melanogaster (Fruit fly)
      33      36598  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      34      35950  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      35      35675  Ailuropoda melanoleuca (Giant panda)
      36      35599  Emiliania huxleyi CCMP1516
      37      35307  Physcomitrella patens subsp. patens (Moss)
      38      35212  Acyrthosiphon pisum (Pea aphid)
      39      35137  Caenorhabditis japonica
      40      34570  Thalassiosira oceanica (Marine diatom)
      41      34506  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      42      33864  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      43      33684  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      44      33256  Selaginella moellendorffii (Spikemoss)
      45      32772  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      46      32420  Sus scrofa (Pig)
      47      32342  Oryza brachyantha
      48      32302  Phaseolus vulgaris (Kidney bean) (French bean)
      49      32142  Oryza glaberrima (African rice)
      50      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      51      31953  Anas platyrhynchos (Domestic duck) (Anas boschas)
      52      31861  Pan troglodytes (Chimpanzee)
      53      31403  Ricinus communis (Castor bean)
      54      31290  Citrus clementina
      55      31207  Capitella teleta (Polychaete worm)
      56      30955  Daphnia pulex (Water flea)
      57      30713  Caenorhabditis brenneri (Nematode worm)
      58      30281  Escherichia coli
      59      30150  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      60      29845  Rhizophagus irregularis (strain DAOM 181602 / DAOM 197198 / MUCL 43194)  
      61      29815  Amphimedon queenslandica (Sponge)
      62      29470  Strongylocentrotus purpuratus (Purple sea urchin)
      63      29321  Pristionchus pacificus (Parasitic nematode)
      64      29193  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      65      29057  Oikopleura dioica (Tunicate)
      66      28827  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      67      28825  Capsella rubella
      68      28632  Prunus persica (Peach) (Amygdalus persica)
      69      28380  Thellungiella salsuginea (Saltwater cress) (Arabidopsis glauca)
      70      28104  Gasterosteus aculeatus (Three-spined stickleback)
      71      27804  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      72      27633  Canis familiaris (Dog) (Canis lupus familiaris)
      73      27527  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      74      27495  Equus caballus (Horse)
      75      27434  Amborella trichopoda
      76      27089  Gorilla gorilla gorilla (Lowland gorilla)
      77      26848  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      78      26848  Tetrahymena thermophila (strain SB210)
      79      26489  Phytophthora parasitica CJ01A1
      80      26477  Phytophthora parasitica P1569
      81      26452  Phytophthora parasitica P10297
      82      26438  Phytophthora parasitica (strain INRA-310)
      83      26336  Ovis aries (Sheep)
      84      25979  Oryzias latipes (Medaka fish) (Japanese ricefish)
      85      25820  Loxodonta africana (African elephant)
      86      25787  Bos taurus (Bovine)
      87      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      88      25699  Rattus norvegicus (Rat)
      89      25025  Aphanomyces astaci
      90      24915  Nematostella vectensis (Starlet sea anemone)
      91      24590  Guillardia theta CCMP2712
      92      24211  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      93      23804  Astyanax mexicanus (Blind cave fish) (Astyanax fasciatus mexicanus)
      94      23742  Ornithorhynchus anatinus (Duckbill platypus)
      95      23687  Lottia gigantea (Giant owl limpet)
      96      23650  Dendroctonus ponderosae (Mountain pine beetle)
      97      23565  Oxytricha trifallax
      98      23496  Latimeria chalumnae (West Indian ocean coelacanth)
      99      23369  Helobdella robusta (Californian leech)
     100      23285  Caenorhabditis elegans
     101      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
     102      22780  Monodelphis domestica (Gray short-tailed opossum)
     103      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
     104      22525  Lepisosteus oculatus (Spotted gar)
     105      22317  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
     106      22174  gut metagenome
     107      21926  Oryctolagus cuniculus (Rabbit)
     108      21706  Haemonchus contortus (Barber pole worm)
     109      21546  Heterocephalus glaber (Naked mole rat)
     110      21495  Gallus gallus (Chicken)
     111      21398  Caenorhabditis briggsae
     112      21203  Echinococcus granulosus (Hydatid tapeworm)
     113      21135  Ixodes scapularis (Black-legged tick) (Deer tick)
     114      21023  Felis catus (Cat) (Felis silvestris catus)
     115      20880  Myotis lucifugus (Little brown bat)
     116      20854  Tupaia chinensis (Chinese tree shrew)
     117      20802  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     118      20534  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     119      20146  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     120      20115  Ciona savignyi (Pacific transparent sea squirt)
     121      20092  Cavia porcellus (Guinea pig)
     122      20061  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     123      20028  Camelus ferus (Wild Bactrian camel)
     124      19976  Callorhynchus milii (Elephant fish) (Australian ghost shark)
     125      19824  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     126      19688  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     127      19601  Anolis carolinensis (Green anole) (American chameleon)
     128      19561  Pteropus alecto (Black flying fox)
     129      19522  Wuchereria bancrofti
     130      19300  Myotis brandtii (Brandt's bat)
     131      19201  Trypanosoma cruzi (strain CL Brener)
     132      19190  Necator americanus (Human hookworm)
     133      19062  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     134      18965  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     135      18861  Drosophila simulans (Fruit fly)
     136      18602  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     137      18559  Bos mutus
     138      18479  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     139      18417  Ophiophagus hannah (King cobra) (Naja hannah)
     140      18248  Tetranychus urticae (Two-spotted spider mite)
     141      18126  Atta cephalotes (Leafcutter ant)
     142      18048  Anopheles gambiae (African malaria mosquito)
     143      18047  Saprolegnia diclina VS20
     144      17976  Moniliophthora roreri (strain MCA 2997) (Cocoa frosty pod rot fungus) 
     145      17850  Hepatitis C virus subtype 1b
     146      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     147      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     148      17731  Bombyx mori (Silk moth)
     149      17683  Genlisea aurea
     150      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     151      17590  Gibberella moniliformis (strain M3125 / FGSC 7600)  
     152      17445  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     153      17381  Ceratitis capitata (Mediterranean fruit fly) (Tephritis capitata)
     154      17288  Nasonia vitripennis (Parasitic wasp)
     155      17245  Plasmodium falciparum
     156      17070  Tribolium castaneum (Red flour beetle)
     157      17042  Drosophila yakuba (Fruit fly)
     158      16949  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     159      16919  Meleagris gallopavo (Common turkey)
     160      16715  Drosophila persimilis (Fruit fly)
     161      16698  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     162      16639  Fusarium oxysporum f. sp. lycopersici  
     163      16619  Rhodnius prolixus (Triatomid bug)
     164      16487  uncultured archaeon
     165      16427  Ectocarpus siliculosus (Brown alga)
     166      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     167      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     168      16330  Danaus plexippus (Monarch butterfly)
     169      16275  Trichinella spiralis (Trichina worm)
     170      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     171      16215  Neovison vison (American mink) (Mustela vison)
     172      16204  Ixodes ricinus (Common tick)
     173      16191  Drosophila sechellia (Fruit fly)
     174      16191  Schistosoma japonicum (Blood fluke)
     175      16148  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     176      16112  Listeria monocytogenes
     177      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     178      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     179      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     180      15718  Naegleria gruberi (Amoeba)
     181      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     182      15592  Phytophthora ramorum (Sudden oak death agent)
     183      15467  Myotis davidii (David's myotis)
     184      15423  Drosophila willistoni (Fruit fly)
     185      15412  Pestalotiopsis fici W106-1
     186      15380  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     187      15355  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     188      15354  Loa loa (Eye worm) (Filaria loa)
     189      15228  Pythium ultimum
     190      15155  Drosophila ananassae (Fruit fly)
     191      15057  Pararge aegeria (specked wood butterfly)
     192      15042  Harpegnathos saltator (Jerdon's jumping ant)
     193      15012  Strigamia maritima (European centipede) (Geophilus maritimus)
     194      14987  Rabies virus
     195      14944  Acanthamoeba castellanii str. Neff
     196      14928  Drosophila erecta (Fruit fly)
     197      14869  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     198      14801  Camponotus floridanus (Florida carpenter ant)
     199      14794  Drosophila mojavensis (Fruit fly)
     200      14790  Gibberella fujikuroi (strain CBS 195.34 / IMI 58289 / NRRL A-6831)  
     201      14713  Plasmodium chabaudi
     202      14708  Drosophila virilis (Fruit fly)
     203      14654  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     204      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     205      14597  Angomonas deanei
     206      14417  Volvox carteri (Green alga)
     207      14346  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     208      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     209      14235  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     210      14157  Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
     211      13971  Acromyrmex echinatior (Panamanian leafcutter ant) 
     212      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     213      13879  Clonorchis sinensis (Chinese liver fluke)
     214      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     215      13806  Fomitopsis pinicola (strain FP-58527) (Brown rot fungus)
     216      13805  Porcine reproductive and respiratory syndrome virus (PRRSV)
     217      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     218      13768  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     219      13704  Trypanosoma cruzi
     220      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     221      13425  Hepatitis C virus subtype 1a
     222      13354  Giardia intestinalis (Giardia lamblia)
     223      13345  Aspergillus flavus 
     224      13338  Colletotrichum orbiculare   
     225      13306  Pyronema omphalodes (strain CBS 100304) (Pyronema confluens)
     226      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     227      13189  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     228      13159  Heterobasidion irregulare TC 32-1
     229      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     230      13115  Petromyzon marinus (Sea lamprey)
     231      13082  Glarea lozoyensis (strain ATCC 20868 / MF5171)
     232      13062  Mycosphaerella fijiensis (strain CIRAD86) (Black leaf streak disease fungus) 
     233      13040  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     234      12983  Albugo laibachii Nc14
     235      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     236      12951  Stigmatella aurantiaca (strain DW4/3-1)
     237      12879  Bipolaris victoriae FI3
     238      12856  Cochliobolus heterostrophus (strain C5 / ATCC 48332 / race O)  
     239      12851  Bipolaris zeicola 26-R-13
     240      12846  Magnaporthe oryzae (strain Y34) (Rice blast fungus) (Pyricularia oryzae)
     241      12768  Helicobacter pylori (Campylobacter pylori)
     242      12758  Schistosoma mansoni (Blood fluke)
     243      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     244      12711  Magnaporthe oryzae (strain P131) (Rice blast fungus) (Pyricularia oryzae)
     245      12703  Cochliobolus heterostrophus (strain C4 / ATCC 48331 / race T)  
     246      12697  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     247      12696  Trypanosoma congolense (strain IL3000)
     248      12645  Xenopus laevis (African clawed frog)
     249      12586  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     250      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          787838 (  1%)
    Bacteria       42090933 ( 77%)
    Eukaryota       9921566 ( 18%)
    Viruses         1970587 (  4%)
    Other            187626 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 116362 (  1%)           (  0%)
     Other Mammalia       1071905 ( 11%)           (  2%)
     Other Vertebrata     1017432 ( 10%)           (  2%)
     Viridiplantae        1970598 ( 20%)           (  4%)
     Fungi                2376975 ( 24%)           (  4%)
     Insecta               981915 ( 10%)           (  2%)
     Nematoda              304308 (  3%)           (  1%)
     Other                2082071 ( 21%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1447977             1001-1100   293655
                 51- 100 4936657             1101-1200   204540
                101- 150 5534275             1201-1300   148509
                151- 200 5370816             1301-1400    87757
                201- 250 5446085             1401-1500    73225
                251- 300 5289715             1501-1600    49416
                301- 350 4773692             1601-1700    36211
                351- 400 3547899             1701-1800    27207
                401- 450 3098033             1801-1900    21903
                451- 500 2525735             1901-2000    18465
                501- 550 1597981             2001-2100    15356
                551- 600 1232989             2101-2200    15149
                601- 650  903595             2201-2300    11418
                651- 700  711724             2301-2400     9437
                701- 750  588023             2401-2500     8271
                751- 800  504356             >2500        64802
                801- 850  395383
                851- 900  352849
                901- 950  240833
                951-1000  167130



   The average sequence length in UniProtKB/TrEMBL is   317 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    65240740                1.19                                                    
   Submitted to EMBL/GenBank/DDBJ  40131513  37778612      0.73                                                    
   Journal                         23017103  21716819      0.42                                                    
   Submitted to other databases     2074155   2067035      0.04                                                    
   Thesis                             11004     10945     <0.01                                                    
   Book citation                       6964      6901     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 502859


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      86255547                1.57                                                    
   CATALYTIC ACTIVITY               6523305   5978028      0.12     4                                              
   CAUTION                         34291509  34252598      0.62     1                                              
   COFACTOR                         2870583   2625495      0.05     8                                              
   DOMAIN                            303738    290326      0.01     9                                              
   ENZYME REGULATION                  93173     93173     <0.01    11                                              
   FUNCTION                         7549120   7146266      0.14     3                                              
   INTERACTION                         1716      1716     <0.01    12                                              
   MISCELLANEOUS                     173811    173599     <0.01    10                                              
   PATHWAY                          3325690   3013446      0.06     7                                              
   SIMILARITY                      20752886  15919028      0.38     2                                              
   SUBCELLULAR LOCATION             6345981   6117740      0.12     5                                              
   SUBUNIT                          4024035   3982980      0.07     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      35194822                0.64                                                    
   ACT_SITE                         2753447   1719425      0.05     5                                              
   BINDING                          5824289   1523762      0.11     2                                              
   CARBOHYD                             662       251     <0.01    27                                              
   CHAIN                             868362    694099      0.02     9                                              
   COILED                             99046     56560     <0.01    17                                              
   COMPBIAS                           15416     15286     <0.01    22                                              
   CROSSLNK                           14439      9790     <0.01    23                                              
   DISULFID                          136626    106508     <0.01    15                                              
   DNA_BIND                          103343     96407     <0.01    16                                              
   DOMAIN                           1139217    882915      0.02     8                                              
   INIT_MET                           18139     18139     <0.01    21                                              
   INTRAMEM                             392        56     <0.01    28                                              
   LIPID                              97318     48659     <0.01    18                                              
   METAL                            5649850   1454466      0.10     3                                              
   MOD_RES                           435361    392591      0.01    13                                              
   MOTIF                             349299    225107      0.01    14                                              
   NON_STD                             1893      1768     <0.01    25                                              
   NON_TER                          7954520   5210392      0.14     1                                              
   NP_BIND                          2065927   1230930      0.04     6                                              
   PEPTIDE                              111       111     <0.01    29                                              
   PROPEP                              6607      6607     <0.01    24                                              
   REGION                           1870147   1030827      0.03     7                                              
   REPEAT                             84238     20778     <0.01    20                                              
   SIGNAL                            732670    729495      0.01    11                                              
   SITE                              818677    409097      0.01    10                                              
   TOPO_DOM                          461419     90574      0.01    12                                              
   TRANSIT                             1582      1574     <0.01    26                                              
   TRANSMEM                         3594721    628620      0.07     4                                              
   ZN_FING                            97104     87784     <0.01    19                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             621414858               11.31                                                    
   Allergome                           3722      3085     <0.01    84   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   102   Organism-specific databases                
   ArrayExpress                       64482     64482     <0.01    56   Gene expression databases                  
   BRENDA                              2619      2591     <0.01    87   Enzyme and pathway databases               
   Bgee                               98641     98641     <0.01    50   Gene expression databases                  
   BindingDB                           5751      5751     <0.01    78   Chemistry                                  
   BioCyc                           5683890   5605351      0.10    20   Enzyme and pathway databases               
   CAZy                               73945     69479     <0.01    54   Protein family/group databases             
   CGD                                 6803      6803     <0.01    77   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   109   2D gel databases                           
   CTD                               423781    422421      0.01    38   Organism-specific databases                
   ChEMBL                               659       659     <0.01    94   Chemistry                                  
   ChiTaRS                            64952     64952     <0.01    55   Other                                      
   ConoServer                           159       159     <0.01   100   Organism-specific databases                
   DIP                                 3019      3014     <0.01    86   Protein-protein interaction databases      
   DNASU                              42075     41749     <0.01    63   Protocols and materials databases          
   EMBL                            58685641  53766967      1.07     3   Sequence databases                         
   Ensembl                          1110574   1095961      0.02    31   Genome annotation databases                
   EnsemblBacteria                 29539837  29111790      0.54     6   Genome annotation databases                
   EnsemblFungi                      401487    399114      0.01    39   Genome annotation databases                
   EnsemblMetazoa                    816499    800163      0.01    34   Genome annotation databases                
   EnsemblPlants                     777630    739869      0.01    35   Genome annotation databases                
   EnsemblProtists                   191191    188622     <0.01    45   Genome annotation databases                
   EuPathDB                          159768    159768     <0.01    49   Organism-specific databases                
   EvolutionaryTrace                   7972      7972     <0.01    76   Other                                      
   FlyBase                           198979    197508     <0.01    44   Organism-specific databases                
   GO                             102006258  34095873      1.86     2   Ontologies                                 
   Gene3D                          30226031  23768492      0.55     5   Family and domain databases                
   GeneID                          11053811  10747355      0.20    13   Genome annotation databases                
   GeneTree                         1005602   1005545      0.02    32   Phylogenomic databases                     
   Genevestigator                     85176     85169     <0.01    51   Gene expression databases                  
   GenoList                           14730     14457     <0.01    72   Organism-specific databases                
   GenomeRNAi                         25295     25295     <0.01    68   Other                                      
   Gramene                           204418    204418     <0.01    43   Organism-specific databases                
   GuidetoPHARMACOLOGY                   21        21     <0.01   107   Chemistry                                  
   H-InvDB                              603       456     <0.01    95   Organism-specific databases                
   HAMAP                            6992559   6900162      0.13    18   Family and domain databases                
   HGNC                               47319     47238     <0.01    59   Organism-specific databases                
   HOGENOM                          3646176   3646133      0.07    24   Phylogenomic databases                     
   HOVERGEN                          304146    304138      0.01    40   Phylogenomic databases                     
   InParanoid                        185616    185616     <0.01    46   Phylogenomic databases                     
   IntAct                             11875     11875     <0.01    73   Protein-protein interaction databases      
   InterPro                       127035125  44360270      2.31     1   Family and domain databases                
   KEGG                             9951150   9721104      0.18    14   Genome annotation databases                
   KO                               4172098   4151304      0.08    23   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    81   Organism-specific databases                
   Leproma                             1272      1270     <0.01    89   Organism-specific databases                
   MEROPS                            175369    175369     <0.01    47   Protein family/group databases             
   MGI                                52111     51673     <0.01    58   Organism-specific databases                
   MIM                                    4         4     <0.01   110   Organism-specific databases                
   MINT                               10176     10175     <0.01    74   Protein-protein interaction databases      
   NextBio                           206032    206031     <0.01    42   Other                                      
   OGP                                    3         3     <0.01   111   2D gel databases                           
   OMA                              6305701   6305695      0.11    19   Phylogenomic databases                     
   OrthoDB                          5181516   5181515      0.09    22   Phylogenomic databases                     
   PANTHER                          7905878   7508451      0.14    17   Family and domain databases                
   PATRIC                           8253286   8253156      0.15    15   Genome annotation databases                
   PDB                                23181     12512     <0.01    69   3D structure databases                     
   PDBsum                             22705     12198     <0.01    70   3D structure databases                     
   PIR                               171836    138981     <0.01    48   Sequence databases                         
   PIRSF                            5619982   5576222      0.10    21   Family and domain databases                
   PMAP-CutDB                           200       200     <0.01    99   Other                                      
   PRIDE                             930153    930153      0.02    33   Proteomic databases                        
   PRINTS                           8144215   7370584      0.15    16   Family and domain databases                
   PRO                                27172     27172     <0.01    66   Other                                      
   PROSITE                         28156399  18828363      0.51     8   Family and domain databases                
   PaxDb                              28701     28699     <0.01    65   Proteomic databases                        
   PeptideAtlas                         128       128     <0.01   101   Proteomic databases                        
   PeroxiBase                          2591      2583     <0.01    88   Protein family/group databases             
   Pfam                            56817285  41538228      1.03     4   Family and domain databases                
   PharmGKB                            3397      3397     <0.01    85   Organism-specific databases                
   PhosSite                             801       789     <0.01    92   PTM databases                              
   PhosphoSite                         1093      1093     <0.01    91   PTM databases                              
   PhylomeDB                         209381    209381     <0.01    41   Phylogenomic databases                     
   PomBase                               40        27     <0.01   104   Organism-specific databases                
   PptaseDB                              38        36     <0.01   105   Protein family/group databases             
   ProDom                           1139072   1103532      0.02    30   Family and domain databases                
   ProMEX                              5315      5315     <0.01    80   Proteomic databases                        
   ProtClustDB                      2709831   2709816      0.05    28   Phylogenomic databases                     
   ProteinModelPortal              14405304  14405304      0.26    10   3D structure databases                     
   PseudoCAP                           4516      4510     <0.01    82   Organism-specific databases                
   REBASE                             47264     47239     <0.01    60   Protein family/group databases             
   REPRODUCTION-2DPAGE                   65        64     <0.01   103   2D gel databases                           
   RGD                                21277     20261     <0.01    71   Organism-specific databases                
   Reactome                             228       187     <0.01    98   Enzyme and pathway databases               
   RefSeq                          11352257  10927824      0.21    12   Sequence databases                         
   SABIO-RK                             521       521     <0.01    96   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   108   Organism-specific databases                
   SMART                           12241829   9326185      0.22    11   Family and domain databases                
   SMR                              2601540   2601540      0.05    29   3D structure databases                     
   STRING                           3131848   3131741      0.06    26   Protein-protein interaction databases      
   SUPFAM                          28286201  22762822      0.51     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   106   2D gel databases                           
   SignaLink                           4332      4330     <0.01    83   Enzyme and pathway databases               
   TAIR                               35213     21964     <0.01    64   Organism-specific databases                
   TCDB                                5692      5682     <0.01    79   Protein family/group databases             
   TIGRFAMs                        14446454  13176085      0.26     9   Family and domain databases                
   TreeFam                           588368    588366      0.01    36   Phylogenomic databases                     
   TubercuList                         1101      1100     <0.01    90   Organism-specific databases                
   UCSC                               58540     58368     <0.01    57   Genome annotation databases                
   UniGene                           553244    520400      0.01    37   Sequence databases                         
   UniPathway                       3155377   2930897      0.06    25   Enzyme and pathway databases               
   VectorBase                         78248     77731     <0.01    52   Genome annotation databases                
   World-2DPAGE                         671       666     <0.01    93   2D gel databases                           
   WormBase                           42812     42640     <0.01    62   Organism-specific databases                
   Xenbase                            25438     25377     <0.01    67   Organism-specific databases                
   ZFIN                               45265     45234     <0.01    61   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    75   Organism-specific databases                
   eggNOG                           2755330   2755296      0.05    27   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    53   Organism-specific databases                
   mycoCLAP                             464       463     <0.01    97   Protein family/group databases             

Number of explicitly cross-referenced databases: 130


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.73   Gln (Q) 4.00   Leu (L) 10.0   Ser (S) 6.52
   Arg (R) 5.38   Glu (E) 6.20   Lys (K) 5.29   Thr (T) 5.51
   Asn (N) 4.11   Gly (G) 7.10   Met (M) 2.50   Trp (W) 1.29
   Asp (D) 5.34   His (H) 2.18   Phe (F) 4.03   Tyr (Y) 3.06
   Cys (C) 1.19   Ile (I) 6.10   Pro (P) 4.54   Val (V) 6.81

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.02


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 718693
Total number of entries encoded on a Plasmid: 413155
Total number of entries encoded on a Plastid: 31737
Total number of entries encoded on a Plastid; Apicoplast: 893
Total number of entries encoded on a Plastid; Chloroplast: 267907
Total number of entries encoded on a Plastid; Cyanelle: 9
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1641