Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2014_02 STATISTICS


1.  INTRODUCTION

Release 2014_02 of 19-Feb-2014 of UniProtKB/TrEMBL contains 52707211 sequence entries,
comprising 16719843101 amino acids.

1163762 sequences have been added since release 2014_01, the sequence data of
3116 existing entries has been updated and the annotations of
24329746 entries have been revised. This represents an increase of 2%.

Number of fragments: 4997911

Protein existence (PE):              entries      %
1: Evidence at protein level           21783     0.04%
2: Evidence at transcript level       757685     1.44%
3: Inferred from homology           12914196    24.50%
4: Predicted                        39013547    74.02%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 468089

   The first twenty species represent 1999142 sequences:   3.8 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:19384
                            2x:76454
                            3x:41135
                            4x:29122
                            5x:17145
                            6x:12336
                            7x: 9104
                            8x: 7287
                            9x: 5692
                           10x:10817
                       11- 20x:34145
                       21- 50x:11033
                       51-100x: 4399
                         >100x:15577


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     580420  Human immunodeficiency virus 1
       2     210555  uncultured bacterium
       3     116225  Homo sapiens (Human)
       4      96837  Oryza sativa subsp. japonica (Rice)
       5      91801  Hepatitis C virus
       6      77442  Hepatitis B virus (HBV)
       7      73914  Glycine max (Soybean) (Glycine hispida)
       8      73055  mine drainage metagenome
       9      70514  Hordeum vulgare var. distichum (Two-rowed barley)
      10      69268  Macaca mulatta (Rhesus macaque)
      11      60404  Zea mays (Maize)
      12      56821  Mus musculus (Mouse)
      13      56235  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      14      54980  Callithrix jacchus (White-tufted-ear marmoset)
      15      54908  Solanum tuberosum (Potato)
      16      54147  Vitis vinifera (Grape)
      17      52840  Danio rerio (Zebrafish) (Brachydanio rerio)
      18      50603  Trichomonas vaginalis
      19      49266  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      20      48907  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      21      47011  Populus trichocarpa (Western balsam poplar) 
      22      41205  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      23      40605  Arabidopsis thaliana (Mouse-ear cress)
      24      39896  Oryza sativa subsp. indica (Rice)
      25      39850  Paramecium tetraurelia
      26      39363  Setaria italica (Foxtail millet) (Panicum italicum)
      27      38796  Mustela putorius furo (European domestic ferret) (Mustela furo)
      28      38198  human gut metagenome
      29      36798  Simian immunodeficiency virus (SIV)
      30      36778  Drosophila melanogaster (Fruit fly)
      31      36598  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      32      35951  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      33      35675  Ailuropoda melanoleuca (Giant panda)
      34      35599  Emiliania huxleyi CCMP1516
      35      35305  Physcomitrella patens subsp. patens (Moss)
      36      35209  Acyrthosiphon pisum (Pea aphid)
      37      35066  Caenorhabditis japonica
      38      34570  Thalassiosira oceanica (Marine diatom)
      39      34505  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      40      33864  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      41      33675  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      42      33256  Selaginella moellendorffii (Spikemoss)
      43      32772  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      44      32362  Sus scrofa (Pig)
      45      32342  Oryza brachyantha
      46      32300  Phaseolus vulgaris (Kidney bean) (French bean)
      47      32142  Oryza glaberrima (African rice)
      48      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      49      31856  Pan troglodytes (Chimpanzee)
      50      31824  Anas platyrhynchos (Domestic duck) (Anas boschas)
      51      31392  Ricinus communis (Castor bean)
      52      31290  Citrus clementina
      53      31207  Capitella teleta (Polychaete worm)
      54      30954  Daphnia pulex (Water flea)
      55      30712  Caenorhabditis brenneri (Nematode worm)
      56      30147  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      57      29847  Rhizophagus irregularis (strain DAOM 181602 / DAOM 197198 / MUCL 43194)  
      58      29815  Amphimedon queenslandica (Sponge)
      59      29674  Escherichia coli
      60      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      61      29319  Pristionchus pacificus (Parasitic nematode)
      62      29185  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      63      29054  Oikopleura dioica (Tunicate)
      64      28825  Capsella rubella
      65      28823  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      66      28632  Prunus persica (Peach) (Amygdalus persica)
      67      28380  Thellungiella salsuginea (Saltwater cress) (Arabidopsis glauca)
      68      28101  Gasterosteus aculeatus (Three-spined stickleback)
      69      27797  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      70      27628  Canis familiaris (Dog) (Canis lupus familiaris)
      71      27519  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      72      27482  Equus caballus (Horse)
      73      27089  Gorilla gorilla gorilla (Lowland gorilla)
      74      26844  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      75      26477  Phytophthora parasitica P1569
      76      25975  Oryzias latipes (Medaka fish) (Japanese ricefish)
      77      25798  Loxodonta africana (African elephant)
      78      25761  Bos taurus (Bovine)
      79      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      80      25690  Rattus norvegicus (Rat)
      81      24916  Nematostella vectensis (Starlet sea anemone)
      82      24643  Tetrahymena thermophila (strain SB210)
      83      24590  Guillardia theta CCMP2712
      84      24212  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      85      23717  Ornithorhynchus anatinus (Duckbill platypus)
      86      23687  Lottia gigantea (Giant owl limpet)
      87      23650  Dendroctonus ponderosae (Mountain pine beetle)
      88      23565  Oxytricha trifallax
      89      23496  Latimeria chalumnae (West Indian ocean coelacanth)
      90      23369  Helobdella robusta (Californian leech)
      91      23196  Caenorhabditis elegans
      92      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      93      22751  Monodelphis domestica (Gray short-tailed opossum)
      94      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      95      22311  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      96      22174  gut metagenome
      97      21904  Oryctolagus cuniculus (Rabbit)
      98      21546  Heterocephalus glaber (Naked mole rat)
      99      21472  Gallus gallus (Chicken)
     100      21397  Caenorhabditis briggsae
     101      21129  Ixodes scapularis (Black-legged tick) (Deer tick)
     102      21013  Felis catus (Cat) (Felis silvestris catus)
     103      20867  Myotis lucifugus (Little brown bat)
     104      20854  Tupaia chinensis (Chinese tree shrew)
     105      20776  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     106      20531  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     107      20133  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     108      20115  Ciona savignyi (Pacific transparent sea squirt)
     109      20078  Cavia porcellus (Guinea pig)
     110      20059  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     111      20028  Camelus ferus (Wild Bactrian camel)
     112      19818  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     113      19687  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     114      19561  Pteropus alecto (Black flying fox)
     115      19558  Anolis carolinensis (Green anole) (American chameleon)
     116      19520  Wuchereria bancrofti
     117      19300  Myotis brandtii (Brandt's bat)
     118      19201  Trypanosoma cruzi (strain CL Brener)
     119      19062  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     120      18958  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     121      18860  Drosophila simulans (Fruit fly)
     122      18600  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     123      18599  Haemonchus contortus (Barber pole worm)
     124      18559  Bos mutus
     125      18477  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     126      18417  Ophiophagus hannah (King cobra) (Naja hannah)
     127      18246  Tetranychus urticae (Two-spotted spider mite)
     128      18113  Atta cephalotes (Leafcutter ant)
     129      18047  Saprolegnia diclina VS20
     130      18039  Anopheles gambiae (African malaria mosquito)
     131      17907  Moniliophthora roreri (strain MCA 2997) (Cocoa frosty pod rot fungus) 
     132      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     133      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     134      17709  Bombyx mori (Silk moth)
     135      17683  Genlisea aurea
     136      17663  Hepatitis C virus subtype 1b
     137      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     138      17442  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     139      17284  Nasonia vitripennis (Parasitic wasp)
     140      17222  Plasmodium falciparum
     141      17066  Tribolium castaneum (Red flour beetle)
     142      17042  Drosophila yakuba (Fruit fly)
     143      16949  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     144      16919  Meleagris gallopavo (Common turkey)
     145      16715  Drosophila persimilis (Fruit fly)
     146      16698  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     147      16639  Fusarium oxysporum f. sp. lycopersici  
     148      16614  Rhodnius prolixus (Triatomid bug)
     149      16427  Ectocarpus siliculosus (Brown alga)
     150      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     151      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     152      16329  Danaus plexippus (Monarch butterfly)
     153      16275  Trichinella spiralis (Trichina worm)
     154      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     155      16214  Neovison vison (American mink) (Mustela vison)
     156      16199  Ixodes ricinus (Common tick)
     157      16191  Drosophila sechellia (Fruit fly)
     158      16190  Schistosoma japonicum (Blood fluke)
     159      16148  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     160      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     161      16073  Listeria monocytogenes
     162      15815  uncultured archaeon
     163      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     164      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     165      15716  Naegleria gruberi (Amoeba)
     166      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     167      15592  Phytophthora ramorum (Sudden oak death agent)
     168      15467  Myotis davidii (David's myotis)
     169      15423  Drosophila willistoni (Fruit fly)
     170      15380  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     171      15355  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     172      15354  Loa loa (Eye worm) (Filaria loa)
     173      15228  Pythium ultimum
     174      15155  Drosophila ananassae (Fruit fly)
     175      15057  Pararge aegeria (specked wood butterfly)
     176      15042  Harpegnathos saltator (Jerdon's jumping ant)
     177      15011  Strigamia maritima (European centipede) (Geophilus maritimus)
     178      14944  Acanthamoeba castellanii str. Neff
     179      14928  Drosophila erecta (Fruit fly)
     180      14869  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     181      14801  Camponotus floridanus (Florida carpenter ant)
     182      14800  Rabies virus
     183      14794  Drosophila mojavensis (Fruit fly)
     184      14790  Gibberella fujikuroi (strain CBS 195.34 / IMI 58289 / NRRL A-6831)  
     185      14713  Plasmodium chabaudi
     186      14710  Drosophila virilis (Fruit fly)
     187      14654  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     188      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     189      14597  Angomonas deanei
     190      14417  Volvox carteri (Green alga)
     191      14346  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     192      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     193      14235  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     194      14157  Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
     195      13971  Acromyrmex echinatior (Panamanian leafcutter ant) 
     196      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     197      13878  Clonorchis sinensis (Chinese liver fluke)
     198      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     199      13806  Fomitopsis pinicola (strain FP-58527) (Brown rot fungus)
     200      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     201      13769  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     202      13702  Trypanosoma cruzi
     203      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     204      13526  Porcine reproductive and respiratory syndrome virus (PRRSV)
     205      13421  Hepatitis C virus subtype 1a
     206      13395  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     207      13345  Aspergillus flavus 
     208      13338  Colletotrichum orbiculare   
     209      13308  Giardia intestinalis (Giardia lamblia)
     210      13306  Pyronema omphalodes (strain CBS 100304) (Pyronema confluens)
     211      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     212      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     213      13115  Petromyzon marinus (Sea lamprey)
     214      13082  Glarea lozoyensis (strain ATCC 20868 / MF5171)
     215      13062  Mycosphaerella fijiensis (strain CIRAD86) (Black leaf streak disease fungus) 
     216      13041  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     217      12983  Albugo laibachii Nc14
     218      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     219      12951  Stigmatella aurantiaca (strain DW4/3-1)
     220      12856  Cochliobolus heterostrophus (strain C5 / ATCC 48332 / race O)  
     221      12846  Magnaporthe oryzae (strain Y34) (Rice blast fungus) (Pyricularia oryzae)
     222      12750  Schistosoma mansoni (Blood fluke)
     223      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     224      12711  Magnaporthe oryzae (strain P131) (Rice blast fungus) (Pyricularia oryzae)
     225      12703  Cochliobolus heterostrophus (strain C4 / ATCC 48331 / race T)  
     226      12697  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     227      12696  Trypanosoma congolense (strain IL3000)
     228      12694  Helicobacter pylori (Campylobacter pylori)
     229      12632  Xenopus laevis (African clawed frog)
     230      12586  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     231      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     232      12440  Polysphondylium pallidum (Cellular slime mold)
     233      12414  Mycosphaerella pini (strain NZE10 / CBS 128990) (Red band needle blight fungus) 
     234      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     235      12352  Dictyostelium purpureum (Slime mold)
     236      12300  Enterococcus gallinarum EGD-AAK12
     237      12197  Thanatephorus cucumeris (strain AG1-IB / isolate 7/3/14)  
     238      12174  Cochliobolus sativus (strain ND90Pr / ATCC 201652)  
     239      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     240      12143  Mucor circinelloides f. circinelloides (strain 1006PhL) (Mucormycosis agent) 
     241      12078  Ceriporiopsis subvermispora (strain B) (White-rot fungus)
     242      11997  Apis mellifera (Honeybee)
     243      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     244      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     245      11934  Emericella nidulans  
     246      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     247      11780  Piriformospora indica (strain DSM 11827)
     248      11752  Chondrocladia sp. SMF<DEU
     249      11751  Cladorhiza sp. SMF<DEU
     250      11750  Abyssocladia sp. SMF<DEU


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          781540 (  1%)
    Bacteria       40604689 ( 77%)
    Eukaryota       9225252 ( 18%)
    Viruses         1930916 (  4%)
    Other            164813 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 116278 (  1%)           (  0%)
     Other Mammalia       1046026 ( 11%)           (  2%)
     Other Vertebrata      939951 ( 10%)           (  2%)
     Viridiplantae        1839679 ( 20%)           (  3%)
     Fungi                2223355 ( 24%)           (  4%)
     Insecta               952083 ( 10%)           (  2%)
     Nematoda              281755 (  3%)           (  1%)
     Other                1826125 ( 20%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1397177             1001-1100   278474
                 51- 100 4730183             1101-1200   193854
                101- 150 5300256             1201-1300   140563
                151- 200 5153416             1301-1400    82203
                201- 250 5233091             1401-1500    68834
                251- 300 5081375             1501-1600    46169
                301- 350 4586209             1601-1700    33912
                351- 400 3406975             1701-1800    25420
                401- 450 2974513             1801-1900    20284
                451- 500 2425567             1901-2000    17139
                501- 550 1528989             2001-2100    14240
                551- 600 1179695             2101-2200    14158
                601- 650  863382             2201-2300    10620
                651- 700  680458             2301-2400     8794
                701- 750  561901             2401-2500     7661
                751- 800  482260             >2500        59468
                801- 850  377963
                851- 900  337039
                901- 950  228910
                951-1000  158148



   The average sequence length in UniProtKB/TrEMBL is   317 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    62066762                1.18                                                    
   Submitted to EMBL/GenBank/DDBJ  38389002  36173743      0.73                                                    
   Journal                         21754507  20651210      0.41                                                    
   Submitted to other databases     1905691   1897495      0.04                                                    
   Thesis                             10599     10540     <0.01                                                    
   Book citation                       6962      6899     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 495539


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      80861651                1.53                                                    
   CATALYTIC ACTIVITY               6023695   5477362      0.11     4                                              
   CAUTION                         33071479  33036125      0.63     1                                              
   COFACTOR                         2637265   2412527      0.05     8                                              
   DOMAIN                            287560    274814      0.01     9                                              
   ENZYME REGULATION                  88857     88857     <0.01    11                                              
   FUNCTION                         7012297   6628038      0.13     3                                              
   INTERACTION                         1702      1702     <0.01    12                                              
   MISCELLANEOUS                     164308    164093     <0.01    10                                              
   PATHWAY                          3095318   2800773      0.06     7                                              
   SIMILARITY                      18865224  14505486      0.36     2                                              
   SUBCELLULAR LOCATION             5874895   5659429      0.11     5                                              
   SUBUNIT                          3739051   3709285      0.07     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      33392134                0.63                                                    
   ACT_SITE                         2576914   1606021      0.05     5                                              
   BINDING                          5469865   1428762      0.10     2                                              
   CARBOHYD                             602       232     <0.01    27                                              
   CHAIN                             889492    720721      0.02     9                                              
   COILED                             94246     53704     <0.01    17                                              
   COMPBIAS                           14790     14664     <0.01    22                                              
   CROSSLNK                           13617      9211     <0.01    23                                              
   DISULFID                          128300    100265     <0.01    15                                              
   DNA_BIND                           99219     92535     <0.01    16                                              
   DOMAIN                           1066877    823115      0.02     8                                              
   INIT_MET                           17440     17440     <0.01    21                                              
   INTRAMEM                             392        56     <0.01    28                                              
   LIPID                              91526     45763     <0.01    19                                              
   METAL                            5293136   1360526      0.10     3                                              
   MOD_RES                           414437    373108      0.01    13                                              
   MOTIF                             333380    214873      0.01    14                                              
   NON_STD                             1887      1736     <0.01    25                                              
   NON_TER                          7671596   5000646      0.15     1                                              
   NP_BIND                          1952692   1170616      0.04     6                                              
   PEPTIDE                              101       101     <0.01    29                                              
   PROPEP                              6244      6244     <0.01    24                                              
   REGION                           1739167    960152      0.03     7                                              
   REPEAT                             74907     17198     <0.01    20                                              
   SIGNAL                            754375    750951      0.01    11                                              
   SITE                              754920    365096      0.01    10                                              
   TOPO_DOM                          445108     86999      0.01    12                                              
   TRANSIT                             1333      1333     <0.01    26                                              
   TRANSMEM                         3393046    592107      0.06     4                                              
   ZN_FING                            92525     83569     <0.01    18                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             606401919               11.51                                                    
   Allergome                           3724      3087     <0.01    82   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   100   Organism-specific databases                
   BRENDA                              2623      2595     <0.01    85   Enzyme and pathway databases               
   Bgee                               98855     98855     <0.01    49   Gene expression databases                  
   BindingDB                           5758      5758     <0.01    76   Chemistry                                  
   BioCyc                           5682030   5604542      0.11    20   Enzyme and pathway databases               
   CAZy                               73964     69497     <0.01    53   Protein family/group databases             
   CGD                                 6894      6894     <0.01    75   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   107   2D gel databases                           
   CTD                               397089    395736      0.01    38   Organism-specific databases                
   ChEMBL                               656       656     <0.01    92   Chemistry                                  
   ChiTaRS                            65205     65205     <0.01    54   Other                                      
   ConoServer                           160       160     <0.01    98   Organism-specific databases                
   DIP                                 3016      3011     <0.01    84   Protein-protein interaction databases      
   DNASU                              42200     41873     <0.01    61   Protocols and materials databases          
   EMBL                            56243599  51615854      1.07     3   Sequence databases                         
   Ensembl                          1041909   1027378      0.02    31   Genome annotation databases                
   EnsemblBacteria                 29662836  29236968      0.56     5   Genome annotation databases                
   EnsemblFungi                      385986    383652      0.01    39   Genome annotation databases                
   EnsemblMetazoa                    802297    786157      0.02    34   Genome annotation databases                
   EnsemblPlants                     670724    639432      0.01    35   Genome annotation databases                
   EnsemblProtists                   193936    191369     <0.01    43   Genome annotation databases                
   EuPathDB                          154744    154742     <0.01    47   Organism-specific databases                
   EvolutionaryTrace                   7993      7993     <0.01    74   Other                                      
   FlyBase                           199008    197536     <0.01    42   Organism-specific databases                
   GO                             112112173  33446195      2.13     2   Ontologies                                 
   Gene3D                          24503469  19317788      0.46     8   Family and domain databases                
   GeneID                          10742713  10444185      0.20    13   Genome annotation databases                
   GeneTree                          954513    954453      0.02    32   Phylogenomic databases                     
   Genevestigator                     85466     85460     <0.01    50   Gene expression databases                  
   GenoList                           14730     14457     <0.01    70   Organism-specific databases                
   GenomeRNAi                         19183     19183     <0.01    68   Other                                      
   GuidetoPHARMACOLOGY                   21        21     <0.01   105   Chemistry                                  
   H-InvDB                              607       460     <0.01    93   Organism-specific databases                
   HAMAP                            6659283   6570768      0.13    18   Family and domain databases                
   HGNC                               47655     47569     <0.01    57   Organism-specific databases                
   HOGENOM                          3646659   3646616      0.07    24   Phylogenomic databases                     
   HOVERGEN                          304594    304584      0.01    40   Phylogenomic databases                     
   InParanoid                        185791    185791     <0.01    44   Phylogenomic databases                     
   IntAct                             13430     13430     <0.01    71   Protein-protein interaction databases      
   InterPro                       120924597  42215578      2.29     1   Family and domain databases                
   KEGG                             9737407   9510339      0.18    14   Genome annotation databases                
   KO                               4030703   4010892      0.08    23   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    79   Organism-specific databases                
   Leproma                             1272      1270     <0.01    87   Organism-specific databases                
   MEROPS                            179367    179367     <0.01    45   Protein family/group databases             
   MGI                                52122     51684     <0.01    56   Organism-specific databases                
   MIM                                    4         4     <0.01   108   Organism-specific databases                
   MINT                               10195     10194     <0.01    72   Protein-protein interaction databases      
   NextBio                           206717    206711     <0.01    41   Other                                      
   OGP                                    3         3     <0.01   109   2D gel databases                           
   OMA                              6328455   6328449      0.12    19   Phylogenomic databases                     
   OrthoDB                          5207822   5207821      0.10    22   Phylogenomic databases                     
   PANTHER                          7519909   7141890      0.14    17   Family and domain databases                
   PATRIC                           8266550   8266421      0.16    15   Genome annotation databases                
   PDB                                22195     12063     <0.01    66   3D structure databases                     
   PDBsum                             22279     12021     <0.01    65   3D structure databases                     
   PIR                               171944    139087     <0.01    46   Sequence databases                         
   PIRSF                            5363359   5321824      0.10    21   Family and domain databases                
   PMAP-CutDB                           201       201     <0.01    97   Other                                      
   PRIDE                             949549    949549      0.02    33   Proteomic databases                        
   PRINTS                           7766026   7027170      0.15    16   Family and domain databases                
   PRO                                27228     27228     <0.01    63   Other                                      
   PROSITE                         26832946  17934297      0.51     6   Family and domain databases                
   PaxDb                              28848     28846     <0.01    62   Proteomic databases                        
   PeptideAtlas                         128       128     <0.01    99   Proteomic databases                        
   PeroxiBase                          2594      2586     <0.01    86   Protein family/group databases             
   Pfam                            54049227  39507464      1.03     4   Family and domain databases                
   PharmGKB                            3505      3505     <0.01    83   Organism-specific databases                
   PhosSite                             784       772     <0.01    90   PTM databases                              
   PhosphoSite                         1099      1099     <0.01    88   PTM databases                              
   PhylomeDB                         145307    145307     <0.01    48   Phylogenomic databases                     
   PomBase                               40        27     <0.01   102   Organism-specific databases                
   PptaseDB                              36        35     <0.01   103   Protein family/group databases             
   ProDom                           1092901   1058621      0.02    30   Family and domain databases                
   ProMEX                              5335      5335     <0.01    78   Proteomic databases                        
   ProtClustDB                      2710275   2710275      0.05    28   Phylogenomic databases                     
   ProteinModelPortal              14522009  14522009      0.28     9   3D structure databases                     
   PseudoCAP                           4519      4513     <0.01    80   Organism-specific databases                
   REBASE                             43861     43827     <0.01    59   Protein family/group databases             
   REPRODUCTION-2DPAGE                   65        64     <0.01   101   2D gel databases                           
   RGD                                21160     20221     <0.01    67   Organism-specific databases                
   Reactome                             242       186     <0.01    96   Enzyme and pathway databases               
   RefSeq                          11000240  10611475      0.21    12   Sequence databases                         
   SABIO-RK                             546       546     <0.01    94   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   106   Organism-specific databases                
   SMART                           11686563   8895992      0.22    11   Family and domain databases                
   SMR                              2617681   2617681      0.05    29   3D structure databases                     
   STRING                           2900476   2900400      0.06    26   Protein-protein interaction databases      
   SUPFAM                          25851286  20776688      0.49     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   104   2D gel databases                           
   SignaLink                           4346      4344     <0.01    81   Enzyme and pathway databases               
   TAIR                               14804     14731     <0.01    69   Organism-specific databases                
   TCDB                                5361      5351     <0.01    77   Protein family/group databases             
   TIGRFAMs                        13777721  12562016      0.26    10   Family and domain databases                
   TreeFam                           588451    588449      0.01    36   Phylogenomic databases                     
   TubercuList                         1093      1092     <0.01    89   Organism-specific databases                
   UCSC                               58810     58719     <0.01    55   Genome annotation databases                
   UniGene                           559369    529259      0.01    37   Sequence databases                         
   UniPathway                       3010299   2798096      0.06    25   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    51   Genome annotation databases                
   World-2DPAGE                         671       666     <0.01    91   2D gel databases                           
   WormBase                           42298     42126     <0.01    60   Organism-specific databases                
   Xenbase                            25530     25469     <0.01    64   Organism-specific databases                
   ZFIN                               45240     45197     <0.01    58   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    73   Organism-specific databases                
   eggNOG                           2755640   2755606      0.05    27   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    52   Organism-specific databases                
   mycoCLAP                             457       456     <0.01    95   Protein family/group databases             

Number of explicitly cross-referenced databases: 130


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.73   Gln (Q) 4.00   Leu (L) 10.0   Ser (S) 6.50
   Arg (R) 5.37   Glu (E) 6.20   Lys (K) 5.28   Thr (T) 5.52
   Asn (N) 4.10   Gly (G) 7.11   Met (M) 2.50   Trp (W) 1.29
   Asp (D) 5.34   His (H) 2.18   Phe (F) 4.03   Tyr (Y) 3.06
   Cys (C) 1.19   Ile (I) 6.11   Pro (P) 4.54   Val (V) 6.81

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.02


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 700702
Total number of entries encoded on a Plasmid: 378342
Total number of entries encoded on a Plastid: 30886
Total number of entries encoded on a Plastid; Apicoplast: 877
Total number of entries encoded on a Plastid; Chloroplast: 262436
Total number of entries encoded on a Plastid; Cyanelle: 9
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1518