Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2014_11 STATISTICS


1.  INTRODUCTION

Release 2014_11 of 26-Nov-2014 of UniProtKB/TrEMBL contains 88589455 sequence entries,
comprising 28116383037 amino acids.

2085787 sequences have been added since release 2014_10, the sequence data of
16349 existing entries has been updated and the annotations of
12846474 entries have been revised. This represents an increase of 3%.

Number of fragments: 6755811

Protein existence (PE):              entries      %
1: Evidence at protein level           43960     0.05%
2: Evidence at transcript level      1013697     1.14%
3: Inferred from homology           20963312    23.66%
4: Predicted                        66568486    75.14%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 524861

   The first twenty species represent 2591418 sequences:   2.9 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:20966
                            2x:83576
                            3x:45333
                            4x:32367
                            5x:18956
                            6x:13867
                            7x: 9969
                            8x: 8006
                            9x: 6405
                           10x:11420
                       11- 20x:41747
                       21- 50x:13371
                       51-100x: 5062
                         >100x:25121


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     611911  Human immunodeficiency virus 1
       2     352020  marine sediment metagenome
       3     239462  uncultured bacterium
       4     120839  Homo sapiens (Human)
       5     103776  Triticum aestivum (Wheat)
       6     100509  Brassica napus (Rape)
       7      98175  Hepatitis C virus
       8      96659  Oryza sativa subsp. japonica (Rice)
       9      90696  Hepatitis B virus (HBV)
      10      84324  Zea mays (Maize)
      11      83635  Escherichia coli
      12      73990  Glycine max (Soybean) (Glycine hispida)
      13      73055  mine drainage metagenome
      14      70544  Hordeum vulgare var. distichum (Two-rowed barley)
      15      69593  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      16      69572  Macaca mulatta (Rhesus macaque)
      17      67671  Phytophthora parasitica (Potato buckeye rot agent)
      18      65421  Ancylostoma ceylanicum
      19      60710  human gut metagenome
      20      58856  Burkholderia pseudomallei (Pseudomonas pseudomallei)
      21      58557  Mus musculus (Mouse)
      22      55041  Callithrix jacchus (White-tufted-ear marmoset)
      23      54931  Solanum tuberosum (Potato)
      24      54204  Vitis vinifera (Grape)
      25      53349  Danio rerio (Zebrafish) (Brachydanio rerio)
      26      50661  Trichomonas vaginalis
      27      49742  Oncorhynchus mykiss (Rainbow trout) (Salmo gairdneri)
      28      49274  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      29      48911  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      30      48124  Vibrio parahaemolyticus
      31      47085  Populus trichocarpa (Western balsam poplar) 
      32      44332  Citrus sinensis (Sweet orange) (Citrus aurantium var. sinensis)
      33      44277  Eucalyptus grandis (Flooded gum)
      34      41211  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      35      40872  Theobroma cacao (Cacao) (Cocoa)
      36      39923  Reticulomyxa filosa
      37      39910  Oryza sativa subsp. indica (Rice)
      38      39848  Paramecium tetraurelia
      39      39392  Setaria italica (Foxtail millet) (Panicum italicum)
      40      39341  Arabidopsis thaliana (Mouse-ear cress)
      41      39297  Simian immunodeficiency virus (SIV)
      42      38814  Mustela putorius furo (European domestic ferret) (Mustela furo)
      43      37312  Acyrthosiphon pisum (Pea aphid)
      44      37297  Drosophila melanogaster (Fruit fly)
      45      36609  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      46      35984  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      47      35673  Ailuropoda melanoleuca (Giant panda)
      48      35599  Emiliania huxleyi CCMP1516
      49      35327  Physcomitrella patens subsp. patens (Moss)
      50      35138  Caenorhabditis japonica
      51      34629  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      52      34570  Thalassiosira oceanica (Marine diatom)
      53      34566  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      54      33883  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      55      33752  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      56      33261  Selaginella moellendorffii (Spikemoss)
      57      32772  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      58      32646  Vibrio cholerae
      59      32543  Sus scrofa (Pig)
      60      32416  Phaseolus vulgaris (Kidney bean) (French bean)
      61      32342  Oryza brachyantha
      62      32206  Oryza glaberrima (African rice)
      63      32123  Caenorhabditis remanei (Caenorhabditis vulgaris)
      64      32101  Capitella teleta (Polychaete worm)
      65      32032  Staphylococcus aureus
      66      32013  Anas platyrhynchos (Domestic duck) (Anas boschas)
      67      31896  Pan troglodytes (Chimpanzee)
      68      31404  Ricinus communis (Castor bean)
      69      31290  Citrus clementina
      70      31203  Klebsiella pneumoniae
      71      30981  Daphnia pulex (Water flea)
      72      30713  Caenorhabditis brenneri (Nematode worm)
      73      30501  Poecilia formosa (Amazon molly) (Limia formosa)
      74      30184  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      75      29845  Rhizophagus irregularis (strain DAOM 181602 / DAOM 197198 / MUCL 43194)  
      76      29815  Amphimedon queenslandica (Sponge)
      77      29644  Pseudomonas aeruginosa
      78      29494  Strongylocentrotus purpuratus (Purple sea urchin)
      79      29334  Pristionchus pacificus (Parasitic nematode)
      80      29205  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      81      29083  Oikopleura dioica (Tunicate)
      82      28885  Erythranthe guttata (Yellow monkey flower) (Mimulus guttatus)
      83      28841  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      84      28826  Capsella rubella
      85      28669  Rhizophagus irregularis DAOM 197198w
      86      28643  Prunus persica (Peach) (Amygdalus persica)
      87      28382  Eutrema salsugineum (Saltwater cress) (Sisymbrium salsugineum)
      88      28196  Gasterosteus aculeatus (Three-spined stickleback)
      89      28004  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      90      27774  Canis familiaris (Dog) (Canis lupus familiaris)
      91      27562  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      92      27555  Equus caballus (Horse)
      93      27523  Jatropha curcas (Barbados nut)
      94      27438  Amborella trichopoda
      95      27101  Gorilla gorilla gorilla (Lowland gorilla)
      96      27017  Stegodyphus mimosarum
      97      26921  Tetrahymena thermophila (strain SB210)
      98      26861  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      99      26771  Morus notabilis
     100      26517  Phytophthora parasitica P1976
     101      26489  Phytophthora parasitica CJ01A1
     102      26477  Phytophthora parasitica P1569
     103      26452  Phytophthora parasitica P10297
     104      26438  Phytophthora parasitica (strain INRA-310)
     105      26421  Ovis aries (Sheep)
     106      26102  Rattus norvegicus (Rat)
     107      26061  Listeria monocytogenes
     108      26002  Oryzias latipes (Medaka fish) (Japanese ricefish)
     109      25846  Bos taurus (Bovine)
     110      25832  Loxodonta africana (African elephant)
     111      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
     112      25594  Coffea canephora (Robusta coffee)
     113      25025  Aphanomyces astaci
     114      24920  Nematostella vectensis (Starlet sea anemone)
     115      24590  Guillardia theta CCMP2712
     116      24375  Oxytricha trifallax
     117      24301  Tetraselmis sp. GSL018
     118      23809  Astyanax mexicanus (Blind cave fish) (Astyanax fasciatus mexicanus)
     119      23743  Ornithorhynchus anatinus (Duckbill platypus)
     120      23687  Lottia gigantea (Giant owl limpet)
     121      23651  Dendroctonus ponderosae (Mountain pine beetle)
     122      23544  Caenorhabditis elegans
     123      23497  Latimeria chalumnae (West Indian ocean coelacanth)
     124      23382  Helobdella robusta (Californian leech)
     125      23365  Arabis alpina (Alpine rock-cress)
     126      23318  Fusarium oxysporum f. sp. melonis 26406
     127      23271  Fusarium oxysporum f. sp. conglutinans race 2 54008
     128      23263  Fusarium oxysporum f. sp. pisi HDV247
     129      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
     130      22809  Monodelphis domestica (Gray short-tailed opossum)
     131      22754  Fusarium oxysporum f. sp. raphani 54005
     132      22565  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
     133      22528  Lepisosteus oculatus (Spotted gar)
     134      22325  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
     135      22248  Fusarium oxysporum f. sp. vasinfectum 25433
     136      22174  gut metagenome
     137      21972  Trichuris suis (pig whipworm)
     138      21957  Comamonas testosteroni (Pseudomonas testosteroni)
     139      21929  Oryctolagus cuniculus (Rabbit)
     140      21754  Haemonchus contortus (Barber pole worm)
     141      21689  Fusarium oxysporum f. sp. radicis-lycopersici 26381
     142      21661  Fusarium oxysporum Fo47
     143      21621  Gallus gallus (Chicken)
     144      21549  Fusarium oxysporum f. sp. lycopersici MN25
     145      21547  Heterocephalus glaber (Naked mole rat)
     146      21505  Burkholderia cepacia (Pseudomonas cepacia)
     147      21398  Caenorhabditis briggsae
     148      21357  Galerina marginata CBS 339.88
     149      21279  Echinococcus granulosus (Hydatid tapeworm)
     150      21210  Ixodes scapularis (Black-legged tick) (Deer tick)
     151      21173  Myotis lucifugus (Little brown bat)
     152      21042  Felis catus (Cat) (Felis silvestris catus)
     153      20867  Tupaia chinensis (Chinese tree shrew)
     154      20805  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     155      20768  Stylonychia lemnae
     156      20767  Fusarium oxysporum FOSC 3-a
     157      20544  Bacillus subtilis
     158      20541  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     159      20465  Fukomys damarensis (Damaraland mole rat) (Cryptomys damarensis)
     160      20295  Helicobacter pylori (Campylobacter pylori)
     161      20168  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     162      20115  Ciona savignyi (Pacific transparent sea squirt)
     163      20114  Papio anubis (Olive baboon)
     164      20106  Cavia porcellus (Guinea pig)
     165      20062  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     166      20052  Saprolegnia parasitica (strain CBS 223.65)
     167      20028  Camelus ferus (Wild Bactrian camel)
     168      19998  Callorhynchus milii (Elephant fish) (Australian ghost shark)
     169      19945  Burkholderia cenocepacia
     170      19837  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     171      19807  Fusarium oxysporum f. sp. cubense tropical race 4 54006
     172      19704  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     173      19687  Mesorhizobium plurifarium
     174      19635  Bactrocera dorsalis (Oriental fruit fly) (Dacus dorsalis)
     175      19625  Anolis carolinensis (Green anole) (American chameleon)
     176      19619  Brugia malayi (Filarial nematode worm)
     177      19594  Aphanomyces invadans
     178      19562  Pteropus alecto (Black flying fox)
     179      19523  Wuchereria bancrofti
     180      19426  Anopheles sinensis
     181      19300  Myotis brandtii (Brandt's bat)
     182      19200  Trypanosoma cruzi (strain CL Brener)
     183      19196  Necator americanus (Human hookworm)
     184      19138  Mycobacterium tuberculosis
     185      19112  Ixodes ricinus (Common tick)
     186      19093  uncultured archaeon
     187      19062  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     188      19017  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     189      18924  Drosophila simulans (Fruit fly)
     190      18600  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     191      18561  Bos mutus
     192      18551  Vibrio vulnificus
     193      18545  Acinetobacter baumannii
     194      18488  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     195      18417  Ophiophagus hannah (King cobra) (Naja hannah)
     196      18373  Nonlabens ulvanivorans
     197      18352  Plasmodium falciparum
     198      18294  Tetranychus urticae (Two-spotted spider mite)
     199      18125  Atta cephalotes (Leafcutter ant)
     200      18053  Anopheles gambiae (African malaria mosquito)
     201      18047  Saprolegnia diclina VS20
     202      17990  Hepatitis C virus subtype 1b
     203      17976  Moniliophthora roreri (strain MCA 2997) (Cocoa frosty pod rot fungus) 
     204      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     205      17795  Bombyx mori (Silk moth)
     206      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     207      17683  Genlisea aurea
     208      17620  Bacillus cereus
     209      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     210      17590  Gibberella moniliformis (strain M3125 / FGSC 7600)  
     211      17496  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     212      17399  Rhizobium radiobacter (Agrobacterium tumefaciens) (Agrobacterium radiobacter)
     213      17384  Ceratitis capitata (Mediterranean fruit fly) (Tephritis capitata)
     214      17289  Nasonia vitripennis (Parasitic wasp)
     215      17107  Drosophila yakuba (Fruit fly)
     216      17080  Tribolium castaneum (Red flour beetle)
     217      16949  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     218      16933  Meleagris gallopavo (Common turkey)
     219      16723  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     220      16715  Drosophila persimilis (Fruit fly)
     221      16676  Enterobacter agglomerans (Erwinia herbicola) (Pantoea agglomerans)
     222      16638  Fusarium oxysporum f. sp. lycopersici  
     223      16620  Rhodnius prolixus (Triatomid bug)
     224      16534  Cerapachys biroi (Ant)
     225      16484  Botryobasidium botryosum FD-172 SS1
     226      16453  Apis mellifera (Honeybee)
     227      16437  Streptococcus mitis
     228      16430  Ectocarpus siliculosus (Brown alga)
     229      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     230      16372  Opisthorchis viverrini
     231      16341  Jaapia argillacea MUCL 33604
     232      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     233      16335  Burkholderia mallei (Pseudomonas mallei)
     234      16332  Danaus plexippus (Monarch butterfly)
     235      16282  Trichinella spiralis (Trichina worm)
     236      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     237      16226  Neovison vison (American mink) (Mustela vison)
     238      16223  Schistosoma japonicum (Blood fluke)
     239      16195  Streptomyces scabiei
     240      16185  Drosophila sechellia (Fruit fly)
     241      16164  Pectobacterium carotovorum subsp. brasiliense
     242      16149  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     243      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     244      16071  Ralstonia solanacearum (Pseudomonas solanacearum)
     245      15871  Bacillus mycoides
     246      15856  Rabies virus
     247      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     248      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     249      15744  Pseudomonas syringae
     250      15718  Naegleria gruberi (Amoeba)


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          883641 (  1%)
    Bacteria       72337266 ( 82%)
    Eukaryota      12665769 ( 14%)
    Viruses         2152015 (  2%)
    Other            550763 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 120894 (  1%)           (  0%)
     Other Mammalia       1137426 (  9%)           (  1%)
     Other Vertebrata     1502600 ( 12%)           (  2%)
     Viridiplantae        2450652 ( 19%)           (  3%)
     Fungi                3422274 ( 27%)           (  4%)
     Insecta              1116362 (  9%)           (  1%)
     Nematoda              436093 (  3%)           (  0%)
     Other                2479468 ( 20%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 2005655             1001-1100   443147
                 51- 100 8027888             1101-1200   321285
                101- 150 9348495             1201-1300   229979
                151- 200 8826413             1301-1400   130738
                201- 250 9060237             1401-1500   114226
                251- 300 8884875             1501-1600    74731
                301- 350 8011696             1601-1700    57902
                351- 400 5933394             1701-1800    36386
                401- 450 5181144             1801-1900    31588
                451- 500 4190501             1901-2000    24420
                501- 550 2665336             2001-2100    24099
                551- 600 2026805             2101-2200    32173
                601- 650 1436019             2201-2300    18427
                651- 700 1151874             2301-2400    15355
                701- 750  891666             2401-2500    13596
                751- 800  762092             >2500        94265
                801- 850  594111
                851- 900  543211
                901- 950  371700
                951-1000  258215



   The average sequence length in UniProtKB/TrEMBL is   317 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                   101348188                1.14                                                    
   Submitted to EMBL/GenBank/DDBJ  71105586  67393325      0.80                                                    
   Journal                         27976867  26537065      0.32                                                    
   Submitted to other databases     2237468   2229787      0.03                                                    
   Thesis                             18905     18846     <0.01                                                    
   Book citation                       9361      9298     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 533375


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                     147400550                1.66                                                    
   CATALYTIC ACTIVITY              10724620   9847646      0.12     4                                              
   CAUTION                         62519554  62468633      0.71     1                                              
   COFACTOR                         4956463   4530744      0.06     8                                              
   DOMAIN                            559849    538261      0.01     9                                              
   ENZYME REGULATION                 208527    208527     <0.01    11                                              
   FUNCTION                        12457027  11822881      0.14     3                                              
   INTERACTION                         1799      1799     <0.01    12                                              
   MISCELLANEOUS                     323678    323450     <0.01    10                                              
   PATHWAY                          5616290   5043591      0.06     7                                              
   SIMILARITY                      33172241  25787926      0.37     2                                              
   SUBCELLULAR LOCATION            10034693   9694595      0.11     5                                              
   SUBUNIT                          6825809   6774155      0.08     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      55856514                0.63                                                    
   ACT_SITE                         4873488   3019626      0.06     5                                              
   BINDING                         10165195   2633519      0.11     1                                              
   CARBOHYD                             144        62     <0.01    28                                              
   CHAIN                             902751    713381      0.01    10                                              
   COILED                            189298    107102     <0.01    16                                              
   COMPBIAS                           29478     29316     <0.01    21                                              
   CROSSLNK                           28999     20687     <0.01    22                                              
   DISULFID                          207391    159060     <0.01    15                                              
   DNA_BIND                          160492    150663     <0.01    18                                              
   DOMAIN                           1946828   1552505      0.02     8                                              
   INIT_MET                           28737     28737     <0.01    23                                              
   INTRAMEM                             392        56     <0.01    27                                              
   LIPID                             150680     75340     <0.01    19                                              
   METAL                            9560375   2514487      0.11     3                                              
   MOD_RES                           743818    688936      0.01    12                                              
   MOTIF                             580093    373917      0.01    14                                              
   NON_STD                             2073      1931     <0.01    26                                              
   NON_TER                         10117406   6760513      0.11     2                                              
   NP_BIND                          3885402   2319788      0.04     6                                              
   PEPTIDE                              127       127     <0.01    29                                              
   PROPEP                              9309      9309     <0.01    24                                              
   REGION                           3207877   1765114      0.04     7                                              
   REPEAT                            125998     29272     <0.01    20                                              
   SIGNAL                            841279    836604      0.01    11                                              
   SITE                             1396522    705131      0.02     9                                              
   TOPO_DOM                          665374    138507      0.01    13                                              
   TRANSIT                             2197      2185     <0.01    25                                              
   TRANSMEM                         5873787   1047354      0.07     4                                              
   ZN_FING                           161004    144167     <0.01    17                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             787553527                8.89                                                    
   Allergome                           3784      3132     <0.01    83   Protein family/group databases             
   ArachnoServer                         99        99     <0.01   103   Organism-specific databases                
   BRENDA                              2563      2536     <0.01    89   Enzyme and pathway databases               
   Bgee                               94388     94388     <0.01    51   Gene expression databases                  
   BindingDB                          90076     90075     <0.01    52   Chemistry                                  
   BioCyc                           5767304   5689880      0.07    22   Enzyme and pathway databases               
   CAZy                               73758     69310     <0.01    57   Protein family/group databases             
   CGD                                 6762      6762     <0.01    78   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   109   2D gel databases                           
   CTD                               462955    461742      0.01    39   Organism-specific databases                
   ChEMBL                               785       785     <0.01    94   Chemistry                                  
   ChiTaRS                            87469     87309     <0.01    53   Other                                      
   ConoServer                           159       159     <0.01   100   Organism-specific databases                
   DIP                                 3138      3133     <0.01    86   Protein-protein interaction databases      
   DNASU                              41852     41526     <0.01    64   Protocols and materials databases          
   DrugBank                             145        57     <0.01   101   Chemistry                                  
   EMBL                            95181592  87376628      1.07     3   Sequence databases                         
   Ensembl                          1161069   1145756      0.01    31   Genome annotation databases                
   EnsemblBacteria                 37454640  36855082      0.42     7   Genome annotation databases                
   EnsemblFungi                      467824    465302      0.01    38   Genome annotation databases                
   EnsemblMetazoa                    917497    901210      0.01    33   Genome annotation databases                
   EnsemblPlants                     845278    804613      0.01    35   Genome annotation databases                
   EnsemblProtists                   190946    188510     <0.01    47   Genome annotation databases                
   EuPathDB                          161153    161152     <0.01    50   Organism-specific databases                
   EvolutionaryTrace                   7865      7865     <0.01    77   Other                                      
   ExpressionAtlas                   180035    180035     <0.01    48   Gene expression databases                  
   FlyBase                           199497    198023     <0.01    45   Organism-specific databases                
   GO                             122219828  41651354      1.38     2   Ontologies                                 
   Gene3D                          40509165  31677693      0.46     5   Family and domain databases                
   GeneID                          11577891  11299005      0.13    13   Genome annotation databases                
   GeneTree                         1067519   1067478      0.01    32   Phylogenomic databases                     
   Genevestigator                     82079     82075     <0.01    54   Gene expression databases                  
   GenoList                           14727     14454     <0.01    73   Organism-specific databases                
   GenomeRNAi                         23428     23428     <0.01    70   Other                                      
   Gramene                           194465    194465     <0.01    46   Organism-specific databases                
   GuidetoPHARMACOLOGY                   20        20     <0.01   107   Chemistry                                  
   H-InvDB                              595       448     <0.01    96   Organism-specific databases                
   HAMAP                            8767498   8645725      0.10    16   Family and domain databases                
   HGNC                               46090     46019     <0.01    62   Organism-specific databases                
   HOGENOM                          3641390   3641344      0.04    26   Phylogenomic databases                     
   HOVERGEN                          302309    302299     <0.01    41   Phylogenomic databases                     
   InParanoid                       2688661   2688661      0.03    29   Phylogenomic databases                     
   IntAct                             12793     12793     <0.01    74   Protein-protein interaction databases      
   InterPro                       156742558  53246391      1.77     1   Family and domain databases                
   KEGG                            10710412  10483797      0.12    14   Genome annotation databases                
   KO                               4565890   4542879      0.05    24   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    80   Organism-specific databases                
   Leproma                             1272      1270     <0.01    90   Organism-specific databases                
   MEROPS                            225752    225751     <0.01    42   Protein family/group databases             
   MGI                                53241     52876     <0.01    59   Organism-specific databases                
   MIM                                    4         4     <0.01   110   Organism-specific databases                
   MINT                               10098     10097     <0.01    75   Protein-protein interaction databases      
   MaxQB                               2572      2571     <0.01    88   Proteomic databases                        
   NextBio                           200676    200572     <0.01    44   Other                                      
   OGP                                    3         3     <0.01   111   2D gel databases                           
   OMA                              7279541   7279515      0.08    20   Phylogenomic databases                     
   OrthoDB                          5178942   5178939      0.06    23   Phylogenomic databases                     
   PANTHER                          8412585   8177471      0.09    18   Family and domain databases                
   PATRIC                           8244023   8243826      0.09    19   Genome annotation databases                
   PDB                                25266     13427     <0.01    67   3D structure databases                     
   PDBsum                             25153     13354     <0.01    68   3D structure databases                     
   PIR                               171115    138283     <0.01    49   Sequence databases                         
   PIRSF                            6956640   6900777      0.08    21   Family and domain databases                
   PMAP-CutDB                           199       199     <0.01    99   Other                                      
   PRIDE                             915136    915136      0.01    34   Proteomic databases                        
   PRINTS                           9556589   8612923      0.11    15   Family and domain databases                
   PRO                                26851     26850     <0.01    66   Other                                      
   PROSITE                         33219547  22343712      0.37     8   Family and domain databases                
   PaxDb                              28321     28319     <0.01    65   Proteomic databases                        
   PeptideAtlas                         127       127     <0.01   102   Proteomic databases                        
   PeroxiBase                          2583      2575     <0.01    87   Protein family/group databases             
   Pfam                            68212023  49695702      0.77     4   Family and domain databases                
   PharmGKB                            3205      3205     <0.01    85   Organism-specific databases                
   PhosSite                             888       876     <0.01    93   PTM databases                              
   PhosphoSite                         1078      1078     <0.01    92   PTM databases                              
   PhylomeDB                         362653    362653     <0.01    40   Phylogenomic databases                     
   PomBase                                2         2     <0.01   112   Organism-specific databases                
   PptaseDB                              38        36     <0.01   105   Protein family/group databases             
   ProDom                           1323493   1285631      0.01    30   Family and domain databases                
   ProMEX                              3374      3374     <0.01    84   Proteomic databases                        
   ProteinModelPortal              21931409  21931409      0.25     9   3D structure databases                     
   PseudoCAP                           4504      4498     <0.01    81   Organism-specific databases                
   REBASE                             48318     48312     <0.01    60   Protein family/group databases             
   REPRODUCTION-2DPAGE                   65        64     <0.01   104   2D gel databases                           
   RGD                                22343     21058     <0.01    71   Organism-specific databases                
   Reactome                          210597     73625     <0.01    43   Enzyme and pathway databases               
   RefSeq                          17664112  14244735      0.20    11   Sequence databases                         
   SABIO-RK                             514       514     <0.01    97   Enzyme and pathway databases               
   SGD                                    7         7     <0.01   108   Organism-specific databases                
   SMART                           14366133  10964243      0.16    12   Family and domain databases                
   SMR                              8582266   8582266      0.10    17   3D structure databases                     
   STRING                           3127734   3127560      0.04    27   Protein-protein interaction databases      
   SUPFAM                          38347851  30873903      0.43     6   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   106   2D gel databases                           
   SignaLink                           4107      4102     <0.01    82   Enzyme and pathway databases               
   TAIR                               21228     21110     <0.01    72   Organism-specific databases                
   TCDB                                6317      6308     <0.01    79   Protein family/group databases             
   TIGRFAMs                        17890605  16319869      0.20    10   Family and domain databases                
   TreeFam                           587484    587482      0.01    36   Phylogenomic databases                     
   TubercuList                         1100      1099     <0.01    91   Organism-specific databases                
   UCSC                               56397     56193     <0.01    58   Genome annotation databases                
   UniGene                           549051    512912      0.01    37   Sequence databases                         
   UniPathway                       4085428   3775732      0.05    25   Enzyme and pathway databases               
   VectorBase                         78242     77725     <0.01    55   Genome annotation databases                
   World-2DPAGE                         671       666     <0.01    95   2D gel databases                           
   WormBase                           43365     43034     <0.01    63   Organism-specific databases                
   Xenbase                            25019     24961     <0.01    69   Organism-specific databases                
   ZFIN                               47336     47259     <0.01    61   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    76   Organism-specific databases                
   eggNOG                           2749539   2749504      0.03    28   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    56   Organism-specific databases                
   mycoCLAP                             411       411     <0.01    98   Protein family/group databases             

Number of explicitly cross-referenced databases: 132


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.94   Gln (Q) 3.99   Leu (L) 9.94   Ser (S) 6.37
   Arg (R) 5.37   Glu (E) 6.08   Lys (K) 5.20   Thr (T) 5.56
   Asn (N) 4.12   Gly (G) 7.21   Met (M) 2.48   Trp (W) 1.26
   Asp (D) 5.43   His (H) 2.21   Phe (F) 3.98   Tyr (Y) 3.05
   Cys (C) 1.10   Ile (I) 6.16   Pro (P) 4.52   Val (V) 6.93

   Asx (B) 0      Glx (Z) 0      Xaa (X) 0.01


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Ile, Glu, Thr, Asp, Arg, Lys, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 795828
Total number of entries encoded on a Plasmid: 472720
Total number of entries encoded on a Plastid: 39297
Total number of entries encoded on a Plastid; Apicoplast: 
Total number of entries encoded on a Plastid; Chloroplast: 63
Total number of entries encoded on a Plastid; Cyanelle: 
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: