Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2012_09 STATISTICS


1.  INTRODUCTION

Release 2012_09 of 03-Oct-2012 of UniProtKB/TrEMBL contains 26079526 sequence entries,
comprising 8448404066 amino acids .

2117581 sequences have been added since release 2012_08, the sequence data of
5919 existing entries has been updated and the annotations of
10268520 entries have been revised. This represents an increase of 8%.

Number of fragments: 3557875

Protein existence (PE):              entries      %
1: Evidence at protein level           13830     0.05%
2: Evidence at transcript level       622131     2.39%
3: Inferred from homology            5690937    21.82%
4: Predicted                        19752628    75.74%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 374439

   The first twenty species represent 1613223 sequences:   6.2 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:15747
                            2x:62858
                            3x:34034
                            4x:22225
                            5x:14136
                            6x:10270
                            7x: 7834
                            8x: 6020
                            9x: 4859
                           10x: 9545
                       11- 20x:24913
                       21- 50x: 8656
                       51-100x: 3302
                         >100x: 8308


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     485879  Human immunodeficiency virus 1
       2     110812  Homo sapiens (Human)
       3      96985  Oryza sativa subsp. japonica (Rice)
       4      78485  uncultured bacterium
       5      74044  Hepatitis C virus
       6      68941  Macaca mulatta (Rhesus macaque)
       7      61230  Glycine max (Soybean) (Glycine hispida)
       8      58282  Mus musculus (Mouse)
       9      56109  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      10      54503  Danio rerio (Zebrafish) (Brachydanio rerio)
      11      54076  Vitis vinifera (Grape)
      12      53262  Hepatitis B virus (HBV)
      13      50556  Trichomonas vaginalis
      14      49227  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      15      48875  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      16      44070  Populus trichocarpa (Western balsam poplar) 
      17      43131  Callithrix jacchus (White-tufted-ear marmoset)
      18      42789  Arabidopsis thaliana (Mouse-ear cress)
      19      42117  Zea mays (Maize)
      20      39850  Paramecium tetraurelia
      21      39606  Oryza sativa subsp. indica (Rice)
      22      35600  Ailuropoda melanoleuca (Giant panda)
      23      34801  Physcomitrella patens subsp. patens (Moss)
      24      34072  Drosophila melanogaster (Fruit fly)
      25      33925  Rattus norvegicus (Rat)
      26      33757  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      27      33262  Selaginella moellendorffii (Spikemoss)
      28      33140  Sus scrofa (Pig)
      29      32926  Monodelphis domestica (Gray short-tailed opossum)
      30      32680  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      31      32339  Oryza brachyantha
      32      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      33      32093  Oryza glaberrima (African rice)
      34      31396  Ricinus communis (Castor bean)
      35      30855  Daphnia pulex (Water flea)
      36      30300  Caenorhabditis brenneri (Nematode worm)
      37      30141  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      38      29816  Amphimedon queenslandica (Sponge)
      39      29448  Strongylocentrotus purpuratus (Purple sea urchin)
      40      29315  Pristionchus pacificus
      41      29168  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      42      29036  Oikopleura dioica (Tunicate)
      43      28847  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      44      28055  Gasterosteus aculeatus (Three-spined stickleback)
      45      28040  Bos taurus (Bovine)
      46      27998  Simian immunodeficiency virus (SIV)
      47      27899  Canis familiaris (Dog) (Canis lupus familiaris)
      48      27628  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      49      27482  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      50      27086  Gorilla gorilla gorilla (Lowland gorilla)
      51      26931  Ornithorhynchus anatinus (Duckbill platypus)
      52      26758  Gallus gallus (Chicken)
      53      25895  Oryzias latipes (Medaka fish) (Japanese ricefish)
      54      25755  Loxodonta africana (African elephant)
      55      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      56      25438  Caenorhabditis japonica
      57      25067  Oryctolagus cuniculus (Rabbit)
      58      24996  Escherichia coli
      59      24849  Nematostella vectensis (Starlet sea anemone)
      60      24643  Tetrahymena thermophila (strain SB210)
      61      24199  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      62      24159  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      63      24045  Equus caballus (Horse)
      64      23221  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      65      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      66      22846  Pan troglodytes (Chimpanzee)
      67      22535  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      68      22428  Caenorhabditis elegans
      69      21821  Latimeria chalumnae (West Indian ocean coelacanth)
      70      21693  Hordeum vulgare var. distichum (Two-rowed barley)
      71      21546  Heterocephalus glaber (Naked mole rat)
      72      21339  Caenorhabditis briggsae
      73      21086  Ixodes scapularis (Black-legged tick) (Deer tick)
      74      20853  Myotis lucifugus (Little brown bat)
      75      20129  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      76      20114  Ciona savignyi (Pacific transparent sea squirt)
      77      20055  Cavia porcellus (Guinea pig)
      78      19972  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
      79      19654  Taeniopygia guttata (Zebra finch) (Poephila guttata)
      80      19319  Toxoplasma gondii
      81      19200  Trypanosoma cruzi (strain CL Brener)
      82      19152  Anolis carolinensis (Green anole) (American chameleon)
      83      19035  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
      84      18916  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      85      18771  mine drainage metagenome
      86      18710  Drosophila simulans (Fruit fly)
      87      18121  Atta cephalotes (Leafcutter ant)
      88      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
      89      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
      90      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
      91      17366  Bombyx mori (Silk moth)
      92      17031  Drosophila yakuba (Fruit fly)
      93      17007  Tribolium castaneum (Red flour beetle)
      94      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
      95      16860  Meleagris gallopavo (Common turkey)
      96      16777  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
      97      16714  Drosophila persimilis (Fruit fly)
      98      16475  Drosophila pseudoobscura pseudoobscura (Fruit fly)
      99      16426  Ectocarpus siliculosus (Brown alga)
     100      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     101      16304  Danaus plexippus (Monarch butterfly)
     102      16263  Trichinella spiralis (Trichina worm)
     103      16239  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
     104      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     105      16234  Colletotrichum higginsianum
     106      16190  Drosophila sechellia (Fruit fly)
     107      16139  Schistosoma japonicum (Blood fluke)
     108      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     109      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     110      15730  Hepatitis C virus subtype 1b
     111      15715  Naegleria gruberi (Amoeba)
     112      15678  Plasmodium falciparum
     113      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     114      15622  Anopheles gambiae (African malaria mosquito)
     115      15557  Phytophthora ramorum (Sudden oak death agent)
     116      15419  Drosophila willistoni (Fruit fly)
     117      15354  Loa loa (Eye worm) (Filaria loa)
     118      15142  Drosophila ananassae (Fruit fly)
     119      15036  Harpegnathos saltator (Jerdon's jumping ant)
     120      15014  Hepatitis C virus subtype 1a
     121      14922  Drosophila erecta (Fruit fly)
     122      14848  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     123      14796  Camponotus floridanus (Florida carpenter ant)
     124      14788  Drosophila mojavensis (Fruit fly)
     125      14700  Drosophila virilis (Fruit fly)
     126      14697  Plasmodium chabaudi
     127      14650  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     128      14417  Volvox carteri (Green alga)
     129      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     130      14336  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     131      14328  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     132      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     133      13966  Acromyrmex echinatior (Panamanian leafcutter ant) 
     134      13863  Clonorchis sinensis (Chinese liver fluke)
     135      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     136      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     137      13329  Aspergillus flavus 
     138      13266  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     139      13178  Mustela putorius furo (European domestic ferret) (Mustela furo)
     140      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     141      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     142      12985  Trypanosoma cruzi
     143      12983  Albugo laibachii Nc14
     144      12950  Stigmatella aurantiaca (strain DW4/3-1)
     145      12936  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     146      12935  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     147      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     148      12696  Trypanosoma congolense (strain IL3000)
     149      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     150      12650  Schistosoma mansoni (Blood fluke)
     151      12596  Xenopus laevis (African clawed frog)
     152      12549  Ralstonia solanacearum (Pseudomonas solanacearum)
     153      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     154      12440  Polysphondylium pallidum (Cellular slime mold)
     155      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     156      12352  Dictyostelium purpureum (Slime mold)
     157      12308  Rabies virus
     158      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     159      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     160      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     161      11945  Emericella nidulans  
     162      11908  Apis mellifera (Honeybee)
     163      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     164      11780  Piriformospora indica (strain DSM 11827)
     165      11715  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     166      11714  Helicobacter pylori (Campylobacter pylori)
     167      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     168      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     169      11678  Porcine reproductive and respiratory syndrome virus (PRRSV)
     170      11669  Anopheles darlingi (Mosquito)
     171      11644  Plasmodium berghei (strain Anka)
     172      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     173      11566  Trichoplax adhaerens (Trichoplax reptans)
     174      11557  Trypanosoma vivax (strain Y486)
     175      11515  Puccinia triticina (isolate 1-1 / race 1 (BBBD)) (Brown leaf rust fungus)
     176      11514  Aureococcus anophagefferens (Harmful bloom alga)
     177      11499  Brugia malayi (Filarial nematode worm)
     178      11488  uncultured archaeon
     179      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     180      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     181      11396  Aspergillus oryzae (strain 3.042) (Yellow koji mold)
     182      11280  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     183      11211  Ktedonobacter racemifer DSM 44963
     184      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     185      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     186      10964  Streptomyces clavuligerus 
     187      10949  Aspergillus niger 
     188      10839  Pediculus humanus subsp. corporis (Body louse)
     189      10822  Chaetomium globosum  
     190      10570  Metarhizium robertsii (strain ARSEF 23 / ATCC MYA-3075) (Metarhizium anisopliae)
     191      10563  Amycolatopsis mediterranei S699
     192      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     193      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     194      10387  Pseudomonas syringae pv. glycinea str. race 4
     195      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     196      10377  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     197      10354  Phaeodactylum tricornutum (strain CCAP 1055/1)
     198      10273  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
     199      10221  Shigella flexneri 1235-66
     200      10216  Burkholderia terrae BS001
     201      10204  Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 
     202      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
     203      10171  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     204      10113  Burkholderia sp. BT03
     205      10109  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
     206      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
     207      10087  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
     208      10051  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
     209      10013  Streptomyces bingchenggensis (strain BCW-1)
     210       9846  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
     211       9836  Chlorella variabilis (Green alga)
     212       9822  Metarhizium acridum (strain CQMa 102)
     213       9803  Klebsiella pneumoniae
     214       9799  Coccomyxa subellipsoidea C-169
     215       9760  Thielavia terrestris (strain ATCC 38088 / NRRL 8126) (Acremonium alabamense)
     216       9704  Coccidioides immitis (strain RS) (Valley fever fungus)
     217       9703  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
     218       9662  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
     219       9651  Cordyceps militaris (strain CM01) (Caterpillar fungus)
     220       9597  Streptomyces cattleya 
     221       9533  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
     222       9510  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
     223       9496  Salmo salar (Atlantic salmon)
     224       9485  Ajellomyces dermatitidis (strain ER-3 / ATCC MYA-2586) 
     225       9443  Ajellomyces capsulata (strain H88) (Darling's disease fungus) 
     226       9391  Exophiala dermatitidis (strain ATCC 34100 / CBS 525.76 / NIH/UT8656)  
     227       9237  Monosiga brevicollis (Choanoflagellate)
     228       9201  Amycolatopsis mediterranei (strain U-32)
     229       9197  Streptomyces himastatinicus ATCC 53653
     230       9154  Ajellomyces capsulata (strain G186AR / H82 / ATCC MYA-2454 / RMSCC 2432)  
     231       9146  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
     232       9139  Pseudomonas syringae pv. pisi str. 1704B
     233       9135  Rhodococcus sp. JVH1
     234       9113  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
     235       9111  Hypocrea jecorina (strain QM6a) (Trichoderma reesei)
     236       9080  Thielavia heterothallica (strain ATCC 42464 / BCRC 31852 / DSM 1799) 
     237       9076  Saccharomyces cerevisiae x Saccharomyces kudriavzevii VIN7
     238       9064  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
     239       9046  Streptomyces hygroscopicus subsp. jinggangensis (strain 5008)
     240       9008  Neurospora crassa 
     241       8991  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
     242       8986  Dictyostelium discoideum (Slime mold)
     243       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
     244       8955  Rhodococcus opacus M213
     245       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
     246       8941  Streptomyces violaceusniger Tu 4113
     247       8940  Burkholderia sp. TJI49
     248       8900  Catenulispora acidiphila 
     249       8859  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
     250       8849  Pichia sorbitophila  


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          391721 (  2%)
    Bacteria       17815805 ( 68%)
    Eukaryota       6343962 ( 24%)
    Viruses         1486176 (  6%)
    Other             41861 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 110848 (  2%)           (  0%)
     Other Mammalia        852660 ( 13%)           (  3%)
     Other Vertebrata      707972 ( 11%)           (  3%)
     Viridiplantae        1213922 ( 19%)           (  5%)
     Fungi                1377259 ( 22%)           (  5%)
     Insecta               719967 ( 11%)           (  3%)
     Nematoda              223391 (  4%)           (  1%)
     Other                1137943 ( 18%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  651580             1001-1100   151620
                 51- 100 2196564             1101-1200   106319
                101- 150 2460274             1201-1300    74753
                151- 200 2391445             1301-1400    47863
                201- 250 2412503             1401-1500    38546
                251- 300 2340005             1501-1600    26898
                301- 350 2128907             1601-1700    20386
                351- 400 1614720             1701-1800    15540
                401- 450 1391649             1801-1900    12971
                451- 500 1144053             1901-2000    11105
                501- 550  761018             2001-2100     8735
                551- 600  584840             2101-2200     8890
                601- 650  426695             2201-2300     6953
                651- 700  334821             2301-2400     5500
                701- 750  283708             2401-2500     4729
                751- 800  252380             >2500        38843
                801- 850  191237
                851- 900  170476
                901- 950  117880
                951-1000   87245

   


   The average sequence length in UniProtKB/TrEMBL is   323 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    32015495                1.23                                                    
   Submitted to EMBL/GenBank/DDBJ  17904276  16296348      0.69                                                    
   Journal                         12791326  12025854      0.49                                                    
   Submitted to other databases     1303598   1300736      0.05                                                    
   Thesis                              9793      9735     <0.01                                                    
   Book citation                       6482      6433     <0.01                                                    
   Unpublished observations              19        19     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 447028


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      28979685                1.11                                                    
   CATALYTIC ACTIVITY               2518829   2299112      0.10     4                                              
   CAUTION                         10982706  10982665      0.42     1                                              
   COFACTOR                          906503    845961      0.03     8                                              
   DOMAIN                             81727     77768     <0.01     9                                              
   FUNCTION                         2669596   2502950      0.10     3                                              
   INTERACTION                          686       686     <0.01    11                                              
   MISCELLANEOUS                      53651     53555     <0.01    10                                              
   PATHWAY                          1228549   1118537      0.05     6                                              
   SIMILARITY                       7202965   6246809      0.28     2                                              
   SUBCELLULAR LOCATION             2119986   2024404      0.08     5                                              
   SUBUNIT                          1214487   1202592      0.05     7                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       6997135                0.27                                                    
   CHAIN                             764039    633887      0.03     2                                              
   NON_TER                          5644629   3558462      0.22     1                                              
   SIGNAL                            587707    585894      0.02     3                                              
   TRANSIT                              760       760     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             284511129               10.91                                                    
   AGD                                 2525      2525     <0.01    84   Organism-specific databases                
   ANU-2DPAGE                            52        52     <0.01    99   2D gel databases                           
   Allergome                           2863      2252     <0.01    79   Protein family/group databases             
   ArachnoServer                         66        66     <0.01    98   Organism-specific databases                
   ArrayExpress                       87350     87283     <0.01    52   Gene expression databases                  
   BRENDA                              2691      2661     <0.01    81   Enzyme and pathway databases               
   Bgee                              127446    127431     <0.01    47   Gene expression databases                  
   BioCyc                            670989    656552      0.03    31   Enzyme and pathway databases               
   CAZy                               74160     69679     <0.01    56   Protein family/group databases             
   CGD                                 7083      7083     <0.01    75   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     5         5     <0.01   105   2D gel databases                           
   CTD                               310553    309232      0.01    40   Organism-specific databases                
   ConoServer                           160       160     <0.01    94   Organism-specific databases                
   DIP                                 2683      2678     <0.01    82   Protein-protein interaction databases      
   DNASU                              43928     43603     <0.01    60   Protocols and materials databases          
   EMBL                            28597074  25295259      1.10     3   Sequence databases                         
   Ensembl                           957439    941027      0.04    28   Genome annotation databases                
   EnsemblBacteria                   835245    801141      0.03    30   Genome annotation databases                
   EnsemblFungi                      250072    249518      0.01    42   Genome annotation databases                
   EnsemblMetazoa                    499254    491730      0.02    34   Genome annotation databases                
   EnsemblPlants                     327995    319843      0.01    37   Genome annotation databases                
   EnsemblProtists                   115271    114209     <0.01    49   Genome annotation databases                
   EuPathDB                          178957    178954      0.01    45   Organism-specific databases                
   EvolutionaryTrace                   8193      8193     <0.01    73   Other                                      
   FlyBase                           195212    193667      0.01    43   Organism-specific databases                
   GO                              47301477  15128564      1.81     2   Ontologies                                 
   Gene3D                          10661051   8486041      0.41     6   Family and domain databases                
   GeneID                           8241874   8070238      0.32    10   Genome annotation databases                
   GeneTree                          886196    886134      0.03    29   Phylogenomic databases                     
   Genevestigator                     93944     93937     <0.01    51   Gene expression databases                  
   GenoList                           14735     14462     <0.01    71   Organism-specific databases                
   GenomeRNAi                         21961     21961     <0.01    66   Other                                      
   GenomeReviews                    4252975   4154204      0.16    15   Genome annotation databases                
   Gramene                            67642     67642     <0.01    57   Organism-specific databases                
   H-InvDB                              630       481     <0.01    90   Organism-specific databases                
   HAMAP                            2343664   2315238      0.09    24   Family and domain databases                
   HGNC                               46507     46429     <0.01    59   Organism-specific databases                
   HOGENOM                          3659754   3659728      0.14    18   Phylogenomic databases                     
   HOVERGEN                          311886    311876      0.01    39   Phylogenomic databases                     
   HSSP                              250861    250635      0.01    41   3D structure databases                     
   IPI                               313894    313659      0.01    38   Sequence databases                         
   InParanoid                        190203    190068      0.01    44   Phylogenomic databases                     
   IntAct                             16873     16872     <0.01    69   Protein-protein interaction databases      
   InterPro                        53452303  19263803      2.05     1   Family and domain databases                
   KEGG                             7504616   7339175      0.29    11   Genome annotation databases                
   KO                               2938098   2924979      0.11    20   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    76   Organism-specific databases                
   Leproma                             1272      1270     <0.01    87   Organism-specific databases                
   MEROPS                             81383     81383     <0.01    53   Protein family/group databases             
   MGI                                34593     34324     <0.01    62   Organism-specific databases                
   MINT                                8607      8607     <0.01    72   Protein-protein interaction databases      
   NextBio                           105294    105283     <0.01    50   Other                                      
   OMA                              3904296   3904258      0.15    16   Phylogenomic databases                     
   OrthoDB                           567163    567162      0.02    32   Phylogenomic databases                     
   PANTHER                          3430142   3260754      0.13    19   Family and domain databases                
   PATRIC                           8331617   8331529      0.32     8   Genome annotation databases                
   PDB                                17685      9977     <0.01    67   3D structure databases                     
   PDBsum                             17344      9782     <0.01    68   3D structure databases                     
   PHCI-2DPAGE                           99        99     <0.01    96   2D gel databases                           
   PIR                               173952    141107      0.01    46   Sequence databases                         
   PIRSF                            2054356   2053848      0.08    25   Family and domain databases                
   PMAP-CutDB                           215       215     <0.01    93   Other                                      
   PMMA-2DPAGE                            2         2     <0.01   106   2D gel databases                           
   PRIDE                             370369    370368      0.01    36   Proteomic databases                        
   PRINTS                           3885992   3442340      0.15    17   Family and domain databases                
   PROSITE                         12558932   8268533      0.48     5   Family and domain databases                
   Pathway_Interaction_DB                11         9     <0.01   104   Enzyme and pathway databases               
   PeptideAtlas                         144       144     <0.01    95   Proteomic databases                        
   PeroxiBase                          2557      2549     <0.01    83   Protein family/group databases             
   Pfam                            24170505  17815728      0.93     4   Family and domain databases                
   PharmGKB                            4506      4506     <0.01    78   Organism-specific databases                
   PhosphoSite                         1209      1209     <0.01    88   PTM databases                              
   PhylomeDB                         117119    117119     <0.01    48   Phylogenomic databases                     
   PomBase                               40        27     <0.01   100   Organism-specific databases                
   PptaseDB                              36        34     <0.01   101   Protein family/group databases             
   ProDom                            485481    462841      0.02    35   Family and domain databases                
   ProMEX                               280       280     <0.01    91   Proteomic databases                        
   ProtClustDB                      2721475   2721475      0.10    22   Phylogenomic databases                     
   ProteinModelPortal               7067374   7067328      0.27    12   3D structure databases                     
   PseudoCAP                           4544      4538     <0.01    77   Organism-specific databases                
   REBASE                             30962     30962     <0.01    63   Protein family/group databases             
   REPRODUCTION-2DPAGE                   86        85     <0.01    97   2D gel databases                           
   RGD                                24904     24579     <0.01    65   Organism-specific databases                
   Reactome                             217       180     <0.01    92   Enzyme and pathway databases               
   RefSeq                           8273065   8075492      0.32     9   Sequence databases                         
   SGD                                   11        11     <0.01   103   Organism-specific databases                
   SMART                            5618627   4248486      0.22    13   Family and domain databases                
   SMR                              1539633   1539633      0.06    26   3D structure databases                     
   STRING                           2592869   2592718      0.10    23   Protein-protein interaction databases      
   SUPFAM                          10274390   8443066      0.39     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   102   2D gel databases                           
   Siena-2DPAGE                           2         2     <0.01   107   2D gel databases                           
   TAIR                               16006     15927     <0.01    70   Organism-specific databases                
   TCDB                                2406      2394     <0.01    85   Protein family/group databases             
   TIGRFAMs                         5437343   4960091      0.21    14   Family and domain databases                
   TubercuList                         2032      2027     <0.01    86   Organism-specific databases                
   UCSC                               64466     64450     <0.01    58   Genome annotation databases                
   UniGene                           556045    522269      0.02    33   Sequence databases                         
   UniPathway                       1071968    998290      0.04    27   Enzyme and pathway databases               
   VectorBase                         78371     77856     <0.01    54   Genome annotation databases                
   World-2DPAGE                         936       931     <0.01    89   2D gel databases                           
   WormBase                           42240     42121     <0.01    61   Organism-specific databases                
   Xenbase                            25646     25571     <0.01    64   Organism-specific databases                
   ZFIN                                2808      2808     <0.01    80   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    74   Organism-specific databases                
   eggNOG                           2780663   2780662      0.11    21   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    55   Organism-specific databases                

Number of explicitly cross-referenced databases: 135


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.67   Gln (Q) 3.95   Leu (L) 9.92   Ser (S) 6.66
   Arg (R) 5.46   Glu (E) 6.17   Lys (K) 5.25   Thr (T) 5.57
   Asn (N) 4.09   Gly (G) 7.10   Met (M) 2.46   Trp (W) 1.30
   Asp (D) 5.32   His (H) 2.21   Phe (F) 4.01   Tyr (Y) 3.03
   Cys (C) 1.26   Ile (I) 5.96   Pro (P) 4.71   Val (V) 6.77

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 554799
Total number of entries encoded on a Plasmid: 297050
Total number of entries encoded on a Plastid: 21487
Total number of entries encoded on a Plastid; Apicoplast: 695
Total number of entries encoded on a Plastid; Chloroplast: 202617
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 893