Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2012_04 STATISTICS


1.  INTRODUCTION

Release 2012_04 of 18-Apr-2012 of UniProtKB/TrEMBL contains 21552793 sequence entries,
comprising 7048241206 amino acids .

1107595 sequences have been added since release 2012_03, the sequence data of
16415 existing entries has been updated and the annotations of
7457587 entries have been revised. This represents an increase of 5%.

Number of fragments: 3189796

Protein existence (PE):              entries      %
1: Evidence at protein level           13209     0.06%
2: Evidence at transcript level       566854     2.63%
3: Inferred from homology            4723994    21.92%
4: Predicted                        16248736    75.39%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 344346

   The first twenty species represent 1517185 sequences:     7 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:14467
                            2x:60020
                            3x:31774
                            4x:20238
                            5x:12895
                            6x: 9262
                            7x: 7092
                            8x: 5350
                            9x: 4355
                           10x: 8673
                       11- 20x:22168
                       21- 50x: 7854
                       51-100x: 3018
                         >100x: 6972


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     452538  Human immunodeficiency virus 1
       2     110339  Homo sapiens (Human)
       3      96890  Oryza sativa subsp. japonica (Rice)
       4      70606  uncultured bacterium
       5      67824  Hepatitis C virus
       6      61396  Mus musculus (Mouse)
       7      54044  Vitis vinifera (Grape)
       8      53126  Danio rerio (Zebrafish) (Brachydanio rerio)
       9      51316  Macaca mulatta (Rhesus macaque)
      10      50483  Trichomonas vaginalis
      11      50130  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      12      49329  Hepatitis B virus (HBV)
      13      49221  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      14      48816  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      15      44070  Populus trichocarpa (Western balsam poplar) 
      16      43599  Arabidopsis thaliana (Mouse-ear cress)
      17      42092  Zea mays (Maize)
      18      42047  Callithrix jacchus (White-tufted-ear marmoset)
      19      39850  Paramecium tetraurelia
      20      39469  Oryza sativa subsp. indica (Rice)
      21      35598  Ailuropoda melanoleuca (Giant panda)
      22      34801  Physcomitrella patens subsp. patens (Moss)
      23      33909  Rattus norvegicus (Rat)
      24      33735  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      25      33573  Drosophila melanogaster (Fruit fly)
      26      33271  Selaginella moellendorffii (Spikemoss)
      27      32674  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      28      31827  Caenorhabditis remanei (Caenorhabditis vulgaris)
      29      31583  Monodelphis domestica (Gray short-tailed opossum)
      30      31381  Ricinus communis (Castor bean)
      31      30550  Daphnia pulex (Water flea)
      32      30300  Caenorhabditis brenneri (Nematode worm)
      33      29427  Strongylocentrotus purpuratus (Purple sea urchin)
      34      29295  Pristionchus pacificus
      35      29164  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      36      29026  Oikopleura dioica (Tunicate)
      37      28880  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      38      28036  Gasterosteus aculeatus (Three-spined stickleback)
      39      28018  Bos taurus (Bovine)
      40      27314  Canis familiaris (Dog) (Canis lupus familiaris)
      41      27086  Gorilla gorilla gorilla (Lowland gorilla)
      42      26871  Ornithorhynchus anatinus (Duckbill platypus)
      43      26056  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
      44      25866  Oryzias latipes (Medaka fish) (Japanese ricefish)
      45      25755  Loxodonta africana (African elephant)
      46      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      47      25438  Caenorhabditis japonica
      48      25046  Oryctolagus cuniculus (Rabbit)
      49      24950  Sus scrofa (Pig)
      50      24897  Gallus gallus (Chicken)
      51      24825  Nematostella vectensis (Starlet sea anemone)
      52      24188  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      53      24157  Escherichia coli
      54      24056  Pongo abelii (Sumatran orangutan)
      55      23820  Equus caballus (Horse)
      56      23156  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      57      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      58      23101  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      59      22829  Pan troglodytes (Chimpanzee)
      60      22519  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      61      22085  Caenorhabditis elegans
      62      21863  Latimeria chalumnae (West Indian ocean coelacanth)
      63      21665  Hordeum vulgare var. distichum (Two-rowed barley)
      64      21546  Heterocephalus glaber (Naked mole rat)
      65      21226  Caenorhabditis briggsae
      66      21085  Ixodes scapularis (Black-legged tick) (Deer tick)
      67      20852  Myotis lucifugus (Little brown bat)
      68      20124  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      69      20110  Ciona savignyi (Pacific transparent sea squirt)
      70      20039  Cavia porcellus (Guinea pig)
      71      19648  Taeniopygia guttata (Zebra finch) (Poephila guttata)
      72      19205  Toxoplasma gondii
      73      19201  Trypanosoma cruzi (strain CL Brener)
      74      19011  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
      75      18908  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      76      18771  mine drainage metagenome
      77      18623  Drosophila simulans (Fruit fly)
      78      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
      79      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
      80      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
      81      17031  Drosophila yakuba (Fruit fly)
      82      16994  Tribolium castaneum (Red flour beetle)
      83      16758  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
      84      16712  Drosophila persimilis (Fruit fly)
      85      16647  Ralstonia solanacearum (Pseudomonas solanacearum)
      86      16425  Ectocarpus siliculosus (Brown alga)
      87      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
      88      16306  Loa loa (Eye worm) (Filaria loa)
      89      16303  Danaus plexippus (Monarch butterfly)
      90      16264  Trichinella spiralis (Trichina worm)
      91      16239  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
      92      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
      93      16190  Drosophila sechellia (Fruit fly)
      94      16129  Colletotrichum higginsianum
      95      15983  Drosophila pseudoobscura pseudoobscura (Fruit fly)
      96      15977  Meleagris gallopavo (Common turkey)
      97      15794  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
      98      15761  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
      99      15714  Naegleria gruberi (Amoeba)
     100      15622  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     101      15622  Anopheles gambiae (African malaria mosquito)
     102      15554  Phytophthora ramorum (Sudden oak death agent)
     103      15418  Drosophila willistoni (Fruit fly)
     104      15230  Tetrahymena thermophila (strain SB210)
     105      15142  Drosophila ananassae (Fruit fly)
     106      15031  Harpegnathos saltator (Jerdon's jumping ant)
     107      14964  Hepatitis C virus subtype 1a
     108      14922  Drosophila erecta (Fruit fly)
     109      14854  Hepatitis C virus subtype 1b
     110      14854  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     111      14793  Camponotus floridanus (Florida carpenter ant)
     112      14781  Drosophila mojavensis (Fruit fly)
     113      14695  Drosophila virilis (Fruit fly)
     114      14669  Plasmodium chabaudi
     115      14649  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     116      14417  Volvox carteri (Green alga)
     117      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     118      14332  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     119      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     120      14185  Plasmodium falciparum
     121      13966  Acromyrmex echinatior (Panamanian leafcutter ant) 
     122      13863  Clonorchis sinensis (Chinese liver fluke)
     123      13767  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     124      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     125      13328  Aspergillus flavus 
     126      13268  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     127      13173  Mustela putorius furo (European domestic ferret) (Mustela furo)
     128      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     129      13042  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     130      12983  Albugo laibachii Nc14
     131      12950  Stigmatella aurantiaca (strain DW4/3-1)
     132      12936  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     133      12764  Glycine max (Soybean) (Glycine hispida)
     134      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     135      12696  Trypanosoma congolense (strain IL3000)
     136      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     137      12604  Schistosoma mansoni (Blood fluke)
     138      12583  Xenopus laevis (African clawed frog)
     139      12521  Trypanosoma cruzi
     140      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     141      12440  Polysphondylium pallidum (Cellular slime mold)
     142      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     143      12352  Dictyostelium purpureum (Slime mold)
     144      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     145      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     146      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     147      11933  Emericella nidulans  
     148      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     149      11780  Piriformospora indica (strain DSM 11827)
     150      11715  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     151      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     152      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     153      11666  Anopheles darlingi (Mosquito)
     154      11644  Plasmodium berghei (strain Anka)
     155      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     156      11562  Trichoplax adhaerens (Trichoplax reptans)
     157      11557  Trypanosoma vivax Y486
     158      11516  Helicobacter pylori (Campylobacter pylori)
     159      11514  Aureococcus anophagefferens (Harmful bloom alga)
     160      11498  Brugia malayi (Filarial nematode worm)
     161      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     162      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     163      11455  Rabies virus
     164      11289  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     165      11211  Ktedonobacter racemifer DSM 44963
     166      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     167      10998  Schistosoma japonicum (Blood fluke)
     168      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     169      10966  Streptomyces clavuligerus ATCC 27064
     170      10949  Aspergillus niger 
     171      10926  Porcine reproductive and respiratory syndrome virus (PRRSV)
     172      10839  Pediculus humanus subsp. corporis (Body louse)
     173      10820  Chaetomium globosum  
     174      10570  Metarhizium robertsii (strain ARSEF 23 / ATCC MYA-3075) (Metarhizium anisopliae)
     175      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     176      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     177      10387  Pseudomonas syringae pv. glycinea str. race 4
     178      10385  uncultured archaeon
     179      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     180      10377  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     181      10354  Phaeodactylum tricornutum (strain CCAP 1055/1)
     182      10274  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
     183      10204  Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 
     184      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
     185      10171  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     186      10110  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
     187      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
     188      10087  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
     189      10051  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
     190      10013  Streptomyces bingchenggensis (strain BCW-1)
     191       9846  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
     192       9836  Chlorella variabilis (Green alga)
     193       9822  Metarhizium acridum (strain CQMa 102)
     194       9760  Thielavia terrestris (strain ATCC 38088 / NRRL 8126) (Acremonium alabamense)
     195       9703  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
     196       9662  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
     197       9651  Cordyceps militaris (strain CM01) (Caterpillar fungus)
     198       9642  Klebsiella pneumoniae
     199       9551  Amycolatopsis mediterranei S699
     200       9533  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
     201       9510  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
     202       9485  Ajellomyces dermatitidis (strain ER-3 / ATCC MYA-2586) 
     203       9444  Salmo salar (Atlantic salmon)
     204       9443  Ajellomyces capsulata (strain H88) (Darling's disease fungus) 
     205       9391  Exophiala dermatitidis NIH/UT8656
     206       9341  Anolis carolinensis (Green anole) (American chameleon)
     207       9237  Monosiga brevicollis (Choanoflagellate)
     208       9201  Amycolatopsis mediterranei (strain U-32)
     209       9197  Streptomyces himastatinicus ATCC 53653
     210       9154  Ajellomyces capsulata (strain G186AR / H82 / ATCC MYA-2454 / RMSCC 2432)  
     211       9146  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
     212       9139  Pseudomonas syringae pv. pisi str. 1704B
     213       9113  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
     214       9112  Hypocrea jecorina (strain QM6a) (Trichoderma reesei)
     215       9083  Thielavia heterothallica (strain ATCC 42464 / BCRC 31852 / DSM 1799) 
     216       9076  Saccharomyces cerevisiae x Saccharomyces kudriavzevii VIN7
     217       9064  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
     218       9046  Streptomyces hygroscopicus subsp. jinggangensis (strain 5008)
     219       9009  Neurospora crassa 
     220       8992  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
     221       8988  Dictyostelium discoideum (Slime mold)
     222       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
     223       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
     224       8941  Streptomyces violaceusniger Tu 4113
     225       8940  Burkholderia sp. TJI49
     226       8900  Catenulispora acidiphila 
     227       8859  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
     228       8849  Pichia sorbitophila  
     229       8796  Aspergillus clavatus 
     230       8794  Bradyrhizobium japonicum USDA 6
     231       8783  Pseudomonas syringae pv. japonica str. M301072PT
     232       8755  Rhodococcus sp. (strain RHA1)
     233       8741  Trypanosoma brucei brucei (strain 927/4 GUTat10.1)
     234       8705  Trichophyton rubrum (strain ATCC MYA-4607 / CBS 118892) (Athlete's foot fungus)
     235       8699  Streptomyces coelicoflavus ZG0656
     236       8698  Paracoccidioides brasiliensis (strain Pb18)
     237       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
     238       8676  Trichophyton equinum (strain ATCC MYA-4606 / CBS 127.97) (Horse ringworm fungus)
     239       8661  Arthroderma otae (strain ATCC MYA-4605 / CBS 113480) (Microsporum canis)
     240       8605  Batrachochytrium dendrobatidis (strain JAM81 / FGSC 10211) (Frog chytrid fungus)
     241       8599  Entamoeba dispar (strain ATCC PRA-260 / SAW760)
     242       8520  Trichophyton tonsurans (strain CBS 112818) (Scalp ringworm fungus)
     243       8494  Nocardia brasiliensis ATCC 700358
     244       8458  uncultured crenarchaeote
     245       8437  Plesiocystis pacifica SIR-1
     246       8428  Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)
     247       8394  Streptomyces sp. AA4
     248       8374  Capsaspora owczarzaki (strain ATCC 30864)
     249       8320  Frankia sp. CN3
     250       8312  Entamoeba histolytica


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          359795 (  2%)
    Bacteria       14057762 ( 65%)
    Eukaryota       5734597 ( 27%)
    Viruses         1359408 (  6%)
    Other             41230 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 110375 (  2%)           (  1%)
     Other Mammalia        796195 ( 14%)           (  4%)
     Other Vertebrata      635218 ( 11%)           (  3%)
     Viridiplantae        1004940 ( 18%)           (  5%)
     Fungi                1246995 ( 22%)           (  6%)
     Insecta               647207 ( 11%)           (  3%)
     Nematoda              223008 (  4%)           (  1%)
     Other                1070659 ( 19%)           (  5%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  502883             1001-1100   128934
                 51- 100 1769605             1101-1200    91527
                101- 150 2001734             1201-1300    64112
                151- 200 1940381             1301-1400    41604
                201- 250 1955405             1401-1500    33594
                251- 300 1894630             1501-1600    23598
                301- 350 1729892             1601-1700    17851
                351- 400 1318465             1701-1800    13917
                401- 450 1129651             1801-1900    11555
                451- 500  938346             1901-2000     9881
                501- 550  630786             2001-2100     7864
                551- 600  488197             2101-2200     7748
                601- 650  355376             2201-2300     6250
                651- 700  278403             2301-2400     4971
                701- 750  237757             2401-2500     4235
                751- 800  211854             >2500        34658
                801- 850  160107
                851- 900  144172
                901- 950   99209
                951-1000   73845

   


   The average sequence length in UniProtKB/TrEMBL is   327 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    26002758                1.21                                                    
   Submitted to EMBL/GenBank/DDBJ  14524039  13311501      0.67                                                    
   Journal                         10475344   9789198      0.49                                                    
   Submitted to other databases      987312    977935      0.05                                                    
   Thesis                              9593      9535     <0.01                                                    
   Book citation                       6441      6392     <0.01                                                    
   Unpublished observations              28        28     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 430368


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      22595201                1.05                                                    
   CATALYTIC ACTIVITY               2014204   1857672      0.09     4                                              
   CAUTION                          7656770   7656756      0.36     1                                              
   COFACTOR                          718747    675885      0.03     8                                              
   DOMAIN                             65120     61993     <0.01     9                                              
   FUNCTION                         2242613   2080600      0.10     3                                              
   INTERACTION                          661       661     <0.01    11                                              
   MISCELLANEOUS                      37526     37442     <0.01    10                                              
   PATHWAY                           980449    889465      0.05     7                                              
   SIMILARITY                       6072424   5280201      0.28     2                                              
   SUBCELLULAR LOCATION             1780806   1708421      0.08     5                                              
   SUBUNIT                          1025881   1024224      0.05     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       6154601                0.29                                                    
   CHAIN                             616765    494718      0.03     2                                              
   NON_TER                          5092607   3190040      0.24     1                                              
   SIGNAL                            444465    442925      0.02     3                                              
   TRANSIT                              764       764     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             240523861               11.16                                                    
   AGD                                 2525      2525     <0.01    81   Organism-specific databases                
   ANU-2DPAGE                            53        53     <0.01    98   2D gel databases                           
   Allergome                           2478      1878     <0.01    83   Protein family/group databases             
   ArachnoServer                         66        66     <0.01    97   Organism-specific databases                
   ArrayExpress                       88260     88258     <0.01    51   Gene expression databases                  
   BRENDA                              2738      2707     <0.01    79   Enzyme and pathway databases               
   Bgee                              142181    142112      0.01    47   Gene expression databases                  
   BioCyc                            670208    655827      0.03    29   Enzyme and pathway databases               
   CAZy                               74185     69705     <0.01    54   Protein family/group databases             
   CGD                                 7089      7089     <0.01    75   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     5         5     <0.01   103   2D gel databases                           
   CTD                               290142    288566      0.01    37   Organism-specific databases                
   CYGD                                   2         2     <0.01   105   Organism-specific databases                
   ConoServer                           160       160     <0.01    93   Organism-specific databases                
   DIP                                 2672      2667     <0.01    80   Protein-protein interaction databases      
   DNASU                              44010     43633     <0.01    60   Protocols and materials databases          
   EMBL                            23984951  21070508      1.11     3   Sequence databases                         
   Ensembl                           891416    875760      0.04    27   Genome annotation databases                
   EnsemblBacteria                   835289    801150      0.04    28   Genome annotation databases                
   EnsemblFungi                      167227    167012      0.01    46   Genome annotation databases                
   EnsemblMetazoa                    391491    381476      0.02    34   Genome annotation databases                
   EnsemblPlants                     274763    249327      0.01    38   Genome annotation databases                
   EnsemblProtists                    93177     91857     <0.01    50   Genome annotation databases                
   EuPathDB                          178983    178982      0.01    44   Organism-specific databases                
   EvolutionaryTrace                   8261      8261     <0.01    73   Other                                      
   FlyBase                           195544    193997      0.01    41   Organism-specific databases                
   GO                              38651690  12927459      1.79     2   Ontologies                                 
   Gene3D                           9195408   7337900      0.43     6   Family and domain databases                
   GeneID                           6908460   6781627      0.32    10   Genome annotation databases                
   GeneTree                          639664    639596      0.03    30   Phylogenomic databases                     
   Genevestigator                     94956     94950     <0.01    49   Gene expression databases                  
   GenoList                           14736     14463     <0.01    71   Organism-specific databases                
   GenomeReviews                    4252246   4153688      0.20    15   Genome annotation databases                
   Gramene                            67760     67760     <0.01    55   Organism-specific databases                
   H-InvDB                              579       474     <0.01    89   Organism-specific databases                
   HAMAP                            1802258   1783669      0.08    24   Family and domain databases                
   HGNC                               48933     48848     <0.01    59   Organism-specific databases                
   HOGENOM                          2190587   2190587      0.10    23   Phylogenomic databases                     
   HOVERGEN                          313616    313612      0.01    36   Phylogenomic databases                     
   HSSP                              251113    250884      0.01    40   3D structure databases                     
   IPI                               324441    324292      0.02    35   Sequence databases                         
   InParanoid                        190849    190832      0.01    43   Phylogenomic databases                     
   IntAct                             16798     16798     <0.01    67   Protein-protein interaction databases      
   InterPro                        45087238  16267665      2.09     1   Family and domain databases                
   KEGG                             6061453   5950015      0.28    12   Genome annotation databases                
   KO                               2306246   2295532      0.11    22   Phylogenomic databases                     
   LegioList                           5139      5111     <0.01    77   Organism-specific databases                
   Leproma                              936       935     <0.01    88   Organism-specific databases                
   MEROPS                             62671     62670     <0.01    56   Protein family/group databases             
   MGI                                37835     37580     <0.01    63   Organism-specific databases                
   MINT                                8665      8665     <0.01    72   Protein-protein interaction databases      
   NextBio                           106994    106993     <0.01    48   Other                                      
   OMA                              3305073   3305062      0.15    17   Phylogenomic databases                     
   OrthoDB                           567740    567738      0.03    31   Phylogenomic databases                     
   PANTHER                          3150148   2977688      0.15    18   Family and domain databases                
   PATRIC                           8352830   8352797      0.39     8   Genome annotation databases                
   PDB                                16147      9357     <0.01    70   3D structure databases                     
   PDBsum                             16227      9300     <0.01    69   3D structure databases                     
   PHCI-2DPAGE                           99        99     <0.01    95   2D gel databases                           
   PIR                               173782    140959      0.01    45   Sequence databases                         
   PIRSF                            1534201   1533823      0.07    25   Family and domain databases                
   PMAP-CutDB                           228       228     <0.01    91   Other                                      
   PMMA-2DPAGE                            2         2     <0.01   104   2D gel databases                           
   PRIDE                             257856    257761      0.01    39   Proteomic databases                        
   PRINTS                           3349789   2961788      0.16    16   Family and domain databases                
   PROSITE                         10611056   7019570      0.49     5   Family and domain databases                
   Pathway_Interaction_DB                11         9     <0.01   102   Enzyme and pathway databases               
   PeptideAtlas                         146       146     <0.01    94   Proteomic databases                        
   PeroxiBase                          2522      2514     <0.01    82   Protein family/group databases             
   Pfam                            20245631  14984253      0.94     4   Family and domain databases                
   PharmGKB                            5284      5284     <0.01    76   Organism-specific databases                
   PhosphoSite                         1559      1559     <0.01    86   PTM databases                              
   PhylomeDB                          56962     56959     <0.01    58   Phylogenomic databases                     
   PomBase                               40        27     <0.01    99   Organism-specific databases                
   ProDom                            401055    380235      0.02    33   Family and domain databases                
   ProMEX                               294       294     <0.01    90   Proteomic databases                        
   ProtClustDB                      2724711   2724710      0.13    20   Phylogenomic databases                     
   ProteinModelPortal               6224432   6222063      0.29    11   3D structure databases                     
   PseudoCAP                           4563      4557     <0.01    78   Organism-specific databases                
   REBASE                             26509     26503     <0.01    64   Protein family/group databases             
   REPRODUCTION-2DPAGE                   89        88     <0.01    96   2D gel databases                           
   RGD                                24880     24585     <0.01    66   Organism-specific databases                
   Reactome                             184       162     <0.01    92   Enzyme and pathway databases               
   RefSeq                           6951593   6799787      0.32     9   Sequence databases                         
   SGD                                   11        11     <0.01   101   Organism-specific databases                
   SMART                            4817138   3640072      0.22    13   Family and domain databases                
   SMR                              1073809   1073809      0.05    26   3D structure databases                     
   STRING                           2596544   2596461      0.12    21   Protein-protein interaction databases      
   SUPFAM                           8785798   7216755      0.41     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   100   2D gel databases                           
   Siena-2DPAGE                           2         2     <0.01   106   2D gel databases                           
   TAIR                               16354     16275     <0.01    68   Organism-specific databases                
   TCDB                                2407      2395     <0.01    84   Protein family/group databases             
   TIGR                              194872    187778      0.01    42   Genome annotation databases                
   TIGRFAMs                         4415361   4028625      0.20    14   Family and domain databases                
   TubercuList                         2064      2059     <0.01    85   Organism-specific databases                
   UCSC                               59737     59736     <0.01    57   Genome annotation databases                
   UniGene                           515257    486033      0.02    32   Sequence databases                         
   VectorBase                         78371     77856     <0.01    52   Genome annotation databases                
   World-2DPAGE                         936       931     <0.01    87   2D gel databases                           
   WormBase                           38343     38333     <0.01    62   Organism-specific databases                
   Xenbase                            25172     25130     <0.01    65   Organism-specific databases                
   ZFIN                               41996     41210     <0.01    61   Organism-specific databases                
   dictyBase                           7997      7775     <0.01    74   Organism-specific databases                
   eggNOG                           2781607   2781606      0.13    19   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    53   Organism-specific databases                

Number of explicitly cross-referenced databases: 133


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.60   Gln (Q) 3.93   Leu (L) 9.86   Ser (S) 6.71
   Arg (R) 5.48   Glu (E) 6.19   Lys (K) 5.26   Thr (T) 5.60
   Asn (N) 4.09   Gly (G) 7.09   Met (M) 2.46   Trp (W) 1.30
   Asp (D) 5.32   His (H) 2.22   Phe (F) 4.00   Tyr (Y) 3.03
   Cys (C) 1.28   Ile (I) 5.94   Pro (P) 4.75   Val (V) 6.76

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 508566
Total number of entries encoded on a Plasmid: 275992
Total number of entries encoded on a Plastid: 19058
Total number of entries encoded on a Plastid; Apicoplast: 394
Total number of entries encoded on a Plastid; Chloroplast: 178654
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 772