Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2012_06 STATISTICS


1.  INTRODUCTION

Release 2012_06 of 13-Jun-2012 of UniProtKB/TrEMBL contains 22660469 sequence entries,
comprising 7407531063 amino acids .

545714 sequences have been added since release 2012_05, the sequence data of
2600 existing entries has been updated and the annotations of
3337305 entries have been revised. This represents an increase of 3%.

Number of fragments: 3314194

Protein existence (PE):              entries      %
1: Evidence at protein level           13298     0.06%
2: Evidence at transcript level       589750     2.60%
3: Inferred from homology            4991528    22.03%
4: Predicted                        17065893    75.31%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 352511

   The first twenty species represent 1573419 sequences:   6.9 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:14745
                            2x:60802
                            3x:32600
                            4x:20651
                            5x:13274
                            6x: 9640
                            7x: 7318
                            8x: 5541
                            9x: 4445
                           10x: 8888
                       11- 20x:23307
                       21- 50x: 8122
                       51-100x: 3119
                         >100x: 7350


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     460970  Human immunodeficiency virus 1
       2     110877  Homo sapiens (Human)
       3      96967  Oryza sativa subsp. japonica (Rice)
       4      73149  uncultured bacterium
       5      69483  Hepatitis C virus
       6      67870  Macaca mulatta (Rhesus macaque)
       7      61432  Mus musculus (Mouse)
       8      61182  Glycine max (Soybean) (Glycine hispida)
       9      54056  Vitis vinifera (Grape)
      10      53919  Danio rerio (Zebrafish) (Brachydanio rerio)
      11      52348  Hepatitis B virus (HBV)
      12      50555  Trichomonas vaginalis
      13      50127  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      14      49221  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      15      48818  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      16      44070  Populus trichocarpa (Western balsam poplar) 
      17      43300  Arabidopsis thaliana (Mouse-ear cress)
      18      43129  Callithrix jacchus (White-tufted-ear marmoset)
      19      42096  Zea mays (Maize)
      20      39850  Paramecium tetraurelia
      21      39551  Oryza sativa subsp. indica (Rice)
      22      35599  Ailuropoda melanoleuca (Giant panda)
      23      34801  Physcomitrella patens subsp. patens (Moss)
      24      33924  Rattus norvegicus (Rat)
      25      33874  Drosophila melanogaster (Fruit fly)
      26      33739  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      27      33269  Selaginella moellendorffii (Spikemoss)
      28      32920  Monodelphis domestica (Gray short-tailed opossum)
      29      32667  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      30      32093  Oryza glaberrima (African rice)
      31      31827  Caenorhabditis remanei (Caenorhabditis vulgaris)
      32      31382  Ricinus communis (Castor bean)
      33      30550  Daphnia pulex (Water flea)
      34      30300  Caenorhabditis brenneri (Nematode worm)
      35      30143  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      36      29813  Amphimedon queenslandica (Sponge)
      37      29444  Strongylocentrotus purpuratus (Purple sea urchin)
      38      29315  Pristionchus pacificus
      39      29164  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      40      29026  Oikopleura dioica (Tunicate)
      41      28861  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      42      28036  Gasterosteus aculeatus (Three-spined stickleback)
      43      27961  Bos taurus (Bovine)
      44      27875  Canis familiaris (Dog) (Canis lupus familiaris)
      45      27308  Simian immunodeficiency virus (SIV)
      46      27086  Gorilla gorilla gorilla (Lowland gorilla)
      47      26871  Ornithorhynchus anatinus (Duckbill platypus)
      48      26698  Gallus gallus (Chicken)
      49      25867  Oryzias latipes (Medaka fish) (Japanese ricefish)
      50      25755  Loxodonta africana (African elephant)
      51      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      52      25438  Caenorhabditis japonica
      53      25056  Oryctolagus cuniculus (Rabbit)
      54      25018  Sus scrofa (Pig)
      55      24828  Nematostella vectensis (Starlet sea anemone)
      56      24360  Escherichia coli
      57      24198  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      58      24057  Pongo abelii (Sumatran orangutan)
      59      23997  Equus caballus (Horse)
      60      23219  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      61      23158  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      62      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      63      22834  Pan troglodytes (Chimpanzee)
      64      22520  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      65      22281  Caenorhabditis elegans
      66      21863  Latimeria chalumnae (West Indian ocean coelacanth)
      67      21661  Hordeum vulgare var. distichum (Two-rowed barley)
      68      21546  Heterocephalus glaber (Naked mole rat)
      69      21339  Caenorhabditis briggsae
      70      21086  Ixodes scapularis (Black-legged tick) (Deer tick)
      71      20853  Myotis lucifugus (Little brown bat)
      72      20126  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      73      20110  Ciona savignyi (Pacific transparent sea squirt)
      74      20050  Cavia porcellus (Guinea pig)
      75      19651  Taeniopygia guttata (Zebra finch) (Poephila guttata)
      76      19242  Toxoplasma gondii
      77      19201  Trypanosoma cruzi (strain CL Brener)
      78      19049  Anolis carolinensis (Green anole) (American chameleon)
      79      19012  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
      80      18911  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      81      18771  mine drainage metagenome
      82      18632  Drosophila simulans (Fruit fly)
      83      18116  Atta cephalotes (Leafcutter ant)
      84      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
      85      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
      86      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
      87      17329  Bombyx mori (Silk moth)
      88      17031  Drosophila yakuba (Fruit fly)
      89      16998  Tribolium castaneum (Red flour beetle)
      90      16968  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
      91      16856  Meleagris gallopavo (Common turkey)
      92      16766  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
      93      16712  Drosophila persimilis (Fruit fly)
      94      16425  Ectocarpus siliculosus (Brown alga)
      95      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
      96      16306  Loa loa (Eye worm) (Filaria loa)
      97      16303  Danaus plexippus (Monarch butterfly)
      98      16264  Trichinella spiralis (Trichina worm)
      99      16239  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
     100      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     101      16190  Drosophila sechellia (Fruit fly)
     102      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     103      15983  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     104      15794  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     105      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     106      15714  Naegleria gruberi (Amoeba)
     107      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     108      15619  Anopheles gambiae (African malaria mosquito)
     109      15557  Phytophthora ramorum (Sudden oak death agent)
     110      15418  Drosophila willistoni (Fruit fly)
     111      15417  Hepatitis C virus subtype 1b
     112      15230  Tetrahymena thermophila (strain SB210)
     113      15142  Drosophila ananassae (Fruit fly)
     114      15031  Harpegnathos saltator (Jerdon's jumping ant)
     115      14964  Hepatitis C virus subtype 1a
     116      14922  Drosophila erecta (Fruit fly)
     117      14849  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     118      14793  Camponotus floridanus (Florida carpenter ant)
     119      14781  Drosophila mojavensis (Fruit fly)
     120      14697  Plasmodium chabaudi
     121      14695  Drosophila virilis (Fruit fly)
     122      14649  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     123      14417  Volvox carteri (Green alga)
     124      14411  Plasmodium falciparum
     125      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     126      14333  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     127      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     128      13966  Acromyrmex echinatior (Panamanian leafcutter ant) 
     129      13863  Clonorchis sinensis (Chinese liver fluke)
     130      13767  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     131      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     132      13329  Aspergillus flavus 
     133      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     134      13175  Mustela putorius furo (European domestic ferret) (Mustela furo)
     135      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     136      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     137      12983  Albugo laibachii Nc14
     138      12950  Stigmatella aurantiaca (strain DW4/3-1)
     139      12936  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     140      12936  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     141      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     142      12696  Trypanosoma congolense (strain IL3000)
     143      12683  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     144      12597  Schistosoma mansoni (Blood fluke)
     145      12588  Xenopus laevis (African clawed frog)
     146      12530  Trypanosoma cruzi
     147      12476  Ralstonia solanacearum (Pseudomonas solanacearum)
     148      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     149      12440  Polysphondylium pallidum (Cellular slime mold)
     150      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     151      12352  Dictyostelium purpureum (Slime mold)
     152      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     153      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     154      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     155      11947  Emericella nidulans  
     156      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     157      11780  Piriformospora indica (strain DSM 11827)
     158      11755  Apis mellifera (Honeybee)
     159      11715  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     160      11708  Helicobacter pylori (Campylobacter pylori)
     161      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     162      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     163      11673  Rabies virus
     164      11666  Anopheles darlingi (Mosquito)
     165      11644  Plasmodium berghei (strain Anka)
     166      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     167      11562  Trichoplax adhaerens (Trichoplax reptans)
     168      11557  Trypanosoma vivax Y486
     169      11514  Aureococcus anophagefferens (Harmful bloom alga)
     170      11498  Brugia malayi (Filarial nematode worm)
     171      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     172      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     173      11289  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     174      11211  Ktedonobacter racemifer DSM 44963
     175      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     176      11042  Schistosoma japonicum (Blood fluke)
     177      11029  Porcine reproductive and respiratory syndrome virus (PRRSV)
     178      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     179      10966  Streptomyces clavuligerus ATCC 27064
     180      10949  Aspergillus niger 
     181      10839  Pediculus humanus subsp. corporis (Body louse)
     182      10822  Chaetomium globosum  
     183      10660  uncultured archaeon
     184      10570  Metarhizium robertsii (strain ARSEF 23 / ATCC MYA-3075) (Metarhizium anisopliae)
     185      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     186      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     187      10387  Pseudomonas syringae pv. glycinea str. race 4
     188      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     189      10377  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     190      10354  Phaeodactylum tricornutum (strain CCAP 1055/1)
     191      10273  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
     192      10204  Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 
     193      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
     194      10171  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     195      10109  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
     196      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
     197      10087  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
     198      10051  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
     199      10013  Streptomyces bingchenggensis (strain BCW-1)
     200       9846  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
     201       9836  Chlorella variabilis (Green alga)
     202       9822  Metarhizium acridum (strain CQMa 102)
     203       9799  Coccomyxa subellipsoidea C-169
     204       9760  Thielavia terrestris (strain ATCC 38088 / NRRL 8126) (Acremonium alabamense)
     205       9703  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
     206       9662  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
     207       9660  Klebsiella pneumoniae
     208       9651  Cordyceps militaris (strain CM01) (Caterpillar fungus)
     209       9597  Streptomyces cattleya 
     210       9551  Amycolatopsis mediterranei S699
     211       9533  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
     212       9510  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
     213       9485  Ajellomyces dermatitidis (strain ER-3 / ATCC MYA-2586) 
     214       9470  Salmo salar (Atlantic salmon)
     215       9443  Ajellomyces capsulata (strain H88) (Darling's disease fungus) 
     216       9391  Exophiala dermatitidis (strain ATCC 34100 / CBS 525.76 / NIH/UT8656)  
     217       9236  Monosiga brevicollis (Choanoflagellate)
     218       9201  Amycolatopsis mediterranei (strain U-32)
     219       9197  Streptomyces himastatinicus ATCC 53653
     220       9154  Ajellomyces capsulata (strain G186AR / H82 / ATCC MYA-2454 / RMSCC 2432)  
     221       9146  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
     222       9139  Pseudomonas syringae pv. pisi str. 1704B
     223       9113  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
     224       9112  Hypocrea jecorina (strain QM6a) (Trichoderma reesei)
     225       9083  Thielavia heterothallica (strain ATCC 42464 / BCRC 31852 / DSM 1799) 
     226       9076  Saccharomyces cerevisiae x Saccharomyces kudriavzevii VIN7
     227       9064  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
     228       9046  Streptomyces hygroscopicus subsp. jinggangensis (strain 5008)
     229       9010  Neurospora crassa 
     230       8992  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
     231       8988  Dictyostelium discoideum (Slime mold)
     232       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
     233       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
     234       8941  Streptomyces violaceusniger Tu 4113
     235       8940  Burkholderia sp. TJI49
     236       8900  Catenulispora acidiphila 
     237       8859  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
     238       8849  Pichia sorbitophila  
     239       8796  Aspergillus clavatus 
     240       8794  Bradyrhizobium japonicum USDA 6
     241       8783  Pseudomonas syringae pv. japonica str. M301072PT
     242       8755  Rhodococcus sp. (strain RHA1)
     243       8738  Trypanosoma brucei brucei (strain 927/4 GUTat10.1)
     244       8705  Trichophyton rubrum (strain ATCC MYA-4607 / CBS 118892) (Athlete's foot fungus)
     245       8699  Streptomyces coelicoflavus ZG0656
     246       8698  Paracoccidioides brasiliensis (strain Pb18)
     247       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
     248       8676  Trichophyton equinum (strain ATCC MYA-4606 / CBS 127.97) (Horse ringworm fungus)
     249       8661  Arthroderma otae (strain ATCC MYA-4605 / CBS 113480) (Microsporum canis)
     250       8627  uncultured crenarchaeote


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          364432 (  2%)
    Bacteria       14809737 ( 65%)
    Eukaryota       6044091 ( 27%)
    Viruses         1400811 (  6%)
    Other             41397 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 110913 (  2%)           (  0%)
     Other Mammalia        817943 ( 14%)           (  4%)
     Other Vertebrata      657347 ( 11%)           (  3%)
     Viridiplantae        1142329 ( 19%)           (  5%)
     Fungi                1287145 ( 21%)           (  6%)
     Insecta               699291 ( 12%)           (  3%)
     Nematoda              223516 (  4%)           (  1%)
     Other                1105607 ( 18%)           (  5%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  526351             1001-1100   135374
                 51- 100 1865318             1101-1200    95930
                101- 150 2110429             1201-1300    67187
                151- 200 2046185             1301-1400    43570
                201- 250 2064259             1401-1500    35125
                251- 300 1998644             1501-1600    24702
                301- 350 1819596             1601-1700    18684
                351- 400 1389771             1701-1800    14597
                401- 450 1192841             1801-1900    12122
                451- 500  986044             1901-2000    10317
                501- 550  662920             2001-2100     8186
                551- 600  513505             2101-2200     8153
                601- 650  374823             2201-2300     6539
                651- 700  293597             2301-2400     5163
                701- 750  250714             2401-2500     4406
                751- 800  223008             >2500        36018
                801- 850  168717
                851- 900  151453
                901- 950  104399
                951-1000   77628

   


   The average sequence length in UniProtKB/TrEMBL is   326 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    27512173                1.21                                                    
   Submitted to EMBL/GenBank/DDBJ  15169950  13873929      0.67                                                    
   Journal                         11122174  10390977      0.49                                                    
   Submitted to other databases     1203785   1194242      0.05                                                    
   Thesis                              9772      9713     <0.01                                                    
   Book citation                       6466      6417     <0.01                                                    
   Unpublished observations              25        25     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 435507


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      24077543                1.06                                                    
   CATALYTIC ACTIVITY               2163029   1987245      0.10     4                                              
   CAUTION                          8214835   8214818      0.36     1                                              
   COFACTOR                          778296    731932      0.03     8                                              
   DOMAIN                             71511     68195     <0.01     9                                              
   FUNCTION                         2380335   2218609      0.11     3                                              
   INTERACTION                          757       757     <0.01    11                                              
   MISCELLANEOUS                      39392     39313     <0.01    10                                              
   PATHWAY                          1056355    960623      0.05     7                                              
   SIMILARITY                       6401629   5591980      0.28     2                                              
   SUBCELLULAR LOCATION             1878580   1799622      0.08     5                                              
   SUBUNIT                          1092824   1091083      0.05     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       6368750                0.28                                                    
   CHAIN                             632633    506221      0.03     2                                              
   NON_TER                          5278584   3314694      0.23     1                                              
   SIGNAL                            456765    455255      0.02     3                                              
   TRANSIT                              768       768     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             253978590               11.21                                                    
   AGD                                 2525      2525     <0.01    83   Organism-specific databases                
   ANU-2DPAGE                            52        52     <0.01    98   2D gel databases                           
   Allergome                           2804      2195     <0.01    79   Protein family/group databases             
   ArachnoServer                         66        66     <0.01    97   Organism-specific databases                
   ArrayExpress                       87821     87767     <0.01    52   Gene expression databases                  
   BRENDA                              2734      2703     <0.01    80   Enzyme and pathway databases               
   Bgee                              141404    141266      0.01    47   Gene expression databases                  
   BioCyc                            670488    656076      0.03    30   Enzyme and pathway databases               
   CAZy                               74198     69716     <0.01    55   Protein family/group databases             
   CGD                                 7083      7083     <0.01    75   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     5         5     <0.01   104   2D gel databases                           
   CTD                               291289    289592      0.01    38   Organism-specific databases                
   CYGD                                   2         2     <0.01   106   Organism-specific databases                
   ConoServer                           160       160     <0.01    93   Organism-specific databases                
   DIP                                 2672      2667     <0.01    81   Protein-protein interaction databases      
   DNASU                              44287     43909     <0.01    60   Protocols and materials databases          
   EMBL                            25060425  21969452      1.11     3   Sequence databases                         
   Ensembl                           915041    897632      0.04    27   Genome annotation databases                
   EnsemblBacteria                   837975    802803      0.04    29   Genome annotation databases                
   EnsemblFungi                      197689    197359      0.01    41   Genome annotation databases                
   EnsemblMetazoa                    473503    467046      0.02    33   Genome annotation databases                
   EnsemblPlants                     296320    288216      0.01    37   Genome annotation databases                
   EnsemblProtists                   100023     98984     <0.01    50   Genome annotation databases                
   EuPathDB                          178974    178973      0.01    45   Organism-specific databases                
   EvolutionaryTrace                   8253      8253     <0.01    73   Other                                      
   FlyBase                           195524    193977      0.01    42   Organism-specific databases                
   GO                              41621895  13960783      1.84     2   Ontologies                                 
   Gene3D                           9569473   7622825      0.42     6   Family and domain databases                
   GeneID                           7437118   7287509      0.33    10   Genome annotation databases                
   GeneTree                          845121    845053      0.04    28   Phylogenomic databases                     
   Genevestigator                     94677     94671     <0.01    51   Gene expression databases                  
   GenoList                           14736     14463     <0.01    71   Organism-specific databases                
   GenomeReviews                    4252888   4154115      0.19    15   Genome annotation databases                
   Gramene                            67725     67725     <0.01    56   Organism-specific databases                
   H-InvDB                              644       490     <0.01    89   Organism-specific databases                
   HAMAP                            1947303   1927129      0.09    24   Family and domain databases                
   HGNC                               48573     48486     <0.01    59   Organism-specific databases                
   HOGENOM                          3660812   3660812      0.16    16   Phylogenomic databases                     
   HOVERGEN                          312794    312786      0.01    36   Phylogenomic databases                     
   HSSP                              250947    250720      0.01    39   3D structure databases                     
   IPI                               323100    322950      0.01    35   Sequence databases                         
   InParanoid                        190682    190556      0.01    44   Phylogenomic databases                     
   IntAct                             16729     16729     <0.01    67   Protein-protein interaction databases      
   InterPro                        47707019  17182039      2.11     1   Family and domain databases                
   KEGG                             6118111   6001203      0.27    11   Genome annotation databases                
   KO                               2329377   2318711      0.10    23   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    76   Organism-specific databases                
   Leproma                              936       935     <0.01    88   Organism-specific databases                
   MEROPS                             62654     62652     <0.01    57   Protein family/group databases             
   MGI                                37926     37663     <0.01    63   Organism-specific databases                
   MINT                                8625      8625     <0.01    72   Protein-protein interaction databases      
   NextBio                           106072    106071     <0.01    48   Other                                      
   OMA                              3304053   3304042      0.15    19   Phylogenomic databases                     
   OrthoDB                           567596    567594      0.03    31   Phylogenomic databases                     
   PANTHER                          3327543   3144983      0.15    18   Family and domain databases                
   PATRIC                           8351679   8351647      0.37     8   Genome annotation databases                
   PDB                                16153      9362     <0.01    70   3D structure databases                     
   PDBsum                             16206      9288     <0.01    69   3D structure databases                     
   PHCI-2DPAGE                           99        99     <0.01    95   2D gel databases                           
   PIR                               173812    140982      0.01    46   Sequence databases                         
   PIRSF                            1653788   1653354      0.07    25   Family and domain databases                
   PMAP-CutDB                           221       221     <0.01    91   Other                                      
   PMMA-2DPAGE                            2         2     <0.01   105   2D gel databases                           
   PRIDE                             218401    218401      0.01    40   Proteomic databases                        
   PRINTS                           3528970   3124931      0.16    17   Family and domain databases                
   PROSITE                         11245165   7424321      0.50     5   Family and domain databases                
   Pathway_Interaction_DB                11         9     <0.01   103   Enzyme and pathway databases               
   PeptideAtlas                         144       144     <0.01    94   Proteomic databases                        
   PeroxiBase                          2541      2533     <0.01    82   Protein family/group databases             
   Pfam                            21519785  15882466      0.95     4   Family and domain databases                
   PharmGKB                            4796      4796     <0.01    77   Organism-specific databases                
   PhosphoSite                         1545      1545     <0.01    86   PTM databases                              
   PhylomeDB                         100570    100570     <0.01    49   Phylogenomic databases                     
   PomBase                               40        27     <0.01    99   Organism-specific databases                
   PptaseDB                              37        35     <0.01   100   Protein family/group databases             
   ProDom                            426413    405460      0.02    34   Family and domain databases                
   ProMEX                               287       287     <0.01    90   Proteomic databases                        
   ProtClustDB                      2724141   2724126      0.12    21   Phylogenomic databases                     
   ProteinModelPortal               6053248   6051518      0.27    12   3D structure databases                     
   PseudoCAP                           4559      4553     <0.01    78   Organism-specific databases                
   REBASE                             27414     27361     <0.01    64   Protein family/group databases             
   REPRODUCTION-2DPAGE                   86        85     <0.01    96   2D gel databases                           
   RGD                                24929     24625     <0.01    66   Organism-specific databases                
   Reactome                             192       169     <0.01    92   Enzyme and pathway databases               
   RefSeq                           7469775   7290233      0.33     9   Sequence databases                         
   SGD                                   11        11     <0.01   102   Organism-specific databases                
   SMART                            5068575   3830023      0.22    13   Family and domain databases                
   SMR                              1099582   1099582      0.05    26   3D structure databases                     
   STRING                           2595613   2595524      0.11    22   Protein-protein interaction databases      
   SUPFAM                           9248127   7598290      0.41     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   101   2D gel databases                           
   Siena-2DPAGE                           2         2     <0.01   107   2D gel databases                           
   TAIR                               16253     16174     <0.01    68   Organism-specific databases                
   TCDB                                2412      2400     <0.01    84   Protein family/group databases             
   TIGR                              195179    187907      0.01    43   Genome annotation databases                
   TIGRFAMs                         4720126   4305972      0.21    14   Family and domain databases                
   TubercuList                         2064      2059     <0.01    85   Organism-specific databases                
   UCSC                               58954     58953     <0.01    58   Genome annotation databases                
   UniGene                           534253    501803      0.02    32   Sequence databases                         
   VectorBase                         78371     77856     <0.01    53   Genome annotation databases                
   World-2DPAGE                         936       931     <0.01    87   2D gel databases                           
   WormBase                           38297     38287     <0.01    62   Organism-specific databases                
   Xenbase                            25668     25593     <0.01    65   Organism-specific databases                
   ZFIN                               43150     42891     <0.01    61   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    74   Organism-specific databases                
   eggNOG                           2781133   2781132      0.12    20   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    54   Organism-specific databases                

Number of explicitly cross-referenced databases: 133


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.58   Gln (Q) 3.93   Leu (L) 9.87   Ser (S) 6.72
   Arg (R) 5.46   Glu (E) 6.20   Lys (K) 5.30   Thr (T) 5.59
   Asn (N) 4.10   Gly (G) 7.07   Met (M) 2.45   Trp (W) 1.30
   Asp (D) 5.32   His (H) 2.21   Phe (F) 4.01   Tyr (Y) 3.04
   Cys (C) 1.28   Ile (I) 5.96   Pro (P) 4.73   Val (V) 6.75

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 524782
Total number of entries encoded on a Plasmid: 284173
Total number of entries encoded on a Plastid: 19527
Total number of entries encoded on a Plastid; Apicoplast: 633
Total number of entries encoded on a Plastid; Chloroplast: 186538
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 861