Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2012_05 STATISTICS


1.  INTRODUCTION

Release 2012_05 of 16-May-2012 of UniProtKB/TrEMBL contains 22128511 sequence entries,
comprising 7226807757 amino acids .

596191 sequences have been added since release 2012_04, the sequence data of
525 existing entries has been updated and the annotations of
11707722 entries have been revised. This represents an increase of 3%.

Number of fragments: 3260498

Protein existence (PE):              entries      %
1: Evidence at protein level           13517     0.06%
2: Evidence at transcript level       583804     2.64%
3: Inferred from homology            4659456    21.06%
4: Predicted                        16871734    76.24%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 348753

   The first twenty species represent 1540110 sequences:     7 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:14614
                            2x:60217
                            3x:32296
                            4x:20470
                            5x:12992
                            6x: 9562
                            7x: 7306
                            8x: 5489
                            9x: 4384
                           10x: 8782
                       11- 20x:22799
                       21- 50x: 7998
                       51-100x: 3086
                         >100x: 7229


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     456950  Human immunodeficiency virus 1
       2     110235  Homo sapiens (Human)
       3      96999  Oryza sativa subsp. japonica (Rice)
       4      72467  uncultured bacterium
       5      68541  Hepatitis C virus
       6      64858  Macaca mulatta (Rhesus macaque)
       7      61451  Mus musculus (Mouse)
       8      54052  Vitis vinifera (Grape)
       9      53932  Danio rerio (Zebrafish) (Brachydanio rerio)
      10      50483  Trichomonas vaginalis
      11      50130  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      12      49860  Hepatitis B virus (HBV)
      13      49221  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      14      48816  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      15      44070  Populus trichocarpa (Western balsam poplar) 
      16      43425  Arabidopsis thaliana (Mouse-ear cress)
      17      43129  Callithrix jacchus (White-tufted-ear marmoset)
      18      42095  Zea mays (Maize)
      19      39850  Paramecium tetraurelia
      20      39546  Oryza sativa subsp. indica (Rice)
      21      35599  Ailuropoda melanoleuca (Giant panda)
      22      34801  Physcomitrella patens subsp. patens (Moss)
      23      33939  Rattus norvegicus (Rat)
      24      33797  Drosophila melanogaster (Fruit fly)
      25      33735  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      26      33272  Selaginella moellendorffii (Spikemoss)
      27      32917  Monodelphis domestica (Gray short-tailed opossum)
      28      32672  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      29      31827  Caenorhabditis remanei (Caenorhabditis vulgaris)
      30      31381  Ricinus communis (Castor bean)
      31      30550  Daphnia pulex (Water flea)
      32      30300  Caenorhabditis brenneri (Nematode worm)
      33      29430  Strongylocentrotus purpuratus (Purple sea urchin)
      34      29315  Pristionchus pacificus
      35      29164  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      36      29026  Oikopleura dioica (Tunicate)
      37      28865  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      38      28036  Gasterosteus aculeatus (Three-spined stickleback)
      39      27959  Bos taurus (Bovine)
      40      27787  Canis familiaris (Dog) (Canis lupus familiaris)
      41      27308  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
      42      27086  Gorilla gorilla gorilla (Lowland gorilla)
      43      26871  Ornithorhynchus anatinus (Duckbill platypus)
      44      26682  Gallus gallus (Chicken)
      45      25867  Oryzias latipes (Medaka fish) (Japanese ricefish)
      46      25755  Loxodonta africana (African elephant)
      47      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      48      25438  Caenorhabditis japonica
      49      25051  Oryctolagus cuniculus (Rabbit)
      50      24976  Sus scrofa (Pig)
      51      24825  Nematostella vectensis (Starlet sea anemone)
      52      24285  Escherichia coli
      53      24188  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      54      24056  Pongo abelii (Sumatran orangutan)
      55      23997  Equus caballus (Horse)
      56      23219  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      57      23158  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      58      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      59      22830  Pan troglodytes (Chimpanzee)
      60      22520  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      61      22310  Caenorhabditis elegans
      62      21863  Latimeria chalumnae (West Indian ocean coelacanth)
      63      21665  Hordeum vulgare var. distichum (Two-rowed barley)
      64      21546  Heterocephalus glaber (Naked mole rat)
      65      21341  Caenorhabditis briggsae
      66      21085  Ixodes scapularis (Black-legged tick) (Deer tick)
      67      20852  Myotis lucifugus (Little brown bat)
      68      20124  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      69      20110  Ciona savignyi (Pacific transparent sea squirt)
      70      20041  Cavia porcellus (Guinea pig)
      71      19648  Taeniopygia guttata (Zebra finch) (Poephila guttata)
      72      19228  Toxoplasma gondii
      73      19201  Trypanosoma cruzi (strain CL Brener)
      74      19049  Anolis carolinensis (Green anole) (American chameleon)
      75      19012  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
      76      18911  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      77      18771  mine drainage metagenome
      78      18632  Drosophila simulans (Fruit fly)
      79      18116  Atta cephalotes (Leafcutter ant)
      80      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
      81      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
      82      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
      83      17318  Bombyx mori (Silk moth)
      84      17031  Drosophila yakuba (Fruit fly)
      85      16997  Tribolium castaneum (Red flour beetle)
      86      16851  Meleagris gallopavo (Common turkey)
      87      16761  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
      88      16712  Drosophila persimilis (Fruit fly)
      89      16650  Ralstonia solanacearum (Pseudomonas solanacearum)
      90      16425  Ectocarpus siliculosus (Brown alga)
      91      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
      92      16306  Loa loa (Eye worm) (Filaria loa)
      93      16303  Danaus plexippus (Monarch butterfly)
      94      16264  Trichinella spiralis (Trichina worm)
      95      16239  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
      96      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
      97      16190  Drosophila sechellia (Fruit fly)
      98      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
      99      15983  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     100      15794  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     101      15761  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     102      15714  Naegleria gruberi (Amoeba)
     103      15652  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     104      15620  Anopheles gambiae (African malaria mosquito)
     105      15554  Phytophthora ramorum (Sudden oak death agent)
     106      15418  Drosophila willistoni (Fruit fly)
     107      15230  Tetrahymena thermophila (strain SB210)
     108      15142  Drosophila ananassae (Fruit fly)
     109      15031  Harpegnathos saltator (Jerdon's jumping ant)
     110      14964  Hepatitis C virus subtype 1a
     111      14922  Drosophila erecta (Fruit fly)
     112      14854  Hepatitis C virus subtype 1b
     113      14849  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     114      14793  Camponotus floridanus (Florida carpenter ant)
     115      14781  Drosophila mojavensis (Fruit fly)
     116      14697  Plasmodium chabaudi
     117      14695  Drosophila virilis (Fruit fly)
     118      14649  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     119      14417  Volvox carteri (Green alga)
     120      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     121      14332  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     122      14284  Plasmodium falciparum
     123      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     124      13966  Acromyrmex echinatior (Panamanian leafcutter ant) 
     125      13863  Clonorchis sinensis (Chinese liver fluke)
     126      13767  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     127      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     128      13328  Aspergillus flavus 
     129      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     130      13173  Mustela putorius furo (European domestic ferret) (Mustela furo)
     131      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     132      13042  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     133      12983  Albugo laibachii Nc14
     134      12950  Stigmatella aurantiaca (strain DW4/3-1)
     135      12936  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     136      12767  Glycine max (Soybean) (Glycine hispida)
     137      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     138      12696  Trypanosoma congolense (strain IL3000)
     139      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     140      12598  Schistosoma mansoni (Blood fluke)
     141      12585  Xenopus laevis (African clawed frog)
     142      12519  Trypanosoma cruzi
     143      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     144      12440  Polysphondylium pallidum (Cellular slime mold)
     145      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     146      12352  Dictyostelium purpureum (Slime mold)
     147      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     148      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     149      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     150      11947  Emericella nidulans  
     151      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     152      11780  Piriformospora indica (strain DSM 11827)
     153      11755  Apis mellifera (Honeybee)
     154      11715  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     155      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     156      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     157      11666  Anopheles darlingi (Mosquito)
     158      11644  Plasmodium berghei (strain Anka)
     159      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     160      11562  Trichoplax adhaerens (Trichoplax reptans)
     161      11557  Trypanosoma vivax Y486
     162      11519  Rabies virus
     163      11518  Helicobacter pylori (Campylobacter pylori)
     164      11514  Aureococcus anophagefferens (Harmful bloom alga)
     165      11498  Brugia malayi (Filarial nematode worm)
     166      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     167      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     168      11289  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     169      11211  Ktedonobacter racemifer DSM 44963
     170      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     171      11001  Schistosoma japonicum (Blood fluke)
     172      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     173      10969  Porcine reproductive and respiratory syndrome virus (PRRSV)
     174      10966  Streptomyces clavuligerus ATCC 27064
     175      10949  Aspergillus niger 
     176      10839  Pediculus humanus subsp. corporis (Body louse)
     177      10820  Chaetomium globosum  
     178      10570  Metarhizium robertsii (strain ARSEF 23 / ATCC MYA-3075) (Metarhizium anisopliae)
     179      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     180      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     181      10517  uncultured archaeon
     182      10387  Pseudomonas syringae pv. glycinea str. race 4
     183      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     184      10377  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     185      10354  Phaeodactylum tricornutum (strain CCAP 1055/1)
     186      10274  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
     187      10204  Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 
     188      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
     189      10171  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     190      10110  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
     191      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
     192      10087  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
     193      10051  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
     194      10013  Streptomyces bingchenggensis (strain BCW-1)
     195       9846  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
     196       9836  Chlorella variabilis (Green alga)
     197       9822  Metarhizium acridum (strain CQMa 102)
     198       9760  Thielavia terrestris (strain ATCC 38088 / NRRL 8126) (Acremonium alabamense)
     199       9703  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
     200       9662  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
     201       9651  Cordyceps militaris (strain CM01) (Caterpillar fungus)
     202       9650  Klebsiella pneumoniae
     203       9597  Streptomyces cattleya 
     204       9551  Amycolatopsis mediterranei S699
     205       9533  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
     206       9510  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
     207       9485  Ajellomyces dermatitidis (strain ER-3 / ATCC MYA-2586) 
     208       9468  Salmo salar (Atlantic salmon)
     209       9443  Ajellomyces capsulata (strain H88) (Darling's disease fungus) 
     210       9391  Exophiala dermatitidis (strain ATCC 34100 / CBS 525.76 / NIH/UT8656)  
     211       9236  Monosiga brevicollis (Choanoflagellate)
     212       9201  Amycolatopsis mediterranei (strain U-32)
     213       9197  Streptomyces himastatinicus ATCC 53653
     214       9154  Ajellomyces capsulata (strain G186AR / H82 / ATCC MYA-2454 / RMSCC 2432)  
     215       9146  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
     216       9139  Pseudomonas syringae pv. pisi str. 1704B
     217       9113  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
     218       9112  Hypocrea jecorina (strain QM6a) (Trichoderma reesei)
     219       9083  Thielavia heterothallica (strain ATCC 42464 / BCRC 31852 / DSM 1799) 
     220       9076  Saccharomyces cerevisiae x Saccharomyces kudriavzevii VIN7
     221       9064  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
     222       9046  Streptomyces hygroscopicus subsp. jinggangensis (strain 5008)
     223       9009  Neurospora crassa 
     224       8992  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
     225       8988  Dictyostelium discoideum (Slime mold)
     226       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
     227       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
     228       8941  Streptomyces violaceusniger Tu 4113
     229       8940  Burkholderia sp. TJI49
     230       8900  Catenulispora acidiphila 
     231       8859  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
     232       8849  Pichia sorbitophila  
     233       8796  Aspergillus clavatus 
     234       8794  Bradyrhizobium japonicum USDA 6
     235       8783  Pseudomonas syringae pv. japonica str. M301072PT
     236       8755  Rhodococcus sp. (strain RHA1)
     237       8738  Trypanosoma brucei brucei (strain 927/4 GUTat10.1)
     238       8705  Trichophyton rubrum (strain ATCC MYA-4607 / CBS 118892) (Athlete's foot fungus)
     239       8699  Streptomyces coelicoflavus ZG0656
     240       8698  Paracoccidioides brasiliensis (strain Pb18)
     241       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
     242       8676  Trichophyton equinum (strain ATCC MYA-4606 / CBS 127.97) (Horse ringworm fungus)
     243       8661  Arthroderma otae (strain ATCC MYA-4605 / CBS 113480) (Microsporum canis)
     244       8604  Batrachochytrium dendrobatidis (strain JAM81 / FGSC 10211) (Frog chytrid fungus)
     245       8599  Entamoeba dispar (strain ATCC PRA-260 / SAW760)
     246       8577  uncultured crenarchaeote
     247       8520  Trichophyton tonsurans (strain CBS 112818) (Scalp ringworm fungus)
     248       8494  Nocardia brasiliensis ATCC 700358
     249       8437  Plesiocystis pacifica SIR-1
     250       8422  Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          362209 (  2%)
    Bacteria       14505509 ( 66%)
    Eukaryota       5839103 ( 26%)
    Viruses         1380399 (  6%)
    Other             41290 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 110271 (  2%)           (  0%)
     Other Mammalia        813899 ( 14%)           (  4%)
     Other Vertebrata      653528 ( 11%)           (  3%)
     Viridiplantae        1013231 ( 17%)           (  5%)
     Fungi                1256301 ( 22%)           (  6%)
     Insecta               694851 ( 12%)           (  3%)
     Nematoda              223421 (  4%)           (  1%)
     Other                1073601 ( 18%)           (  5%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  517410             1001-1100   131623
                 51- 100 1825304             1101-1200    93485
                101- 150 2059762             1201-1300    65431
                151- 200 1996059             1301-1400    42502
                201- 250 2013434             1401-1500    34248
                251- 300 1949174             1501-1600    24078
                301- 350 1773912             1601-1700    18205
                351- 400 1353817             1701-1800    14244
                401- 450 1161729             1801-1900    11812
                451- 500  960493             1901-2000    10116
                501- 550  644801             2001-2100     8020
                551- 600  499957             2101-2200     7958
                601- 650  364926             2201-2300     6396
                651- 700  285351             2301-2400     5043
                701- 750  243941             2401-2500     4311
                751- 800  216963             >2500        35338
                801- 850  163788
                851- 900  147660
                901- 950  101516
                951-1000   75206

   


   The average sequence length in UniProtKB/TrEMBL is   326 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    26768952                1.21                                                    
   Submitted to EMBL/GenBank/DDBJ  14836812  13596834      0.67                                                    
   Journal                         10867895  10157357      0.49                                                    
   Submitted to other databases     1048026   1038488      0.05                                                    
   Thesis                              9748      9690     <0.01                                                    
   Book citation                       6445      6396     <0.01                                                    
   Unpublished observations              25        25     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 432838


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      22656035                1.02                                                    
   CATALYTIC ACTIVITY               1995721   1837242      0.09     4                                              
   CAUTION                          8004506   8004492      0.36     1                                              
   COFACTOR                          711583    668753      0.03     8                                              
   DOMAIN                             65050     61926     <0.01     9                                              
   FUNCTION                         2206849   2051798      0.10     3                                              
   INTERACTION                          976       976     <0.01    11                                              
   MISCELLANEOUS                      36889     36810     <0.01    10                                              
   PATHWAY                           968594    879940      0.04     7                                              
   SIMILARITY                       5932374   5197844      0.27     2                                              
   SUBCELLULAR LOCATION             1730358   1660412      0.08     5                                              
   SUBUNIT                          1003135   1001348      0.05     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       6274628                0.28                                                    
   CHAIN                             625097    501118      0.03     2                                              
   NON_TER                          5199523   3260955      0.23     1                                              
   SIGNAL                            449242    447706      0.02     3                                              
   TRANSIT                              766       766     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             243865729               11.02                                                    
   AGD                                 2525      2525     <0.01    83   Organism-specific databases                
   ANU-2DPAGE                            53        53     <0.01    98   2D gel databases                           
   Allergome                           2747      2141     <0.01    79   Protein family/group databases             
   ArachnoServer                         66        66     <0.01    97   Organism-specific databases                
   ArrayExpress                       87852     87852     <0.01    51   Gene expression databases                  
   BRENDA                              2736      2705     <0.01    80   Enzyme and pathway databases               
   Bgee                              141434    141434      0.01    47   Gene expression databases                  
   BioCyc                            670496    656092      0.03    30   Enzyme and pathway databases               
   CAZy                               74192     69711     <0.01    54   Protein family/group databases             
   CGD                                 7083      7083     <0.01    75   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     5         5     <0.01   103   2D gel databases                           
   CTD                               290173    288481      0.01    37   Organism-specific databases                
   CYGD                                   2         2     <0.01   105   Organism-specific databases                
   ConoServer                           160       160     <0.01    93   Organism-specific databases                
   DIP                                 2672      2667     <0.01    81   Protein-protein interaction databases      
   DNASU                              43779     43386     <0.01    59   Protocols and materials databases          
   EMBL                            24550144  21589479      1.11     3   Sequence databases                         
   Ensembl                           915245    897777      0.04    27   Genome annotation databases                
   EnsemblBacteria                   835157    801043      0.04    29   Genome annotation databases                
   EnsemblFungi                      184806    184477      0.01    43   Genome annotation databases                
   EnsemblMetazoa                    443645    437310      0.02    33   Genome annotation databases                
   EnsemblPlants                     177795    175040      0.01    45   Genome annotation databases                
   EnsemblProtists                   100023     98984     <0.01    49   Genome annotation databases                
   EuPathDB                          178974    178973      0.01    44   Organism-specific databases                
   EvolutionaryTrace                   8264      8264     <0.01    73   Other                                      
   FlyBase                           195529    193982      0.01    40   Organism-specific databases                
   GO                              39722333  13249408      1.80     2   Ontologies                                 
   Gene3D                           9185979   7330443      0.42     6   Family and domain databases                
   GeneID                           7070437   6942892      0.32    10   Genome annotation databases                
   GeneTree                          845289    845221      0.04    28   Phylogenomic databases                     
   Genevestigator                     94800     94793     <0.01    50   Gene expression databases                  
   GenoList                           14736     14463     <0.01    71   Organism-specific databases                
   GenomeReviews                    4252736   4154128      0.19    15   Genome annotation databases                
   Gramene                            67749     67749     <0.01    55   Organism-specific databases                
   H-InvDB                              576       472     <0.01    89   Organism-specific databases                
   HAMAP                            1800261   1781690      0.08    24   Family and domain databases                
   HGNC                               48652     48565     <0.01    58   Organism-specific databases                
   HOGENOM                          3659822   3659821      0.17    16   Phylogenomic databases                     
   HOVERGEN                          312986    312978      0.01    36   Phylogenomic databases                     
   HSSP                              250983    250754      0.01    38   3D structure databases                     
   IPI                               323493    323343      0.01    35   Sequence databases                         
   InParanoid                        190525    190525      0.01    42   Phylogenomic databases                     
   IntAct                             16741     16741     <0.01    67   Protein-protein interaction databases      
   InterPro                        45044261  16252282      2.04     1   Family and domain databases                
   KEGG                             6079851   5968261      0.27    11   Genome annotation databases                
   KO                               2312554   2301946      0.10    23   Phylogenomic databases                     
   LegioList                           5139      5111     <0.01    76   Organism-specific databases                
   Leproma                              936       935     <0.01    88   Organism-specific databases                
   MEROPS                             62662     62660     <0.01    56   Protein family/group databases             
   MGI                                37939     37683     <0.01    62   Organism-specific databases                
   MINT                                8630      8630     <0.01    72   Protein-protein interaction databases      
   NextBio                           106244    106243     <0.01    48   Other                                      
   OMA                              3305216   3305205      0.15    18   Phylogenomic databases                     
   OrthoDB                           567643    567641      0.03    31   Phylogenomic databases                     
   PANTHER                          3147174   2974875      0.14    19   Family and domain databases                
   PATRIC                           8352093   8352061      0.38     8   Genome annotation databases                
   PDB                                16175      9370     <0.01    70   3D structure databases                     
   PDBsum                             16240      9303     <0.01    69   3D structure databases                     
   PHCI-2DPAGE                           99        99     <0.01    95   2D gel databases                           
   PIR                               173878    141044      0.01    46   Sequence databases                         
   PIRSF                            1532705   1532327      0.07    25   Family and domain databases                
   PMAP-CutDB                           222       222     <0.01    91   Other                                      
   PMMA-2DPAGE                            2         2     <0.01   104   2D gel databases                           
   PRIDE                             209062    209062      0.01    39   Proteomic databases                        
   PRINTS                           3346953   2959252      0.15    17   Family and domain databases                
   PROSITE                         10602158   7013250      0.48     5   Family and domain databases                
   Pathway_Interaction_DB                11         9     <0.01   102   Enzyme and pathway databases               
   PeptideAtlas                         145       145     <0.01    94   Proteomic databases                        
   PeroxiBase                          2540      2532     <0.01    82   Protein family/group databases             
   Pfam                            20227128  14970317      0.91     4   Family and domain databases                
   PharmGKB                            4884      4884     <0.01    77   Organism-specific databases                
   PhosphoSite                         1550      1550     <0.01    86   PTM databases                              
   PhylomeDB                          34752     34752     <0.01    63   Phylogenomic databases                     
   PomBase                               40        27     <0.01    99   Organism-specific databases                
   ProDom                            400876    380056      0.02    34   Family and domain databases                
   ProMEX                               288       288     <0.01    90   Proteomic databases                        
   ProtClustDB                      2723987   2723974      0.12    21   Phylogenomic databases                     
   ProteinModelPortal               6052644   6052644      0.27    12   3D structure databases                     
   PseudoCAP                           4562      4556     <0.01    78   Organism-specific databases                
   REBASE                             27415     27357     <0.01    64   Protein family/group databases             
   REPRODUCTION-2DPAGE                   86        85     <0.01    96   2D gel databases                           
   RGD                                24941     24636     <0.01    66   Organism-specific databases                
   Reactome                             187       164     <0.01    92   Enzyme and pathway databases               
   RefSeq                           7102680   6945574      0.32     9   Sequence databases                         
   SGD                                   11        11     <0.01   101   Organism-specific databases                
   SMART                            4812723   3636731      0.22    13   Family and domain databases                
   SMR                              1099779   1099779      0.05    26   3D structure databases                     
   STRING                           2595521   2595521      0.12    22   Protein-protein interaction databases      
   SUPFAM                           8777249   7209675      0.40     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   100   2D gel databases                           
   Siena-2DPAGE                           2         2     <0.01   106   2D gel databases                           
   TAIR                               16295     16216     <0.01    68   Organism-specific databases                
   TCDB                                2414      2402     <0.01    84   Protein family/group databases             
   TIGR                              194978    187867      0.01    41   Genome annotation databases                
   TIGRFAMs                         4410922   4024621      0.20    14   Family and domain databases                
   TubercuList                         2064      2059     <0.01    85   Organism-specific databases                
   UCSC                               59058     59057     <0.01    57   Genome annotation databases                
   UniGene                           534667    502156      0.02    32   Sequence databases                         
   VectorBase                         78371     77856     <0.01    52   Genome annotation databases                
   World-2DPAGE                         936       931     <0.01    87   2D gel databases                           
   WormBase                           38303     38293     <0.01    61   Organism-specific databases                
   Xenbase                            25676     25601     <0.01    65   Organism-specific databases                
   ZFIN                               41940     41172     <0.01    60   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    74   Organism-specific databases                
   eggNOG                           2781322   2781321      0.13    20   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    53   Organism-specific databases                

Number of explicitly cross-referenced databases: 133


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.58   Gln (Q) 3.93   Leu (L) 9.87   Ser (S) 6.71
   Arg (R) 5.46   Glu (E) 6.20   Lys (K) 5.30   Thr (T) 5.59
   Asn (N) 4.11   Gly (G) 7.07   Met (M) 2.45   Trp (W) 1.30
   Asp (D) 5.32   His (H) 2.21   Phe (F) 4.01   Tyr (Y) 3.04
   Cys (C) 1.28   Ile (I) 5.96   Pro (P) 4.73   Val (V) 6.75

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 516963
Total number of entries encoded on a Plasmid: 280063
Total number of entries encoded on a Plastid: 19155
Total number of entries encoded on a Plastid; Apicoplast: 626
Total number of entries encoded on a Plastid; Chloroplast: 184777
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 861