Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2012_07 STATISTICS


1.  INTRODUCTION

Release 2012_07 of 11-Jul-2012 of UniProtKB/TrEMBL contains 23165610 sequence entries,
comprising 7575340945 amino acids .

528328 sequences have been added since release 2012_06, the sequence data of
7774 existing entries has been updated and the annotations of
6799150 entries have been revised. This represents an increase of 2%.

Number of fragments: 3380053

Protein existence (PE):              entries      %
1: Evidence at protein level           13216     0.06%
2: Evidence at transcript level       595260     2.57%
3: Inferred from homology            5155530    22.26%
4: Predicted                        17401604    75.12%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 358176

   The first twenty species represent 1586292 sequences:   6.8 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:14926
                            2x:61369
                            3x:33252
                            4x:21135
                            5x:13633
                            6x: 9842
                            7x: 7551
                            8x: 5757
                            9x: 4552
                           10x: 9002
                       11- 20x:23902
                       21- 50x: 8276
                       51-100x: 3151
                         >100x: 7492


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     469841  Human immunodeficiency virus 1
       2     110822  Homo sapiens (Human)
       3      96966  Oryza sativa subsp. japonica (Rice)
       4      74502  uncultured bacterium
       5      70442  Hepatitis C virus
       6      68894  Macaca mulatta (Rhesus macaque)
       7      61412  Mus musculus (Mouse)
       8      61196  Glycine max (Soybean) (Glycine hispida)
       9      54462  Danio rerio (Zebrafish) (Brachydanio rerio)
      10      54065  Vitis vinifera (Grape)
      11      52715  Hepatitis B virus (HBV)
      12      50556  Trichomonas vaginalis
      13      50126  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      14      49221  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      15      48828  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      16      44070  Populus trichocarpa (Western balsam poplar) 
      17      43129  Callithrix jacchus (White-tufted-ear marmoset)
      18      43095  Arabidopsis thaliana (Mouse-ear cress)
      19      42100  Zea mays (Maize)
      20      39850  Paramecium tetraurelia
      21      39556  Oryza sativa subsp. indica (Rice)
      22      35599  Ailuropoda melanoleuca (Giant panda)
      23      34802  Physcomitrella patens subsp. patens (Moss)
      24      33923  Rattus norvegicus (Rat)
      25      33907  Drosophila melanogaster (Fruit fly)
      26      33752  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      27      33270  Selaginella moellendorffii (Spikemoss)
      28      33066  Sus scrofa (Pig)
      29      32922  Monodelphis domestica (Gray short-tailed opossum)
      30      32668  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      31      32093  Oryza glaberrima (African rice)
      32      31827  Caenorhabditis remanei (Caenorhabditis vulgaris)
      33      31387  Ricinus communis (Castor bean)
      34      30843  Daphnia pulex (Water flea)
      35      30300  Caenorhabditis brenneri (Nematode worm)
      36      30144  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      37      29815  Amphimedon queenslandica (Sponge)
      38      29444  Strongylocentrotus purpuratus (Purple sea urchin)
      39      29315  Pristionchus pacificus
      40      29165  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      41      29026  Oikopleura dioica (Tunicate)
      42      28857  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      43      28036  Gasterosteus aculeatus (Three-spined stickleback)
      44      27976  Bos taurus (Bovine)
      45      27886  Simian immunodeficiency virus (SIV)
      46      27883  Canis familiaris (Dog) (Canis lupus familiaris)
      47      27425  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      48      27086  Gorilla gorilla gorilla (Lowland gorilla)
      49      26871  Ornithorhynchus anatinus (Duckbill platypus)
      50      26705  Gallus gallus (Chicken)
      51      25868  Oryzias latipes (Medaka fish) (Japanese ricefish)
      52      25755  Loxodonta africana (African elephant)
      53      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      54      25438  Caenorhabditis japonica
      55      25060  Oryctolagus cuniculus (Rabbit)
      56      24843  Nematostella vectensis (Starlet sea anemone)
      57      24397  Escherichia coli
      58      24199  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      59      24057  Pongo abelii (Sumatran orangutan)
      60      24009  Equus caballus (Horse)
      61      23230  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      62      23219  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      63      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      64      22835  Pan troglodytes (Chimpanzee)
      65      22520  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      66      22398  Caenorhabditis elegans
      67      21821  Latimeria chalumnae (West Indian ocean coelacanth)
      68      21663  Hordeum vulgare var. distichum (Two-rowed barley)
      69      21546  Heterocephalus glaber (Naked mole rat)
      70      21339  Caenorhabditis briggsae
      71      21086  Ixodes scapularis (Black-legged tick) (Deer tick)
      72      20853  Myotis lucifugus (Little brown bat)
      73      20126  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      74      20110  Ciona savignyi (Pacific transparent sea squirt)
      75      20050  Cavia porcellus (Guinea pig)
      76      19972  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
      77      19652  Taeniopygia guttata (Zebra finch) (Poephila guttata)
      78      19246  Toxoplasma gondii
      79      19201  Trypanosoma cruzi (strain CL Brener)
      80      19151  Anolis carolinensis (Green anole) (American chameleon)
      81      19012  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
      82      18912  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      83      18771  mine drainage metagenome
      84      18632  Drosophila simulans (Fruit fly)
      85      18121  Atta cephalotes (Leafcutter ant)
      86      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
      87      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
      88      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
      89      17337  Bombyx mori (Silk moth)
      90      17031  Drosophila yakuba (Fruit fly)
      91      16999  Tribolium castaneum (Red flour beetle)
      92      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
      93      16856  Meleagris gallopavo (Common turkey)
      94      16767  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
      95      16712  Drosophila persimilis (Fruit fly)
      96      16425  Ectocarpus siliculosus (Brown alga)
      97      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
      98      16306  Loa loa (Eye worm) (Filaria loa)
      99      16303  Danaus plexippus (Monarch butterfly)
     100      16264  Trichinella spiralis (Trichina worm)
     101      16239  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
     102      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     103      16190  Drosophila sechellia (Fruit fly)
     104      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     105      15983  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     106      15794  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     107      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     108      15714  Naegleria gruberi (Amoeba)
     109      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     110      15619  Anopheles gambiae (African malaria mosquito)
     111      15557  Phytophthora ramorum (Sudden oak death agent)
     112      15454  Hepatitis C virus subtype 1b
     113      15418  Drosophila willistoni (Fruit fly)
     114      15230  Tetrahymena thermophila (strain SB210)
     115      15142  Drosophila ananassae (Fruit fly)
     116      15036  Harpegnathos saltator (Jerdon's jumping ant)
     117      15000  Hepatitis C virus subtype 1a
     118      14982  Plasmodium falciparum
     119      14922  Drosophila erecta (Fruit fly)
     120      14851  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     121      14796  Camponotus floridanus (Florida carpenter ant)
     122      14781  Drosophila mojavensis (Fruit fly)
     123      14697  Plasmodium chabaudi
     124      14695  Drosophila virilis (Fruit fly)
     125      14649  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     126      14417  Volvox carteri (Green alga)
     127      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     128      14336  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     129      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     130      13966  Acromyrmex echinatior (Panamanian leafcutter ant) 
     131      13863  Clonorchis sinensis (Chinese liver fluke)
     132      13767  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     133      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     134      13329  Aspergillus flavus 
     135      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     136      13175  Mustela putorius furo (European domestic ferret) (Mustela furo)
     137      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     138      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     139      12983  Albugo laibachii Nc14
     140      12950  Stigmatella aurantiaca (strain DW4/3-1)
     141      12936  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     142      12935  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     143      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     144      12696  Trypanosoma congolense (strain IL3000)
     145      12683  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     146      12597  Schistosoma mansoni (Blood fluke)
     147      12589  Xenopus laevis (African clawed frog)
     148      12541  Ralstonia solanacearum (Pseudomonas solanacearum)
     149      12532  Trypanosoma cruzi
     150      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     151      12440  Polysphondylium pallidum (Cellular slime mold)
     152      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     153      12352  Dictyostelium purpureum (Slime mold)
     154      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     155      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     156      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     157      11947  Emericella nidulans  
     158      11897  Apis mellifera (Honeybee)
     159      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     160      11797  Rabies virus
     161      11780  Piriformospora indica (strain DSM 11827)
     162      11715  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     163      11714  Helicobacter pylori (Campylobacter pylori)
     164      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     165      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     166      11669  Anopheles darlingi (Mosquito)
     167      11644  Plasmodium berghei (strain Anka)
     168      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     169      11562  Trichoplax adhaerens (Trichoplax reptans)
     170      11557  Trypanosoma vivax (strain Y486)
     171      11514  Aureococcus anophagefferens (Harmful bloom alga)
     172      11499  Brugia malayi (Filarial nematode worm)
     173      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     174      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     175      11289  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     176      11211  Ktedonobacter racemifer DSM 44963
     177      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     178      11070  Porcine reproductive and respiratory syndrome virus (PRRSV)
     179      11042  Schistosoma japonicum (Blood fluke)
     180      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     181      10966  Streptomyces clavuligerus ATCC 27064
     182      10949  Aspergillus niger 
     183      10884  uncultured archaeon
     184      10839  Pediculus humanus subsp. corporis (Body louse)
     185      10822  Chaetomium globosum  
     186      10570  Metarhizium robertsii (strain ARSEF 23 / ATCC MYA-3075) (Metarhizium anisopliae)
     187      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     188      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     189      10387  Pseudomonas syringae pv. glycinea str. race 4
     190      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     191      10377  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     192      10354  Phaeodactylum tricornutum (strain CCAP 1055/1)
     193      10273  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
     194      10204  Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 
     195      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
     196      10171  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     197      10109  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
     198      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
     199      10087  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
     200      10051  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
     201      10013  Streptomyces bingchenggensis (strain BCW-1)
     202       9846  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
     203       9836  Chlorella variabilis (Green alga)
     204       9822  Metarhizium acridum (strain CQMa 102)
     205       9799  Coccomyxa subellipsoidea C-169
     206       9760  Thielavia terrestris (strain ATCC 38088 / NRRL 8126) (Acremonium alabamense)
     207       9703  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
     208       9686  Klebsiella pneumoniae
     209       9662  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
     210       9651  Cordyceps militaris (strain CM01) (Caterpillar fungus)
     211       9597  Streptomyces cattleya 
     212       9551  Amycolatopsis mediterranei S699
     213       9533  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
     214       9510  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
     215       9485  Ajellomyces dermatitidis (strain ER-3 / ATCC MYA-2586) 
     216       9477  Salmo salar (Atlantic salmon)
     217       9443  Ajellomyces capsulata (strain H88) (Darling's disease fungus) 
     218       9391  Exophiala dermatitidis (strain ATCC 34100 / CBS 525.76 / NIH/UT8656)  
     219       9236  Monosiga brevicollis (Choanoflagellate)
     220       9201  Amycolatopsis mediterranei (strain U-32)
     221       9197  Streptomyces himastatinicus ATCC 53653
     222       9154  Ajellomyces capsulata (strain G186AR / H82 / ATCC MYA-2454 / RMSCC 2432)  
     223       9146  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
     224       9139  Pseudomonas syringae pv. pisi str. 1704B
     225       9113  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
     226       9112  Hypocrea jecorina (strain QM6a) (Trichoderma reesei)
     227       9083  Thielavia heterothallica (strain ATCC 42464 / BCRC 31852 / DSM 1799) 
     228       9076  Saccharomyces cerevisiae x Saccharomyces kudriavzevii VIN7
     229       9064  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
     230       9046  Streptomyces hygroscopicus subsp. jinggangensis (strain 5008)
     231       9010  Neurospora crassa 
     232       8992  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
     233       8988  Dictyostelium discoideum (Slime mold)
     234       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
     235       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
     236       8941  Streptomyces violaceusniger Tu 4113
     237       8940  Burkholderia sp. TJI49
     238       8900  Catenulispora acidiphila 
     239       8859  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
     240       8849  Pichia sorbitophila  
     241       8796  Aspergillus clavatus 
     242       8794  Bradyrhizobium japonicum USDA 6
     243       8783  Pseudomonas syringae pv. japonica str. M301072
     244       8755  Rhodococcus sp. (strain RHA1)
     245       8738  Trypanosoma brucei brucei (strain 927/4 GUTat10.1)
     246       8715  uncultured crenarchaeote
     247       8705  Trichophyton rubrum (strain ATCC MYA-4607 / CBS 118892) (Athlete's foot fungus)
     248       8699  Streptomyces coelicoflavus ZG0656
     249       8698  Paracoccidioides brasiliensis (strain Pb18)
     250       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          365254 (  2%)
    Bacteria       15185817 ( 66%)
    Eukaryota       6150596 ( 27%)
    Viruses         1422320 (  6%)
    Other             41622 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 110858 (  2%)           (  0%)
     Other Mammalia        847921 ( 14%)           (  4%)
     Other Vertebrata      690387 ( 11%)           (  3%)
     Viridiplantae        1153366 ( 19%)           (  5%)
     Fungi                1310959 ( 21%)           (  6%)
     Insecta               702059 ( 11%)           (  3%)
     Nematoda              223717 (  4%)           (  1%)
     Other                1111329 ( 18%)           (  5%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  540298             1001-1100   138462
                 51- 100 1907642             1101-1200    98083
                101- 150 2156466             1201-1300    68811
                151- 200 2092300             1301-1400    44644
                201- 250 2110561             1401-1500    35913
                251- 300 2043822             1501-1600    25299
                301- 350 1862030             1601-1700    19086
                351- 400 1421245             1701-1800    14872
                401- 450 1220547             1801-1900    12389
                451- 500 1009225             1901-2000    10535
                501- 550  677598             2001-2100     8373
                551- 600  524885             2101-2200     8309
                601- 650  382967             2201-2300     6695
                651- 700  299947             2301-2400     5274
                701- 750  256138             2401-2500     4497
                751- 800  228338             >2500        36703
                801- 850  172592
                851- 900  154863
                901- 950  106761
                951-1000   79387

   


   The average sequence length in UniProtKB/TrEMBL is   327 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    28049036                1.21                                                    
   Submitted to EMBL/GenBank/DDBJ  15459128  14225472      0.67                                                    
   Journal                         11319251  10586695      0.49                                                    
   Submitted to other databases     1254391   1249340      0.05                                                    
   Thesis                              9774      9716     <0.01                                                    
   Book citation                       6466      6417     <0.01                                                    
   Unpublished observations              25        25     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 437825


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      25094081                1.08                                                    
   CATALYTIC ACTIVITY               2252977   2073020      0.10     4                                              
   CAUTION                          8549648   8549631      0.37     1                                              
   COFACTOR                          816211    764724      0.04     8                                              
   DOMAIN                             76754     73149     <0.01     9                                              
   FUNCTION                         2493058   2321461      0.11     3                                              
   INTERACTION                          676       676     <0.01    11                                              
   MISCELLANEOUS                      43976     43884     <0.01    10                                              
   PATHWAY                          1113180   1006992      0.05     7                                              
   SIMILARITY                       6624842   5787777      0.29     2                                              
   SUBCELLULAR LOCATION             1960229   1877903      0.08     5                                              
   SUBUNIT                          1162530   1160614      0.05     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       6490393                0.28                                                    
   CHAIN                             641199    514893      0.03     2                                              
   NON_TER                          5380769   3380577      0.23     1                                              
   SIGNAL                            467660    465950      0.02     3                                              
   TRANSIT                              765       765     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             263942017               11.39                                                    
   AGD                                 2525      2525     <0.01    83   Organism-specific databases                
   ANU-2DPAGE                            52        52     <0.01    98   2D gel databases                           
   Allergome                           2836      2224     <0.01    79   Protein family/group databases             
   ArachnoServer                         66        66     <0.01    97   Organism-specific databases                
   ArrayExpress                       87715     87660     <0.01    52   Gene expression databases                  
   BRENDA                              2734      2703     <0.01    80   Enzyme and pathway databases               
   Bgee                              140950    140776      0.01    47   Gene expression databases                  
   BioCyc                            670897    656473      0.03    30   Enzyme and pathway databases               
   CAZy                               74206     69724     <0.01    55   Protein family/group databases             
   CGD                                 7083      7083     <0.01    75   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     5         5     <0.01   104   2D gel databases                           
   CTD                               305493    303791      0.01    37   Organism-specific databases                
   CYGD                                   2         2     <0.01   106   Organism-specific databases                
   ConoServer                           160       160     <0.01    93   Organism-specific databases                
   DIP                                 2694      2689     <0.01    81   Protein-protein interaction databases      
   DNASU                              44180     43856     <0.01    61   Protocols and materials databases          
   EMBL                            25654414  22472619      1.11     3   Sequence databases                         
   Ensembl                           971579    952891      0.04    27   Genome annotation databases                
   EnsemblBacteria                   838320    803156      0.04    29   Genome annotation databases                
   EnsemblFungi                      197688    197358      0.01    41   Genome annotation databases                
   EnsemblMetazoa                    473484    467026      0.02    33   Genome annotation databases                
   EnsemblPlants                     296206    288102      0.01    38   Genome annotation databases                
   EnsemblProtists                   100023     98984     <0.01    50   Genome annotation databases                
   EuPathDB                          178974    178973      0.01    45   Organism-specific databases                
   EvolutionaryTrace                   8250      8250     <0.01    73   Other                                      
   FlyBase                           195522    193975      0.01    42   Organism-specific databases                
   GO                              44928425  14373475      1.94     2   Ontologies                                 
   Gene3D                           9828016   7828926      0.42     6   Family and domain databases                
   GeneID                           8030517   7859476      0.35    10   Genome annotation databases                
   GeneTree                          844958    844888      0.04    28   Phylogenomic databases                     
   Genevestigator                     94467     94460     <0.01    51   Gene expression databases                  
   GenoList                           14735     14462     <0.01    71   Organism-specific databases                
   GenomeReviews                    4253579   4154800      0.18    15   Genome annotation databases                
   Gramene                            67723     67723     <0.01    56   Organism-specific databases                
   H-InvDB                              639       489     <0.01    89   Organism-specific databases                
   HAMAP                            2027872   2006948      0.09    24   Family and domain databases                
   HGNC                               48462     48370     <0.01    59   Organism-specific databases                
   HOGENOM                          3660912   3660890      0.16    17   Phylogenomic databases                     
   HOVERGEN                          312604    312596      0.01    36   Phylogenomic databases                     
   HSSP                              251062    250836      0.01    39   3D structure databases                     
   IPI                               321418    321261      0.01    35   Sequence databases                         
   InParanoid                        190614    190487      0.01    44   Phylogenomic databases                     
   IntAct                             16823     16823     <0.01    67   Protein-protein interaction databases      
   InterPro                        49131687  17682648      2.12     1   Family and domain databases                
   KEGG                             6459696   6343218      0.28    11   Genome annotation databases                
   KO                               2466983   2455964      0.11    23   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    76   Organism-specific databases                
   Leproma                             1272      1270     <0.01    87   Organism-specific databases                
   MEROPS                             62655     62653     <0.01    58   Protein family/group databases             
   MGI                                37534     37532     <0.01    63   Organism-specific databases                
   MINT                                8621      8621     <0.01    72   Protein-protein interaction databases      
   NextBio                           105959    105958     <0.01    48   Other                                      
   OMA                              3905662   3905629      0.17    16   Phylogenomic databases                     
   OrthoDB                           567608    567606      0.02    31   Phylogenomic databases                     
   PANTHER                          3425022   3236968      0.15    19   Family and domain databases                
   PATRIC                           8343240   8343152      0.36     8   Genome annotation databases                
   PDB                                16148      9361     <0.01    70   3D structure databases                     
   PDBsum                             16197      9284     <0.01    68   3D structure databases                     
   PHCI-2DPAGE                           99        99     <0.01    95   2D gel databases                           
   PIR                               174018    141209      0.01    46   Sequence databases                         
   PIRSF                            1707973   1707536      0.07    25   Family and domain databases                
   PMAP-CutDB                           220       220     <0.01    91   Other                                      
   PMMA-2DPAGE                            2         2     <0.01   105   2D gel databases                           
   PRIDE                             219472    219472      0.01    40   Proteomic databases                        
   PRINTS                           3620054   3205406      0.16    18   Family and domain databases                
   PROSITE                         11597487   7626689      0.50     5   Family and domain databases                
   Pathway_Interaction_DB                11         9     <0.01   103   Enzyme and pathway databases               
   PeptideAtlas                         144       144     <0.01    94   Proteomic databases                        
   PeroxiBase                          2541      2533     <0.01    82   Protein family/group databases             
   Pfam                            22170694  16358760      0.96     4   Family and domain databases                
   PharmGKB                            4746      4746     <0.01    77   Organism-specific databases                
   PhosphoSite                         1543      1543     <0.01    86   PTM databases                              
   PhylomeDB                         100522    100522     <0.01    49   Phylogenomic databases                     
   PomBase                               40        27     <0.01    99   Organism-specific databases                
   PptaseDB                              36        34     <0.01   100   Protein family/group databases             
   ProDom                            437698    416296      0.02    34   Family and domain databases                
   ProMEX                               283       283     <0.01    90   Proteomic databases                        
   ProtClustDB                      2722715   2722703      0.12    21   Phylogenomic databases                     
   ProteinModelPortal               6050548   6048773      0.26    12   3D structure databases                     
   PseudoCAP                           4559      4553     <0.01    78   Organism-specific databases                
   REBASE                             27973     27962     <0.01    64   Protein family/group databases             
   REPRODUCTION-2DPAGE                   86        85     <0.01    96   2D gel databases                           
   RGD                                24930     24625     <0.01    66   Organism-specific databases                
   Reactome                             192       169     <0.01    92   Enzyme and pathway databases               
   RefSeq                           8060614   7860838      0.35     9   Sequence databases                         
   SGD                                   11        11     <0.01   102   Organism-specific databases                
   SMART                            5199973   3929523      0.22    13   Family and domain databases                
   SMR                              1268211   1268211      0.05    26   3D structure databases                     
   STRING                           2596816   2596702      0.11    22   Protein-protein interaction databases      
   SUPFAM                           9493132   7800525      0.41     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   101   2D gel databases                           
   Siena-2DPAGE                           2         2     <0.01   107   2D gel databases                           
   TAIR                               16162     16083     <0.01    69   Organism-specific databases                
   TCDB                                2413      2401     <0.01    84   Protein family/group databases             
   TIGR                              195164    187892      0.01    43   Genome annotation databases                
   TIGRFAMs                         4884568   4456033      0.21    14   Family and domain databases                
   TubercuList                         2052      2047     <0.01    85   Organism-specific databases                
   UCSC                               65092     65091     <0.01    57   Genome annotation databases                
   UniGene                           534016    501597      0.02    32   Sequence databases                         
   VectorBase                         78371     77856     <0.01    53   Genome annotation databases                
   World-2DPAGE                         936       931     <0.01    88   2D gel databases                           
   WormBase                           38279     38268     <0.01    62   Organism-specific databases                
   Xenbase                            25664     25589     <0.01    65   Organism-specific databases                
   ZFIN                               45296     44953     <0.01    60   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    74   Organism-specific databases                
   eggNOG                           2781062   2781061      0.12    20   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    54   Organism-specific databases                

Number of explicitly cross-referenced databases: 133


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.58   Gln (Q) 3.93   Leu (L) 9.87   Ser (S) 6.72
   Arg (R) 5.46   Glu (E) 6.20   Lys (K) 5.30   Thr (T) 5.59
   Asn (N) 4.10   Gly (G) 7.07   Met (M) 2.45   Trp (W) 1.30
   Asp (D) 5.32   His (H) 2.21   Phe (F) 4.01   Tyr (Y) 3.04
   Cys (C) 1.28   Ile (I) 5.95   Pro (P) 4.73   Val (V) 6.75

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 531337
Total number of entries encoded on a Plasmid: 286280
Total number of entries encoded on a Plastid: 20730
Total number of entries encoded on a Plastid; Apicoplast: 660
Total number of entries encoded on a Plastid; Chloroplast: 194092
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 868