Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2012_03 STATISTICS


1.  INTRODUCTION

Release 2012_03 of 21-Mar-2012 of UniProtKB/TrEMBL contains 20639311 sequence entries,
comprising 6750540747 amino acids .

527121 sequences have been added since release 2012_02, the sequence data of
318 existing entries has been updated and the annotations of
6289752 entries have been revised. This represents an increase of 3%.

Number of fragments: 3230979

Protein existence (PE):              entries      %
1: Evidence at protein level           13183     0.06%
2: Evidence at transcript level       564310     2.73%
3: Inferred from homology            4750386    23.02%
4: Predicted                        15311432    74.19%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 422873

   The first twenty species represent 1488603 sequences:   7.2 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:20341
                            2x:73056
                            3x:36260
                            4x:21842
                            5x:13565
                            6x: 9577
                            7x: 7249
                            8x: 5407
                            9x: 4384
                           10x: 8736
                       11- 20x:22023
                       21- 50x: 7749
                       51-100x: 2918
                         >100x: 6695


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     446913  Human immunodeficiency virus 1
       2     104945  Homo sapiens (Human)
       3      96919  Oryza sativa subsp. japonica (Rice)
       4      68479  uncultured bacterium
       5      67683  Hepatitis C virus
       6      60427  Mus musculus (Mouse)
       7      54041  Vitis vinifera (Grape)
       8      53144  Danio rerio (Zebrafish) (Brachydanio rerio)
       9      51315  Macaca mulatta (Rhesus macaque)
      10      50483  Trichomonas vaginalis
      11      50129  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      12      48816  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      13      48390  Hepatitis B virus (HBV)
      14      44070  Populus trichocarpa (Western balsam poplar) 
      15      43824  Arabidopsis thaliana (Mouse-ear cress)
      16      42093  Zea mays (Maize)
      17      42046  Callithrix jacchus (White-tufted-ear marmoset)
      18      39850  Paramecium tetraurelia
      19      39438  Oryza sativa subsp. indica (Rice)
      20      35598  Ailuropoda melanoleuca (Giant panda)
      21      34801  Physcomitrella patens subsp. patens (Moss)
      22      33894  Rattus norvegicus (Rat)
      23      33726  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      24      33511  Drosophila melanogaster (Fruit fly)
      25      33271  Selaginella moellendorffii (Spikemoss)
      26      32604  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      27      31827  Caenorhabditis remanei (Caenorhabditis vulgaris)
      28      31570  Monodelphis domestica (Gray short-tailed opossum)
      29      31381  Ricinus communis (Castor bean)
      30      30550  Daphnia pulex (Water flea)
      31      30300  Caenorhabditis brenneri (Nematode worm)
      32      29161  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      33      29026  Oikopleura dioica (Tunicate)
      34      28899  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      35      28093  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      36      28013  Gasterosteus aculeatus (Three-spined stickleback)
      37      27964  Bos taurus (Bovine)
      38      27306  Canis familiaris (Dog) (Canis lupus familiaris)
      39      27086  Gorilla gorilla gorilla (Lowland gorilla)
      40      26873  Ornithorhynchus anatinus (Duckbill platypus)
      41      26046  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
      42      25869  Oryzias latipes (Medaka fish) (Japanese ricefish)
      43      25755  Loxodonta africana (African elephant)
      44      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      45      25043  Oryctolagus cuniculus (Rabbit)
      46      24923  Sus scrofa (Pig)
      47      24864  Gallus gallus (Chicken)
      48      24825  Nematostella vectensis (Starlet sea anemone)
      49      24188  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      50      24056  Pongo abelii (Sumatran orangutan)
      51      23988  Escherichia coli
      52      23763  Equus caballus (Horse)
      53      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      54      23102  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      55      23100  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      56      22822  Pan troglodytes (Chimpanzee)
      57      22519  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      58      22122  Caenorhabditis elegans
      59      21566  Hordeum vulgare var. distichum (Two-rowed barley)
      60      21546  Heterocephalus glaber (Naked mole rat)
      61      21228  Caenorhabditis briggsae
      62      21085  Ixodes scapularis (Black-legged tick) (Deer tick)
      63      20981  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
      64      20851  Myotis lucifugus (Little brown bat)
      65      20423  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
      66      20124  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      67      20033  Cavia porcellus (Guinea pig)
      68      19662  Ralstonia solanacearum (Pseudomonas solanacearum)
      69      19648  Taeniopygia guttata (Zebra finch) (Poephila guttata)
      70      19201  Trypanosoma cruzi (strain CL Brener)
      71      19201  Toxoplasma gondii
      72      18907  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      73      18771  mine drainage metagenome
      74      18606  Drosophila simulans (Fruit fly)
      75      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
      76      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
      77      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
      78      17031  Drosophila yakuba (Fruit fly)
      79      16992  Tribolium castaneum (Red flour beetle)
      80      16754  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
      81      16712  Drosophila persimilis (Fruit fly)
      82      16425  Ectocarpus siliculosus (Brown alga)
      83      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
      84      16306  Loa loa (Eye worm) (Filaria loa)
      85      16295  Danaus plexippus (Monarch butterfly)
      86      16264  Trichinella spiralis (Trichina worm)
      87      16239  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
      88      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
      89      16190  Drosophila sechellia (Fruit fly)
      90      16129  Colletotrichum higginsianum
      91      15983  Drosophila pseudoobscura pseudoobscura (Fruit fly)
      92      15976  Meleagris gallopavo (Common turkey)
      93      15761  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
      94      15714  Naegleria gruberi (Amoeba)
      95      15623  Anopheles gambiae (African malaria mosquito)
      96      15622  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
      97      15418  Drosophila willistoni (Fruit fly)
      98      15230  Tetrahymena thermophila (strain SB210)
      99      15142  Drosophila ananassae (Fruit fly)
     100      15031  Harpegnathos saltator (Jerdon's jumping ant)
     101      14964  Hepatitis C virus subtype 1a
     102      14922  Drosophila erecta (Fruit fly)
     103      14848  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     104      14815  Hepatitis C virus subtype 1b
     105      14793  Camponotus floridanus (Florida carpenter ant)
     106      14781  Drosophila mojavensis (Fruit fly)
     107      14695  Drosophila virilis (Fruit fly)
     108      14669  Plasmodium chabaudi
     109      14649  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     110      14417  Volvox carteri (Green alga)
     111      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     112      14333  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     113      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     114      14106  Plasmodium falciparum
     115      13966  Acromyrmex echinatior (Panamanian leafcutter ant) 
     116      13862  Clonorchis sinensis (Chinese liver fluke)
     117      13767  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     118      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     119      13328  Aspergillus flavus 
     120      13268  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     121      13172  Mustela putorius furo (European domestic ferret) (Mustela furo)
     122      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     123      13042  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     124      12983  Albugo laibachii Nc14
     125      12950  Stigmatella aurantiaca (strain DW4/3-1)
     126      12936  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     127      12765  Glycine max (Soybean) (Glycine hispida)
     128      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     129      12696  Trypanosoma congolense (strain IL3000)
     130      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     131      12604  Schistosoma mansoni (Blood fluke)
     132      12576  Xenopus laevis (African clawed frog)
     133      12509  Trypanosoma cruzi
     134      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     135      12440  Polysphondylium pallidum (Cellular slime mold)
     136      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     137      12352  Dictyostelium purpureum (Slime mold)
     138      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     139      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     140      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     141      11933  Emericella nidulans  
     142      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     143      11780  Piriformospora indica (strain DSM 11827)
     144      11715  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     145      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     146      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     147      11646  Anopheles darlingi (Mosquito)
     148      11644  Plasmodium berghei (strain Anka)
     149      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     150      11562  Trichoplax adhaerens (Trichoplax reptans)
     151      11557  Trypanosoma vivax Y486
     152      11514  Aureococcus anophagefferens (Harmful bloom alga)
     153      11499  Brugia malayi (Filarial nematode worm)
     154      11480  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     155      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     156      11477  Helicobacter pylori (Campylobacter pylori)
     157      11289  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     158      11211  Ktedonobacter racemifer DSM 44963
     159      11211  Rabies virus
     160      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     161      10997  Schistosoma japonicum (Blood fluke)
     162      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     163      10966  Streptomyces clavuligerus ATCC 27064
     164      10949  Aspergillus niger 
     165      10844  Porcine reproductive and respiratory syndrome virus (PRRSV)
     166      10839  Pediculus humanus subsp. corporis (Body louse)
     167      10820  Chaetomium globosum  
     168      10570  Metarhizium robertsii (strain ARSEF 23 / ATCC MYA-3075) (Metarhizium anisopliae)
     169      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     170      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     171      10387  Pseudomonas syringae pv. glycinea str. race 4
     172      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     173      10377  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     174      10354  Phaeodactylum tricornutum (strain CCAP 1055/1)
     175      10274  Micromonas pusilla (strain CCMP1545) (Picoplanktonic green alga)
     176      10204  Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 
     177      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
     178      10171  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     179      10110  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
     180      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
     181      10087  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
     182      10051  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
     183      10013  Streptomyces bingchenggensis (strain BCW-1)
     184       9846  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
     185       9836  Chlorella variabilis (Green alga)
     186       9835  uncultured archaeon
     187       9822  Metarhizium acridum (strain CQMa 102)
     188       9760  Thielavia terrestris (strain ATCC 38088 / NRRL 8126) (Acremonium alabamense)
     189       9703  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
     190       9662  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
     191       9651  Cordyceps militaris (strain CM01) (Caterpillar fungus)
     192       9551  Amycolatopsis mediterranei S699
     193       9533  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
     194       9510  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
     195       9485  Ajellomyces dermatitidis (strain ER-3 / ATCC MYA-2586) 
     196       9443  Ajellomyces capsulata (strain H88) (Darling's disease fungus) 
     197       9443  Salmo salar (Atlantic salmon)
     198       9341  Anolis carolinensis (Green anole) (American chameleon)
     199       9331  Klebsiella pneumoniae
     200       9237  Monosiga brevicollis (Choanoflagellate)
     201       9201  Amycolatopsis mediterranei (strain U-32)
     202       9197  Streptomyces himastatinicus ATCC 53653
     203       9154  Ajellomyces capsulata (strain G186AR / H82 / ATCC MYA-2454 / RMSCC 2432)  
     204       9146  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
     205       9139  Pseudomonas syringae pv. pisi str. 1704B
     206       9113  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
     207       9112  Hypocrea jecorina (strain QM6a) (Trichoderma reesei)
     208       9081  Thielavia heterothallica (strain ATCC 42464 / BCRC 31852 / DSM 1799) 
     209       9076  Saccharomyces cerevisiae x Saccharomyces kudriavzevii VIN7
     210       9064  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
     211       9046  Streptomyces hygroscopicus subsp. jinggangensis 5008
     212       9009  Neurospora crassa 
     213       9006  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
     214       8989  Dictyostelium discoideum (Slime mold)
     215       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
     216       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
     217       8941  Streptomyces violaceusniger Tu 4113
     218       8940  Burkholderia sp. TJI49
     219       8900  Catenulispora acidiphila 
     220       8859  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
     221       8826  Millerozyma farinosa CBS 7064 (Pichia farinosa CBS 7064)
     222       8796  Aspergillus clavatus 
     223       8794  Bradyrhizobium japonicum USDA 6
     224       8783  Pseudomonas syringae pv. japonica str. M301072PT
     225       8755  Rhodococcus sp. (strain RHA1)
     226       8741  Trypanosoma brucei brucei (strain 927/4 GUTat10.1)
     227       8705  Trichophyton rubrum (strain ATCC MYA-4607 / CBS 118892) (Athlete's foot fungus)
     228       8699  Streptomyces coelicoflavus ZG0656
     229       8698  Paracoccidioides brasiliensis (strain Pb18)
     230       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
     231       8676  Trichophyton equinum (strain ATCC MYA-4606 / CBS 127.97) (Horse ringworm fungus)
     232       8661  Arthroderma otae (strain ATCC MYA-4605 / CBS 113480) (Microsporum canis)
     233       8605  Batrachochytrium dendrobatidis (strain JAM81 / FGSC 10211) (Frog chytrid fungus)
     234       8599  Entamoeba dispar (strain ATCC PRA-260 / SAW760)
     235       8520  Trichophyton tonsurans (strain CBS 112818) (Scalp ringworm fungus)
     236       8437  Plesiocystis pacifica SIR-1
     237       8428  Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)
     238       8394  Streptomyces sp. AA4
     239       8381  Bradyrhizobium japonicum
     240       8374  Capsaspora owczarzaki (strain ATCC 30864)
     241       8320  Frankia sp. CN3
     242       8312  Entamoeba histolytica
     243       8308  Grosmannia clavigera (strain kw1407 / UAMH 11150) (Blue stain fungus) 
     244       8265  Leishmania major
     245       8248  Microscilla marina ATCC 23134
     246       8242  Actinoplanes sp. (strain ATCC 31044 / CBS 674.73 / SE50/110)
     247       8202  Bradyrhizobium sp. STM 3843
     248       8202  Streptomyces sviceus ATCC 29083
     249       8201  Leishmania infantum
     250       8201  Microcoleus chthonoplastes PCC 7420


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          353827 (  2%)
    Bacteria       13239984 ( 64%)
    Eukaryota       5662670 ( 27%)
    Viruses         1342673 (  7%)
    Other             40156 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 104981 (  2%)           (  1%)
     Other Mammalia        792218 ( 14%)           (  4%)
     Other Vertebrata      593140 ( 10%)           (  3%)
     Viridiplantae         997094 ( 18%)           (  5%)
     Fungi                1241032 ( 22%)           (  6%)
     Insecta               743049 ( 13%)           (  4%)
     Nematoda              168100 (  3%)           (  1%)
     Other                1023056 ( 18%)           (  5%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  459370             1001-1100   124451
                 51- 100 1663844             1101-1200    87840
                101- 150 1895808             1201-1300    61470
                151- 200 1836531             1301-1400    40064
                201- 250 1851447             1401-1500    32204
                251- 300 1795586             1501-1600    22815
                301- 350 1639100             1601-1700    17203
                351- 400 1252871             1701-1800    13532
                401- 450 1073566             1801-1900    11170
                451- 500  892136             1901-2000     9560
                501- 550  601675             2001-2100     7611
                551- 600  466904             2101-2200     7516
                601- 650  339819             2201-2300     6005
                651- 700  266019             2301-2400     4764
                701- 750  228417             2401-2500     4087
                751- 800  203901             >2500        33755
                801- 850  153004
                851- 900  138053
                901- 950   95033
                951-1000   71201

   


   The average sequence length in UniProtKB/TrEMBL is   327 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    24944342                1.21                                                    
   Submitted to EMBL/GenBank/DDBJ  13925437  12504876      0.67                                                    
   Journal                         10185261   9528232      0.49                                                    
   Submitted to other databases      817788    810517      0.04                                                    
   Thesis                              9405      9347     <0.01                                                    
   Book citation                       6422      6373     <0.01                                                    
   Unpublished observations              28        28     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 427314


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      22050686                1.07                                                    
   CATALYTIC ACTIVITY               2110159   1954096      0.10     4                                              
   CAUTION                          6889072   6889059      0.33     1                                              
   COFACTOR                          690686    649548      0.03     8                                              
   DOMAIN                             59293     56359     <0.01     9                                              
   FUNCTION                         2307240   2149669      0.11     3                                              
   INTERACTION                          637       637     <0.01    11                                              
   MISCELLANEOUS                      37274     37203     <0.01    10                                              
   PATHWAY                          1092590   1004758      0.05     6                                              
   SIMILARITY                       6013293   5256494      0.29     2                                              
   SUBCELLULAR LOCATION             1860051   1789448      0.09     5                                              
   SUBUNIT                           990391    980419      0.05     7                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       6277989                0.30                                                    
   CHAIN                             609880    488240      0.03     2                                              
   NON_TER                          5230897   3231140      0.25     1                                              
   SIGNAL                            436450    435011      0.02     3                                              
   TRANSIT                              762       762     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             236359316               11.45                                                    
   AGD                                 2525      2525     <0.01    80   Organism-specific databases                
   ANU-2DPAGE                            53        53     <0.01    97   2D gel databases                           
   Allergome                           2474      1874     <0.01    82   Protein family/group databases             
   ArachnoServer                         66        66     <0.01    96   Organism-specific databases                
   ArrayExpress                       88461     88461     <0.01    50   Gene expression databases                  
   BRENDA                              2744      2713     <0.01    78   Enzyme and pathway databases               
   Bgee                              142539    142539      0.01    48   Gene expression databases                  
   BioCyc                            670168    655816      0.03    30   Enzyme and pathway databases               
   CAZy                               74174     69694     <0.01    54   Protein family/group databases             
   CGD                                 7089      7089     <0.01    74   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     5         5     <0.01   102   2D gel databases                           
   CTD                               290555    288980      0.01    38   Organism-specific databases                
   CYGD                                   2         2     <0.01   104   Organism-specific databases                
   ConoServer                           160       160     <0.01    92   Organism-specific databases                
   DIP                                 2673      2668     <0.01    79   Protein-protein interaction databases      
   DNASU                              44193     43816     <0.01    58   Protocols and materials databases          
   EMBL                            23145720  20219788      1.12     3   Sequence databases                         
   Ensembl                           824677    808970      0.04    29   Genome annotation databases                
   EnsemblBacteria                   835187    801070      0.04    28   Genome annotation databases                
   EnsemblFungi                      181659    181390      0.01    45   Genome annotation databases                
   EnsemblMetazoa                    308222    298648      0.01    37   Genome annotation databases                
   EnsemblPlants                     274938    249489      0.01    39   Genome annotation databases                
   EnsemblProtists                    77613     76547     <0.01    52   Genome annotation databases                
   EuPathDB                          178984    178983      0.01    46   Organism-specific databases                
   FlyBase                           195550    194003      0.01    42   Organism-specific databases                
   GO                              38189526  12618285      1.85     2   Ontologies                                 
   Gene3D                           9097744   7288446      0.44     6   Family and domain databases                
   GeneID                           6749748   6626540      0.33    10   Genome annotation databases                
   GeneTree                          647992    647925      0.03    31   Phylogenomic databases                     
   Genevestigator                     95203     95196     <0.01    49   Gene expression databases                  
   GenoList                           14737     14464     <0.01    71   Organism-specific databases                
   GenomeReviews                    4251245   4152900      0.21    15   Genome annotation databases                
   Gramene                            67783     67783     <0.01    55   Organism-specific databases                
   H-InvDB                              580       475     <0.01    88   Organism-specific databases                
   HAMAP                            1756238   1738077      0.09    24   Family and domain databases                
   HGNC                               43889     43810     <0.01    60   Organism-specific databases                
   HOGENOM                          2190027   2190027      0.11    23   Phylogenomic databases                     
   HOVERGEN                          313996    313996      0.02    36   Phylogenomic databases                     
   HSSP                              251147    250906      0.01    40   3D structure databases                     
   IPI                               325327    325189      0.02    35   Sequence databases                         
   InParanoid                        190945    190945      0.01    44   Phylogenomic databases                     
   IntAct                             16862     16862     <0.01    67   Protein-protein interaction databases      
   InterPro                        44181526  15952069      2.14     1   Family and domain databases                
   KEGG                             5308636   5210084      0.26    12   Genome annotation databases                
   KO                               2310506   2299710      0.11    22   Phylogenomic databases                     
   LegioList                           5139      5111     <0.01    75   Organism-specific databases                
   Leproma                              936       935     <0.01    86   Organism-specific databases                
   MEROPS                             55559     55559     <0.01    56   Protein family/group databases             
   MGI                                36876     36599     <0.01    63   Organism-specific databases                
   MINT                                8676      8676     <0.01    72   Protein-protein interaction databases      
   NextBio                            43947     43946     <0.01    59   Other                                      
   OMA                              3304603   3304592      0.16    17   Phylogenomic databases                     
   OrthoDB                           567838    567836      0.03    32   Phylogenomic databases                     
   PANTHER                          3191768   3024540      0.15    18   Family and domain databases                
   PATRIC                           8359222   8359189      0.41     8   Genome annotation databases                
   PDB                                16128      9351     <0.01    70   3D structure databases                     
   PDBsum                             16229      9303     <0.01    69   3D structure databases                     
   PHCI-2DPAGE                          100       100     <0.01    94   2D gel databases                           
   PIR                               173692    140870      0.01    47   Sequence databases                         
   PIRSF                            1495519   1495146      0.07    25   Family and domain databases                
   PMAP-CutDB                           229       229     <0.01    90   Other                                      
   PMMA-2DPAGE                            2         2     <0.01   103   2D gel databases                           
   PRIDE                             231994    231994      0.01    41   Proteomic databases                        
   PRINTS                           3393997   3021477      0.16    16   Family and domain databases                
   PROSITE                         10510292   6977639      0.51     5   Family and domain databases                
   Pathway_Interaction_DB                11         9     <0.01   101   Enzyme and pathway databases               
   PeptideAtlas                         146       146     <0.01    93   Proteomic databases                        
   PeroxiBase                          2522      2514     <0.01    81   Protein family/group databases             
   Pfam                            19760111  14684629      0.96     4   Family and domain databases                
   PharmGKB                            2850      2850     <0.01    77   Organism-specific databases                
   PhosphoSite                         1563      1563     <0.01    85   PTM databases                              
   PhylomeDB                         915489    915489      0.04    27   Phylogenomic databases                     
   PomBase                               40        27     <0.01    98   Organism-specific databases                
   ProDom                            391280    370814      0.02    34   Family and domain databases                
   ProMEX                               296       296     <0.01    89   Proteomic databases                        
   ProtClustDB                      2723896   2723896      0.13    20   Phylogenomic databases                     
   ProteinModelPortal               5860438   5860438      0.28    11   3D structure databases                     
   PseudoCAP                           4563      4557     <0.01    76   Organism-specific databases                
   REBASE                             24458     24458     <0.01    66   Protein family/group databases             
   REPRODUCTION-2DPAGE                   89        88     <0.01    95   2D gel databases                           
   RGD                                24872     24585     <0.01    65   Organism-specific databases                
   Reactome                             184       162     <0.01    91   Enzyme and pathway databases               
   RefSeq                           6772892   6629264      0.33     9   Sequence databases                         
   SGD                                   11        11     <0.01   100   Organism-specific databases                
   SMART                            4655329   3528126      0.23    13   Family and domain databases                
   SMR                              1001421   1001421      0.05    26   3D structure databases                     
   STRING                           2596970   2596970      0.13    21   Protein-protein interaction databases      
   SUPFAM                           8705555   7177643      0.42     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01    99   2D gel databases                           
   Siena-2DPAGE                           2         2     <0.01   105   2D gel databases                           
   TAIR                               16427     16347     <0.01    68   Organism-specific databases                
   TCDB                                2391      2379     <0.01    83   Protein family/group databases             
   TIGR                              194496    187455      0.01    43   Genome annotation databases                
   TIGRFAMs                         4312397   3934498      0.21    14   Family and domain databases                
   TubercuList                         2064      2059     <0.01    84   Organism-specific databases                
   UCSC                               53392     53392     <0.01    57   Genome annotation databases                
   UniGene                           516207    486936      0.03    33   Sequence databases                         
   VectorBase                         78371     77856     <0.01    51   Genome annotation databases                
   World-2DPAGE                         929       924     <0.01    87   2D gel databases                           
   WormBase                           38382     38372     <0.01    62   Organism-specific databases                
   Xenbase                            25180     25138     <0.01    64   Organism-specific databases                
   ZFIN                               42197     41414     <0.01    61   Organism-specific databases                
   dictyBase                           7998      7776     <0.01    73   Organism-specific databases                
   eggNOG                           2781768   2781768      0.13    19   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    53   Organism-specific databases                

Number of explicitly cross-referenced databases: 132


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.60   Gln (Q) 3.91   Leu (L) 9.88   Ser (S) 6.73
   Arg (R) 5.47   Glu (E) 6.17   Lys (K) 5.25   Thr (T) 5.60
   Asn (N) 4.09   Gly (G) 7.10   Met (M) 2.47   Trp (W) 1.31
   Asp (D) 5.31   His (H) 2.21   Phe (F) 4.01   Tyr (Y) 3.03
   Cys (C) 1.28   Ile (I) 5.95   Pro (P) 4.77   Val (V) 6.75

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 622109
Total number of entries encoded on a Plasmid: 273850
Total number of entries encoded on a Plastid: 15672
Total number of entries encoded on a Plastid; Apicoplast: 388
Total number of entries encoded on a Plastid; Chloroplast: 176646
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 772