Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2014_05 STATISTICS


1.  INTRODUCTION

Release 2014_05 of 14-May-2014 of UniProtKB/TrEMBL contains 56010222 sequence entries,
comprising 17785675050 amino acids.

1096365 sequences have been added since release 2014_04, the sequence data of
18185 existing entries has been updated and the annotations of
13742268 entries have been revised. This represents an increase of 2%.

Number of fragments: 5506759

Protein existence (PE):              entries      %
1: Evidence at protein level           26770     0.05%
2: Evidence at transcript level       854251     1.53%
3: Inferred from homology           14322100    25.57%
4: Predicted                        40807101    72.86%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 479260

   The first twenty species represent 2405826 sequences:   4.3 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:19714
                            2x:78358
                            3x:42287
                            4x:29956
                            5x:17604
                            6x:12653
                            7x: 9356
                            8x: 7478
                            9x: 5852
                           10x:10918
                       11- 20x:35389
                       21- 50x:11477
                       51-100x: 4564
                         >100x:16223


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     590031  Human immunodeficiency virus 1
       2     352018  marine sediment metagenome
       3     217903  uncultured bacterium
       4     115939  Homo sapiens (Human)
       5     105994  Triticum aestivum (Wheat)
       6      96773  Oryza sativa subsp. japonica (Rice)
       7      92711  Hepatitis C virus
       8      81523  Hepatitis B virus (HBV)
       9      73928  Glycine max (Soybean) (Glycine hispida)
      10      73055  mine drainage metagenome
      11      70495  Hordeum vulgare var. distichum (Two-rowed barley)
      12      69506  Macaca mulatta (Rhesus macaque)
      13      67669  Phytophthora parasitica (Potato buckeye rot agent)
      14      60710  human gut metagenome
      15      60414  Zea mays (Maize)
      16      56828  Mus musculus (Mouse)
      17      56237  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      18      55011  Callithrix jacchus (White-tufted-ear marmoset)
      19      54924  Solanum tuberosum (Potato)
      20      54157  Vitis vinifera (Grape)
      21      53267  Danio rerio (Zebrafish) (Brachydanio rerio)
      22      50605  Trichomonas vaginalis
      23      49267  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      24      48911  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      25      47057  Populus trichocarpa (Western balsam poplar) 
      26      41207  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      27      40465  Arabidopsis thaliana (Mouse-ear cress)
      28      39882  Oryza sativa subsp. indica (Rice)
      29      39850  Paramecium tetraurelia
      30      39364  Setaria italica (Foxtail millet) (Panicum italicum)
      31      38796  Mustela putorius furo (European domestic ferret) (Mustela furo)
      32      38067  Simian immunodeficiency virus (SIV)
      33      37309  Acyrthosiphon pisum (Pea aphid)
      34      36712  Drosophila melanogaster (Fruit fly)
      35      36598  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      36      35950  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      37      35672  Ailuropoda melanoleuca (Giant panda)
      38      35599  Emiliania huxleyi CCMP1516
      39      35307  Physcomitrella patens subsp. patens (Moss)
      40      35137  Caenorhabditis japonica
      41      34570  Thalassiosira oceanica (Marine diatom)
      42      34549  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      43      33864  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      44      33684  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      45      33258  Selaginella moellendorffii (Spikemoss)
      46      33016  Escherichia coli
      47      32772  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      48      32426  Sus scrofa (Pig)
      49      32342  Oryza brachyantha
      50      32302  Phaseolus vulgaris (Kidney bean) (French bean)
      51      32142  Oryza glaberrima (African rice)
      52      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      53      32050  Capitella teleta (Polychaete worm)
      54      31956  Anas platyrhynchos (Domestic duck) (Anas boschas)
      55      31861  Pan troglodytes (Chimpanzee)
      56      31402  Ricinus communis (Castor bean)
      57      31290  Citrus clementina
      58      30955  Daphnia pulex (Water flea)
      59      30713  Caenorhabditis brenneri (Nematode worm)
      60      30181  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      61      29845  Rhizophagus irregularis (strain DAOM 181602 / DAOM 197198 / MUCL 43194)  
      62      29815  Amphimedon queenslandica (Sponge)
      63      29471  Strongylocentrotus purpuratus (Purple sea urchin)
      64      29321  Pristionchus pacificus (Parasitic nematode)
      65      29193  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      66      29083  Oikopleura dioica (Tunicate)
      67      28831  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      68      28825  Capsella rubella
      69      28636  Prunus persica (Peach) (Amygdalus persica)
      70      28382  Thellungiella salsuginea (Saltwater cress) (Arabidopsis glauca)
      71      28104  Gasterosteus aculeatus (Three-spined stickleback)
      72      27804  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      73      27648  Canis familiaris (Dog) (Canis lupus familiaris)
      74      27532  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      75      27502  Equus caballus (Horse)
      76      27434  Amborella trichopoda
      77      27090  Gorilla gorilla gorilla (Lowland gorilla)
      78      26921  Tetrahymena thermophila (strain SB210)
      79      26849  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      80      26763  Morus notabilis
      81      26489  Phytophthora parasitica CJ01A1
      82      26477  Phytophthora parasitica P1569
      83      26452  Phytophthora parasitica P10297
      84      26438  Phytophthora parasitica (strain INRA-310)
      85      26349  Ovis aries (Sheep)
      86      25984  Oryzias latipes (Medaka fish) (Japanese ricefish)
      87      25825  Loxodonta africana (African elephant)
      88      25795  Bos taurus (Bovine)
      89      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      90      25710  Rattus norvegicus (Rat)
      91      25025  Aphanomyces astaci
      92      24915  Nematostella vectensis (Starlet sea anemone)
      93      24590  Guillardia theta CCMP2712
      94      24211  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      95      23804  Astyanax mexicanus (Blind cave fish) (Astyanax fasciatus mexicanus)
      96      23742  Ornithorhynchus anatinus (Duckbill platypus)
      97      23687  Lottia gigantea (Giant owl limpet)
      98      23650  Dendroctonus ponderosae (Mountain pine beetle)
      99      23565  Oxytricha trifallax
     100      23496  Latimeria chalumnae (West Indian ocean coelacanth)
     101      23369  Helobdella robusta (Californian leech)
     102      23318  Fusarium oxysporum f. sp. melonis 26406
     103      23283  Caenorhabditis elegans
     104      23271  Fusarium oxysporum f. sp. conglutinans race 2 54008
     105      23263  Fusarium oxysporum f. sp. pisi HDV247
     106      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
     107      22780  Monodelphis domestica (Gray short-tailed opossum)
     108      22754  Fusarium oxysporum f. sp. raphani 54005
     109      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
     110      22525  Lepisosteus oculatus (Spotted gar)
     111      22319  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
     112      22248  Fusarium oxysporum f. sp. vasinfectum 25433
     113      22174  gut metagenome
     114      21931  Oryctolagus cuniculus (Rabbit)
     115      21706  Haemonchus contortus (Barber pole worm)
     116      21689  Fusarium oxysporum f. sp. radicis-lycopersici 26381
     117      21661  Fusarium oxysporum Fo47
     118      21549  Fusarium oxysporum f. sp. lycopersici MN25
     119      21546  Heterocephalus glaber (Naked mole rat)
     120      21520  Gallus gallus (Chicken)
     121      21398  Caenorhabditis briggsae
     122      21339  Anopheles darlingi (Mosquito)
     123      21206  Echinococcus granulosus (Hydatid tapeworm)
     124      21136  Ixodes scapularis (Black-legged tick) (Deer tick)
     125      21026  Felis catus (Cat) (Felis silvestris catus)
     126      20897  Myotis lucifugus (Little brown bat)
     127      20864  Tupaia chinensis (Chinese tree shrew)
     128      20805  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     129      20767  Fusarium oxysporum FOSC 3-a
     130      20534  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     131      20149  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     132      20115  Ciona savignyi (Pacific transparent sea squirt)
     133      20097  Cavia porcellus (Guinea pig)
     134      20061  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     135      20028  Camelus ferus (Wild Bactrian camel)
     136      19976  Callorhynchus milii (Elephant fish) (Australian ghost shark)
     137      19826  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     138      19807  Fusarium oxysporum f. sp. cubense tropical race 4 54006
     139      19688  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     140      19601  Anolis carolinensis (Green anole) (American chameleon)
     141      19561  Pteropus alecto (Black flying fox)
     142      19522  Wuchereria bancrofti
     143      19300  Myotis brandtii (Brandt's bat)
     144      19201  Trypanosoma cruzi (strain CL Brener)
     145      19190  Necator americanus (Human hookworm)
     146      19062  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     147      18966  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     148      18861  Drosophila simulans (Fruit fly)
     149      18602  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     150      18559  Bos mutus
     151      18479  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     152      18417  Ophiophagus hannah (King cobra) (Naja hannah)
     153      18248  Tetranychus urticae (Two-spotted spider mite)
     154      18126  Atta cephalotes (Leafcutter ant)
     155      18048  Anopheles gambiae (African malaria mosquito)
     156      18047  Saprolegnia diclina VS20
     157      17976  Moniliophthora roreri (strain MCA 2997) (Cocoa frosty pod rot fungus) 
     158      17850  Hepatitis C virus subtype 1b
     159      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     160      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     161      17740  Bombyx mori (Silk moth)
     162      17683  Genlisea aurea
     163      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     164      17590  Gibberella moniliformis (strain M3125 / FGSC 7600)  
     165      17456  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     166      17383  Ceratitis capitata (Mediterranean fruit fly) (Tephritis capitata)
     167      17289  Nasonia vitripennis (Parasitic wasp)
     168      17269  Plasmodium falciparum
     169      17104  Drosophila yakuba (Fruit fly)
     170      17071  Tribolium castaneum (Red flour beetle)
     171      16949  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     172      16919  Meleagris gallopavo (Common turkey)
     173      16903  uncultured archaeon
     174      16715  Drosophila persimilis (Fruit fly)
     175      16698  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     176      16639  Fusarium oxysporum f. sp. lycopersici  
     177      16619  Rhodnius prolixus (Triatomid bug)
     178      16430  Ectocarpus siliculosus (Brown alga)
     179      16414  Klebsiella pneumoniae
     180      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     181      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     182      16330  Danaus plexippus (Monarch butterfly)
     183      16276  Trichinella spiralis (Trichina worm)
     184      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     185      16215  Neovison vison (American mink) (Mustela vison)
     186      16205  Ixodes ricinus (Common tick)
     187      16191  Drosophila sechellia (Fruit fly)
     188      16191  Schistosoma japonicum (Blood fluke)
     189      16148  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     190      16113  Listeria monocytogenes
     191      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     192      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     193      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     194      15718  Naegleria gruberi (Amoeba)
     195      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     196      15592  Phytophthora ramorum (Sudden oak death agent)
     197      15467  Myotis davidii (David's myotis)
     198      15423  Drosophila willistoni (Fruit fly)
     199      15412  Pestalotiopsis fici W106-1
     200      15411  Rabies virus
     201      15380  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     202      15355  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     203      15354  Loa loa (Eye worm) (Filaria loa)
     204      15228  Pythium ultimum
     205      15155  Drosophila ananassae (Fruit fly)
     206      15057  Pararge aegeria (specked wood butterfly)
     207      15042  Harpegnathos saltator (Jerdon's jumping ant)
     208      15012  Strigamia maritima (European centipede) (Geophilus maritimus)
     209      14944  Acanthamoeba castellanii str. Neff
     210      14928  Drosophila erecta (Fruit fly)
     211      14869  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     212      14801  Camponotus floridanus (Florida carpenter ant)
     213      14794  Drosophila mojavensis (Fruit fly)
     214      14790  Gibberella fujikuroi (strain CBS 195.34 / IMI 58289 / NRRL A-6831)  
     215      14713  Plasmodium chabaudi
     216      14708  Drosophila virilis (Fruit fly)
     217      14654  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     218      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     219      14597  Angomonas deanei
     220      14417  Volvox carteri (Green alga)
     221      14356  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     222      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     223      14235  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     224      14157  Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
     225      13971  Acromyrmex echinatior (Panamanian leafcutter ant) 
     226      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     227      13879  Clonorchis sinensis (Chinese liver fluke)
     228      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     229      13820  Porcine reproductive and respiratory syndrome virus (PRRSV)
     230      13806  Fomitopsis pinicola (strain FP-58527) (Brown rot fungus)
     231      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     232      13768  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     233      13704  Trypanosoma cruzi
     234      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     235      13425  Hepatitis C virus subtype 1a
     236      13417  Cladophialophora psammophila CBS 110553
     237      13400  Giardia intestinalis (Giardia lamblia)
     238      13345  Aspergillus flavus 
     239      13338  Colletotrichum orbiculare   
     240      13306  Pyronema omphalodes (strain CBS 100304) (Pyronema confluens)
     241      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     242      13189  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     243      13159  Heterobasidion irregulare TC 32-1
     244      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     245      13115  Petromyzon marinus (Sea lamprey)
     246      13082  Glarea lozoyensis (strain ATCC 20868 / MF5171)
     247      13062  Mycosphaerella fijiensis (strain CIRAD86) (Black leaf streak disease fungus) 
     248      13040  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     249      12983  Albugo laibachii Nc14
     250      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 


   
   2.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          793289 (  1%)
    Bacteria       42443528 ( 76%)
    Eukaryota      10248904 ( 18%)
    Viruses         1984679 (  4%)
    Other            539821 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 115992 (  1%)           (  0%)
     Other Mammalia       1072913 ( 10%)           (  2%)
     Other Vertebrata     1021787 ( 10%)           (  2%)
     Viridiplantae        2000760 ( 20%)           (  4%)
     Fungi                2651485 ( 26%)           (  5%)
     Insecta               997204 ( 10%)           (  2%)
     Nematoda              304525 (  3%)           (  1%)
     Other                2084238 ( 20%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1466401             1001-1100   300205
                 51- 100 5018781             1101-1200   209209
                101- 150 5616140             1201-1300   151959
                151- 200 5443039             1301-1400    89996
                201- 250 5515460             1401-1500    75279
                251- 300 5359854             1501-1600    50739
                301- 350 4839408             1601-1700    37053
                351- 400 3600347             1701-1800    27884
                401- 450 3142571             1801-1900    22488
                451- 500 2564912             1901-2000    18908
                501- 550 1630555             2001-2100    15772
                551- 600 1257215             2101-2200    15439
                601- 650  921455             2201-2300    11684
                651- 700  725730             2301-2400     9663
                701- 750  599411             2401-2500     8554
                751- 800  513421             >2500        65924
                801- 850  402626
                851- 900  359059
                901- 950  245621
                951-1000  170701



   The average sequence length in UniProtKB/TrEMBL is   317 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    66699462                1.19                                                    
   Submitted to EMBL/GenBank/DDBJ  40035063  37454763      0.71                                                    
   Journal                         24526868  23212851      0.44                                                    
   Submitted to other databases     2119507   2112218      0.04                                                    
   Thesis                             11060     11001     <0.01                                                    
   Book citation                       6963      6900     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 505852


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      87664997                1.57                                                    
   CATALYTIC ACTIVITY               6615627   6065097      0.12     4                                              
   CAUTION                         34871017  34831818      0.62     1                                              
   COFACTOR                         2918079   2670376      0.05     8                                              
   DOMAIN                            307485    293936      0.01     9                                              
   ENZYME REGULATION                 103756    103756     <0.01    11                                              
   FUNCTION                         7596657   7225710      0.14     3                                              
   INTERACTION                         1735      1735     <0.01    12                                              
   MISCELLANEOUS                     175654    175439     <0.01    10                                              
   PATHWAY                          3364559   3048730      0.06     7                                              
   SIMILARITY                      21235424  16347569      0.38     2                                              
   SUBCELLULAR LOCATION             6399731   6184862      0.11     5                                              
   SUBUNIT                          4075273   4036321      0.07     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      36070790                0.64                                                    
   ACT_SITE                         2798230   1745070      0.05     5                                              
   BINDING                          5967420   1562799      0.11     2                                              
   CARBOHYD                             684       265     <0.01    27                                              
   CHAIN                             886099    710599      0.02     9                                              
   COILED                            100138     57209     <0.01    18                                              
   COMPBIAS                           15600     15461     <0.01    22                                              
   CROSSLNK                           14836     10037     <0.01    23                                              
   DISULFID                          142777    110165     <0.01    15                                              
   DNA_BIND                          103932     96987     <0.01    16                                              
   DOMAIN                           1139640    881958      0.02     8                                              
   INIT_MET                           18354     18354     <0.01    21                                              
   INTRAMEM                             392        56     <0.01    28                                              
   LIPID                             101916     50958     <0.01    17                                              
   METAL                            5735602   1478610      0.10     3                                              
   MOD_RES                           456347    412838      0.01    13                                              
   MOTIF                             352464    227111      0.01    14                                              
   NON_STD                             1928      1803     <0.01    25                                              
   NON_TER                          8319047   5509839      0.15     1                                              
   NP_BIND                          2094450   1251836      0.04     6                                              
   PEPTIDE                              111       111     <0.01    29                                              
   PROPEP                              6642      6642     <0.01    24                                              
   REGION                           1962351   1062690      0.04     7                                              
   REPEAT                             80262     18475     <0.01    20                                              
   SIGNAL                            753044    749380      0.01    11                                              
   SITE                              832860    418627      0.01    10                                              
   TOPO_DOM                          462766     91055      0.01    12                                              
   TRANSIT                             1865      1855     <0.01    26                                              
   TRANSMEM                         3622816    634034      0.06     4                                              
   ZN_FING                            98217     88805     <0.01    19                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             625942710               11.18                                                    
   Allergome                           3726      3089     <0.01    83   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   102   Organism-specific databases                
   ArrayExpress                       63875     63875     <0.01    55   Gene expression databases                  
   BRENDA                              2615      2587     <0.01    86   Enzyme and pathway databases               
   Bgee                               97260     97260     <0.01    49   Gene expression databases                  
   BindingDB                           5750      5750     <0.01    78   Chemistry                                  
   BioCyc                           5683694   5605162      0.10    20   Enzyme and pathway databases               
   CAZy                               73938     69473     <0.01    53   Protein family/group databases             
   CGD                                 6802      6802     <0.01    76   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   108   2D gel databases                           
   CTD                               423160    421810      0.01    37   Organism-specific databases                
   ChEMBL                               658       658     <0.01    94   Chemistry                                  
   ChiTaRS                            64735     64735     <0.01    54   Other                                      
   ConoServer                           159       159     <0.01   100   Organism-specific databases                
   DIP                                 3016      3011     <0.01    85   Protein-protein interaction databases      
   DNASU                              42030     41704     <0.01    62   Protocols and materials databases          
   EMBL                            59809087  54806542      1.07     3   Sequence databases                         
   Ensembl                          1110468   1095655      0.02    30   Genome annotation databases                
   EnsemblBacteria                 29537351  29111551      0.53     6   Genome annotation databases                
   EnsemblFungi                      401487    399114      0.01    38   Genome annotation databases                
   EnsemblMetazoa                    862252    845647      0.02    33   Genome annotation databases                
   EnsemblPlants                     777545    739788      0.01    34   Genome annotation databases                
   EnsemblProtists                   191177    188614     <0.01    44   Genome annotation databases                
   EuPathDB                          159765    159764     <0.01    48   Organism-specific databases                
   EvolutionaryTrace                   7951      7951     <0.01    75   Other                                      
   FlyBase                           198949    197478     <0.01    42   Organism-specific databases                
   GO                             103858610  35110616      1.85     2   Ontologies                                 
   Gene3D                          30606342  24071550      0.55     5   Family and domain databases                
   GeneID                          11172329  10870936      0.20    13   Genome annotation databases                
   GeneTree                         1005908   1005849      0.02    31   Phylogenomic databases                     
   Genevestigator                     83151     83145     <0.01    50   Gene expression databases                  
   GenoList                           14730     14457     <0.01    71   Organism-specific databases                
   GenomeRNAi                         24948     24948     <0.01    66   Other                                      
   Gramene                           197805    197805     <0.01    43   Organism-specific databases                
   GuidetoPHARMACOLOGY                   21        21     <0.01   106   Chemistry                                  
   H-InvDB                              603       456     <0.01    95   Organism-specific databases                
   HAMAP                            7059274   6966050      0.13    19   Family and domain databases                
   HGNC                               46932     46852     <0.01    60   Organism-specific databases                
   HOGENOM                          3645748   3645705      0.07    24   Phylogenomic databases                     
   HOVERGEN                          303800    303792      0.01    39   Phylogenomic databases                     
   InParanoid                        185530    185530     <0.01    45   Phylogenomic databases                     
   IntAct                             14116     14116     <0.01    72   Protein-protein interaction databases      
   InterPro                       128190845  44923084      2.29     1   Family and domain databases                
   KEGG                            10026282   9790420      0.18    14   Genome annotation databases                
   KO                               4178290   4157145      0.07    23   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    80   Organism-specific databases                
   Leproma                             1272      1270     <0.01    89   Organism-specific databases                
   MEROPS                            175360    175360     <0.01    46   Protein family/group databases             
   MGI                                52123     51684     <0.01    57   Organism-specific databases                
   MIM                                    4         4     <0.01   109   Organism-specific databases                
   MINT                               10165     10164     <0.01    73   Protein-protein interaction databases      
   MaxQB                               1750      1750     <0.01    88   Proteomic databases                        
   NextBio                           205504    205480     <0.01    40   Other                                      
   OGP                                    3         3     <0.01   110   2D gel databases                           
   OMA                              7296653   7296650      0.13    17   Phylogenomic databases                     
   OrthoDB                          5181256   5181254      0.09    22   Phylogenomic databases                     
   PANTHER                          7122130   6938880      0.13    18   Family and domain databases                
   PATRIC                           8253224   8253094      0.15    15   Genome annotation databases                
   PDB                                23336     12524     <0.01    67   3D structure databases                     
   PDBsum                             23261     12479     <0.01    68   3D structure databases                     
   PIR                               171740    138887     <0.01    47   Sequence databases                         
   PIRSF                            5682066   5637748      0.10    21   Family and domain databases                
   PMAP-CutDB                           200       200     <0.01    99   Other                                      
   PRIDE                             926568    926568      0.02    32   Proteomic databases                        
   PRINTS                           8251799   7465069      0.15    16   Family and domain databases                
   PRO                                27008     27007     <0.01    64   Other                                      
   PROSITE                         28546450  19078875      0.51     8   Family and domain databases                
   PaxDb                              28538     28536     <0.01    63   Proteomic databases                        
   PeptideAtlas                         127       127     <0.01   101   Proteomic databases                        
   PeroxiBase                          2590      2582     <0.01    87   Protein family/group databases             
   Pfam                            57535735  42036169      1.03     4   Family and domain databases                
   PharmGKB                            3361      3361     <0.01    84   Organism-specific databases                
   PhosSite                             890       878     <0.01    92   PTM databases                              
   PhosphoSite                         1093      1093     <0.01    91   PTM databases                              
   PhylomeDB                         202969    202969     <0.01    41   Phylogenomic databases                     
   PomBase                                1         1     <0.01   111   Organism-specific databases                
   PptaseDB                              38        36     <0.01   104   Protein family/group databases             
   ProDom                           1142753   1106957      0.02    29   Family and domain databases                
   ProMEX                              5298      5298     <0.01    79   Proteomic databases                        
   ProteinModelPortal              14389285  14389285      0.26    10   3D structure databases                     
   PseudoCAP                           4507      4501     <0.01    81   Organism-specific databases                
   REBASE                             47267     47238     <0.01    58   Protein family/group databases             
   REPRODUCTION-2DPAGE                   65        64     <0.01   103   2D gel databases                           
   RGD                                21275     20259     <0.01    70   Organism-specific databases                
   Reactome                             244       202     <0.01    98   Enzyme and pathway databases               
   RefSeq                          11476167  11051096      0.20    12   Sequence databases                         
   SABIO-RK                             518       518     <0.01    96   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   107   Organism-specific databases                
   SMART                           12437127   9469941      0.22    11   Family and domain databases                
   SMR                              2628046   2628046      0.05    28   3D structure databases                     
   STRING                           3131635   3131527      0.06    26   Protein-protein interaction databases      
   SUPFAM                          28650377  23056195      0.51     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   105   2D gel databases                           
   SignaLink                           4313      4311     <0.01    82   Enzyme and pathway databases               
   TAIR                               22001     21882     <0.01    69   Organism-specific databases                
   TCDB                                5828      5818     <0.01    77   Protein family/group databases             
   TIGRFAMs                        14586796  13304252      0.26     9   Family and domain databases                
   TreeFam                           587959    587957      0.01    35   Phylogenomic databases                     
   TubercuList                         1101      1100     <0.01    90   Organism-specific databases                
   UCSC                               58215     58043     <0.01    56   Genome annotation databases                
   UniGene                           555419    522608      0.01    36   Sequence databases                         
   UniPathway                       3277781   3045883      0.06    25   Enzyme and pathway databases               
   VectorBase                         78248     77731     <0.01    51   Genome annotation databases                
   World-2DPAGE                         671       666     <0.01    93   2D gel databases                           
   WormBase                           43128     42955     <0.01    61   Organism-specific databases                
   Xenbase                            25526     25465     <0.01    65   Organism-specific databases                
   ZFIN                               47116     46817     <0.01    59   Organism-specific databases                
   dictyBase                           7997      7775     <0.01    74   Organism-specific databases                
   eggNOG                           2755141   2755107      0.05    27   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    52   Organism-specific databases                
   mycoCLAP                             464       463     <0.01    97   Protein family/group databases             

Number of explicitly cross-referenced databases: 130


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.73   Gln (Q) 4.00   Leu (L) 10.0   Ser (S) 6.52
   Arg (R) 5.38   Glu (E) 6.20   Lys (K) 5.29   Thr (T) 5.52
   Asn (N) 4.11   Gly (G) 7.10   Met (M) 2.50   Trp (W) 1.29
   Asp (D) 5.34   His (H) 2.18   Phe (F) 4.03   Tyr (Y) 3.06
   Cys (C) 1.19   Ile (I) 6.09   Pro (P) 4.55   Val (V) 6.81

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.02


   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 724344
Total number of entries encoded on a Plasmid: 419851
Total number of entries encoded on a Plastid: 32173
Total number of entries encoded on a Plastid; Apicoplast: 902
Total number of entries encoded on a Plastid; Chloroplast: 270042
Total number of entries encoded on a Plastid; Cyanelle: 9
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1641