Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2013_12 STATISTICS


1.  INTRODUCTION

Release 2013_12 of 11-Dec-2013 of UniProtKB/TrEMBL contains 48701576 sequence entries,
comprising 15448487119 amino acids.

532633 sequences have been added since release 2013_11, the sequence data of
13181 existing entries has been updated and the annotations of
12538446 entries have been revised. This represents an increase of 1%.

Number of fragments: 4761286

Protein existence (PE):              entries      %
1: Evidence at protein level           21691     0.04%
2: Evidence at transcript level       871408     1.79%
3: Inferred from homology           11516683    23.65%
4: Predicted                        36291794    74.52%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 455480

   The first twenty species represent 1970061 sequences:     4 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:18744
                            2x:75208
                            3x:40342
                            4x:28409
                            5x:17056
                            6x:11879
                            7x: 9031
                            8x: 7085
                            9x: 5570
                           10x:10685
                       11- 20x:33120
                       21- 50x:10759
                       51-100x: 4270
                         >100x:14620


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     563866  Human immunodeficiency virus 1
       2     206410  uncultured bacterium
       3     114605  Homo sapiens (Human)
       4      96850  Oryza sativa subsp. japonica (Rice)
       5      89807  Hepatitis C virus
       6      73890  Glycine max (Soybean) (Glycine hispida)
       7      73054  mine drainage metagenome
       8      70511  Hordeum vulgare var. distichum (Two-rowed barley)
       9      69186  Macaca mulatta (Rhesus macaque)
      10      66972  Hepatitis B virus (HBV)
      11      60585  Zea mays (Maize)
      12      56702  Mus musculus (Mouse)
      13      56232  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      14      55024  Populus trichocarpa (Western balsam poplar) 
      15      54979  Callithrix jacchus (White-tufted-ear marmoset)
      16      54899  Solanum tuberosum (Potato)
      17      54144  Vitis vinifera (Grape)
      18      52477  Danio rerio (Zebrafish) (Brachydanio rerio)
      19      50603  Trichomonas vaginalis
      20      49265  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      21      48906  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      22      41202  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      23      40802  Arabidopsis thaliana (Mouse-ear cress)
      24      39893  Oryza sativa subsp. indica (Rice)
      25      39850  Paramecium tetraurelia
      26      39300  Setaria italica (Foxtail millet) (Panicum italicum)
      27      38798  Mustela putorius furo (European domestic ferret) (Mustela furo)
      28      38163  human gut metagenome
      29      36778  Drosophila melanogaster (Fruit fly)
      30      36598  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      31      36444  Simian immunodeficiency virus (SIV)
      32      35923  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      33      35652  Ailuropoda melanoleuca (Giant panda)
      34      35599  Emiliania huxleyi CCMP1516
      35      35208  Acyrthosiphon pisum (Pea aphid)
      36      35066  Caenorhabditis japonica
      37      34832  Physcomitrella patens subsp. patens (Moss)
      38      34570  Thalassiosira oceanica (Marine diatom)
      39      34492  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      40      33850  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      41      33663  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      42      33256  Selaginella moellendorffii (Spikemoss)
      43      32767  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      44      32342  Oryza brachyantha
      45      32327  Sus scrofa (Pig)
      46      32141  Oryza glaberrima (African rice)
      47      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      48      31849  Pan troglodytes (Chimpanzee)
      49      31802  Anas platyrhynchos (Domestic duck) (Anas boschas)
      50      31389  Ricinus communis (Castor bean)
      51      31207  Capitella teleta (Polychaete worm)
      52      30954  Daphnia pulex (Water flea)
      53      30712  Caenorhabditis brenneri (Nematode worm)
      54      30147  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      55      29815  Amphimedon queenslandica (Sponge)
      56      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      57      29318  Pristionchus pacificus (Parasitic nematode)
      58      29234  Escherichia coli
      59      29183  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      60      29054  Oikopleura dioica (Tunicate)
      61      28829  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      62      28825  Capsella rubella
      63      28628  Prunus persica (Peach) (Amygdalus persica)
      64      28519  Canis familiaris (Dog) (Canis lupus familiaris)
      65      28099  Gasterosteus aculeatus (Three-spined stickleback)
      66      27781  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      67      27517  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      68      27473  Equus caballus (Horse)
      69      27089  Gorilla gorilla gorilla (Lowland gorilla)
      70      26834  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      71      25975  Oryzias latipes (Medaka fish) (Japanese ricefish)
      72      25797  Loxodonta africana (African elephant)
      73      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      74      25681  Rattus norvegicus (Rat)
      75      25680  Bos taurus (Bovine)
      76      24914  Nematostella vectensis (Starlet sea anemone)
      77      24643  Tetrahymena thermophila (strain SB210)
      78      24590  Guillardia theta CCMP2712
      79      24210  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      80      23717  Ornithorhynchus anatinus (Duckbill platypus)
      81      23649  Dendroctonus ponderosae (Mountain pine beetle)
      82      23565  Oxytricha trifallax
      83      23502  Latimeria chalumnae (West Indian ocean coelacanth)
      84      23361  Helobdella robusta (Californian leech)
      85      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      86      22998  Caenorhabditis elegans
      87      22751  Monodelphis domestica (Gray short-tailed opossum)
      88      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      89      22313  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      90      22163  gut metagenome
      91      21893  Oryctolagus cuniculus (Rabbit)
      92      21547  Heterocephalus glaber (Naked mole rat)
      93      21422  Gallus gallus (Chicken)
      94      21346  Caenorhabditis briggsae
      95      21128  Ixodes scapularis (Black-legged tick) (Deer tick)
      96      21001  Felis catus (Cat) (Felis silvestris catus)
      97      20867  Myotis lucifugus (Little brown bat)
      98      20850  Tupaia chinensis (Chinese tree shrew)
      99      20770  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
     100      20513  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
     101      20133  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
     102      20114  Ciona savignyi (Pacific transparent sea squirt)
     103      20081  Cavia porcellus (Guinea pig)
     104      20028  Camelus ferus (Wild Bactrian camel)
     105      19992  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     106      19818  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     107      19686  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     108      19553  Anolis carolinensis (Green anole) (American chameleon)
     109      19546  Pteropus alecto (Black flying fox)
     110      19520  Wuchereria bancrofti
     111      19300  Myotis brandtii (Brandt's bat)
     112      19201  Trypanosoma cruzi (strain CL Brener)
     113      19058  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     114      18957  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     115      18857  Drosophila simulans (Fruit fly)
     116      18599  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     117      18557  Bos grunniens mutus (wild yak)
     118      18477  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     119      18243  Tetranychus urticae (Two-spotted spider mite)
     120      18113  Atta cephalotes (Leafcutter ant)
     121      18047  Saprolegnia diclina VS20
     122      18026  Anopheles gambiae (African malaria mosquito)
     123      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     124      17784  Fusarium oxysporum (strain Fo5176) (Fusarium vascular wilt)
     125      17700  Bombyx mori (Silk moth)
     126      17683  Genlisea aurea
     127      17618  Hepatitis C virus subtype 1b
     128      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     129      17426  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     130      17284  Nasonia vitripennis (Parasitic wasp)
     131      17200  Plasmodium falciparum
     132      17056  Tribolium castaneum (Red flour beetle)
     133      17040  Drosophila yakuba (Fruit fly)
     134      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     135      16919  Meleagris gallopavo (Common turkey)
     136      16714  Drosophila persimilis (Fruit fly)
     137      16698  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     138      16639  Fusarium oxysporum f. sp. lycopersici  
     139      16609  Rhodnius prolixus (Triatomid bug)
     140      16426  Ectocarpus siliculosus (Brown alga)
     141      16388  Colletotrichum gloeosporioides (strain Cg-14) (Anthracnose fungus) 
     142      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     143      16329  Danaus plexippus (Monarch butterfly)
     144      16275  Trichinella spiralis (Trichina worm)
     145      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     146      16189  Drosophila sechellia (Fruit fly)
     147      16189  Schistosoma japonicum (Blood fluke)
     148      16148  Ficedula albicollis (Collared flycatcher) (Muscicapa albicollis)
     149      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     150      16076  Listeria monocytogenes
     151      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     152      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     153      15716  Naegleria gruberi (Amoeba)
     154      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     155      15568  Phytophthora ramorum (Sudden oak death agent)
     156      15465  Myotis davidii (David's myotis)
     157      15463  uncultured archaeon
     158      15422  Drosophila willistoni (Fruit fly)
     159      15371  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     160      15354  Loa loa (Eye worm) (Filaria loa)
     161      15345  Fusarium oxysporum f. sp. cubense (strain race 1) (Panama disease fungus)
     162      15310  Klebsiella pneumoniae
     163      15228  Pythium ultimum
     164      15144  Drosophila ananassae (Fruit fly)
     165      15057  Pararge aegeria (specked wood butterfly)
     166      15042  Harpegnathos saltator (Jerdon's jumping ant)
     167      15011  Strigamia maritima (European centipede) (Geophilus maritimus)
     168      14942  Acanthamoeba castellanii str. Neff
     169      14927  Drosophila erecta (Fruit fly)
     170      14857  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     171      14801  Camponotus floridanus (Florida carpenter ant)
     172      14794  Drosophila mojavensis (Fruit fly)
     173      14790  Gibberella fujikuroi (strain CBS 195.34 / IMI 58289 / NRRL A-6831)  
     174      14713  Plasmodium chabaudi
     175      14707  Drosophila virilis (Fruit fly)
     176      14666  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     177      14647  Rabies virus
     178      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     179      14562  Angomonas deanei
     180      14417  Volvox carteri (Green alga)
     181      14346  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     182      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     183      14235  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     184      14147  Fusarium oxysporum f. sp. cubense (strain race 4) (Panama disease fungus)
     185      13970  Acromyrmex echinatior (Panamanian leafcutter ant) 
     186      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     187      13878  Clonorchis sinensis (Chinese liver fluke)
     188      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     189      13806  Fomitopsis pinicola (strain FP-58527) (Brown rot fungus)
     190      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     191      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     192      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     193      13626  Trypanosoma cruzi
     194      13421  Hepatitis C virus subtype 1a
     195      13345  Aspergillus flavus 
     196      13329  Colletotrichum orbiculare   
     197      13306  Pyronema omphalodes CBS 100304
     198      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     199      13137  Porcine reproductive and respiratory syndrome virus (PRRSV)
     200      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     201      13114  Petromyzon marinus (Sea lamprey)
     202      13082  Glarea lozoyensis (strain ATCC 20868 / MF5171)
     203      13062  Mycosphaerella fijiensis (strain CIRAD86) (Black leaf streak disease fungus) 
     204      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     205      12983  Albugo laibachii Nc14
     206      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     207      12950  Stigmatella aurantiaca (strain DW4/3-1)
     208      12900  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     209      12856  Cochliobolus heterostrophus (strain C5 / ATCC 48332 / race O)  
     210      12846  Magnaporthe oryzae (strain Y34) (Rice blast fungus) (Pyricularia oryzae)
     211      12746  Schistosoma mansoni (Blood fluke)
     212      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     213      12711  Magnaporthe oryzae (strain P131) (Rice blast fungus) (Pyricularia oryzae)
     214      12703  Cochliobolus heterostrophus (strain C4 / ATCC 48331 / race T)  
     215      12697  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     216      12696  Trypanosoma congolense (strain IL3000)
     217      12652  Helicobacter pylori (Campylobacter pylori)
     218      12623  Xenopus laevis (African clawed frog)
     219      12586  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     220      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     221      12440  Polysphondylium pallidum (Cellular slime mold)
     222      12414  Mycosphaerella pini (strain NZE10 / CBS 128990) (Red band needle blight fungus) 
     223      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     224      12352  Dictyostelium purpureum (Slime mold)
     225      12300  Enterococcus gallinarum EGD-AAK12
     226      12197  Thanatephorus cucumeris (strain AG1-IB / isolate 7/3/14)  
     227      12174  Cochliobolus sativus (strain ND90Pr / ATCC 201652)  
     228      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     229      12143  Mucor circinelloides f. circinelloides (strain 1006PhL) (Mucormycosis agent) 
     230      12078  Ceriporiopsis subvermispora (strain B) (White-rot fungus)
     231      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     232      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     233      11987  Apis mellifera (Honeybee)
     234      11939  Emericella nidulans  
     235      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     236      11780  Piriformospora indica (strain DSM 11827)
     237      11752  Chondrocladia sp. SMF<DEU
     238      11751  Cladorhiza sp. SMF<DEU
     239      11750  Abyssocladia sp. SMF<DEU
     240      11735  Gloeophyllum trabeum (strain ATCC 11539 / FP-39264 / Madison 617) 
     241      11726  Phelloderma sp. SMF<DEU
     242      11719  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     243      11703  Salpingoeca rosetta (strain ATCC 50818 / BSB-021)
     244      11687  Setosphaeria turcica (strain 28A) (Northern leaf blight fungus) 
     245      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     246      11682  Eutypa lata (strain UCR-EL1) (Grapevine dieback disease fungus) 
     247      11679  Anopheles darlingi (Mosquito)
     248      11639  Plasmodium berghei (strain Anka)
     249      11603  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     250      11567  Trichoplax adhaerens (Trichoplax reptans)


   
   2.3  Taxonomic distribution of the sequences

   Kingdom        sequences (% of the database)
    Archaea          748111 (  2%)
    Bacteria       37158519 ( 76%)
    Eukaryota       8764101 ( 18%)
    Viruses         1866298 (  4%)
    Other            164546 ( <1%)



   Within Eukaryota:

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 114645 (  1%)           (  0%)
     Other Mammalia       1027057 ( 12%)           (  2%)
     Other Vertebrata      913015 ( 10%)           (  2%)
     Viridiplantae        1744003 ( 20%)           (  4%)
     Fungi                2138439 ( 24%)           (  4%)
     Insecta               936820 ( 11%)           (  2%)
     Nematoda              263046 (  3%)           (  1%)
     Other                1627076 ( 19%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1310570             1001-1100   259852
                 51- 100 4377094             1101-1200   181234
                101- 150 4882800             1201-1300   130279
                151- 200 4743688             1301-1400    76823
                201- 250 4804765             1401-1500    64287
                251- 300 4657874             1501-1600    43542
                301- 350 4202483             1601-1700    31608
                351- 400 3133473             1701-1800    23845
                401- 450 2731245             1801-1900    18964
                451- 500 2228098             1901-2000    16175
                501- 550 1409752             2001-2100    13177
                551- 600 1088084             2101-2200    13308
                601- 650  799821             2201-2300    10014
                651- 700  628410             2301-2400     8235
                701- 750  519616             2401-2500     7079
                751- 800  447926             >2500        55437
                801- 850  350246
                851- 900  312750
                901- 950  210923
                951-1000  146813



   The average sequence length in UniProtKB/TrEMBL is   317 amino acids.

   The shortest sequence is C4PYW0_SCHMA:     2 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    57375972                1.18                                                    
   Submitted to EMBL/GenBank/DDBJ  34820649  32813191      0.71                                                    
   Journal                         20628293  19524210      0.42                                                    
   Submitted to other databases     1909870   1898290      0.04                                                    
   Thesis                             10353     10294     <0.01                                                    
   Book citation                       6806      6756     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 488710


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      72499805                1.49                                                    
   CATALYTIC ACTIVITY               5464682   4999496      0.11     4                                              
   CAUTION                         29575380  29551581      0.61     1                                              
   COFACTOR                         2327004   2147873      0.05     8                                              
   DOMAIN                            253267    243054      0.01     9                                              
   ENZYME REGULATION                  77216     77216     <0.01    11                                              
   FUNCTION                         6308925   5969790      0.13     3                                              
   INTERACTION                         1693      1693     <0.01    12                                              
   MISCELLANEOUS                     152024    151820     <0.01    10                                              
   PATHWAY                          2840822   2571124      0.06     7                                              
   SIMILARITY                      16791533  13004701      0.34     2                                              
   SUBCELLULAR LOCATION             5359407   5160974      0.11     5                                              
   SUBUNIT                          3347852   3321192      0.07     6                                              

Total number of comment topics: 12


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                      29834651                0.61                                                    
   ACT_SITE                         2236754   1384383      0.05     5                                              
   BINDING                          4852942   1276727      0.10     2                                              
   CARBOHYD                             611       233     <0.01    27                                              
   CHAIN                             876690    712032      0.02     8                                              
   COILED                             81033     44851     <0.01    18                                              
   COMPBIAS                           14002     13878     <0.01    22                                              
   CROSSLNK                           12065      8005     <0.01    23                                              
   DISULFID                          110011     85042     <0.01    15                                              
   DNA_BIND                           69169     63322     <0.01    19                                              
   DOMAIN                            858386    667242      0.02     9                                              
   INIT_MET                           16140     16140     <0.01    21                                              
   INTRAMEM                             392        56     <0.01    28                                              
   LIPID                              82062     41031     <0.01    17                                              
   METAL                            4680296   1203879      0.10     3                                              
   MOD_RES                           375349    336352      0.01    13                                              
   MOTIF                             251773    153227      0.01    14                                              
   NON_STD                             1857      1706     <0.01    25                                              
   NON_TER                          7349181   4763229      0.15     1                                              
   NP_BIND                          1709117   1019297      0.04     6                                              
   PEPTIDE                               98        98     <0.01    29                                              
   PROPEP                              5852      5852     <0.01    24                                              
   REGION                           1479386    821508      0.03     7                                              
   REPEAT                             64300     15225     <0.01    20                                              
   SIGNAL                            728678    725255      0.01    10                                              
   SITE                              510333    297048      0.01    11                                              
   TOPO_DOM                          399669     78546      0.01    12                                              
   TRANSIT                             1312      1312     <0.01    26                                              
   TRANSMEM                         2982111    522111      0.06     4                                              
   ZN_FING                            85082     76721     <0.01    16                                              

Total number of feature keys: 29


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             533839251               10.96                                                    
   Allergome                           3750      3111     <0.01    84   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   103   Organism-specific databases                
   ArrayExpress                      183686    183686     <0.01    46   Gene expression databases                  
   BRENDA                              2625      2597     <0.01    87   Enzyme and pathway databases               
   Bgee                               99133     99133     <0.01    51   Gene expression databases                  
   BindingDB                           5767      5767     <0.01    78   Chemistry                                  
   BioCyc                           5680951   5603481      0.12    19   Enzyme and pathway databases               
   CAZy                               73974     69505     <0.01    55   Protein family/group databases             
   CGD                                 6936      6936     <0.01    77   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   109   2D gel databases                           
   CTD                               364974    363652      0.01    38   Organism-specific databases                
   ChEMBL                               656       656     <0.01    94   Chemistry                                  
   ChiTaRS                            65411     65411     <0.01    56   Other                                      
   ConoServer                           160       160     <0.01   100   Organism-specific databases                
   DIP                                 2961      2956     <0.01    86   Protein-protein interaction databases      
   DNASU                              42254     41920     <0.01    62   Protocols and materials databases          
   EMBL                            52182992  47573668      1.07     3   Sequence databases                         
   Ensembl                          1042662   1028074      0.02    30   Genome annotation databases                
   EnsemblBacteria                 29683438  29258951      0.61     5   Genome annotation databases                
   EnsemblFungi                      386896    384559      0.01    37   Genome annotation databases                
   EnsemblMetazoa                    802460    786287      0.02    34   Genome annotation databases                
   EnsemblPlants                     677482    644888      0.01    35   Genome annotation databases                
   EnsemblProtists                   195422    192854     <0.01    44   Genome annotation databases                
   EuPathDB                          157794    157792     <0.01    49   Organism-specific databases                
   EvolutionaryTrace                   8004      8004     <0.01    75   Other                                      
   FlyBase                           199049    197577     <0.01    43   Organism-specific databases                
   GO                              90385898  28797349      1.86     2   Ontologies                                 
   Gene3D                          21163202  16696675      0.43     8   Family and domain databases                
   GeneID                          10405167  10116265      0.21    12   Genome annotation databases                
   GeneTree                          886004    885946      0.02    33   Phylogenomic databases                     
   Genevestigator                     85810     85802     <0.01    52   Gene expression databases                  
   GenoList                           14730     14457     <0.01    72   Organism-specific databases                
   GenomeRNAi                         19280     19280     <0.01    70   Other                                      
   Gramene                           202547    202547     <0.01    42   Organism-specific databases                
   GuidetoPHARMACOLOGY                   21        21     <0.01   107   Chemistry                                  
   H-InvDB                              609       462     <0.01    95   Organism-specific databases                
   HAMAP                            4920916   4858506      0.10    21   Family and domain databases                
   HGNC                               47497     47412     <0.01    59   Organism-specific databases                
   HOGENOM                          3652621   3652551      0.08    24   Phylogenomic databases                     
   HOVERGEN                          304896    304885      0.01    39   Phylogenomic databases                     
   IPI                               278246    277349      0.01    40   Sequence databases                         
   InParanoid                        186045    186045     <0.01    45   Phylogenomic databases                     
   IntAct                             12175     12175     <0.01    73   Protein-protein interaction databases      
   InterPro                       103504906  36340908      2.13     1   Family and domain databases                
   KEGG                             9297378   9076142      0.19    14   Genome annotation databases                
   KO                               3813117   3795173      0.08    23   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    80   Organism-specific databases                
   Leproma                             1272      1270     <0.01    89   Organism-specific databases                
   MEROPS                            179543    179543     <0.01    47   Protein family/group databases             
   MGI                                52023     51598     <0.01    58   Organism-specific databases                
   MIM                                    4         4     <0.01   110   Organism-specific databases                
   MINT                               10216     10215     <0.01    74   Protein-protein interaction databases      
   NextBio                           207583    207575     <0.01    41   Other                                      
   OGP                                    3         3     <0.01   111   2D gel databases                           
   OMA                              6332417   6332408      0.13    18   Phylogenomic databases                     
   OrthoDB                          5210173   5210173      0.11    20   Phylogenomic databases                     
   PANTHER                          6477692   6176028      0.13    17   Family and domain databases                
   PATRIC                           8267896   8267768      0.17    15   Genome annotation databases                
   PDB                                20790     11391     <0.01    69   3D structure databases                     
   PDBsum                             21616     11736     <0.01    67   3D structure databases                     
   PIR                               172098    139238     <0.01    48   Sequence databases                         
   PIRSF                            4497751   4462255      0.09    22   Family and domain databases                
   PMAP-CutDB                           207       207     <0.01    99   Other                                      
   PRIDE                             937764    937764      0.02    31   Proteomic databases                        
   PRINTS                           6775871   6108210      0.14    16   Family and domain databases                
   PRO                                27272     27272     <0.01    65   Other                                      
   PROSITE                         22833683  15299197      0.47     6   Family and domain databases                
   PaxDb                              28914     28912     <0.01    64   Proteomic databases                        
   PeptideAtlas                         128       128     <0.01   101   Proteomic databases                        
   PeroxiBase                          2596      2588     <0.01    88   Protein family/group databases             
   Pfam                            46469016  33974163      0.95     4   Family and domain databases                
   PharmGKB                            3541      3541     <0.01    85   Organism-specific databases                
   PhosSite                             784       772     <0.01    92   PTM databases                              
   PhosphoSite                         1109      1109     <0.01    90   PTM databases                              
   PhylomeDB                         145366    145366     <0.01    50   Phylogenomic databases                     
   PomBase                               40        27     <0.01   104   Organism-specific databases                
   PptaseDB                              36        35     <0.01   105   Protein family/group databases             
   ProDom                            917529    885976      0.02    32   Family and domain databases                
   ProMEX                              5374      5374     <0.01    79   Proteomic databases                        
   ProtClustDB                      2715930   2715883      0.06    29   Phylogenomic databases                     
   ProteinModelPortal              12944726  12944726      0.27     9   3D structure databases                     
   PseudoCAP                           4528      4522     <0.01    82   Organism-specific databases                
   REBASE                             41493     41486     <0.01    63   Protein family/group databases             
   REPRODUCTION-2DPAGE                   66        65     <0.01   102   2D gel databases                           
   RGD                                21122     20231     <0.01    68   Organism-specific databases                
   Reactome                             242       186     <0.01    98   Enzyme and pathway databases               
   RefSeq                          10630592  10268080      0.22    11   Sequence databases                         
   SABIO-RK                             499       499     <0.01    96   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   108   Organism-specific databases                
   SMART                           10106079   7695210      0.21    13   Family and domain databases                
   SMR                              3495492   3495492      0.07    25   3D structure databases                     
   STRING                           2903087   2903010      0.06    26   Protein-protein interaction databases      
   SUPFAM                          22324494  17959112      0.46     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   106   2D gel databases                           
   SignaLink                           4371      4369     <0.01    83   Enzyme and pathway databases               
   TAIR                               14928     14855     <0.01    71   Organism-specific databases                
   TCDB                                5073      5063     <0.01    81   Protein family/group databases             
   TIGRFAMs                        11540318  10523035      0.24    10   Family and domain databases                
   TubercuList                         1094      1093     <0.01    91   Organism-specific databases                
   UCSC                               59105     58943     <0.01    57   Genome annotation databases                
   UniGene                           564535    534338      0.01    36   Sequence databases                         
   UniPathway                       2763922   2568498      0.06    28   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    53   Genome annotation databases                
   World-2DPAGE                         672       667     <0.01    93   2D gel databases                           
   WormBase                           42341     42167     <0.01    61   Organism-specific databases                
   Xenbase                            25532     25471     <0.01    66   Organism-specific databases                
   ZFIN                               45638     45171     <0.01    60   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    76   Organism-specific databases                
   eggNOG                           2765381   2765300      0.06    27   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    54   Organism-specific databases                
   mycoCLAP                             423       422     <0.01    97   Protein family/group databases             

Number of explicitly cross-referenced databases: 129


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.64   Gln (Q) 3.98   Leu (L) 9.97   Ser (S) 6.52
   Arg (R) 5.32   Glu (E) 6.22   Lys (K) 5.36   Thr (T) 5.54
   Asn (N) 4.14   Gly (G) 7.08   Met (M) 2.51   Trp (W) 1.28
   Asp (D) 5.34   His (H) 2.17   Phe (F) 4.05   Tyr (Y) 3.08
   Cys (C) 1.19   Ile (I) 6.16   Pro (P) 4.52   Val (V) 6.81

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.02

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Lys, Asp, Arg, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 683125
Total number of entries encoded on a Plasmid: 368828
Total number of entries encoded on a Plastid: 28787
Total number of entries encoded on a Plastid; Apicoplast: 842
Total number of entries encoded on a Plastid; Chloroplast: 253180
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1263

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again
UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health