Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Release 15.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 57.0 and the UniProtKB/TrEMBL Protein Database release 40.0.

More information on these databases can be found in the user manual What is the UniProt Knowledgebase?.


UniProtKB/Swiss-Prot protein knowledgebase release 57.0 statistics

Release 57.0 of 24-Mar-09 of UniProtKB/Swiss-Prot contains 428'650 sequence entries, comprising 154'416'236 amino acids abstracted from 177'584 references.

The growth of the database is summarized below.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154
44.0 07/04 153'871 56'608'159
45.0 10/04 163'235 59'631'787
46.0 02/05 168'297 61'443'278
47.0 05/05 181'571 65'746'672
48.0 09/05 194'317 70'391'852
49.0 02/06 207'132 75'438'310
50.0 05/06 222'289 81'585'146
51.0 10/06 241'242 88'541'632
52.0 03/07 260'167 95'638'062
53.0 05/07 269'293 98'902'758
54.0 07/07 276'256 101'466'206
55.0 02/08 356'194 127'836'513
56.0 07/08 392'667 141'217'034
57.0 03/09 428'650 154'416'236

In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.


Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

  • be as complete as possible. All sequences available at a given time should be immediately included in UniProtKB/Swiss-Prot. This also includes sequence corrections and updates;
  • provide a higher level of annotation;
  • provide cross-references to specialized database(s) that contain, among other data, some information about the genes that code for these proteins;
  • provide specific indexes and documents.

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana TAIR arath.txt 7'876
C.albicans None yet calbican.txt 767
C.elegans Wormpep celegans.txt 3218
D.discoideum DictyBase dicty.txt 3'557
D.melanogaster FlyBase fly.txt 2'904
M.musculus MGD mgdtosp.txt 16'101
S.cerevisiae SGD yeast.txt 6'552
S.pombe GeneDB_SPombe pombe.txt 4'752

UniProtKB/Swiss-Prot release statistics
                    
                    UniProtKB/Swiss-Prot protein knowledgebase release 57.0 statistics
                    
                    
                    1.  INTRODUCTION
                    
                    Release 57.0 of 24-Mar-09 of UniProtKB/Swiss-Prot contains 428650 sequence entries,
                    comprising 154416236 amino acids abstracted from 177584 references. 
                    
                    36053 sequences have been added since release 56.0, the sequence data of
                    2010 existing entries has been updated and the annotations of
                    368500 entries have been revised.
                    
                    Number of fragments: 8328
                    Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 27591 
                    
                    
                    Protein existence (PE):           entries     %
                    
                    1: Evidence at protein level        63411   14.8%
                    2: Evidence at transcript level     64726   15.1%
                    3: Inferred from homology          285291   66.6%
                    4: Predicted                        13812    3.2%
                    5: Uncertain                         1410    0.3%
                    
                    
                    
                    2.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/Swiss-Prot: 11669
                    
                    The first twenty species represent 103439 sequences:  24.1 % of the total
                    number of entries.
                    
                    
                    2.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x: 5235
                    2x: 1703
                    3x:  854
                    4x:  556
                    5x:  413
                    6x:  319
                    7x:  228
                    8x:  194
                    9x:  172
                    10x:  101
                    11- 20x:  515
                    21- 50x:  364
                    51-100x:  217
                    >100x:  798
                    
                    
                    2.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1      20333  Homo sapiens (Human)
                    2      16101  Mus musculus (Mouse)
                    3       7876  Arabidopsis thaliana (Mouse-ear cress)
                    4       7314  Rattus norvegicus (Rat)
                    5       6552  Saccharomyces cerevisiae (Baker's yeast)
                    6       5600  Bos taurus (Bovine)
                    7       4752  Schizosaccharomyces pombe (Fission yeast)
                    8       4342  Escherichia coli (strain K12)
                    9       3600  Bacillus subtilis
                    10       3557  Dictyostelium discoideum (Slime mold)
                    11       3218  Caenorhabditis elegans
                    12       2980  Xenopus laevis (African clawed frog)
                    13       2904  Drosophila melanogaster (Fruit fly)
                    14       2429  Danio rerio (Zebrafish) (Brachydanio rerio)
                    15       2199  Pongo abelii (Sumatran orangutan)
                    16       2104  Gallus gallus (Chicken)
                    17       2044  Oryza sativa subsp. japonica (Rice)
                    18       1979  Escherichia coli O157:H7
                    19       1782  Methanocaldococcus jannaschii (Methanococcus jannaschii)
                    20       1773  Haemophilus influenzae
                    21       1736  Salmonella typhimurium
                    22       1652  Escherichia coli O6
                    23       1649  Shigella flexneri
                    24       1462  Mycobacterium tuberculosis
                    25       1343  Sus scrofa (Pig)
                    26       1334  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    27       1323  Salmonella typhi
                    28       1260  Pseudomonas aeruginosa
                    29       1198  Mycobacterium bovis
                    30       1140  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
                    31       1012  Synechocystis sp. (strain PCC 6803)
                    32        989  Archaeoglobus fulgidus
                    33        980  Yersinia pestis
                    34        927  Vibrio cholerae
                    35        909  Acanthamoeba polyphaga mimivirus (APMV)
                    36        904  Salmonella paratyphi A
                    37        898  Rhizobium meliloti (Sinorhizobium meliloti)
                    38        896  Staphylococcus aureus (strain N315)
                    39        896  Staphylococcus aureus (strain Mu50 / ATCC 700699)
                    40        881  Oryctolagus cuniculus (Rabbit)
                    41        869  Staphylococcus aureus (strain COL)
                    42        867  Staphylococcus aureus (strain MW2)
                    43        862  Staphylococcus aureus (strain MSSA476)
                    44        859  Staphylococcus aureus (strain MRSA252)
                    45        854  Salmonella choleraesuis
                    46        846  Escherichia coli O6:K15:H31 (strain 536 / UPEC)
                    47        844  Yersinia pseudotuberculosis
                    48        842  Shigella sonnei (strain Ss046)
                    49        795  Escherichia coli O9:H4 (strain HS)
                    50        794  Shigella boydii serotype 4 (strain Sb227)
                    51        784  Ashbya gossypii (Yeast) (Eremothecium gossypii)
                    52        784  Escherichia coli O139:H28 (strain E24377A / ETEC)
                    53        783  Escherichia coli (strain UTI89 / UPEC)
                    54        782  Vibrio parahaemolyticus
                    55        776  Shigella dysenteriae serotype 1 (strain Sd197)
                    56        767  Candida albicans (Yeast)
                    57        765  Pasteurella multocida
                    58        764  Aquifex aeolicus
                    59        760  Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)
                    60        758  Kluyveromyces lactis (Yeast) (Candida sphaerica)
                    61        756  Canis familiaris (Dog)
                    62        751  Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
                    63        745  Neurospora crassa
                    64        723  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
                    65        722  Streptomyces coelicolor
                    66        722  Staphylococcus epidermidis (strain ATCC 12228)
                    67        719  Shigella flexneri serotype 5b (strain 8401)
                    68        719  Vibrio vulnificus
                    69        716  Photorhabdus luminescens subsp. laumondii
                    70        715  Candida glabrata (Yeast) (Torulopsis glabrata)
                    71        709  Bacillus halodurans
                    72        703  Vibrio vulnificus (strain YJ016)
                    73        694  Bacillus anthracis
                    74        693  Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
                    75        688  Yersinia pestis bv. Antiqua (strain Nepal516)
                    76        687  Mycoplasma pneumoniae
                    77        682  Yersinia pestis bv. Antiqua (strain Antiqua)
                    78        677  Pan troglodytes (Chimpanzee)
                    79        677  Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)
                    80        671  Staphylococcus aureus (strain NCTC 8325)
                    81        670  Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)
                    82        669  Escherichia coli O1:K1 / APEC
                    83        668  Anabaena sp. (strain PCC 7120)
                    84        662  Enterobacter sp. (strain 638)
                    85        660  Pseudomonas syringae pv. tomato
                    86        655  Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
                    87        653  Pseudomonas putida (strain KT2440)
                    88        652  Mycobacterium leprae
                    89        637  Escherichia coli
                    90        635  Yersinia pestis (strain Pestoides F)
                    91        631  Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
                    92        631  Bradyrhizobium japonicum
                    93        626  Staphylococcus aureus (strain USA300)
                    94        620  Zea mays (Maize)
                    95        615  Serratia proteamaculans (strain 568)
                    96        614  Treponema pallidum
                    97        613  Bacillus cereus (strain ATCC 14579 / DSM 31)
                    98        603  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    99        602  Staphylococcus aureus (strain bovine RF122 / ET3-1)
                    100        601  Shewanella oneidensis
                    101        600  Methanobacterium thermoautotrophicum
                    102        600  Ralstonia solanacearum (Pseudomonas solanacearum)
                    103        591  Rhizobium loti (Mesorhizobium loti)
                    104        590  Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)
                    105        583  Listeria monocytogenes
                    106        583  Rickettsia prowazekii
                    107        579  Photobacterium profundum (Photobacterium sp. (strain SS9))
                    108        579  Helicobacter pylori (Campylobacter pylori)
                    109        576  Xanthomonas campestris pv. campestris
                    110        575  Listeria innocua
                    111        573  Lactococcus lactis subsp. lactis (Streptococcus lactis)
                    112        573  Staphylococcus haemolyticus (strain JCSC1435)
                    113        572  Buchnera aphidicola subsp. Acyrthosiphon pisum 
                    114        570  Neisseria meningitidis serogroup B
                    115        569  Emericella nidulans (Aspergillus nidulans)
                    116        566  Enterobacter sakazakii (strain ATCC BAA-894)
                    117        565  Staphylococcus saprophyticus subsp. saprophyticus 
                    118        563  Yarrowia lipolytica (Candida lipolytica)
                    119        562  Brucella melitensis
                    120        562  Buchnera aphidicola subsp. Schizaphis graminum
                    121        561  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
                    122        560  Helicobacter pylori J99 (Campylobacter pylori J99)
                    123        559  Bacillus cereus (strain ATCC 10987)
                    124        559  Brucella suis
                    125        546  Neisseria meningitidis serogroup A
                    126        540  Bacillus thuringiensis subsp. konkukian
                    127        539  Xanthomonas axonopodis pv. citri (Citrus canker)
                    128        536  Caulobacter crescentus (Caulobacter vibrioides)
                    129        534  Clostridium acetobutylicum
                    130        534  Pseudomonas syringae pv. syringae (strain B728a)
                    131        531  Bacillus cereus (strain ZK / E33L)
                    132        530  Oceanobacillus iheyensis
                    133        529  Pseudomonas aeruginosa (strain UCBPP-PA14)
                    134        526  Bacillus licheniformis (strain DSM 13 / ATCC 14580)
                    135        525  Pseudomonas fluorescens (strain Pf0-1)
                    136        524  Vibrio fischeri (strain ATCC 700601 / ES114)
                    137        521  Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
                    138        516  Listeria monocytogenes serotype 4b (strain F2365)
                    139        512  Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
                    140        510  Streptococcus pneumoniae
                    141        510  Xylella fastidiosa
                    142        508  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
                    143        507  Buchnera aphidicola subsp. Baizongia pistaciae
                    144        502  Thermotoga maritima
                    145        501  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
                    146        496  Chromobacterium violaceum
                    147        493  Bordetella parapertussis
                    148        493  Rickettsia conorii
                    149        493  Sodalis glossinidius (strain morsitans)
                    150        493  Bordetella pertussis
                    151        492  Vibrio cholerae serotype O1 (strain ATCC 39541 / Ogawa 395 / O395)
                    152        491  Haemophilus ducreyi
                    153        485  Brucella abortus
                    154        483  Mycoplasma genitalium
                    155        483  Deinococcus radiodurans
                    156        480  Pseudomonas aeruginosa (strain PA7)
                    157        479  Clostridium perfringens
                    158        475  Corynebacterium glutamicum (Brevibacterium flavum)
                    159        474  Pseudomonas entomophila (strain L48)
                    160        473  Haemophilus influenzae (strain 86-028NP)
                    161        472  Methanosarcina acetivorans
                    162        472  Xanthomonas campestris pv. campestris (strain 8004)
                    163        470  Geobacillus kaustophilus
                    164        469  Streptomyces avermitilis
                    165        469  Bacillus clausii (strain KSM-K16)
                    166        468  Mannheimia succiniciproducens (strain MBEL55E)
                    167        468  Burkholderia pseudomallei (Pseudomonas pseudomallei)
                    168        463  Shewanella sp. (strain MR-7)
                    169        462  Vibrio harveyi (strain ATCC BAA-1116 / BB120)
                    170        460  Pyrococcus horikoshii
                    171        460  Thermosynechococcus elongatus (strain BP-1)
                    172        460  Shewanella sp. (strain MR-4)
                    173        459  Staphylococcus aureus (strain Newman)
                    174        458  Synechococcus elongatus (strain PCC 7942) (Anacystis nidulans R2)
                    175        457  Oryza sativa subsp. indica (Rice)
                    176        456  Brucella abortus (strain 2308)
                    177        456  Pyrococcus abyssi
                    178        455  Enterococcus faecalis (Streptococcus faecalis)
                    179        453  Methanosarcina mazei (Methanosarcina frisia)
                    180        452  Halobacterium salinarium (Halobacterium halobium)
                    181        448  Rickettsia felis (Rickettsia azadi)
                    182        447  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
                    183        447  Aspergillus fumigatus (Sartorya fumigata)
                    184        446  Rhodopseudomonas palustris
                    185        446  Lactobacillus plantarum
                    186        445  Burkholderia mallei (Pseudomonas mallei)
                    187        445  Anabaena variabilis (strain ATCC 29413 / PCC 7937)
                    188        444  Pseudomonas putida (strain F1 / ATCC 700007)
                    189        443  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    190        443  Xanthomonas campestris pv. vesicatoria (strain 85-10)
                    191        441  Streptococcus mutans
                    192        441  Ovis aries (Sheep)
                    193        440  Acinetobacter sp. (strain ADP1)
                    194        440  Bacillus amyloliquefaciens (strain FZB42)
                    195        439  Chlamydia trachomatis
                    196        438  Thermoanaerobacter tengcongensis
                    197        438  Staphylococcus aureus (strain Mu3 / ATCC 700698)
                    198        437  Pyrococcus furiosus
                    199        435  Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
                    200        435  Shewanella frigidimarina (strain NCIMB 400)
                    201        435  Rickettsia bellii (strain RML369-C)
                    202        434  Pseudomonas putida (strain GB-1)
                    203        434  Shewanella sp. (strain ANA-3)
                    204        433  Streptococcus pyogenes serotype M6
                    205        433  Nicotiana tabacum (Common tobacco)
                    206        433  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
                    207        430  Ralstonia eutropha  (Cupriavidus necator 
                    208        427  Borrelia burgdorferi (Lyme disease spirochete)
                    209        427  Methylococcus capsulatus
                    210        427  Campylobacter jejuni
                    211        426  Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
                    212        422  Shewanella baltica (strain OS185)
                    213        422  Chlamydia pneumoniae (Chlamydophila pneumoniae)
                    214        418  Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
                    215        418  Gloeobacter violaceus
                    216        418  Pseudoalteromonas haloplanktis (strain TAC 125)
                    217        417  Hahella chejuensis (strain KCTC 2396)
                    218        415  Streptococcus pyogenes serotype M1
                    219        414  Mycobacterium paratuberculosis
                    220        413  Pseudomonas mendocina (strain ymp)
                    221        412  Chlamydia muridarum
                    222        412  Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
                    223        412  Sulfolobus solfataricus
                    224        412  Burkholderia xenovorans (strain LB400)
                    225        411  Staphylococcus aureus (strain JH1)
                    226        411  Nitrosomonas europaea
                    227        409  Streptococcus pyogenes serotype M18
                    228        409  Rhizobium sp. (strain NGR234)
                    229        409  Dechloromonas aromatica (strain RCB)
                    230        408  Shewanella sp. (strain W3-18-1)
                    231        408  Streptococcus pyogenes serotype M3
                    232        408  Shewanella putrefaciens (strain CN-32 / ATCC BAA-453)
                    233        407  Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1) 
                    234        407  Shewanella baltica (strain OS195)
                    235        405  Staphylococcus aureus (strain JH9)
                    236        405  Aeromonas salmonicida (strain A449)
                    237        404  Rickettsia typhi
                    238        404  Shewanella denitrificans (strain OS217 / ATCC BAA-1090 / DSM 15013)
                    239        403  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
                    240        401  Shewanella baltica (strain OS155 / ATCC BAA-1091)
                    241        400  Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
                    242        400  Chlorobium tepidum
                    243        400  Idiomarina loihiensis
                    244        400  Synechococcus sp. (strain WH8102)
                    245        399  Haemophilus influenzae (strain PittEE)
                    246        399  Burkholderia cenocepacia (strain AU 1054)
                    247        397  Shewanella amazonensis (strain ATCC BAA-1098 / SB2B)
                    248        397  Caenorhabditis briggsae
                    249        396  Actinobacillus pleuropneumoniae serotype 5b (strain L20)
                    250        396  Corynebacterium efficiens
                    
                    
                    
                    2.3  Taxonomic distribution of the sequences
                    
                    Kingdom        sequences (% of the database)
                    Archaea           15698 (  4%)
                    Bacteria         249878 ( 58%)
                    Eukaryota        150533 ( 35%)
                    Viruses           12541 (  3%)
                    
                    
                    Within Eukaryota:
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  20334 ( 14%)           (  5%)
                    Other Mammalia         43931 ( 29%)           ( 10%)
                    Other Vertebrata       14925 ( 10%)           (  3%)
                    Viridiplantae          27014 ( 18%)           (  6%)
                    Fungi                  23102 ( 15%)           (  5%)
                    Insecta                 6145 (  4%)           (  1%)
                    Nematoda                3869 (  3%)           (  1%)
                    Other                  11213 (  7%)           (  3%)
                    
                    
                    
                    3.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50    7410             1001-1100     3070
                    51- 100   31441             1101-1200     2119
                    101- 150   44644             1201-1300     1666
                    151- 200   45150             1301-1400     1581
                    201- 250   45195             1401-1500     1289
                    251- 300   39221             1501-1600      599
                    301- 350   39049             1601-1700      472
                    351- 400   34750             1701-1800      389
                    401- 450   27771             1801-1900      364
                    451- 500   22997             1901-2000      301
                    501- 550   16055             2001-2100      184
                    551- 600   11862             2101-2200      255
                    601- 650   10145             2201-2300      263
                    651- 700    7254             2301-2400      162
                    701- 750    6070             2401-2500      118
                    751- 800    4263             >2500          938
                    801- 850    3652
                    851- 900    4290
                    901- 950    3123
                    951-1000    2210
                    
                    
                    The average sequence length in UniProtKB/Swiss-Prot is 360 amino acids.
                    
                    The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
                    The longest sequence is  TITIN_MOUSE (A2ASS6): 35213 amino acids.
                    
                    
                    4.  JOURNAL CITATIONS
                    
                    Note: the following citation statistics reflect the number of distinct
                    journal citations.
                    
                    Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1975
                    
                    
                    4.1 Table of the frequency of journal citations
                    
                    Journals cited 1x:  647
                    2x:  267
                    3x:  132
                    4x:  107
                    5x:   77
                    6x:   60
                    7x:   38
                    8x:   41
                    9x:   33
                    10x:   23
                    11- 20x:  151
                    21- 50x:  157
                    51-100x:   91
                    >100x:  151
                    
                    4.2  List of the most cited journals in UniProtKB/Swiss-Prot
                    
                    Nb    Citations   Journal name
                    --    ---------   -------------------------------------------------------------
                    1        16828   Journal of Biological Chemistry
                    2         7853   Proceedings of the National Academy of Sciences of the U.S.A.
                    3         4843   Journal of Bacteriology
                    4         4434   Gene
                    5         4294   Biochemical and Biophysical Research Communications
                    6         4221   Nucleic Acids Research
                    7         3817   FEBS Letters
                    8         3625   Biochemistry
                    9         3557   The EMBO Journal
                    10         3205   Molecular and Cellular Biology
                    11         3051   Nature
                    12         3045   European Journal of Biochemistry
                    13         2879   Biochimica et Biophysica Acta
                    14         2828   Journal of Molecular Biology
                    15         2489   Cell
                    16         2457   Genomics
                    17         2075   Biochemical Journal
                    18         1957   Science
                    19         1785   Journal of Virology
                    20         1652   Molecular Microbiology
                    21         1472   Journal of Cell Biology
                    22         1453   Plant Molecular Biology
                    23         1293   Molecular and General Genetics
                    24         1269   Virology
                    25         1247   Genes and Development
                    26         1247   Nature Genetics
                    27         1235   Human Molecular Genetics
                    28         1177   Plant Physiology
                    29         1142   The American Journal of Human Genetics
                    30         1132   Journal of Biochemistry
                    31         1129   Oncogene
                    32         1034   Development
                    33          972   Human Mutation
                    34          935   Journal of Immunology
                    35          911   Genetics
                    36          909   Molecular Biology of the Cell
                    37          836   Infection and Immunity
                    38          833   Structure
                    39          810   Journal of General Virology
                    40          779   Archives of Biochemistry and Biophysics
                    41          777   The Plant Cell
                    42          734   Blood
                    43          728   Yeast
                    44          706   Microbiology
                    45          696   Molecular Cell
                    46          645   Developmental Biology
                    47          641   The Plant Journal
                    48          640   Journal of Cell Science
                    49          624   FEMS Microbiology Letters
                    50          618   Cancer Research
                    51          580   Human Genetics
                    52          574   Nature Structural Biology
                    53          565   Current Biology
                    54          553   Mechanisms of Development
                    55          515   Current Genetics
                    56          495   Acta Crystallographica, Section D
                    57          494   Journal of Neuroscience
                    58          490   Applied and Environmental Microbiology
                    59          487   Protein Science
                    60          481   Journal of Clinical Investigation
                    61          472   Neuron
                    62          464   Mammalian Genome
                    63          461   Toxicon
                    64          428   Immunogenetics
                    65          422   The Journal of Experimental Medicine
                    66          418   Molecular Endocrinology
                    67          416   American Journal of Physiology
                    68          411   Molecular and Biochemical Parasitology
                    69          388   Journal of Neurochemistry
                    70          368   Endocrinology
                    71          367   Journal of Molecular Evolution
                    72          359   DNA and Cell Biology
                    73          358   The Journal of Clinical Endocrinology and Metabolism
                    74          351   DNA Sequence
                    75          342   Molecular Biology and Evolution
                    76          328   Bioscience, Biotechnology, and Biochemistry
                    77          324   Journal of Medical Genetics
                    78          308   Proteins
                    79          308   Brain Research. Molecular Brain Research
                    80          287   Biological Chemistry Hoppe-Seyler
                    81          273   Cytogenetics and Cell Genetics
                    82          267   Comparative Biochemistry and Physiology
                    83          266   Peptides
                    84          265   Journal of Investigative Dermatology
                    85          265   Antimicrobial Agents and Chemotherapy
                    86          256   Plant and Cell Physiology
                    87          250   Molecular Pharmacology
                    88          248   Biology of Reproduction
                    89          246   Nature Cell Biology
                    90          246   Experimental Cell Research
                    91          245   Journal of General Microbiology
                    92          234   Genome Research
                    93          221   Virus Research
                    94          218   Neurology
                    95          215   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
                    96          208   Developmental Dynamics
                    97          204   RNA
                    98          201   DNA Research
                    99          197   Molecular Plant-Microbe Interactions
                    100          193   Biochimie
                    101          192   European Journal of Immunology
                    102          184   Annals of Neurology
                    103          183   Tissue Antigens
                    104          182   European Journal of Human Genetics
                    105          181   Planta
                    106          179   Developmental Cell
                    107          173   Journal of Human Genetics
                    108          172   Genes to Cells
                    109          168   Immunity
                    110          166   Molecular and Cellular Endocrinology
                    111          161   Eukaryotic cell
                    112          161   Molecular Phylogenetics and Evolution
                    113          160   Archives of Microbiology
                    114          159   DNA
                    115          158   American Journal of Medical Genetics
                    116          157   The New England Journal of Medicine
                    117          152   Hemoglobin
                    118          150   Insect Biochemistry and Molecular Biology
                    119          148   Bioorganicheskaia Khimiia
                    120          147   Investigative Ophthalmology and Visual Science
                    121          144   Molecular Reproduction and Development
                    122          140   Diabetes
                    123          138   Molecular Immunology
                    124          138   Glycobiology
                    125          135   Animal Genetics
                    126          132   General and Comparative Endocrinology
                    127          128   Molecular and Cellular Neuroscience
                    128          128   International Journal of Cancer
                    129          127   Clinical Genetics
                    130          124   The FASEB Journal
                    131          124   Archives of Virology
                    132          123   EMBO Reports
                    133          119   Agricultural and Biological Chemistry
                    134          119   Molecular Genetics and Metabolism
                    135          115   British Journal of Haematology
                    136          114   Nature Structural and Molecular Biology
                    137          113   Molecular Genetics and Genomics
                    138          112   Journal of Cellular Biochemistry
                    139          111   Journal of Protein Chemistry
                    140          110   The FEBS Journal
                    141          109   Biological Chemistry
                    142          107   Thrombosis and Haemostasis
                    143          107   Journal of Neuroscience Research
                    144          107   Journal of the American Chemical Society
                    145          106   American Journal of Medical Genetics. Part A
                    146          105   Nature Immunology
                    147          105   Neuroscience Letters
                    148          105   Journal of Lipid Research
                    149          104   Journal of Molecular Endocrinology
                    150          103   Protein Expression and Purification
                    
                    
                    5.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ------------------------------------  -------- ---------  ---------
                    
                    References (RL)                       781540                 1.82        
                    Journal                            628701     333076      1.47       1
                    Submitted to EMBL/GenBank/DDBJ     141218     129587      0.33       2
                    Submitted to other databases         9617       8507      0.02       3
                    Book citation                         622        611     <0.01       4
                    Plant Gene Register                   556        544     <0.01       5
                    Thesis389        387     <0.01       6
                    Unpublished observations              288        284     <0.01       7
                    Patent143        141     <0.01       8
                    Worm Breeder's Gazette                  6          6     <0.01       9
                    
                    Total number of distinct authors cited in UniProtKB/Swiss-Prot: 271220
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ------------------------------------  -------- ---------  ---------  ----
                    Comments (CC)                        1789693                 4.18        
                    ALLERGEN                              452        452     <0.01      26
                    ALTERNATIVE PRODUCTS                17720      17720      0.04      12
                    BIOPHYSICOCHEMICAL PROPERTIES        2500       2500      0.01      22
                    BIOTECHNOLOGY                         241        239     <0.01      28
                    CATALYTIC ACTIVITY                 175352     160149      0.41       4
                    CAUTION                              6045       5925      0.01      19
                    COFACTOR                            75945      69750      0.18       7
                    DEVELOPMENTAL STAGE                  7930       7930      0.02      16
                    DISEASE                              4495       3090      0.01      20
                    DISRUPTION PHENOTYPE                 1609       1609     <0.01      23
                    DOMAIN                              26533      23452      0.06      11
                    ENZYME REGULATION                    6664       6664      0.02      18
                    FUNCTION                           310429     299159      0.72       2
                    INDUCTION                            9805       9805      0.02      15
                    INTERACTION                         11265      11265      0.03      14
                    MASS SPECTROMETRY                    3883       2946      0.01      21
                    MISCELLANEOUS                       27141      24903      0.06      10
                    PATHWAY                             98123      89621      0.23       6
                    PHARMACEUTICAL                         80         80     <0.01      29
                    POLYMORPHISM                          735        706     <0.01      24
                    PTM31005      25377      0.07       8
                    RNA EDITING                           560        560     <0.01      25
                    SEQUENCE CAUTION                    11577      11577      0.03      13
                    SIMILARITY                         497292     404571      1.16       1
                    SUBCELLULAR LOCATION               249793     245311      0.58       3
                    SUBUNIT                            173827     173827      0.41       5
                    TISSUE SPECIFICITY                  30654      30654      0.07       9
                    TOXIC DOSE                            392        384     <0.01      27
                    WEB RESOURCE                         7646       6129      0.02      17
                    
                    Total number of comment topics: 29
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank
                    ------------------------------------  -------- ---------  ---------  ----
                    Features (FT)                        2723263                 6.35        
                    ACT_SITE                           104207      62057      0.24      11
                    BINDING                            155163      49238      0.36       4
                    CA_BIND                              3566       1449      0.01      35
                    CARBOHYD                            89711      23184      0.21      12
                    CHAIN                              434909     424497      1.01       1
                    COILED                              16267      10833      0.04      26
                    COMPBIAS                            43223      23345      0.10      18
                    CONFLICT                           111609      38964      0.26       9
                    CROSSLNK                             4122       2734      0.01      34
                    DISULFID                            88560      23114      0.21      13
                    DNA_BIND                             9381       8704      0.02      31
                    DOMAIN                             126447      73205      0.29       6
                    HELIX                              112953      11607      0.26       8
                    INIT_MET                            12879      12879      0.03      27
                    LIPID9803       6321      0.02      29
                    METAL                              208340      52300      0.49       3
                    MOD_RES                            129042      42164      0.30       5
                    MOTIF                               28332      18294      0.07      22
                    MUTAGEN                             26290       6325      0.06      25
                    NON_CONS                             1569        627     <0.01      36
                    NON_STD                               340        266     <0.01      38
                    NON_TER                             11304       8588      0.03      28
                    NP_BIND                             82019      55056      0.19      14
                    PEPTIDE                              7852       4848      0.02      32
                    PROPEP                               9800       8166      0.02      30
                    REGION                              69832      39220      0.16      17
                    REPEAT                              81125      11967      0.19      15
                    SIGNAL                              30875      30865      0.07      20
                    SITE29264      17057      0.07      21
                    STRAND                             116537      10976      0.27       7
                    TOPO_DOM                           107079      21833      0.25      10
                    TRANSIT                              5932       5846      0.01      33
                    TRANSMEM                           292168      59681      0.68       2
                    TURN27890       9332      0.07      23
                    UNSURE946        300     <0.01      37
                    VAR_SEQ                             37208      15859      0.09      19
                    VARIANT                             70313      15203      0.16      16
                    ZN_FING                             26406      11048      0.06      24
                    
                    Total number of feature keys: 38
                    
                    
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry  Rank      Category
                    ------------------------------------  -------- ---------  ---------  ----      -------------------------------------------
                    Cross-references (DR)                8910885                20.79                          
                    2DBase-Ecoli                           84         84     <0.01     102      2D gel databases                             
                    Aarhus/Ghent-2DPAGE                   126         96     <0.01      99      2D gel databases                             
                    AGD  790        784     <0.01      77      Organism-specific databases                  
                    ANU-2DPAGE                             23         23     <0.01     109      2D gel databases                             
                    ArrayExpress                        54151      54151      0.13      30      Gene expression databases                    
                    Bgee35505      35495      0.08      34      Gene expression databases                    
                    BindingDB                             297        297     <0.01      92      Other       
                    BioCyc                             157015     148907      0.37      14      Enzyme and pathway databases                 
                    BRENDA                              65123      62330      0.15      26      Enzyme and pathway databases                 
                    BuruList                              296        296     <0.01      93      Organism-specific databases                  
                    CGD  514        512     <0.01      82      Organism-specific databases                  
                    CleanEx                             30264      29611      0.07      37      Gene expression databases                    
                    COMPLUYEAST-2DPAGE                     59         59     <0.01     104      2D gel databases                             
                    Cornea-2DPAGE                          67         67     <0.01     103      2D gel databases                             
                    CYGD6628       6522      0.02      52      Organism-specific databases                  
                    dictyBase                            3667       3557      0.01      65      Organism-specific databases                  
                    DIP 9016       8966      0.02      47      Protein-protein interaction databases        
                    DisProt                               397        394     <0.01      86      3D structure databases                       
                    DOSAC-COBS-2DPAGE                     150        150     <0.01      98      2D gel databases                             
                    DrugBank                             5316       1625      0.01      54      Other       
                    EchoBASE                             4159       4124      0.01      61      Organism-specific databases                  
                    ECO2DBASE                             351        299     <0.01      90      2D gel databases                             
                    EcoGene                              4331       4328      0.01      60      Organism-specific databases                  
                    EMBL                               733511     419465      1.71       3      Sequence databases                           
                    Ensembl                             68473      66943      0.16      25      Genome annotation databases                  
                    euHCVdb55         44     <0.01     105      Organism-specific databases                  
                    FlyBase                              4415       4043      0.01      59      Organism-specific databases                  
                    Gene3D                             194637     161088      0.45      13      Family and domain databases                  
                    GeneCards                           21183      19899      0.05      38      Organism-specific databases                  
                    GeneDB_Spombe                        4793       4749      0.01      56      Organism-specific databases                  
                    GeneFarm                             2504       2483      0.01      70      Organism-specific databases                  
                    GeneID                             381309     363101      0.89       7      Genome annotation databases                  
                    GenomeReviews                      284894     266392      0.66       9      Genome annotation databases                  
                    GermOnline                          41962      41352      0.10      33      Gene expression databases                    
                    GlycoSuiteDB                          280        280     <0.01      94      PTM databases
                    GO1730543     399299      4.04       1      Ontologies  
                    Gramene                              3990       3990      0.01      62      Organism-specific databases                  
                    H-InvDB                             11259       9565      0.03      46      Organism-specific databases                  
                    HAMAP                              232695     232581      0.54      10      Family and domain databases                  
                    HGNC19216      19059      0.04      40      Organism-specific databases                  
                    HOGENOM                            204967     204967      0.48      12      Phylogenomic databases                       
                    HOVERGEN                            76378      76378      0.18      24      Phylogenomic databases                       
                    HPA 6200       4994      0.01      53      Organism-specific databases                  
                    HSC-2DPAGE                             85         85     <0.01     101      2D gel databases                             
                    HSSP84683      84683      0.20      23      3D structure databases                       
                    IntAct                              20253      20251      0.05      39      Protein-protein interaction databases        
                    InterPro                          1083300     399424      2.53       2      Family and domain databases                  
                    IPI85696      61732      0.20      22      Sequence databases                           
                    KEGG                               355038     334366      0.83       8      Genome annotation databases                  
                    LegioList                             725        723     <0.01      78      Organism-specific databases                  
                    Leproma                               655        652     <0.01      81      Organism-specific databases                  
                    LinkHub                             18287      18287      0.04      41      Other       
                    ListiList                            1159       1151     <0.01      75      Organism-specific databases                  
                    MaizeGDB                              469        464     <0.01      84      Organism-specific databases                  
                    MEROPS                               7866       7604      0.02      49      Protein family/group databases               
                    MGI15977      15927      0.04      43      Organism-specific databases                  
                    MIM15492      12279      0.04      45      Organism-specific databases                  
                    MypuList                              201        201     <0.01      97      Organism-specific databases                  
                    NextBio                             48267      48265      0.11      32      Other       
                    NMPDR                              122888     122860      0.29      16      Genome annotation databases                  
                    OGP  378        378     <0.01      88      2D gel databases                             
                    Orphanet                             3382       1995      0.01      67      Organism-specific databases                  
                    PANTHER                            155906     143918      0.36      15      Family and domain databases                  
                    Pathway_Interaction_DB               4568       1665      0.01      58      Enzyme and pathway databases                 
                    PDB56257      13928      0.13      28      3D structure databases                       
                    PDBsum                              56248      13927      0.13      29      3D structure databases                       
                    PeptideAtlas                         5167       5167      0.01      55      Proteomic databases                          
                    PeroxiBase                            662        646     <0.01      80      Protein family/group databases               
                    Pfam                               559239     391183      1.30       4      Family and domain databases                  
                    PharmGKB                            15843      15831      0.04      44      Organism-specific databases                  
                    PHCI-2DPAGE                           245        245     <0.01      96      2D gel databases                             
                    PhosphoSite                         16726      16726      0.04      42      PTM databases
                    PhosSite                              266        266     <0.01      95      PTM databases
                    PhotoList                             716        716     <0.01      79      Organism-specific databases                  
                    PIR113036     103221      0.26      19      Sequence databases                           
                    PIRSF                               64427      64427      0.15      27      Family and domain databases                  
                    PMMA-2DPAGE                            52         52     <0.01     106      2D gel databases                             
                    PptaseDB                               34         34     <0.01     107      Protein family/group databases               
                    PRIDE                               33839      33839      0.08      35      Proteomic databases                          
                    PRINTS                             114793      98243      0.27      18      Family and domain databases                  
                    ProDom                             114838     111821      0.27      17      Family and domain databases                  
                    ProMEX431        431     <0.01      85      Proteomic databases                          
                    PROSITE                            385455     243280      0.90       6      Family and domain databases                  
                    PseudoCAP                            1199       1190     <0.01      73      Organism-specific databases                  
                    Rat-heart-2DPAGE                       28         28     <0.01     108      2D gel databases                             
                    Reactome                             4620       2749      0.01      57      Enzyme and pathway databases                 
                    REBASE354        345     <0.01      89      Protein family/group databases               
                    RefSeq                             396625     363318      0.93       5      Sequence databases                           
                    REPRODUCTION-2DPAGE                  1030        942     <0.01      76      2D gel databases                             
                    RGD 7194       7189      0.02      50      Organism-specific databases                  
                    SagaList                              381        380     <0.01      87      Organism-specific databases                  
                    SGD 6640       6537      0.02      51      Organism-specific databases                  
                    Siena-2DPAGE                          102        102     <0.01     100      2D gel databases                             
                    SMART                              111692      85039      0.26      20      Family and domain databases                  
                    SMR50798      50798      0.12      31      3D structure databases                       
                    SubtiList                            3537       3535      0.01      66      Organism-specific databases                  
                    SWISS-2DPAGE                         1182       1182     <0.01      74      2D gel databases                             
                    TAIR7957       7843      0.02      48      Organism-specific databases                  
                    TCDB3095       3060      0.01      69      Protein family/group databases               
                    TIGR32672      31933      0.08      36      Genome annotation databases                  
                    TIGRFAMs                           215422     200843      0.50      11      Family and domain databases                  
                    TubercuList                          1490       1454     <0.01      72      Organism-specific databases                  
                    UniGene                             85716      78769      0.20      21      Sequence databases                           
                    VectorBase                            305        296     <0.01      91      Genome annotation databases                  
                    World-2DPAGE                          501        501     <0.01      83      2D gel databases                             
                    WormBase                             3670       3585      0.01      64      Organism-specific databases                  
                    WormPep                              3933       3209      0.01      63      Organism-specific databases                  
                    Xenbase                              3227       3160      0.01      68      Organism-specific databases                  
                    ZFIN2373       2357      0.01      71      Organism-specific databases                  
                    
                    Total number of cross-referenced databases: 109
                    
                    6.  AMINO ACID COMPOSITION
                    
                    6.1  Composition in percent for the complete database
                    
                    Ala (A) 8.17   Gln (Q) 3.95   Leu (L) 9.67   Ser (S) 6.62
                    Arg (R) 5.50   Glu (E) 6.74   Lys (K) 5.87   Thr (T) 5.34
                    Asn (N) 4.07   Gly (G) 7.04   Met (M) 2.41   Trp (W) 1.09
                    Asp (D) 5.42   His (H) 2.28   Phe (F) 3.88   Tyr (Y) 2.93
                    Cys (C) 1.40   Ile (I) 5.94   Pro (P) 4.74   Val (V) 6.82
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00
                    
                    
                    6.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
                    Phe, Tyr, Met, His, Cys, Trp
                    
                    
                    7.  MISCELLANEOUS STATISTICS
                    
                    4433 entries are encoded on a mitochondrion, and 3492 are encoded on a plasmid.
                    
                    11919 entries are encoded on a plastid, 
                    of which 20 are encoded on apicoplasts, 
                    11406 on chloroplasts, 
                    39 on chromatophores,
                    145 on cyanelles, 
                    149 on non-photosynthetic plastids and 
                    199 on unspecified types of plastid.
                    
                    Number of entries with at least one sequence correction: 64801
                    
                    
                    
                

UniProtKB/TrEMBL protein database release 40.0 statistics

                    
                    1.  INTRODUCTION
                    
                    Release 40.0 of 24-Mar-2009 of UniProtKB/TrEMBL contains 7'753'442 sequence entries
                    comprising 2'459'135'421 amino acids.
                    
                    1'700'878 sequences have been added since release 39, the sequence data of
                    24'829 existing entries has been updated and the annotations of
                    4'218'268 entries have been revised. This represents an increase of 31%.
                    
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 8.54   Gln (Q) 3.93   Leu (L) 9.83   Ser (S) 6.84
                    Arg (R) 5.54   Glu (E) 6.09   Lys (K) 5.22   Thr (T) 5.60
                    Asn (N) 4.17   Gly (G) 7.05   Met (M) 2.42   Trp (W) 1.33
                    Asp (D) 5.26   His (H) 2.22   Phe (F) 4.02   Tyr (Y) 3.02
                    Cys (C) 1.36   Ile (I) 5.89   Pro (P) 4.84   Val (V) 6.65
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.07
                    
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 193405
                    
                    The first twenty species represent 1076730 sequences:  14.3 % of the
                    total number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:87309
                    2x:34896
                    3x:18055
                    4x:10752
                    5x: 6919
                    6x: 4612
                    7x: 3548
                    8x: 2634
                    9x: 2158
                    10x: 2513
                    11- 20x:11482
                    21- 50x: 4206
                    51-100x: 1591
                    >100x: 2730
                    
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     262485  Human immunodeficiency virus 1
                    2      95008  Oryza sativa subsp. japonica (Rice)
                    3      67189  Homo sapiens (Human)
                    4      54387  Vitis vinifera (Grape)
                    5      50193  Branchiostoma floridae (Florida lancelet) (Amphioxus)
                    6      50188  Trichomonas vaginalis G3
                    7      46361  Hepatitis C virus
                    8      44805  Mus musculus (Mouse)
                    9      43980  Populus trichocarpa (Western balsam poplar) 
                    10      43557  Arabidopsis thaliana (Mouse-ear cress)
                    11      39850  Paramecium tetraurelia
                    12      38756  Oryza sativa subsp. indica (Rice)
                    13      34771  Physcomitrella patens subsp. patens
                    14      33127  uncultured bacterium
                    15      31220  Ricinus communis (Castor bean)
                    16      30108  Zea mays (Maize)
                    17      29407  Drosophila melanogaster (Fruit fly)
                    18      28078  Tetraodon nigroviridis (Green puffer)
                    19      26658  Hepatitis B virus (HBV)
                    20      26602  Danio rerio (Zebrafish) (Brachydanio rerio)
                    21      24830  Nematostella vectensis (Starlet sea anemone)
                    22      21418  Caenorhabditis briggsae
                    23      21089  Ixodes scapularis (Black-legged tick) (Deer tick)
                    24      20639  Caenorhabditis elegans
                    25      20525  Trypanosoma cruzi
                    26      18820  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
                    27      17880  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
                    28      17513  Drosophila simulans (Fruit fly)
                    29      16989  Drosophila yakuba (Fruit fly)
                    30      16785  Drosophila persimilis (Fruit fly)
                    31      16779  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
                    32      16685  Tetrahymena thermophila SB210
                    33      16281  Drosophila sechellia (Fruit fly)
                    34      16281  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
                    35      16064  Drosophila pseudoobscura pseudoobscura (Fruit fly)
                    36      15883  Phaeosphaeria nodorum (Septoria nodorum)
                    37      15513  Drosophila willistoni (Fruit fly)
                    38      15064  Drosophila ananassae (Fruit fly)
                    39      15040  Drosophila erecta (Fruit fly)
                    40      14781  Drosophila mojavensis (Fruit fly)
                    41      14756  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
                    42      14736  Drosophila virilis (Fruit fly)
                    43      14724  Chlamydomonas reinhardtii
                    44      14675  Plasmodium chabaudi
                    45      14301  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
                    46      14296  Anopheles gambiae (African malaria mosquito)
                    47      13747  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
                    48      13489  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
                    49      13487  Aspergillus flavus NRRL3357
                    50      12996  Talaromyces stipitatus ATCC 10500
                    51      12810  Xenopus laevis (African clawed frog)
                    52      12772  Penicillium chrysogenum Wisconsin 54-1255
                    53      12737  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
                    54      12057  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
                    55      11927  Aspergillus oryzae
                    56      11793  Plasmodium berghei
                    57      11612  Thalassiosira pseudonana CCMP1335
                    58      11574  Trichoplax adhaerens
                    59      11562  Brugia malayi (Filarial nematode worm)
                    60      11045  Escherichia coli
                    61      10926  Hepatitis C virus subtype 1b
                    62      10892  Chaetomium globosum (Soil fungus)
                    63      10709  Podospora anserina
                    64      10559  Ralstonia solanacearum (Pseudomonas solanacearum)
                    65      10467  Dictyostelium discoideum (Slime mold)
                    66      10427  Neurospora crassa
                    67      10422  Penicillium marneffei ATCC 18224
                    68      10336  Phaeodactylum tricornutum CCAP 1055/1
                    69      10294  Coccidioides immitis
                    70      10288  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    71      10238  Aspergillus terreus (strain NIH 2624)
                    72      10230  Neosartorya fischeri  (Aspergillus fischerianus 
                    73       9892  Schistosoma japonicum (Blood fluke)
                    74       9878  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
                    75       9813  Bos taurus (Bovine)
                    76       9669  Cryptococcus neoformans (Filobasidiella neoformans)
                    77       9665  Aspergillus fumigatus (Sartorya fumigata)
                    78       9471  Trypanosoma brucei
                    79       9416  Emericella nidulans (Aspergillus nidulans)
                    80       9258  Monosiga brevicollis (Choanoflagellate)
                    81       9192  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
                    82       9190  Candida albicans (Yeast)
                    83       9166  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
                    84       9090  Rattus norvegicus (Rat)
                    85       8983  Postia placenta Mad-698-R
                    86       8954  Aspergillus clavatus
                    87       8932  Porcine reproductive and respiratory syndrome virus (PRRSV)
                    88       8915  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    89       8884  Helicobacter pylori (Campylobacter pylori)
                    90       8809  Rhodococcus sp. (strain RHA1)
                    91       8731  Escherichia coli (strain 55989 / EAEC)
                    92       8607  Entamoeba dispar SAW760
                    93       8523  Stigmatella aurantiaca DW4/3-1
                    94       8437  Plesiocystis pacifica SIR-1
                    95       8275  Plasmodium falciparum
                    96       8253  Streptomyces sviceus ATCC 29083
                    97       8249  Microscilla marina ATCC 23134
                    98       8201  Microcoleus chthonoplastes PCC 7420
                    99       8180  Burkholderia xenovorans (strain LB400)
                    100       8129  Bradyrhizobium japonicum
                    
                    
                    
                    3.3  Taxonomic distribution of the sequences
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea          137049 (  2%)
                    Bacteria        4165881 ( 55%)
                    Eukaryota       2492847 ( 33%)
                    Viruses          733065 ( 10%)
                    Other              8599 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  67203 (  3%)           (  1%)
                    Other Mammalia        144489 (  6%)           (  2%)
                    Other Vertebrata      247964 ( 10%)           (  3%)
                    Viridiplantae         608401 ( 24%)           (  8%)
                    Fungi                 448052 ( 18%)           (  6%)
                    Insecta               366292 ( 15%)           (  5%)
                    Nematoda               59871 (  2%)           (  1%)
                    Other                 550575 ( 22%)           (  7%)
                    
                    
                    
                    4.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50  148439             1001-1100    47501
                    51- 100  564448             1101-1200    33699
                    101- 150  664698             1201-1300    23447
                    151- 200  640557             1301-1400    16104
                    201- 250  639142             1401-1500    12723
                    251- 300  614727             1501-1600     9280
                    301- 350  566716             1601-1700     7128
                    351- 400  445777             1701-1800     5819
                    401- 450  370905             1801-1900     4539
                    451- 500  311596             1901-2000     3897
                    501- 550  222942             2001-2100     3109
                    551- 600  167193             2101-2200     3213
                    601- 650  123928             2201-2300     2449
                    651- 700   97281             2301-2400     1977
                    701- 750   83493             2401-2500     1658
                    751- 800   74339             >2500        15038
                    801- 850   56358
                    851- 900   50046
                    901- 950   35680
                    951-1000   28236
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   326 amino acids.
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    5.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    9885975              1.31
                    Submitted to EMBL/GenBank/DDBJ  5435937   4559102    0.72
                    Journal                         4342038   3784310    0.58
                    Thesis                             7110      7054   <0.01
                    Submitted to other databases       4628      4620   <0.01
                    Book citation                      4517      4471   <0.01
                    Other                             91745     90595    0.01
                    
                    Comments (CC)                      5086365              0.67
                    SIMILARITY                      1529419   1283560    0.20
                    CAUTION                         1522553   1522553    0.20
                    FUNCTION                         550977    487269    0.07
                    CATALYTIC ACTIVITY               520201    431091    0.07
                    SUBCELLULAR LOCATION             451790    420050    0.06
                    SUBUNIT                          244362    219337    0.03
                    COFACTOR                         159132    146980    0.02
                    PATHWAY                           98881     95980    0.01
                    MISCELLANEOUS                      5827      5827   <0.01
                    INTERACTION                        2627      2627   <0.01
                    DOMAIN                              596       596   <0.01
                    
                    Features (FT)                      2925361              0.39
                    NON_TER                         2416053   1437324    0.32
                    CHAIN                            314678    246799    0.04
                    SIGNAL                           194054    194054    0.03
                    TRANSIT                             576       576   <0.01
                    
                    Cross-references (DR)             71009542              9.42
                    GO                             13450712   4329612    1.78
                    InterPro                       12179422   5288931    1.62
                    EMBL                            8471780   7530279    1.12
                    Pfam                            6773292   5019113    0.90
                    PROSITE                         3705440   2401590    0.49
                    RefSeq                          3705117   3565145    0.49
                    GeneID                          3690642   3558116    0.49
                    KEGG                            2964305   2877187    0.39
                    Gene3D                          2221743   1897018    0.29
                    GenomeReviews                   2190231   2128466    0.29
                    SMART                           1291968   1012465    0.17
                    TIGRFAMs                        1203718   1100600    0.16
                    PANTHER                         1141465   1081325    0.15
                    PRINTS                          1132757    986972    0.15
                    HOGENOM                         1046657   1046653    0.14
                    NMPDR                            941154    941143    0.12
                    BioCyc                           833412    811420    0.11
                    ProDom                           669141    638944    0.09
                    SMR                              490641    490505    0.07
                    UniGene                          360267    332031    0.05
                    PIRSF                            337379    337379    0.04
                    HOVERGEN                         309523    309327    0.04
                    HSSP                             259517    259229    0.03
                    TIGR                             197613    190359    0.03
                    FlyBase                          194222    192697    0.03
                    IPI                              193089    193089    0.03
                    PIR                              179433    146432    0.02
                    Ensembl                          150065    144403    0.02
                    ArrayExpress                      95180     95144    0.01
                    Bgee                              80715     80670    0.01
                    Gramene                           69538     69538    0.01
                    PRIDE                             60147     60147    0.01
                    euHCVdb                           55083     55082    0.01
                    NextBio                           53147     53147    0.01
                    MGI                               39407     39130    0.01
                    VectorBase                        28981     28654   <0.01
                    HGNC                              27771     27735   <0.01
                    MEROPS                            25649     24990   <0.01
                    ZFIN                              19621     19615   <0.01
                    WormPep                           18815     18712   <0.01
                    WormBase                          18806     18712   <0.01
                    TAIR                              18615     18566   <0.01
                    IntAct                            12617     12617   <0.01
                    LinkHub                           11554     11554   <0.01
                    Xenbase                           10331     10045   <0.01
                    dictyBase                          9048      9047   <0.01
                    CGD6852      6852   <0.01
                    PDBsum                             5675      3203   <0.01
                    PDB5675      3203   <0.01
                    LegioList                          5178      5150   <0.01
                    ListiList                          4656      4639   <0.01
                    PseudoCAP                          4369      4366   <0.01
                    PhotoList                          3964      3840   <0.01
                    BuruList                           3944      3910   <0.01
                    AGD3904      3904   <0.01
                    RGD3684      3678   <0.01
                    REBASE                             3674      3650   <0.01
                    BRENDA                             2972      2902   <0.01
                    TubercuList                        2500      2494   <0.01
                    DIP2229      2224   <0.01
                    PeroxiBase                         2093      2088   <0.01
                    TCDB                               1977      1958   <0.01
                    SagaList                           1713      1619   <0.01
                    PhosphoSite                        1250      1250   <0.01
                    Leproma                             952       951   <0.01
                    MypuList                            581       577   <0.01
                    ProMEX                              473       473   <0.01
                    World-2DPAGE                        412       412   <0.01
                    SGD317       317   <0.01
                    GeneDB_Spombe                       206       202   <0.01
                    PeptideAtlas                        165       165   <0.01
                    PHCI-2DPAGE                         102       102   <0.01
                    PharmGKB                             89        89   <0.01
                    Reactome                             68        64   <0.01
                    ANU-2DPAGE                           58        58   <0.01
                    SWISS-2DPAGE                         29        29   <0.01
                    Pathway_Interaction_DB               16        13   <0.01
                    CYGD16        16   <0.01
                    REPRODUCTION-2DPAGE                  13        13   <0.01
                    PMMA-2DPAGE                           3         3   <0.01
                    Siena-2DPAGE                          2         2   <0.01
                    COMPLUYEAST-2DPAGE                    1         1   <0.01
                    
                    Number of explicitly cross-referenced databases: 110
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 271939
                    
                    Total number of entries encoded on a Mitochondrion: 246125
                    Total number of entries encoded on a Plasmid: 121183
                    Total number of entries encoded on a Plastid: 7064
                    Total number of entries encoded on a Plastid; Apicoplast: 316
                    Total number of entries encoded on a Plastid; Chloroplast: 85864
                    Total number of entries encoded on a Plastid; Cyanelle: 7
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 419
                    
                    Number of fragments: 1439360
                    
                

Submissions and Updates

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.

Submit new sequence data, updates and corrections at http://www.uniprot.org/help/submissions.

For all queries regarding submissions to UniProtKB, please contact: datasubs@ebi.ac.uk


Download information

Minor releases (every 3 weeks)

The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/downloads. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic

Major releases

For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on DVD from the EBI.


Contact

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address: help@uniprot.org
WWW server: http://www.ebi.ac.uk/


SIB Swiss Institute of Bioinformatics
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4
Switzerland

Telephone: (+41 22) 379 50 50
Fax: (+41 22) 379 58 58
Electronic mail address: help@uniprot.org
WWW server: http://www.expasy.org/


Protein Information Resource (PIR)
Georgetown University Medical Center
3300 Whitehaven St., Suite 1200
Washington, DC 20008
United States of America

Telephone: (+1 202) 687 1039
Fax: (+1 202) 687 0057)
Electronic mail address: help@uniprot.org
WWW server: http://pir.georgetown.edu

Citation

If you want to cite UniProt in a publication, please use the following reference:

The UniProt Consortium
"The Universal Protein Resource (UniProt) 2009"
Nucleic Acids Res. 37:D169-D174(2009) 10.1093/nar/gkn664