Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Release 3.0 of the UniProt Knowledgebase is composed of the UniProt/Swiss-Prot Protein Knowledgebase release 45.0 and the UniProt/TrEMBL Protein Database release 28.0.

More information on these databases can be found in the user manual What is the UniProt Knowledgebase?.


UniProt/Swiss-Prot protein Knowledgebase release 45.0 statistics

Release 45.0 of 25-Oct-2004 of Swiss-Prot contains 163'235 sequence entries, comprising 59'631'787 amino acids abstracted from 120'520 references.

The growth of the database is summarized below.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154
44.0 07/04 153'871 56'608'159
45.0 10/04 163'235 59'631'787

In rare cases, Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProt/Swiss-Prot, but which have now been deleted from the database.


Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

  • be as complete as possible. All sequences available at a given time should be immediately included in UniProt/Swiss-Prot. This also includes sequence corrections and updates;
  • provide a higher level of annotation;
  • provide cross-references to specialized database(s) that contain, among other data, some information about the genes that code for these proteins;
  • provide specific indexes and documents.

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana None yet arath.txt 2'981
C.albicans None yet calbican.txt 305
C.elegans Wormpep celegans.txt 2'543
D.discoideum DictyBase dicty.txt 323
D.melanogaster FlyBase fly.txt 2'118
M.musculus MGD mgdtosp.txt 8'368
S.cerevisiae SGD yeast.txt 4'992
S.pombe GeneDB_SPombe pombe.txt 2'672
UniProt/Swiss-Prot release statistics
                    
                    1.  INTRODUCTION
                    
                    Release 45.0 of 25-Oct-2004 of UniProt/Swiss-Prot contains 163235 sequence 
                    entries, comprising 59631787 amino acids abstracted from 120520 references. 
                    
                    6183 sequences have been added since release 44, the sequence data of
                    2851 existing entries has been updated and the annotations of
                    71220 entries have been revised. This represents an increase of 4%.
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 7.82   Gln (Q) 3.94   Leu (L) 9.62   Ser (S) 6.87
                    Arg (R) 5.32   Glu (E) 6.60   Lys (K) 5.93   Thr (T) 5.46
                    Asn (N) 4.20   Gly (G) 6.94   Met (M) 2.37   Trp (W) 1.16
                    Asp (D) 5.30   His (H) 2.27   Phe (F) 4.01   Tyr (Y) 3.07
                    Cys (C) 1.56   Ile (I) 5.90   Pro (P) 4.85   Val (V) 6.71
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.01
                    
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Thr, Arg, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of 
                    UniProt/Swiss-Prot: 8703
                    
                    The first twenty species represent 61239 sequences:  37.5 % of the total
                    number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x: 4130
                    2x: 1366
                    3x:  690
                    4x:  455
                    5x:  282
                    6x:  261
                    7x:  196
                    8x:  151
                    9x:  132
                    10x:   84
                    11- 20x:  364
                    21- 50x:  276
                    51-100x:   97
                    >100x:  219
                    
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1      11539  Homo sapiens (Human)
                    2       8368  Mus musculus (Mouse)
                    3       4992  Saccharomyces cerevisiae (Baker's yeast)
                    4       4838  Escherichia coli
                    5       3976  Rattus norvegicus (Rat)
                    6       2981  Arabidopsis thaliana (Mouse-ear cress)
                    7       2750  Bacillus subtilis
                    8       2672  Schizosaccharomyces pombe (Fission yeast)
                    9       2543  Caenorhabditis elegans
                    10       2118  Drosophila melanogaster (Fruit fly)
                    11       1782  Methanococcus jannaschii
                    12       1773  Haemophilus influenzae
                    13       1690  Escherichia coli O157:H7
                    14       1506  Bos taurus (Bovine)
                    15       1454  Salmonella typhimurium
                    16       1399  Mycobacterium tuberculosis
                    17       1344  Escherichia coli O6
                    18       1307  Shigella flexneri
                    19       1114  Gallus gallus (Chicken)
                    20       1093  Mycobacterium bovis
                    21       1036  Salmonella typhi
                    22       1004  Pseudomonas aeruginosa
                    23        957  Synechocystis sp. (strain PCC 6803)
                    24        951  Archaeoglobus fulgidus
                    25        904  Sus scrofa (Pig)
                    26        900  Xenopus laevis (African clawed frog)
                    27        803  Rhizobium meliloti (Sinorhizobium meliloti)
                    28        784  Vibrio cholerae
                    29        753  Yersinia pestis
                    30        744  Aquifex aeolicus
                    31        742  Oryctolagus cuniculus (Rabbit)
                    32        687  Mycoplasma pneumoniae
                    33        676  Pasteurella multocida
                    34        619  Streptomyces coelicolor
                    35        618  Vibrio parahaemolyticus
                    36        609  Mycobacterium leprae
                    37        608  Bacillus halodurans
                    38        606  Treponema pallidum
                    39        572  Buchnera aphidicola (subsp. Acyrthosiphon pisum) 
                    40        571  Methanobacterium thermoautotrophicum
                    41        571  Vibrio vulnificus
                    42        566  Anabaena sp. (strain PCC 7120)
                    43        561  Buchnera aphidicola (subsp. Schizaphis graminum)
                    44        560  Helicobacter pylori (Campylobacter pylori)
                    45        546  Rickettsia prowazekii
                    46        541  Helicobacter pylori J99 (Campylobacter pylori J99)
                    47        536  Staphylococcus aureus (strain Mu50 / ATCC 700699)
                    48        534  Staphylococcus aureus (strain N315)
                    49        517  Staphylococcus aureus (strain MW2)
                    50        511  Lactococcus lactis (subsp. lactis) (Streptococcus lactis)
                    51        508  Zea mays (Maize)
                    52        507  Pseudomonas putida (strain KT2440)
                    53        507  Buchnera aphidicola (subsp. Baizongia pistaciae)
                    54        500  Pseudomonas syringae (pv. tomato)
                    55        496  Ralstonia solanacearum (Pseudomonas solanacearum)
                    56        491  Listeria monocytogenes
                    57        489  Staphylococcus epidermidis
                    58        488  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    59        486  Mycoplasma genitalium
                    60        486  Listeria innocua
                    61        481  Rhizobium loti (Mesorhizobium loti)
                    62        477  Xanthomonas campestris (pv. campestris)
                    63        475  Neisseria meningitidis (serogroup B)
                    64        473  Neisseria meningitidis (serogroup A)
                    65        465  Clostridium acetobutylicum
                    66        461  Caulobacter crescentus
                    67        460  Bradyrhizobium japonicum
                    68        457  Thermotoga maritima
                    69        456  Bacillus anthracis
                    70        445  Canis familiaris (Dog)
                    71        439  Xanthomonas axonopodis (pv. citri)
                    72        436  Xylella fastidiosa
                    73        431  Streptococcus pneumoniae
                    74        430  Deinococcus radiodurans
                    75        430  Oryza sativa (Rice)
                    76        424  Pyrococcus horikoshii
                    77        424  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
                    78        421  Chlamydia trachomatis
                    79        420  Pyrococcus abyssi
                    80        417  Borrelia burgdorferi (Lyme disease spirochete)
                    81        411  Shewanella oneidensis
                    82        409  Chlamydia pneumoniae (Chlamydophila pneumoniae)
                    83        408  Brucella melitensis
                    84        407  Brucella suis
                    85        405  Clostridium perfringens
                    86        403  Rhizobium sp. (strain NGR234)
                    87        399  Methanosarcina acetivorans
                    88        399  Chlamydia muridarum
                    89        395  Corynebacterium glutamicum (Brevibacterium flavum)
                    90        389  Halobacterium sp. (strain NRC-1 / ATCC 700922 / JCM 11081)
                    91        386  Bacillus cereus (strain ATCC 14579 / DSM 31)
                    92        386  Methanosarcina mazei (Methanosarcina frisia)
                    93        380  Pyrococcus furiosus
                    94        378  Campylobacter jejuni
                    95        375  Sulfolobus solfataricus
                    96        371  Thermoanaerobacter tengcongensis
                    97        368  Oceanobacillus iheyensis
                    98        365  Neurospora crassa
                    99        364  Lactobacillus plantarum
                    100        361  Streptococcus pyogenes
                    101        361  Nicotiana tabacum (Common tobacco)
                    102        360  Ovis aries (Sheep)
                    103        359  Rickettsia conorii
                    104        353  Vibrio vulnificus (strain YJ016)
                    105        350  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
                    106        349  Photorhabdus luminescens (subsp. laumondii)
                    107        347  Synechococcus elongatus (Thermosynechococcus elongatus)
                    108        340  Brachydanio rerio (Zebrafish) (Danio rerio)
                    109        337  Streptococcus mutans
                    110        332  Aeropyrum pernix
                    111        329  Chlorobium tepidum
                    112        323  Dictyostelium discoideum (Slime mold)
                    113        317  Streptococcus pyogenes (serotype M18)
                    114        312  Streptococcus pyogenes (serotype M3)
                    115        312  Staphylococcus aureus
                    116        309  Methanopyrus kandleri
                    117        305  Candida albicans (Yeast)
                    118        302  Pisum sativum (Garden pea)
                    119        301  Sulfolobus tokodaii
                    120        299  Enterococcus faecalis (Streptococcus faecalis)
                    121        287  Thermoplasma acidophilum
                    122        282  Corynebacterium efficiens
                    123        282  Triticum aestivum (Wheat)
                    124        280  Bordetella pertussis
                    125        278  Haemophilus ducreyi
                    126        277  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
                    127        270  Hordeum vulgare (Barley)
                    128        269  Streptomyces avermitilis
                    129        268  Fusobacterium nucleatum (subsp. nucleatum)
                    130        268  Bacteriophage T4
                    131        266  Bordetella parapertussis
                    132        263  Chromobacterium violaceum
                    133        263  Nitrosomonas europaea
                    134        263  Glycine max (Soybean)
                    135        257  Lycopersicon esculentum (Tomato)
                    136        256  Cavia porcellus (Guinea pig)
                    137        255  Streptococcus agalactiae (serotype V)
                    138        254  Vaccinia virus (strain Copenhagen)
                    139        253  Rhodobacter capsulatus (Rhodopseudomonas capsulata)
                    140        253  Pyrobaculum aerophilum
                    141        253  Thermoplasma volcanium
                    142        253  Streptococcus agalactiae (serotype III)
                    143        252  Solanum tuberosum (Potato)
                    144        252  Leptospira interrogans
                    145        249  Pan troglodytes (Chimpanzee)
                    146        249  Pseudomonas putida
                    147        238  Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
                    148        237  Spinacia oleracea (Spinach)
                    149        232  Bacillus stearothermophilus
                    150        221  Wigglesworthia glossinidia brevipalpis
                    151        220  Porphyra purpurea
                    152        218  Chlamydophila caviae
                    153        215  Clostridium tetani
                    154        214  Coxiella burnetii
                    155        212  Synechococcus sp. (strain WH8102)
                    156        212  Chlamydomonas reinhardtii
                    157        207  Gloeobacter violaceus
                    158        207  Bacteroides thetaiotaomicron
                    159        206  Equus caballus (Horse)
                    160        206  Prochlorococcus marinus
                    161        204  Klebsiella pneumoniae
                    162        203  Prochlorococcus marinus (strain MIT 9313)
                    163        201  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
                    164        200  Kluyveromyces lactis (Yeast)
                    
                    
                    3.3  Taxonomic distribution of the sequences
                    
                    Kingdom        sequences (% of the database)
                    Archaea            8886 (  5%)
                    Bacteria          71350 ( 44%)
                    Eukaryota         74328 ( 46%)
                    Viruses            8671 (  5%)
                    
                    
                    Within Eukaryota:
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  11539 ( 16%)           (  7%)
                    Other Mammalia         20961 ( 28%)           ( 13%)
                    Other Vertebrata        6796 (  9%)           (  4%)
                    Viridiplantae          11474 ( 15%)           (  7%)
                    Fungi                  11135 ( 15%)           (  7%)
                    Insecta                 4073 (  5%)           (  2%)
                    Nematoda                2792 (  4%)           (  2%)
                    Other                   5558 (  7%)           (  3%)
                    
                    
                    4.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50    3089             1001-1100     1382
                    51- 100   11502             1101-1200     1000
                    101- 150   16542             1201-1300      723
                    151- 200   15548             1301-1400      539
                    201- 250   16096             1401-1500      424
                    251- 300   13706             1501-1600      272
                    301- 350   14454             1601-1700      206
                    351- 400   12990             1701-1800      143
                    401- 450    9990             1801-1900      162
                    451- 500    8487             1901-2000      129
                    501- 550    6464             2001-2100       80
                    551- 600    4396             2101-2200      125
                    601- 650    3735             2201-2300      111
                    651- 700    2616             2301-2400       71
                    701- 750    2206             2401-2500       63
                    751- 800    1864             >2500          435
                    801- 850    1486
                    851- 900    1662
                    901- 950    1135
                    951-1000     954
                    
                    The average sequence length in UniProt/Swiss-Prot is 365 amino acids.
                    
                    The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
                    The longest sequence is   SNE1_HUMAN (Q8NF91):  8797 amino acids.
                    
                    
                    5.  JOURNAL CITATIONS
                    
                    Note: the following citation statistics reflect the number of distinct
                    journal citations.
                    
                    Total number of journals cited in this release of UniProt/Swiss-Prot: 1516
                    
                    
                    5.1 Table of the frequency of journal citations
                    
                    Journals cited 1x:  556
                    2x:  203
                    3x:  105
                    4x:   63
                    5x:   66
                    6x:   33
                    7x:   34
                    8x:   23
                    9x:   26
                    10x:   15
                    11- 20x:  120
                    21- 50x:  113
                    51-100x:   53
                    >100x:  106
                    
                    
                    5.2  List of the most cited journals in UniProt/Swiss-Prot
                    
                    Nb    Citations   Journal name
                    --    ---------   -------------------------------------------------------------
                    1        10930   Journal of Biological Chemistry
                    2         5666   Proceedings of the National Academy of Sciences of the U.S.A.
                    3         3967   Journal of Bacteriology
                    4         3761   Nucleic Acids Research
                    5         3697   Gene
                    6         3004   Biochemical and Biophysical Research Communications
                    7         2997   FEBS Letters
                    8         2710   Biochemistry
                    9         2655   European Journal of Biochemistry
                    10         2516   The EMBO Journal
                    11         2342   Nature
                    12         2271   Biochimica et Biophysica Acta
                    13         2061   Journal of Molecular Biology
                    14         1977   Genomics
                    15         1856   Cell
                    16         1839   Molecular and Cellular Biology
                    17         1447   Biochemical Journal
                    18         1365   Science
                    19         1223   Molecular Microbiology
                    20         1183   Plant Molecular Biology
                    21         1181   Molecular and General Genetics
                    22          944   Journal of Biochemistry
                    23          895   Human Molecular Genetics
                    24          893   Virology
                    25          886   Journal of Cell Biology
                    26          817   Nature Genetics
                    27          733   Genes and Development
                    28          710   Journal of Virology
                    29          702   The American Journal of Human Genetics
                    30          670   Oncogene
                    31          667   Plant Physiology
                    32          654   Human Mutation
                    33          603   Journal of Immunology
                    34          592   Yeast
                    35          590   Infection and Immunity
                    36          564   Structure
                    37          544   Archives of Biochemistry and Biophysics
                    38          535   Journal of General Virology
                    39          519   Microbiology
                    40          517   Development
                    41          500   FEMS Microbiology Letters
                    42          470   Nature Structural Biology
                    43          467   Genetics
                    44          432   Human Genetics
                    45          423   Current Genetics
                    46          416   Blood
                    47          379   Molecular and Biochemical Parasitology
                    48          366   Applied and Environmental Microbiology
                    49          346   Journal of Clinical Investigation
                    50          334   Developmental Biology
                    51          333   Mammalian Genome
                    52          333   Protein Science
                    53          329   Molecular Endocrinology
                    54          322   Cancer Research
                    55          317   Molecular Biology of the Cell
                    56          310   Immunogenetics
                    57          308   Journal of Molecular Evolution
                    58          308   Neuron
                    59          304   DNA and Cell Biology
                    60          304   Mechanisms of Development
                    61          304   Acta Crystallographica, Section D
                    62          298   The Journal of Experimental Medicine
                    63          291   Journal of Cell Science
                    64          289   The Plant Cell
                    65          275   Biological Chemistry Hoppe-Seyler
                    66          267   Endocrinology
                    67          261   DNA Sequence
                    68          257   Journal of Neuroscience
                    69          254   The Plant Journal
                    70          236   Journal of General Microbiology
                    71          232   Journal of Neurochemistry
                    72          231   Molecular Biology and Evolution
                    73          230   The Journal of Clinical Endocrinology and Metabolism
                    74          228   Brain Research. Molecular Brain Research
                    75          216   Molecular Cell
                    76          214   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
                    77          212   Toxicon
                    78          205   Cytogenetics and Cell Genetics
                    79          199   American Journal of Physiology
                    80          198   Comparative Biochemistry and Physiology
                    81          196   Current Biology
                    82          194   Bioscience, Biotechnology, and Biochemistry
                    83          176   Antimicrobial Agents and Chemotherapy
                    84          176   Molecular Pharmacology
                    85          159   Proteins
                    86          156   DNA
                    87          149   Journal of Investigative Dermatology
                    88          147   Journal of Medical Genetics
                    89          146   DNA Research
                    90          146   Peptides
                    91          146   Tissue Antigens
                    92          141   Molecular Plant-Microbe Interactions
                    93          141   Virus Research
                    94          140   Biochimie
                    95          138   Genome Research
                    96          138   American Journal of Medical Genetics
                    97          134   Bioorganicheskaia Khimiia
                    98          126   European Journal of Immunology
                    99          123   Molecular and Cellular Endocrinology
                    100          123   Hemoglobin
                    101          121   Plant and Cell Physiology
                    102          117   Biology of Reproduction
                    103          115   Agricultural and Biological Chemistry
                    104          112   Insect Biochemistry and Molecular Biology
                    105          106   Archives of Microbiology
                    106          102   General and Comparative Endocrinology
                    107          100   Diabetes
                    
                    
                    6.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProt/Swiss-Prot 
                    lines, as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                     316266              1.94
                    Journal                          280711    153830    1.72
                    Submitted to EMBL/GenBank/DDBJ    32597     28003    0.20
                    Submitted to Swiss-Prot             776       771   <0.01
                    Unpublished observations            495       491   <0.01
                    Plant Gene Register                 489       478   <0.01
                    Book citation                       478       466   <0.01
                    Thesis                              274       272   <0.01
                    Submitted to other databases        201       200   <0.01
                    Unpublished results                 128       126   <0.01
                    Patent                              114       112   <0.01
                    Worm Breeder's Gazette                3         3   <0.01
                    
                    Comments (CC)                       582509              3.57
                    SIMILARITY                       166945    143752    1.02
                    FUNCTION                         106598    104154    0.65
                    SUBCELLULAR LOCATION              78760     78760    0.48
                    CATALYTIC ACTIVITY                57355     53901    0.35
                    SUBUNIT                           51150     51150    0.31
                    PATHWAY                           27659     26444    0.17
                    COFACTOR                          19319     19319    0.12
                    TISSUE SPECIFICITY                18041     18041    0.11
                    PTM                               10656      9454    0.07
                    MISCELLANEOUS                      9854      9021    0.06
                    DOMAIN                             6436      5646    0.04
                    ALTERNATIVE PRODUCTS               6148      6148    0.04
                    CAUTION                            5318      4858    0.03
                    INDUCTION                          4483      4483    0.03
                    DEVELOPMENTAL STAGE                4214      4214    0.03
                    DISEASE                            2765      2034    0.02
                    ENZYME REGULATION                  2308      2308    0.01
                    MASS SPECTROMETRY                  1501      1322    0.01
                    DATABASE                           1443      1361    0.01
                    POLYMORPHISM                        491       479   <0.01
                    ALLERGEN                            366       366   <0.01
                    RNA EDITING                         316       316   <0.01
                    TOXIC DOSE                          244       242   <0.01
                    BIOTECHNOLOGY                        89        89   <0.01
                    PHARMACEUTICAL                       50        50   <0.01
                    
                    Features (FT)                       917536              5.62
                    DOMAIN                           132159     41072    0.81
                    TRANSMEM                         103438     22456    0.63
                    TURN                              62434      4661    0.38
                    METAL                             61199     15293    0.37
                    CONFLICT                          61029     21454    0.37
                    STRAND                            57250      4165    0.35
                    CARBOHYD                          54750     13451    0.34
                    DISULFID                          51096     13514    0.31
                    HELIX                             45067      4518    0.28
                    ACT_SITE                          35908     21679    0.22
                    REPEAT                            35282      4995    0.22
                    VARIANT                           29737      5516    0.18
                    CHAIN                             28344     22968    0.17
                    NP_BIND                           22707     15604    0.14
                    SIGNAL                            17660     17658    0.11
                    MOD_RES                           17515      9614    0.11
                    BINDING                           13576      9401    0.08
                    SITE                              13553      8092    0.08
                    VARSPLIC                          12101      5353    0.07
                    NON_TER                           10873      8300    0.07
                    ZN_FING                           10322      3821    0.06
                    MUTAGEN                            8574      2356    0.05
                    INIT_MET                           7129      7083    0.04
                    PROPEP                             5683      4814    0.03
                    LIPID                              5008      3289    0.03
                    DNA_BIND                           4983      4681    0.03
                    TRANSIT                            3020      2995    0.02
                    PEPTIDE                            2983      1241    0.02
                    CA_BIND                            2178       896    0.01
                    NON_CONS                            925       459    0.01
                    CROSSLNK                            494       389   <0.01
                    UNSURE                              373       153   <0.01
                    SE_CYS                              186       129   <0.01
                    
                    Cross-references (DR)              1573986              9.64
                    InterPro                         332339    147362    2.04
                    EMBL                             313738    155986    1.92
                    Pfam                             190733    140008    1.17
                    PROSITE                          144507     90826    0.89
                    PIR                               91028     83972    0.56
                    HSSP                              68288     68288    0.42
                    PRINTS                            58993     48028    0.36
                    GO54709     16394    0.34
                    TIGRFAMs                          54414     47723    0.33
                    HAMAP                             48541     48434    0.30
                    ProDom                            43929     42102    0.27
                    SMART                             39027     29738    0.24
                    PDB                               24640      6662    0.15
                    TIGR                              16273     15819    0.10
                    Genew                             10611     10554    0.07
                    MIM                               10078      8331    0.06
                    MGD8016      7978    0.05
                    SGD5041      4981    0.03
                    GermOnline                         4927      4876    0.03
                    PIRSF                              4793      4793    0.03
                    EcoGene                            4228      4226    0.03
                    EchoBASE                           4159      4127    0.03
                    MEROPS                             3989      3889    0.02
                    H-InvDB                            3677      3659    0.02
                    WormPep                            2876      2535    0.02
                    RGD2782      2780    0.02
                    SubtiList                          2702      2701    0.02
                    FlyBase                            2701      2655    0.02
                    GeneDB_SPombe                      2700      2670    0.02
                    TRANSFAC                           2691      2412    0.02
                    IntAct                             2549      2549    0.02
                    WormBase                           2488      2426    0.02
                    TubercuList                        1427      1391    0.01
                    StyGene                            1407      1404    0.01
                    SWISS-2DPAGE                       1113      1113    0.01
                    ListiList                           978       955    0.01
                    Reactome                            712       712   <0.01
                    Leproma                             613       609   <0.01
                    Gramene                             562       557   <0.01
                    GeneFarm                            500       499   <0.01
                    MaizeDB                             412       407   <0.01
                    HIV370       354   <0.01
                    REBASE                              365       360   <0.01
                    OGP358       358   <0.01
                    ECO2DBASE                           351       299   <0.01
                    PhotoList                           349       349   <0.01
                    DictyBase                           324       322   <0.01
                    ZFIN307       300   <0.01
                    GlycoSuiteDB                        262       262   <0.01
                    SagaList                            254       253   <0.01
                    PHCI-2DPAGE                         239       239   <0.01
                    AGD187       182   <0.01
                    MypuList                            168       168   <0.01
                    Aarhus/Ghent-2DPAGE                 128        98   <0.01
                    Siena-2DPAGE                        103       103   <0.01
                    HSC-2DPAGE                           85        85   <0.01
                    COMPLUYEAST-2DPAGE                   59        59   <0.01
                    PhosSite                             54        54   <0.01
                    PMMA-2DPAGE                          52        52   <0.01
                    Maize-2DPAGE                         39        39   <0.01
                    Rat-heart-2DPAGE                     28        28   <0.01
                    ANU-2DPAGE                           13        13   <0.01
                    
                    
                    7.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in UniProt/Swiss-Prot: 191089
                    
                    Total number of entries encoded on a chloroplast: 3657
                    Total number of entries encoded on a mitochondrion: 2947
                    Total number of entries encoded on a cyanelle: 145
                    Total number of entries encoded on a plasmid: 2817
                    
                    Number of fragments: 8448
                    Number of additional sequences encoded on splice variants: 9436
                    
                

UniProt/TrEMBL protein database release 28.0 statistics

                    
                    1.  INTRODUCTION
                    
                    Release 28.0 of 25-Oct-2004 of UniProt/TrEMBL has been produced in synch
                    with UniProt/Swiss-Prot release 45 and EMBL/DDBJ/GenBank nucleotide
                    sequence database release 80 and updates until the 24-Sept. It contains 1'449'374
                    sequence entries, comprising 452'535'149 amino acids.
                    
                    126'364 sequences have been added since release 27, and the sequence and annotation
                    data of 56'945 entries have been revised. This represents an increase of 10.31%.
                    
                    In the document delac_tr.txt, you will find a list of all accession numbers
                    which were previously present in UniProt/TrEMBL, but which have now been
                    deleted from the database. Most deletions are due to the deletion of the
                    corresponding CDS in the source nucleotide sequence databases EMBL-
                    Bank/DDBJ/GenBank. In addition, some entries are recognised to be Open
                    Reading frames (ORFs) that have been wrongly predicted to code for proteins.
                    When there is enough evidence that these hypothetical proteins are not real,
                    we take the decision to remove them from TrEMBL. 
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 7.75   Gln (Q) 3.86   Leu (L) 9.73   Ser (S) 7.07
                    Arg (R) 5.30   Glu (E) 6.05   Lys (K) 5.57   Thr (T) 5.74
                    Asn (N) 4.49   Gly (G) 6.92   Met (M) 2.41   Trp (W) 1.37
                    Asp (D) 5.09   His (H) 2.27   Phe (F) 4.14   Tyr (Y) 3.15
                    Cys (C) 1.50   Ile (I) 6.03   Pro (P) 4.92   Val (V) 6.48
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.07
                    
                    
                    
                    Legend: gray = aliphatic, red = acidic, green = small hydroxy,
                    blue = basic, black = aromatic, white = amide, yellow = sulfur
                    
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Ser, Gly, Val, Glu, Ile, Thr, Lys, Arg, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of TrEMBL: 79556
                    
                    The first twenty species represent  443604 sequences:  30.6 % of the
                    total number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:39735
                    2x:14935
                    3x: 7591
                    4x: 3995
                    5x: 2237
                    6x: 1761
                    7x: 1190
                    8x: 1034
                    9x:  803
                    10x:  607
                    11- 20x: 2592
                    21- 50x: 1609
                    51-100x:  651
                    >100x:  816
                    
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     111043  Human immunodeficiency virus 1
                    2      45936  Homo sapiens (Human)
                    3      43435  Oryza sativa (japonica cultivar-group)
                    4      38180  Arabidopsis thaliana (Mouse-ear cress)
                    5      37472  Mus musculus (Mouse)
                    6      23882  Drosophila melanogaster (Fruit fly)
                    7      20025  Caenorhabditis elegans
                    8      19828  Hepatitis C virus
                    9      15632  Anopheles gambiae str. PEST
                    10      10995  Neurospora crassa
                    11       9060  Xenopus laevis (African clawed frog)
                    12       8837  Brachydanio rerio (Zebrafish) (Danio rerio)
                    13       8183  Bradyrhizobium japonicum
                    14       7811  Plasmodium yoelii yoelii
                    15       7588  Streptomyces coelicolor
                    16       7438  Streptomyces avermitilis
                    17       7199  Rhizobium loti (Mesorhizobium loti)
                    18       7102  Rhodopirellula baltica
                    19       7021  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    20       6937  Rattus norvegicus (Rat)
                    21       6488  Hepatitis B virus
                    22       6414  Yarrowia lipolytica CLIB99
                    23       6397  Giardia lamblia ATCC 50803
                    24       6354  Pseudomonas aeruginosa
                    25       6322  Bacillus anthracis
                    26       6249  Debaryomyces hansenii CBS767
                    27       5879  Escherichia coli
                    28       5707  Burkholderia pseudomallei K96243
                    29       5701  Bacillus cereus (strain ATCC 10987)
                    30       5685  Rhizobium meliloti (Sinorhizobium meliloti)
                    31       5575  Anabaena sp. (strain PCC 7120)
                    32       5275  Photobacterium profundum (Photobacterium sp. (strain SS9))
                    33       5269  Yersinia pestis
                    34       5231  Plasmodium falciparum (isolate 3D7)
                    35       5133  Kluyveromyces lactis NRRL Y-1140
                    36       5128  Bacillus cereus ZK
                    37       5062  Bacillus thuringiensis (subsp. konkukian)
                    38       5029  Candida glabrata CBS138
                    39       5000  Pseudomonas syringae (pv. tomato)
                    40       4995  uncultured bacterium
                    41       4979  Bacillus licheniformis DSM 13
                    42       4964  Escherichia coli O157:H7
                    43       4936  Helicobacter pylori (Campylobacter pylori)
                    44       4862  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
                    45       4860  Bacteroides fragilis
                    46       4855  Bacillus cereus (strain ATCC 14579 / DSM 31)
                    47       4807  Pseudomonas putida (strain KT2440)
                    48       4707  Rhodopseudomonas palustris
                    49       4698  Ralstonia solanacearum (Pseudomonas solanacearum)
                    50       4637  Bacteroides thetaiotaomicron
                    51       4637  Vibrio vulnificus (strain YJ016)
                    52       4629  Leptospira interrogans
                    53       4567  Ashbya gossypii (Yeast) (Eremothecium gossypii)
                    54       4523  Burkholderia mallei ATCC 23344
                    55       4473  Erwinia carotovora (subsp. atroseptica) (Pectobacterium atrosepticum)
                    56       4423  Shigella flexneri
                    57       4402  Vibrio parahaemolyticus
                    58       4307  Mycobacterium paratuberculosis
                    59       4270  Mycobacterium tuberculosis
                    60       4209  Gloeobacter violaceus
                    61       4176  Shewanella oneidensis
                    62       4171  Photorhabdus luminescens (subsp. laumondii)
                    63       4143  Chromobacterium violaceum
                    64       4080  Methanosarcina acetivorans
                    65       4076  Salmonella typhi
                    66       4034  Vibrio vulnificus
                    67       3998  Yersinia pseudotuberculosis IP 32953
                    68       3997  Escherichia coli O6
                    69       3925  Xanthomonas axonopodis (pv. citri)
                    70       3920  Vibrio cholerae
                    71       3910  Bordetella parapertussis
                    72       3871  Oryza sativa (Rice)
                    73       3829  Corynebacterium glutamicum (Brevibacterium flavum)
                    74       3807  Plasmodium falciparum
                    75       3755  Listeria monocytogenes
                    76       3721  Xanthomonas campestris (pv. campestris)
                    77       3613  Salmonella typhimurium
                    78       3577  Leptospira interrogans (serogroup Icterohaemorrhagiae / serovar Copenhageni)
                    79       3577  Bacillus halodurans
                    80       3560  Enterococcus faecalis (Streptococcus faecalis)
                    81       3543  Bdellovibrio bacteriovorus
                    82       3438  TT virus
                    83       3422  Streptococcus pneumoniae
                    84       3421  Clostridium acetobutylicum
                    85       3412  Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
                    86       3327  Caulobacter crescentus
                    87       3312  Symbiobacterium thermophilum
                    88       3309  Geobacter sulfurreducens
                    89       3259  Acinetobacter sp. (strain ADP1)
                    90       3225  Desulfotalea psychrophila
                    91       3204  Dictyostelium discoideum (Slime mold)
                    92       3125  Oceanobacillus iheyensis
                    93       3117  Chimpanzee immunodeficiency virus (SIV(cpz)) (CIV)
                    94       3092  Streptococcus pyogenes
                    95       3080  Bordetella pertussis
                    96       2971  Methanosarcina mazei (Methanosarcina frisia)
                    97       2873  Mycobacterium bovis
                    98       2863  Brucella suis
                    99       2841  Lactobacillus plantarum
                    100       2826  Gallus gallus (Chicken)
                    
                    
                    3.3  Distribution of the sequences by sections
                    
                    Division      sequences (% of the database)
                    arc                4947 ( 0%)
                    arp               33768 ( 2%)
                    fun               60361 ( 4%)
                    hum               42112 ( 3%)
                    inv              130570 ( 9%)
                    mam               18122 ( 1%)
                    mhc               11167 ( 1%)
                    org              122210 ( 8%)
                    phg               14152 ( 1%)
                    pln              127263 ( 9%)
                    pro              167218 (12%)
                    prp              374823 (26%)
                    rod               47880 ( 3%)
                    unc                1035 ( 0%)
                    vrl              133061 ( 9%)
                    vrt               38587 ( 3%)
                    vrv              122098 ( 8%)
                    
                    
                    4.  SEQUENCE SIZE
                    
                    4.1  Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50   17511             1001-1100     7990
                    51- 100   86504             1101-1200     5679
                    101- 150  106065             1201-1300     4310
                    151- 200   97642             1301-1400     2774
                    201- 250   99121             1401-1500     2311
                    251- 300   91903             1501-1600     1589
                    301- 350   89727             1601-1700     1246
                    351- 400   72463             1701-1800     1113
                    401- 450   55700             1801-1900      883
                    451- 500   48422             1901-2000      732
                    501- 550   38437             2001-2100      568
                    551- 600   26652             2101-2200      668
                    601- 650   20340             2201-2300      577
                    651- 700   16042             2301-2400      456
                    701- 750   13627             2401-2500      288
                    751- 800   11127             >2500         2819
                    801- 850    9494
                    851- 900    8340
                    901- 950    6039
                    951-1000    4987
                    
                    
                    4.2  Longest and shortest sequences
                    
                    The shortest sequence is Q16047:     4 amino acids.
                    The longest sequence is  Q8WZ42: 34350 amino acids.
                    
                    
                    5.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    2070966              1.43
                    Journal                         1310858   1081957    0.90
                    Submitted to EMBL/GenBank/DDBJ   752316    583823    0.52
                    Thesis                             4475      4423   <0.01
                    Book citation                      2836      2798   <0.01
                    Submitted to other databases        465       457   <0.01
                    Unpublished results                  10        10   <0.01
                    Unpublished observations              4         4   <0.01
                    Plant Gene Register                   1         1   <0.01
                    Patent1         1   <0.01
                    
                    Comments (CC)                       740803              0.51
                    SIMILARITY                       287840    284120    0.20
                    FUNCTION                         104641    104601    0.07
                    CATALYTIC ACTIVITY                92792     81468    0.06
                    SUBCELLULAR LOCATION              84858     84816    0.06
                    SUBUNIT                           57056     57045    0.04
                    CAUTION                           41764     41763    0.03
                    COFACTOR                          38459     38219    0.03
                    PATHWAY                           27807     27807    0.02
                    MISCELLANEOUS                      4489      4472   <0.01
                    DOMAIN                              336       331   <0.01
                    PTM313       312   <0.01
                    TISSUE SPECIFICITY                  156       156   <0.01
                    MASS SPECTROMETRY                   122        66   <0.01
                    DEVELOPMENTAL STAGE                  58        58   <0.01
                    INDUCTION                            49        49   <0.01
                    ALTERNATIVE PRODUCTS                 45        45   <0.01
                    ENZYME REGULATION                    10        10   <0.01
                    DISEASE                               5         5   <0.01
                    POLYMORPHISM                          3         3   <0.01
                    
                    Features (FT)                       889215              0.61
                    NON_TER                          834577    492963    0.58
                    CHAIN                             38254     22864    0.03
                    SIGNAL                            12200     11990    0.01
                    NON_CONS                            949       435   <0.01
                    CARBOHYD                            590       108   <0.01
                    TRANSIT                             588       578   <0.01
                    DOMAIN                              582       185   <0.01
                    SE_CYS                              301       159   <0.01
                    TRANSMEM                            253        53   <0.01
                    CONFLICT                            177        32   <0.01
                    REPEAT                              173        24   <0.01
                    DISULFID                            103        36   <0.01
                    VARSPLIC                             90        38   <0.01
                    METAL65        26   <0.01
                    VARIANT                              53        13   <0.01
                    ACT_SITE                             47        33   <0.01
                    UNSURE                               33        14   <0.01
                    DNA_BIND                             30        24   <0.01
                    NP_BIND                              29        25   <0.01
                    MOD_RES                              27        16   <0.01
                    BINDING                              19        10   <0.01
                    ZN_FING                              18        10   <0.01
                    PROPEP                               15        12   <0.01
                    SITE14        11   <0.01
                    MUTAGEN                              10         4   <0.01
                    LIPID7         4   <0.01
                    CA_BIND                               5         4   <0.01
                    PEPTIDE                               4         4   <0.01
                    INIT_MET                              2         2   <0.01
                    
                    Cross-references (DR)             10281511              7.09
                    GO                              3015214    891190    2.08
                    InterPro                        2018619   1017975    1.39
                    EMBL                            1683762   1442914    1.16
                    Pfam                            1301521    990543    0.90
                    PROSITE                          658935    435811    0.45
                    HSSP                             301450    301172    0.21
                    PRINTS                           300132    249418    0.21
                    SMART                            241092    187553    0.17
                    PIR                              199166    163352    0.14
                    ProDom                           175640    168699    0.12
                    TIGRFAMs                         138646    128847    0.10
                    TIGR                              76496     70728    0.05
                    MGD                               26164     26162    0.02
                    Gramene                           25322     24654    0.02
                    FlyBase                           23194     22921    0.02
                    WormPep                           19048     18955    0.01
                    WormBase                          19021     18947    0.01
                    PIRSF                              9470      9460    0.01
                    MEROPS                             6446      6195   <0.01
                    ZFIN                               6323      6320   <0.01
                    IntAct                             5195      5195   <0.01
                    ListiList                          4836      4819   <0.01
                    AGD4503      4503   <0.01
                    PhotoList                          4332      4208   <0.01
                    TubercuList                        2500      2491   <0.01
                    PDB2389      1375   <0.01
                    Genew                              2310      2310   <0.01
                    GeneDB_SPombe                      2248      2233   <0.01
                    SagaList                           1840      1746   <0.01
                    SGD1520      1520   <0.01
                    TRANSFAC                           1091      1077   <0.01
                    Leproma                             993       991   <0.01
                    DictyBase                           950       950   <0.01
                    MypuList                            614       610   <0.01
                    REBASE                              126       121   <0.01
                    PHCI-2DPAGE                         108       108   <0.01
                    SWISS-2DPAGE                        106       106   <0.01
                    ANU-2DPAGE                           76        76   <0.01
                    OGP 41        40   <0.01
                    Reactome                             39        39   <0.01
                    MIM 14        13   <0.01
                    PhosSite                             12        12   <0.01
                    PMMA-2DPAGE                           3         3   <0.01
                    Siena-2DPAGE                          2         2   <0.01
                    RGD  1         1   <0.01
                    COMPLUYEAST-2DPAGE                    1         1   <0.01
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in TrEMBL: 202049
                    
                    Total number of entries encoded on a chloroplast: 36258
                    Total number of entries encoded on a mitochondrion: 85929
                    Total number of entries encoded on a plasmid: 29227
                    
                    Number of additional sequences encoded on splice variants: 66
                    
                

Submissions and Updates

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.

Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml

For all queries regarding submissions to UniProt and to submit new protein sequence data, please contact:

UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail: datasubs@ebi.ac.uk


Download information

Full releases

For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest full release (updated 4 times per year) in flatfile format. The UniProt/Swiss-Prot Protein Knowledgebase is available at ftp://ftp.expasy.org/databases/Swiss-Prot/ and the UniProt/TrEMBL Protein Database is available at ftp://ftp.ebi.ac.uk/pub/databases/TrEMBL/.

The UniProt Knowledgebase full release is also available on CD-ROM from the EBI.

Bi-Weekly releases

The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by two files containing the sequences of all additional splice isoforms annotated in UniProt/Swiss-Prot and UniProt/TrEMBL. These data sets are documented in the file ftp://ftp.expasy.org/databases/sp_tr_nrdb/varsplic.txt


Contact

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address: datalib@ebi.ac.uk / swissprot@ebi.ac.uk
WWW server: http://www.ebi.ac.uk/


SIB Swiss Institute of Bioinformatics
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4
Switzerland

Telephone: (+41 22) 702 50 50
Fax: (+41 22) 702 58 58
Electronic mail address: Swiss-Prot@expasy.org
WWW server: http://www.expasy.org/


Protein Information Resource (PIR)
Georgetown University Medical Center
3900 Reservoir Road, NW
Box 571455
Washington, DC 20057-1455
United States of America

Telephone: (+1 202) 687 1039
Fax: (+1 202) 687 0057)
Electronic mail address: pirmail@georgetown.edu
WWW server: http://pir.georgetown.edu

Citation

If you want to cite UniProt in a publication please use the following reference:

Apweiler R., Bairoch A., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Natale D.A., O'Donovan C., Redaschi N. and Yeh L.L., UniProt: the Universal Protein Knowledgebase, Nucleic Acids Res. 32: D115-D119 (2004).


Copyright

UniProt copyright (c) 2003 UniProt consortium For non-commercial use all databases and documents in the UniProt FTP directory may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy.

For commercial use all databases and documents in the UniProt FTP directory, except the files ftp://ftp.ebi.ac.uk/pub/databases/uniprot/knowledgebase/uniprot_sprot.dat.gz and ftp://ftp.ebi.ac.uk/pub/databases/uniprot/knowledgebase/uniprot_sprot.xml.gz may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy. More information for commercial users can be found in: http://www.expasy.org/announce/sp_98.html

From January 1, 2005, all databases and documents in the UniProt FTP directory may be copied and redistributed freely by all entities, without advance permission, provided that this copyright statement is reproduced with each copy.

The above copyright notice also applies to these release note as well as to all other UniProt Knowledgebase documents.