Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Release 4.0 of the UniProt Knowledgebase is composed of the UniProt/Swiss-Prot Protein Knowledgebase release 46.0 and the UniProt/TrEMBL Protein Database release 29.0.

More information on these databases can be found in the user manual What is the UniProt Knowledgebase?.


UniProt/Swiss-Prot protein knowledgebase release 46.0 statistics

Release 46.0 of 01-Feb-2005 of UniProt/Swiss-Prot contains 168'297 sequence entries, comprising 61'443'278 amino acids abstracted from 124'910 references.

The growth of the database is summarized below.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154
44.0 07/04 153'871 56'608'159
45.0 10/04 163'235 59'631'787
46.0 02/05 168'297 61'443'278

In rare cases, Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProt/Swiss-Prot, but which have now been deleted from the database.


Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

  • be as complete as possible. All sequences available at a given time should be immediately included in UniProt/Swiss-Prot. This also includes sequence corrections and updates;
  • provide a higher level of annotation;
  • provide cross-references to specialized database(s) that contain, among other data, some information about the genes that code for these proteins;
  • provide specific indexes and documents.

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana None yet arath.txt 3'110
C.albicans None yet calbican.txt 321
C.elegans Wormpep celegans.txt 2'615
D.discoideum DictyBase dicty.txt 324
D.melanogaster FlyBase fly.txt 2'158
M.musculus MGD mgdtosp.txt 8'676
S.cerevisiae SGD yeast.txt 5'042
S.pombe GeneDB_SPombe pombe.txt 2'712

UniProt/Swiss-Prot release statistics
                    
                    1.  INTRODUCTION
                    
                    Release 46.0 of 01-Feb-2005 of UniProt/Swiss-Prot contains 168297 sequence entries,
                    comprising 61443278 amino acids abstracted from 124910 references. 
                    
                    4537 sequences have been added since release 45, the sequence data of
                    866 existing entries has been updated and the annotations of
                    77494 entries have been revised. This represents an increase of 3%.
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 7.81   Gln (Q) 3.94   Leu (L) 9.62   Ser (S) 6.88
                    Arg (R) 5.32   Glu (E) 6.61   Lys (K) 5.93   Thr (T) 5.45
                    Asn (N) 4.20   Gly (G) 6.93   Met (M) 2.37   Trp (W) 1.15
                    Asp (D) 5.30   His (H) 2.28   Phe (F) 4.00   Tyr (Y) 3.07
                    Cys (C) 1.56   Ile (I) 5.91   Pro (P) 4.84   Val (V) 6.71
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.01
                    
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Thr, Arg, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of Swiss-Prot: 8826
                    
                    The first twenty species represent 62418 sequences:  37.1 % of the total
                    number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x: 4171
                    2x: 1390
                    3x:  699
                    4x:  460
                    5x:  289
                    6x:  265
                    7x:  195
                    8x:  155
                    9x:  129
                    10x:   83
                    11- 20x:  371
                    21- 50x:  293
                    51-100x:   96
                    >100x:  230
                    
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1      11850  Homo sapiens (Human)
                    2       8676  Mus musculus (Mouse)
                    3       5042  Saccharomyces cerevisiae (Baker's yeast)
                    4       4838  Escherichia coli
                    5       4079  Rattus norvegicus (Rat)
                    6       3110  Arabidopsis thaliana (Mouse-ear cress)
                    7       2767  Bacillus subtilis
                    8       2712  Schizosaccharomyces pombe (Fission yeast)
                    9       2615  Caenorhabditis elegans
                    10       2158  Drosophila melanogaster (Fruit fly)
                    11       1782  Methanococcus jannaschii
                    12       1773  Haemophilus influenzae
                    13       1707  Escherichia coli O157:H7
                    14       1521  Bos taurus (Bovine)
                    15       1468  Salmonella typhimurium
                    16       1399  Mycobacterium tuberculosis
                    17       1368  Escherichia coli O6
                    18       1328  Shigella flexneri
                    19       1128  Gallus gallus (Chicken)
                    20       1097  Mycobacterium bovis
                    21       1051  Salmonella typhi
                    22       1012  Pseudomonas aeruginosa
                    23        958  Synechocystis sp. (strain PCC 6803)
                    24        955  Archaeoglobus fulgidus
                    25        923  Sus scrofa (Pig)
                    26        908  Xenopus laevis (African clawed frog)
                    27        807  Rhizobium meliloti (Sinorhizobium meliloti)
                    28        792  Vibrio cholerae
                    29        766  Yersinia pestis
                    30        747  Oryctolagus cuniculus (Rabbit)
                    31        745  Aquifex aeolicus
                    32        687  Mycoplasma pneumoniae
                    33        681  Pasteurella multocida
                    34        629  Vibrio parahaemolyticus
                    35        628  Streptomyces coelicolor
                    36        617  Bacillus halodurans
                    37        612  Mycobacterium leprae
                    38        606  Treponema pallidum
                    39        578  Vibrio vulnificus
                    40        573  Methanobacterium thermoautotrophicum
                    41        572  Buchnera aphidicola (subsp. Acyrthosiphon pisum) 
                    42        568  Anabaena sp. (strain PCC 7120)
                    43        562  Helicobacter pylori (Campylobacter pylori)
                    44        561  Buchnera aphidicola (subsp. Schizaphis graminum)
                    45        549  Staphylococcus aureus (strain Mu50 / ATCC 700699)
                    46        547  Staphylococcus aureus (strain N315)
                    47        546  Rickettsia prowazekii
                    48        543  Helicobacter pylori J99 (Campylobacter pylori J99)
                    49        530  Staphylococcus aureus (strain MW2)
                    50        517  Lactococcus lactis (subsp. lactis) (Streptococcus lactis)
                    51        514  Pseudomonas putida (strain KT2440)
                    52        513  Zea mays (Maize)
                    53        508  Pseudomonas syringae (pv. tomato)
                    54        507  Buchnera aphidicola (subsp. Baizongia pistaciae)
                    55        499  Staphylococcus epidermidis
                    56        499  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    57        499  Ralstonia solanacearum (Pseudomonas solanacearum)
                    58        496  Listeria monocytogenes
                    59        492  Listeria innocua
                    60        486  Mycoplasma genitalium
                    61        486  Rhizobium loti (Mesorhizobium loti)
                    62        482  Xanthomonas campestris (pv. campestris)
                    63        481  Neisseria meningitidis (serogroup B)
                    64        479  Neisseria meningitidis (serogroup A)
                    65        472  Clostridium acetobutylicum
                    66        467  Bradyrhizobium japonicum
                    67        464  Bacillus anthracis
                    68        463  Caulobacter crescentus
                    69        462  Canis familiaris (Dog)
                    70        461  Thermotoga maritima
                    71        444  Xanthomonas axonopodis (pv. citri)
                    72        442  Streptococcus pneumoniae
                    73        438  Oryza sativa (Rice)
                    74        438  Xylella fastidiosa
                    75        432  Deinococcus radiodurans
                    76        428  Pyrococcus horikoshii
                    77        428  Chlamydia trachomatis
                    78        426  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
                    79        424  Pyrococcus abyssi
                    80        419  Shewanella oneidensis
                    81        417  Borrelia burgdorferi (Lyme disease spirochete)
                    82        411  Brucella melitensis
                    83        411  Brucella suis
                    84        410  Methanosarcina acetivorans
                    85        410  Chlamydia pneumoniae (Chlamydophila pneumoniae)
                    86        410  Clostridium perfringens
                    87        405  Vibrio vulnificus (strain YJ016)
                    88        403  Rhizobium sp. (strain NGR234)
                    89        400  Chlamydia muridarum
                    90        396  Corynebacterium glutamicum (Brevibacterium flavum)
                    91        395  Methanosarcina mazei (Methanosarcina frisia)
                    92        394  Halobacterium sp. (strain NRC-1 / ATCC 700922 / JCM 11081)
                    93        394  Bacillus cereus (strain ATCC 14579 / DSM 31)
                    94        393  Brachydanio rerio (Zebrafish) (Danio rerio)
                    95        384  Pyrococcus furiosus
                    96        380  Oceanobacillus iheyensis
                    97        378  Campylobacter jejuni
                    98        378  Sulfolobus solfataricus
                    99        377  Thermoanaerobacter tengcongensis
                    100        372  Photorhabdus luminescens (subsp. laumondii)
                    101        372  Neurospora crassa
                    102        371  Ovis aries (Sheep)
                    103        371  Lactobacillus plantarum
                    104        366  Nicotiana tabacum (Common tobacco)
                    105        365  Streptococcus pyogenes
                    106        360  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
                    107        359  Rickettsia conorii
                    108        348  Synechococcus elongatus (Thermosynechococcus elongatus)
                    109        344  Streptococcus mutans
                    110        335  Aeropyrum pernix
                    111        331  Chlorobium tepidum
                    112        324  Dictyostelium discoideum (Slime mold)
                    113        322  Streptococcus pyogenes (serotype M18)
                    114        321  Candida albicans (Yeast)
                    115        317  Streptococcus pyogenes (serotype M3)
                    116        314  Methanopyrus kandleri
                    117        313  Staphylococcus aureus
                    118        307  Enterococcus faecalis (Streptococcus faecalis)
                    119        304  Pan troglodytes (Chimpanzee)
                    120        303  Sulfolobus tokodaii
                    121        302  Pisum sativum (Garden pea)
                    122        293  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
                    123        292  Bordetella pertussis
                    124        290  Thermoplasma acidophilum
                    125        288  Haemophilus ducreyi
                    126        283  Corynebacterium efficiens
                    127        283  Triticum aestivum (Wheat)
                    128        282  Bordetella parapertussis
                    129        279  Streptomyces avermitilis
                    130        278  Staphylococcus aureus (strain MRSA252)
                    131        277  Staphylococcus aureus (strain MSSA476)
                    132        276  Chromobacterium violaceum
                    133        273  Fusobacterium nucleatum (subsp. nucleatum)
                    134        272  Hordeum vulgare (Barley)
                    135        268  Bacteriophage T4
                    136        266  Nitrosomonas europaea
                    137        264  Glycine max (Soybean)
                    138        261  Lycopersicon esculentum (Tomato)
                    139        261  Streptococcus agalactiae (serotype V)
                    140        259  Streptococcus agalactiae (serotype III)
                    141        258  Leptospira interrogans
                    142        257  Cavia porcellus (Guinea pig)
                    143        256  Solanum tuberosum (Potato)
                    144        255  Thermoplasma volcanium
                    145        254  Rhodobacter capsulatus (Rhodopseudomonas capsulata)
                    146        254  Vaccinia virus (strain Copenhagen) (VACV)
                    147        254  Pyrobaculum aerophilum
                    148        248  Pseudomonas putida
                    149        240  Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
                    150        238  Spinacia oleracea (Spinach)
                    151        233  Bacillus stearothermophilus
                    152        221  Clostridium tetani
                    153        221  Wigglesworthia glossinidia brevipalpis
                    154        220  Porphyra purpurea
                    155        220  Chlamydophila caviae
                    156        218  Coxiella burnetii
                    157        218  Gloeobacter violaceus
                    158        216  Synechococcus sp. (strain WH8102)
                    159        212  Kluyveromyces lactis (Yeast)
                    160        212  Chlamydomonas reinhardtii
                    161        210  Prochlorococcus marinus
                    162        210  Bacteroides thetaiotaomicron
                    163        209  Macaca mulatta (Rhesus macaque)
                    164        208  Equus caballus (Horse)
                    165        207  Prochlorococcus marinus (strain MIT 9313)
                    166        206  Klebsiella pneumoniae
                    167        204  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
                    168        200  Vaccinia virus (strain Western Reserve / WR) (VACV)
                    
                    
                    3.3  Taxonomic distribution of the sequences
                    
                    Kingdom        sequences (% of the database)
                    Archaea            9025 (  5%)
                    Bacteria          73807 ( 44%)
                    Eukaryota         76388 ( 45%)
                    Viruses            9077 (  5%)
                    
                    
                    Within Eukaryota:
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  11850 ( 16%)           (  7%)
                    Other Mammalia         21659 ( 28%)           ( 13%)
                    Other Vertebrata        7019 (  9%)           (  4%)
                    Viridiplantae          11826 ( 15%)           (  7%)
                    Fungi                  11327 ( 15%)           (  7%)
                    Insecta                 4177 (  5%)           (  2%)
                    Nematoda                2880 (  4%)           (  2%)
                    Other                   5650 (  7%)           (  3%)
                    
                    
                    4.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50    3303             1001-1100     1432
                    51- 100   11821             1101-1200     1035
                    101- 150   17104             1201-1300      739
                    151- 200   15970             1301-1400      552
                    201- 250   16646             1401-1500      438
                    251- 300   14263             1501-1600      277
                    301- 350   15036             1601-1700      209
                    351- 400   13286             1701-1800      158
                    401- 450   10277             1801-1900      173
                    451- 500    8760             1901-2000      140
                    501- 550    6626             2001-2100       84
                    551- 600    4573             2101-2200      127
                    601- 650    3841             2201-2300      115
                    651- 700    2671             2301-2400       71
                    701- 750    2259             2401-2500       63
                    751- 800    1926             >2500          445
                    801- 850    1541
                    851- 900    1697
                    901- 950    1183
                    951-1000     999
                    
                    
                    The average sequence length in Swiss-Prot is 365 amino acids.
                    
                    The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
                    The longest sequence is  SYNE1_HUMAN (Q8NF91):  8797 amino acids.
                    
                    
                    5.  JOURNAL CITATIONS
                    
                    Note: the following citation statistics reflect the number of distinct
                    journal citations.
                    
                    Total number of journals cited in this release of Swiss-Prot: 1551
                    
                    
                    5.1 Table of the frequency of journal citations
                    
                    Journals cited 1x:  567
                    2x:  212
                    3x:  102
                    4x:   68
                    5x:   62
                    6x:   34
                    7x:   33
                    8x:   30
                    9x:   20
                    10x:   17
                    11- 20x:  118
                    21- 50x:  123
                    51-100x:   55
                    >100x:  110
                    
                    
                    5.2  List of the most cited journals in Swiss-Prot
                    
                    Nb    Citations   Journal name
                    --    ---------   -------------------------------------------------------------
                    1        11442   Journal of Biological Chemistry
                    2         5878   Proceedings of the National Academy of Sciences of the U.S.A.
                    3         4050   Journal of Bacteriology
                    4         3813   Nucleic Acids Research
                    5         3789   Gene
                    6         3152   Biochemical and Biophysical Research Communications
                    7         3125   FEBS Letters
                    8         2802   Biochemistry
                    9         2751   European Journal of Biochemistry
                    10         2612   The EMBO Journal
                    11         2403   Nature
                    12         2358   Biochimica et Biophysica Acta
                    13         2134   Journal of Molecular Biology
                    14         2031   Genomics
                    15         1927   Molecular and Cellular Biology
                    16         1912   Cell
                    17         1542   Biochemical Journal
                    18         1422   Science
                    19         1268   Molecular Microbiology
                    20         1216   Plant Molecular Biology
                    21         1209   Molecular and General Genetics
                    22          980   Journal of Biochemistry
                    23          936   Journal of Cell Biology
                    24          914   Virology
                    25          910   Human Molecular Genetics
                    26          838   Nature Genetics
                    27          762   Genes and Development
                    28          751   Journal of Virology
                    29          722   The American Journal of Human Genetics
                    30          714   Oncogene
                    31          687   Plant Physiology
                    32          683   Human Mutation
                    33          631   Journal of Immunology
                    34          620   Infection and Immunity
                    35          612   Archives of Biochemistry and Biophysics
                    36          601   Yeast
                    37          587   Structure
                    38          553   Journal of General Virology
                    39          538   Development
                    40          529   Microbiology
                    41          505   FEMS Microbiology Letters
                    42          489   Genetics
                    43          480   Nature Structural Biology
                    44          442   Human Genetics
                    45          441   Blood
                    46          427   Current Genetics
                    47          386   Molecular and Biochemical Parasitology
                    48          375   Applied and Environmental Microbiology
                    49          361   Journal of Clinical Investigation
                    50          350   Developmental Biology
                    51          348   Mammalian Genome
                    52          346   Molecular Endocrinology
                    53          344   Protein Science
                    54          340   Cancer Research
                    55          338   Molecular Biology of the Cell
                    56          330   Immunogenetics
                    57          326   The Plant Cell
                    58          324   Acta Crystallographica, Section D
                    59          321   Mechanisms of Development
                    60          319   Neuron
                    61          314   The Journal of Experimental Medicine
                    62          312   Journal of Molecular Evolution
                    63          307   DNA and Cell Biology
                    64          306   Journal of Cell Science
                    65          282   Biological Chemistry Hoppe-Seyler
                    66          277   Journal of Neuroscience
                    67          277   The Plant Journal
                    68          276   Endocrinology
                    69          268   DNA Sequence
                    70          254   Journal of Neurochemistry
                    71          243   Molecular Cell
                    72          239   Journal of General Microbiology
                    73          237   Brain Research. Molecular Brain Research
                    74          236   Molecular Biology and Evolution
                    75          235   The Journal of Clinical Endocrinology and Metabolism
                    76          225   Toxicon
                    77          218   Current Biology
                    78          217   Bioscience, Biotechnology, and Biochemistry
                    79          214   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
                    80          212   American Journal of Physiology
                    81          210   Cytogenetics and Cell Genetics
                    82          205   Comparative Biochemistry and Physiology
                    83          186   Molecular Pharmacology
                    84          180   Antimicrobial Agents and Chemotherapy
                    85          164   Proteins
                    86          159   Journal of Investigative Dermatology
                    87          158   DNA
                    88          156   Journal of Medical Genetics
                    89          154   DNA Research
                    90          151   Peptides
                    91          149   Tissue Antigens
                    92          146   Molecular Plant-Microbe Interactions
                    93          146   Genome Research
                    94          146   Virus Research
                    95          143   American Journal of Medical Genetics
                    96          141   Biochimie
                    97          138   Bioorganicheskaia Khimiia
                    98          135   Hemoglobin
                    99          130   European Journal of Immunology
                    100          129   Molecular and Cellular Endocrinology
                    101          126   Biology of Reproduction
                    102          123   Plant and Cell Physiology
                    103          116   Agricultural and Biological Chemistry
                    104          115   Insect Biochemistry and Molecular Biology
                    105          109   Archives of Microbiology
                    106          105   General and Comparative Endocrinology
                    107          105   Annals of Neurology
                    108          103   Diabetes
                    109          101   European Journal of Human Genetics
                    110          101   Molecular Phylogenetics and Evolution
                    
                    
                    6.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some Swiss-Prot lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                     331500              1.97
                    Journal                          295405    158585    1.76
                    Submitted to EMBL/GenBank/DDBJ    33350     28547    0.20
                    Submitted to Swiss-Prot             619       616   <0.01
                    Plant Gene Register                 495       484   <0.01
                    Book citation                       483       471   <0.01
                    Unpublished observations            444       440   <0.01
                    Thesis                              280       278   <0.01
                    Submitted to other databases        217       214   <0.01
                    Patent                              118       116   <0.01
                    Unpublished results                  83        81   <0.01
                    Worm Breeder's Gazette                6         6   <0.01
                    
                    Comments (CC)                       610556              3.63
                    SIMILARITY                       174573    149217    1.04
                    FUNCTION                         110731    108250    0.66
                    SUBCELLULAR LOCATION              81853     81853    0.49
                    CATALYTIC ACTIVITY                59345     55679    0.35
                    SUBUNIT                           53482     53481    0.32
                    PATHWAY                           28898     27301    0.17
                    COFACTOR                          20107     20107    0.12
                    TISSUE SPECIFICITY                18762     18762    0.11
                    PTM                               11372     10101    0.07
                    MISCELLANEOUS                      9674      8890    0.06
                    DOMAIN                             6951      6128    0.04
                    ALTERNATIVE PRODUCTS               6544      6544    0.04
                    CAUTION                            5775      5209    0.03
                    INDUCTION                          4721      4721    0.03
                    DEVELOPMENTAL STAGE                4413      4413    0.03
                    DISEASE                            2843      2087    0.02
                    INTERACTION                        2606      2606    0.02
                    ENZYME REGULATION                  2397      2397    0.01
                    MASS SPECTROMETRY                  1600      1406    0.01
                    DATABASE                           1481      1399    0.01
                    BIOPHYSICOCHEMICAL PROPERTIES       793       793   <0.01
                    POLYMORPHISM                        496       484   <0.01
                    ALLERGEN                            375       375   <0.01
                    RNA EDITING                         340       340   <0.01
                    TOXIC DOSE                          263       262   <0.01
                    BIOTECHNOLOGY                       110       110   <0.01
                    PHARMACEUTICAL                       51        51   <0.01
                    
                    Features (FT)                       951134              5.65
                    DOMAIN                           137509     42734    0.82
                    TRANSMEM                         106696     23186    0.63
                    CONFLICT                          64076     22398    0.38
                    METAL                             63755     15800    0.38
                    TURN                              62445      4663    0.37
                    STRAND                            57248      4166    0.34
                    CARBOHYD                          56975     14081    0.34
                    DISULFID                          52591     13918    0.31
                    HELIX                             45087      4520    0.27
                    ACT_SITE                          38281     22904    0.23
                    REPEAT                            36216      5152    0.22
                    VARIANT                           31599      6000    0.19
                    CHAIN                             28442     23157    0.17
                    NP_BIND                           23975     16553    0.14
                    MOD_RES                           19066     10178    0.11
                    SIGNAL                            18062     18060    0.11
                    SITE                              15265      9051    0.09
                    BINDING                           14746      9725    0.09
                    VARSPLIC                          13053      5755    0.08
                    ZN_FING                           10948      4044    0.07
                    NON_TER                           10907      8300    0.06
                    MUTAGEN                            9579      2606    0.06
                    INIT_MET                           7510      7464    0.04
                    PROPEP                             5846      4942    0.03
                    DNA_BIND                           5179      4872    0.03
                    LIPID                              5121      3374    0.03
                    PEPTIDE                            3563      1599    0.02
                    TRANSIT                            3059      3032    0.02
                    CA_BIND                            2236       902    0.01
                    NON_CONS                           1008       495    0.01
                    CROSSLNK                            517       408   <0.01
                    UNSURE                              383       156   <0.01
                    SE_CYS                              191       134   <0.01
                    
                    Cross-references (DR)              1666608              9.90
                    InterPro                         341849    151755    2.03
                    EMBL                             327282    160878    1.94
                    Pfam                             196363    144251    1.17
                    PROSITE                          150504     93796    0.89
                    PIR                               91827     84791    0.55
                    GO75177     21332    0.45
                    HSSP                              69476     69476    0.41
                    PRINTS                            60403     49140    0.36
                    TIGRFAMs                          52285     48770    0.31
                    HAMAP                             50708     50601    0.30
                    ProDom                            45407     43563    0.27
                    SMART                             41802     31654    0.25
                    PDB                               24775      6745    0.15
                    Ensembl                           22719     22718    0.13
                    TIGR                              16617     16155    0.10
                    Genew                             10935     10875    0.06
                    MIM                               10379      8553    0.06
                    MGD8327      8284    0.05
                    IntAct                             7447      7447    0.04
                    SGD5092      5031    0.03
                    PIRSF                              5008      5001    0.03
                    GermOnline                         4927      4877    0.03
                    EcoGene                            4225      4223    0.03
                    EchoBASE                           4159      4127    0.02
                    H-InvDB                            3677      3659    0.02
                    MEROPS                             3598      3507    0.02
                    WormPep                            2990      2612    0.02
                    RGD2886      2883    0.02
                    FlyBase                            2747      2723    0.02
                    GeneDB_SPombe                      2740      2710    0.02
                    TRANSFAC                           2737      2455    0.02
                    SubtiList                          2717      2716    0.02
                    WormBase                           2672      2597    0.02
                    TubercuList                        1427      1391    0.01
                    StyGene                            1420      1417    0.01
                    SWISS-2DPAGE                       1121      1121    0.01
                    ListiList                           989       966    0.01
                    Reactome                            717       717   <0.01
                    GeneFarm                            625       624   <0.01
                    Leproma                             616       612   <0.01
                    Gramene                             569       564   <0.01
                    MaizeDB                             419       414   <0.01
                    ZFIN387       380   <0.01
                    PhotoList                           372       372   <0.01
                    HIV370       354   <0.01
                    REBASE                              366       361   <0.01
                    OGP364       364   <0.01
                    ECO2DBASE                           351       299   <0.01
                    DictyBase                           325       323   <0.01
                    GlycoSuiteDB                        282       282   <0.01
                    SagaList                            260       259   <0.01
                    PHCI-2DPAGE                         239       239   <0.01
                    AGD200       194   <0.01
                    MypuList                            170       170   <0.01
                    Aarhus/Ghent-2DPAGE                 128        98   <0.01
                    Siena-2DPAGE                        103       103   <0.01
                    HSC-2DPAGE                           85        85   <0.01
                    COMPLUYEAST-2DPAGE                   59        59   <0.01
                    PhosSite                             54        54   <0.01
                    PMMA-2DPAGE                          52        52   <0.01
                    Maize-2DPAGE                         39        39   <0.01
                    Rat-heart-2DPAGE                     28        28   <0.01
                    ANU-2DPAGE                           14        14   <0.01
                    
                    Number of explicitly cross-referenced databases: 64
                    Number of implicitly cross-referenced databases: 32
                    
                    
                    7.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in Swiss-Prot: 196818
                    
                    Total number of entries encoded on a chloroplast: 3804
                    Total number of entries encoded on a mitochondrion: 2971
                    Total number of entries encoded on a cyanelle: 145
                    Total number of entries encoded on a plasmid: 2902
                    
                    Number of fragments: 8457
                    Number of additional sequences encoded on splice variants: 10003
                    
                

UniProt/TrEMBL protein database release 29.0 statistics

                    
                    1.  INTRODUCTION
                    
                    Release 29.0 of 01-Feb-2005 of UniProt/TrEMBL has been produced in synch
                    with UniProt/Swiss-Prot release 46 and EMBL/DDBJ/GenBank nucleotide
                    sequence database release 81 and updates until the 22-Jan-2005. It contains 
                    1'589'670 sequence entries, comprising 497'792'130 amino acids.
                    
                    153'776 sequences have been added since release 28, and the sequence and 
                    annotation data of 115'996 entries have been updated. This represents an 
                    increase of 11.24%.
                    
                    In the document delac_tr.txt, you will find a list of all accession numbers
                    which were previously present in UniProt/TrEMBL, but which have now been
                    deleted from the database. Most deletions are due to the deletion of the
                    corresponding CDS in the source nucleotide sequence databases EMBL-
                    Bank/DDBJ/GenBank. In addition, some entries are recognised to be Open
                    Reading frames (ORFs) that have been wrongly predicted to code for proteins.
                    When there is enough evidence that these hypothetical proteins are not real,
                    we take the decision to remove them from TrEMBL. 
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 7.78   Gln (Q) 3.87   Leu (L) 9.74   Ser (S) 7.04
                    Arg (R) 5.32   Glu (E) 6.07   Lys (K) 5.54   Thr (T) 5.73
                    Asn (N) 4.44   Gly (G) 6.93   Met (M) 2.41   Trp (W) 1.37
                    Asp (D) 5.10   His (H) 2.27   Phe (F) 4.14   Tyr (Y) 3.14
                    Cys (C) 1.50   Ile (I) 6.01   Pro (P) 4.93   Val (V) 6.50
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.07
                    
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Ser, Gly, Val, Glu, Ile, Thr, Lys, Arg, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of 
                    UniProt/TrEMBL: 84064
                    
                    The first twenty species represent 477233 sequences: 30 % of the
                    total number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:41727
                    2x:15907
                    3x: 8040
                    4x: 4247
                    5x: 2466
                    6x: 1872
                    7x: 1230
                    8x: 1067
                    9x:  853
                    10x:  642
                    11- 20x: 2798
                    21- 50x: 1662
                    51-100x:  684
                    >100x:  869
                    
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     121308  Human immunodeficiency virus 1
                    2      50385  Homo sapiens (Human)
                    3      48975  Oryza sativa (japonica cultivar-group)
                    4      38332  Arabidopsis thaliana (Mouse-ear cress)
                    5      38286  Mus musculus (Mouse)
                    6      24152  Drosophila melanogaster (Fruit fly)
                    7      21503  Hepatitis C virus
                    8      19983  Caenorhabditis elegans
                    9      15229  Anopheles gambiae str. PEST
                    10      13214  Caenorhabditis briggsae
                    11      10987  Neurospora crassa
                    12      10842  Brachydanio rerio (Zebrafish) (Danio rerio)
                    13      10664  Xenopus laevis (African clawed frog)
                    14       8177  Bradyrhizobium japonicum
                    15       8088  Rattus norvegicus (Rat)
                    16       7810  Plasmodium yoelii yoelii
                    17       7578  Streptomyces coelicolor
                    18       7429  Streptomyces avermitilis
                    19       7194  Rhizobium loti (Mesorhizobium loti)
                    20       7097  Rhodopirellula baltica
                    21       7015  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    22       6822  Hepatitis B virus
                    23       6494  Yarrowia lipolytica (Candida lipolytica)
                    24       6397  Giardia lamblia ATCC 50803
                    25       6369  Pseudomonas aeruginosa
                    26       6318  Bacillus anthracis
                    27       6265  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
                    28       6084  Escherichia coli
                    29       5951  uncultured bacterium
                    30       5911  Nocardia farcinica
                    31       5857  Burkholderia pseudomallei (Pseudomonas pseudomallei)
                    32       5692  Rhizobium meliloti (Sinorhizobium meliloti)
                    33       5672  Bacillus cereus (strain ATCC 10987)
                    34       5573  Anabaena sp. (strain PCC 7120)
                    35       5242  Photobacterium profundum (Photobacterium sp. (strain SS9))
                    36       5231  Plasmodium falciparum (isolate 3D7)
                    37       5229  Kluyveromyces lactis (Yeast)
                    38       5137  Candida glabrata (Yeast) (Torulopsis glabrata)
                    39       5096  Bacillus cereus (strain ZK)
                    40       5095  Helicobacter pylori (Campylobacter pylori)
                    41       5017  Bacillus thuringiensis (subsp. konkukian)
                    42       4993  Pseudomonas syringae (pv. tomato)
                    43       4941  Escherichia coli O157:H7
                    44       4847  Bacillus cereus (strain ATCC 14579 / DSM 31)
                    45       4846  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
                    46       4832  Gallus gallus (Chicken)
                    47       4824  Bacteroides fragilis
                    48       4800  Pseudomonas putida (strain KT2440)
                    49       4753  Yersinia pestis
                    50       4723  Ralstonia solanacearum (Pseudomonas solanacearum)
                    51       4689  Rhodopseudomonas palustris
                    52       4634  Bacteroides thetaiotaomicron
                    53       4628  Pongo pygmaeus (Orangutan)
                    54       4623  Leptospira interrogans
                    55       4585  Vibrio vulnificus (strain YJ016)
                    56       4526  Ashbya gossypii ATCC 10895
                    57       4515  Burkholderia mallei (Pseudomonas mallei)
                    58       4496  Azoarcus sp. (strain EbN1)
                    59       4419  Erwinia carotovora (subsp. atroseptica) (Pectobacterium atrosepticum)
                    60       4395  Vibrio parahaemolyticus
                    61       4317  Mycobacterium tuberculosis
                    62       4291  Mycobacterium paratuberculosis
                    63       4233  Silicibacter pomeroyi DSS-3
                    64       4198  Gloeobacter violaceus
                    65       4188  Photorhabdus luminescens (subsp. laumondii)
                    66       4168  Shewanella oneidensis
                    67       4158  Haloarcula marismortui (Halobacterium marismortui)
                    68       4130  Chromobacterium violaceum
                    69       4124  Yersinia pseudotuberculosis
                    70       4094  Bacillus licheniformis (strain DSM 13 / ATCC 14580)
                    71       4072  Salmonella enterica subsp. enterica serovar Paratypi A str. ATCC 9150
                    72       4069  Methanosarcina acetivorans
                    73       4067  Bacillus clausii (strain KSM-K16)
                    74       4060  Salmonella typhi
                    75       4029  Vibrio vulnificus
                    76       3973  Escherichia coli O6
                    77       3941  Vibrio cholerae
                    78       3920  Xanthomonas axonopodis (pv. citri)
                    79       3894  Bordetella parapertussis
                    80       3858  Plasmodium falciparum
                    81       3843  Bacillus licheniformis
                    82       3839  Corynebacterium glutamicum (Brevibacterium flavum)
                    83       3777  Salmonella typhimurium
                    84       3771  Oryza sativa (Rice)
                    85       3768  Shigella flexneri
                    86       3759  Listeria monocytogenes
                    87       3716  Xanthomonas campestris (pv. campestris)
                    88       3570  Enterococcus faecalis (Streptococcus faecalis)
                    89       3567  Bacillus halodurans
                    90       3552  Leptospira interrogans (serogroup Icterohaemorrhagiae / serovar Copenhageni)
                    91       3535  Bdellovibrio bacteriovorus
                    92       3511  Geobacillus kaustophilus HTA426
                    93       3487  TT virus
                    94       3441  Streptococcus pneumoniae
                    95       3415  Clostridium acetobutylicum
                    96       3393  Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
                    97       3325  Caulobacter crescentus
                    98       3289  Geobacter sulfurreducens
                    99       3283  Symbiobacterium thermophilum
                    100       3269  Chimpanzee immunodeficiency virus (SIV(cpz)) (CIV)
                    
                    3.3  Distribution of the sequences by sections
                    
                    Division      sequences (% of the database)
                    archaea           43134 ( 2.7%)
                    fungi             62926 ( 4%)
                    human             50385 ( 3.2%)
                    invertebrates    184252 ( 11.6%)
                    mammals           34073 ( 2.1%)
                    plants           179409 ( 11.3%)
                    bacteria         605632 ( 38.1%)
                    rodents           55021 ( 3.5%)
                    unclassified       1045 ( 0%)
                    viruses          288453 ( 18%)
                    vertebrates       85041 ( 5.3%)
                    
                    
                    4.  SEQUENCE SIZE
                    
                    4.1  Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50   18352             1001-1100     8773
                    51- 100   95681             1101-1200     6260
                    101- 150  118102             1201-1300     4728
                    151- 200  109209             1301-1400     3046
                    201- 250  110494             1401-1500     2514
                    251- 300  102539             1501-1600     1730
                    301- 350   99602             1601-1700     1359
                    351- 400   80912             1701-1800     1189
                    401- 450   62563             1801-1900      944
                    451- 500   54264             1901-2000      791
                    501- 550   42499             2001-2100      607
                    551- 600   29474             2101-2200      733
                    601- 650   22620             2201-2300      612
                    651- 700   17682             2301-2400      494
                    701- 750   14980             2401-2500      322
                    751- 800   12273             >2500         3046
                    801- 850   10415
                    851- 900    9233
                    901- 950    6740
                    951-1000    5475
                    
                    
                    4.2  Longest and shortest sequences
                    
                    The shortest sequence is Q16047:     4 amino acids.
                    The longest sequence is  Q8WZ42: 34350 amino acids.
                    
                    
                    5.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProt/TrEMBL 
                    lines, as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    2220491              1.40
                    Journal                         1395141   1163227    0.88
                    Submitted to EMBL/GenBank/DDBJ   816582    627835    0.51
                    Thesis                             4582      4530   <0.01
                    Book citation                      3718      3674   <0.01
                    Submitted to other databases        452       444   <0.01
                    Unpublished results                  10        10   <0.01
                    Unpublished observations              4         4   <0.01
                    Plant Gene Register                   1         1   <0.01
                    Patent1         1   <0.01
                    
                    Comments (CC)                       835627              0.53
                    SIMILARITY                       222175    218793    0.14
                    FUNCTION                         143297    142581    0.09
                    CATALYTIC ACTIVITY               136440    123511    0.09
                    SUBCELLULAR LOCATION             126593    126592    0.08
                    SUBUNIT                           65266     65258    0.04
                    CAUTION                           47416     47413    0.03
                    PATHWAY                           42505     42266    0.03
                    COFACTOR                          38630     38630    0.02
                    INTERACTION                        5097      5097   <0.01
                    MISCELLANEOUS                      4142      4125   <0.01
                    DOMAIN                             3454      3262   <0.01
                    ALLERGEN                            163       163   <0.01
                    TISSUE SPECIFICITY                  138       138   <0.01
                    MASS SPECTROMETRY                   121        65   <0.01
                    DEVELOPMENTAL STAGE                  55        55   <0.01
                    INDUCTION                            45        45   <0.01
                    PTM 38        37   <0.01
                    ALTERNATIVE PRODUCTS                 38        38   <0.01
                    ENZYME REGULATION                     8         8   <0.01
                    POLYMORPHISM                          3         3   <0.01
                    DISEASE                               3         3   <0.01
                    
                    Features (FT)                       951302              0.60
                    NON_TER                          895245    527251    0.56
                    CHAIN                             39563     23647    0.02
                    SIGNAL                            12522     12311    0.01
                    NON_CONS                            929       432   <0.01
                    TRANSIT                             582       578   <0.01
                    CARBOHYD                            580       100   <0.01
                    DOMAIN                              520       168   <0.01
                    SE_CYS                              318       168   <0.01
                    TRANSMEM                            229        52   <0.01
                    REPEAT                              169        23   <0.01
                    CONFLICT                            164        27   <0.01
                    DISULFID                             98        34   <0.01
                    VARSPLIC                             77        31   <0.01
                    VARIANT                              53        13   <0.01
                    METAL43        17   <0.01
                    ACT_SITE                             43        29   <0.01
                    UNSURE                               33        14   <0.01
                    DNA_BIND                             30        24   <0.01
                    NP_BIND                              23        19   <0.01
                    MOD_RES                              22        12   <0.01
                    ZN_FING                              16         8   <0.01
                    PROPEP                               15        12   <0.01
                    SITE10        10   <0.01
                    CA_BIND                               4         3   <0.01
                    PEPTIDE                               4         4   <0.01
                    BINDING                               3         3   <0.01
                    LIPID3         2   <0.01
                    MUTAGEN                               3         2   <0.01
                    INIT_MET                              1         1   <0.01
                    
                    Cross-references (DR)             11393181              7.17
                    GO                              3490371   1018322    2.20
                    InterPro                        2053199   1165127    1.29
                    EMBL                            1851113   1583287    1.16
                    Pfam                            1456963   1099139    0.92
                    PROSITE                          748989    488427    0.47
                    PRINTS                           316136    262369    0.20
                    HSSP                             295204    294924    0.19
                    SMART                            273636    211019    0.17
                    PIR                              198843    163073    0.13
                    ProDom                           190432    182879    0.12
                    TIGRFAMs                         161550    149520    0.10
                    TIGR                              83793     77785    0.05
                    Ensembl                           75459     75444    0.05
                    Gramene                           45809     45808    0.03
                    MGD                               25480     25478    0.02
                    FlyBase                           23005     22734    0.01
                    WormPep                           19282     19203    0.01
                    WormBase                          19270     19203    0.01
                    PIRSF                              9497      9497    0.01
                    MEROPS                             8679      8395    0.01
                    ZFIN                               6174      6171   <0.01
                    IntAct                             5438      5438   <0.01
                    ListiList                          4826      4809   <0.01
                    AGD4491      4491   <0.01
                    PhotoList                          4309      4185   <0.01
                    Genew                              3568      3568   <0.01
                    PDB2945      1720   <0.01
                    RGD2594      2579   <0.01
                    TubercuList                        2497      2491   <0.01
                    GeneDB_SPombe                      2236      2221   <0.01
                    SagaList                           1834      1740   <0.01
                    SGD1435      1434   <0.01
                    TRANSFAC                           1042      1028   <0.01
                    Leproma                             991       989   <0.01
                    DictyBase                           980       980   <0.01
                    MypuList                            612       608   <0.01
                    REBASE                              126       121   <0.01
                    PHCI-2DPAGE                         108       108   <0.01
                    SWISS-2DPAGE                         98        98   <0.01
                    ANU-2DPAGE                           74        74   <0.01
                    Reactome                             34        34   <0.01
                    OGP 29        29   <0.01
                    PhosSite                             12        12   <0.01
                    MIM 12        11   <0.01
                    PMMA-2DPAGE                           3         3   <0.01
                    Siena-2DPAGE                          2         2   <0.01
                    COMPLUYEAST-2DPAGE                    1         1   <0.01
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in UniProt/TrEMBL: 205506
                    
                    Total number of entries encoded on a chloroplast: 39087
                    Total number of entries encoded on a mitochondrion: 91928
                    Total number of entries encoded on a plasmid: 32361
                    
                    Number of additional sequences encoded on splice variants: 57
                    
                

Submissions and Updates

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.

Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml

For all queries regarding submissions to UniProt and to submit new protein sequence data, please contact:

UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail:


Download information

Bi-Weekly releases

The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by two files containing the sequences of all additional splice isoforms annotated in UniProt/Swiss-Prot and UniProt/TrEMBL. These data sets are documented in the file ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/README.varsplic

Major releases

For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 4 times per year) in flatfile format. Previous UniProt/Swiss-Prot and UniProt/TrEMBL are archived under ftp://ftp.uniprot.org/databases/uniprot/previous_major_releases The UniProt Knowledgebase major release is also available on CD-ROM from the EBI.


Contact

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address: /
WWW server: http://www.ebi.ac.uk/


SIB Swiss Institute of Bioinformatics
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4
Switzerland

Telephone: (+41 22) 702 50 50
Fax: (+41 22) 702 58 58
Electronic mail address:
WWW server: http://www.expasy.org/


Protein Information Resource (PIR)
Georgetown University Medical Center
3900 Reservoir Road, NW
Box 571455
Washington, DC 20057-1455
United States of America

Telephone: (+1 202) 687 1039
Fax: (+1 202) 687 0057)
Electronic mail address:
WWW server: http://pir.georgetown.edu

Citation

If you want to cite UniProt in a publication please use the following reference:

Bairoch A., Apweiler R., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Natale D.A., O'Donovan C., Redaschi N., Yeh L.S., The Universal Protein Resource (UniProt), Nucleic Acids Res. 33: D154-D159 (2005).