Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Release 6.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 48.0 and the UniProtKB/TrEMBL Protein Database release 31.0.

More information on these databases can be found in the user manual What is the UniProt Knowledgebase?.


UniProtKB/Swiss-Prot protein knowledgebase release 48.0 statistics

Release 48.0 of 13-Sep-2005 of Swiss-Prot contains 194'317 sequence entries, comprising 70'391'852 amino acids abstracted from 133'723 references.

The growth of the database is summarized below.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154
44.0 07/04 153'871 56'608'159
45.0 10/04 163'235 59'631'787
46.0 02/05 168'297 61'443'278
47.0 05/05 181'577 65'746'672
48.0 09/05 194'317 70'391'852

In rare cases, Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.


Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

  • be as complete as possible. All sequences available at a given time should be immediately included in UniProtKB/Swiss-Prot. This also includes sequence corrections and updates;
  • provide a higher level of annotation;
  • provide cross-references to specialized database(s) that contain, among other data, some information about the genes that code for these proteins;
  • provide specific indexes and documents.

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana TAIR arath.txt 3'609
C.albicans None yet calbican.txt 390
C.elegans Wormpep celegans.txt 2'667
D.discoideum DictyBase dicty.txt 325
D.melanogaster FlyBase fly.txt 2'273
M.musculus MGD mgdtosp.txt 9'933
S.cerevisiae SGD yeast.txt 5'139
S.pombe GeneDB_SPombe pombe.txt 2'840

UniProtKB/Swiss-Prot release statistics
                    
                    1.  INTRODUCTION
                    
                    Release 48.0 of 13-Sep-2005 of Swiss-Prot contains 194'317 sequence entries,
                    comprising 70'391'852 amino acids abstracted from 133'723 references. 
                    
                    11'963 sequences have been added since release 47, the sequence data of
                    1'095 existing entries has been updated and the annotations of
                    93'692 entries have been revised. This represents an increase of 7%.
                    
                    The growth of the database is summarized below.
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 7.83   Gln (Q) 3.94   Leu (L) 9.64   Ser (S) 6.85
                    Arg (R) 5.35   Glu (E) 6.63   Lys (K) 5.93   Thr (T) 5.42
                    Asn (N) 4.18   Gly (G) 6.94   Met (M) 2.37   Trp (W) 1.15
                    Asp (D) 5.32   His (H) 2.28   Phe (F) 4.00   Tyr (Y) 3.06
                    Cys (C) 1.53   Ile (I) 5.92   Pro (P) 4.83   Val (V) 6.72
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.01
                    
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Thr, Arg, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of Swiss-Prot: 9'479
                    
                    The first twenty species represent 66639 sequences:  34.3 % of the total
                    number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x: 4552
                    2x: 1489
                    3x:  734
                    4x:  476
                    5x:  320
                    6x:  281
                    7x:  197
                    8x:  156
                    9x:  138
                    10x:   78
                    11- 20x:  382
                    21- 50x:  287
                    51-100x:  111
                    >100x:  278
                    
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1      12860  Homo sapiens (Human)
                    2       9933  Mus musculus (Mouse)
                    3       5139  Saccharomyces cerevisiae (Baker's yeast)
                    4       4846  Escherichia coli
                    5       4570  Rattus norvegicus (Rat)
                    6       3609  Arabidopsis thaliana (Mouse-ear cress)
                    7       2840  Schizosaccharomyces pombe (Fission yeast)
                    8       2814  Bacillus subtilis
                    9       2667  Caenorhabditis elegans
                    10       2273  Drosophila melanogaster (Fruit fly)
                    11       1782  Methanococcus jannaschii
                    12       1772  Haemophilus influenzae
                    13       1758  Escherichia coli O157:H7
                    14       1653  Bos taurus (Bovine)
                    15       1512  Salmonella typhimurium
                    16       1438  Escherichia coli O6
                    17       1404  Shigella flexneri
                    18       1403  Mycobacterium tuberculosis
                    19       1230  Gallus gallus (Chicken)
                    20       1136  Mycobacterium bovis
                    21       1106  Salmonella typhi
                    22       1029  Pseudomonas aeruginosa
                    23       1001  Xenopus laevis (African clawed frog)
                    24        983  Sus scrofa (Pig)
                    25        964  Synechocystis sp. (strain PCC 6803)
                    26        964  Archaeoglobus fulgidus
                    27        823  Rhizobium meliloti (Sinorhizobium meliloti)
                    28        810  Vibrio cholerae
                    29        809  Yersinia pestis
                    30        770  Oryctolagus cuniculus (Rabbit)
                    31        746  Aquifex aeolicus
                    32        694  Pasteurella multocida
                    33        687  Mycoplasma pneumoniae
                    34        661  Pongo pygmaeus (Orangutan)
                    35        652  Vibrio parahaemolyticus
                    36        644  Streptomyces coelicolor
                    37        632  Bacillus halodurans
                    38        621  Mycobacterium leprae
                    39        608  Treponema pallidum
                    40        603  Canis familiaris (Dog)
                    41        599  Vibrio vulnificus
                    42        591  Staphylococcus aureus (strain Mu50 / ATCC 700699)
                    43        588  Staphylococcus aureus (strain N315)
                    44        587  Anabaena sp. (strain PCC 7120)
                    45        583  Methanobacterium thermoautotrophicum
                    46        578  Vibrio vulnificus (strain YJ016)
                    47        572  Buchnera aphidicola subsp. Acyrthosiphon pisum 
                    48        571  Staphylococcus aureus (strain MW2)
                    49        566  Oryza sativa (Rice)
                    50        563  Helicobacter pylori (Campylobacter pylori)
                    51        562  Buchnera aphidicola subsp. Schizaphis graminum
                    52        546  Pseudomonas putida (strain KT2440)
                    53        546  Rickettsia prowazekii
                    54        544  Helicobacter pylori J99 (Campylobacter pylori J99)
                    55        541  Pseudomonas syringae pv. tomato
                    56        531  Bacillus anthracis
                    57        528  Lactococcus lactis subsp. lactis (Streptococcus lactis)
                    58        528  Staphylococcus epidermidis
                    59        524  Bradyrhizobium japonicum
                    60        523  Brachydanio rerio (Zebrafish) (Danio rerio)
                    61        521  Zea mays (Maize)
                    62        517  Ralstonia solanacearum (Pseudomonas solanacearum)
                    63        513  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    64        512  Listeria monocytogenes
                    65        507  Buchnera aphidicola subsp. Baizongia pistaciae
                    66        506  Listeria innocua
                    67        500  Rhizobium loti (Mesorhizobium loti)
                    68        493  Xanthomonas campestris pv. campestris
                    69        493  Neisseria meningitidis serogroup B
                    70        490  Neisseria meningitidis serogroup A
                    71        488  Photorhabdus luminescens subsp. laumondii
                    72        486  Mycoplasma genitalium
                    73        485  Clostridium acetobutylicum
                    74        475  Caulobacter crescentus
                    75        467  Thermotoga maritima
                    76        462  Staphylococcus aureus (strain MRSA252)
                    77        461  Staphylococcus aureus (strain MSSA476)
                    78        459  Shewanella oneidensis
                    79        458  Bacillus cereus (strain ATCC 14579 / DSM 31)
                    80        456  Xanthomonas axonopodis pv. citri
                    81        453  Streptococcus pneumoniae
                    82        451  Pan troglodytes (Chimpanzee)
                    83        447  Xylella fastidiosa
                    84        441  Deinococcus radiodurans
                    85        440  Listeria monocytogenes serotype 4b (strain F2365)
                    86        437  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
                    87        436  Pyrococcus horikoshii
                    88        431  Chlamydia trachomatis
                    89        431  Pyrococcus abyssi
                    90        430  Methanosarcina acetivorans
                    91        426  Halobacterium salinarium (Halobacterium halobium)
                    92        423  Brucella melitensis
                    93        423  Brucella suis
                    94        422  Clostridium perfringens
                    95        421  Corynebacterium glutamicum (Brevibacterium flavum)
                    96        419  Oceanobacillus iheyensis
                    97        419  Haemophilus ducreyi
                    98        418  Borrelia burgdorferi (Lyme disease spirochete)
                    99        418  Neurospora crassa
                    100        417  Mimivirus
                    101        412  Chlamydia pneumoniae (Chlamydophila pneumoniae)
                    102        410  Methanosarcina mazei (Methanosarcina frisia)
                    103        404  Rhizobium sp. (strain NGR234)
                    104        402  Chlamydia muridarum
                    105        399  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
                    106        399  Yersinia pseudotuberculosis
                    107        398  Pyrococcus furiosus
                    108        390  Thermoanaerobacter tengcongensis
                    109        390  Candida albicans (Yeast)
                    110        389  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
                    111        388  Lactobacillus plantarum
                    112        385  Campylobacter jejuni
                    113        384  Ovis aries (Sheep)
                    114        383  Sulfolobus solfataricus
                    115        375  Streptococcus mutans
                    116        372  Synechococcus elongatus (Thermosynechococcus elongatus)
                    117        370  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
                    118        369  Nicotiana tabacum (Common tobacco)
                    119        367  Rickettsia conorii
                    120        366  Streptococcus pyogenes serotype M1
                    121        363  Streptococcus pyogenes serotype M6
                    122        363  Bordetella pertussis
                    123        361  Enterococcus faecalis (Streptococcus faecalis)
                    124        360  Chromobacterium violaceum
                    125        360  Streptococcus pyogenes serotype M18
                    126        359  Bordetella parapertussis
                    127        359  Streptococcus pyogenes serotype M3
                    128        359  Streptomyces avermitilis
                    129        346  Chlorobium tepidum
                    130        338  Aeropyrum pernix
                    131        338  Staphylococcus aureus
                    132        332  Methanopyrus kandleri
                    133        330  Leptospira interrogans
                    134        329  Corynebacterium efficiens
                    135        329  Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
                    136        328  Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
                    137        325  Dictyostelium discoideum (Slime mold)
                    138        319  Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
                    139        313  Bacillus cereus (strain ATCC 10987)
                    140        313  Nitrosomonas europaea
                    141        313  Pisum sativum (Garden pea)
                    142        309  Staphylococcus aureus (strain COL)
                    143        309  Sulfolobus tokodaii
                    144        307  Kluyveromyces lactis (Yeast)
                    145        297  Streptococcus agalactiae serotype V
                    146        297  Streptococcus agalactiae serotype III
                    147        297  Thermoplasma acidophilum
                    148        296  Gloeobacter violaceus
                    149        294  Photobacterium profundum (Photobacterium sp. (strain SS9))
                    150        294  Ashbya gossypii (Yeast) (Eremothecium gossypii)
                    151        285  Triticum aestivum (Wheat)
                    152        280  Synechococcus sp. (strain WH8102)
                    153        280  Fusobacterium nucleatum subsp. nucleatum
                    154        279  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
                    155        278  Pseudomonas putida
                    156        273  Prochlorococcus marinus (strain MIT 9313)
                    157        273  Hordeum vulgare (Barley)
                    158        270  Lycopersicon esculentum (Tomato)
                    159        268  Cavia porcellus (Guinea pig)
                    160        268  Bacteriophage T4
                    161        268  Glycine max (Soybean)
                    162        267  Macaca mulatta (Rhesus macaque)
                    163        265  Rhodopseudomonas palustris
                    164        265  Prochlorococcus marinus
                    165        264  Pyrobaculum aerophilum
                    166        262  Coxiella burnetii
                    167        261  Thermoplasma volcanium
                    168        257  Solanum tuberosum (Potato)
                    169        257  Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
                    170        256  Clostridium tetani
                    171        254  Rhodobacter capsulatus (Rhodopseudomonas capsulata)
                    172        254  Vaccinia virus (strain Copenhagen) (VACV)
                    173        253  Candida glabrata (Yeast) (Torulopsis glabrata)
                    174        252  Acinetobacter sp. (strain ADP1)
                    175        251  Emericella nidulans (Aspergillus nidulans)
                    176        250  Bacteroides thetaiotaomicron
                    177        249  Bacillus thuringiensis subsp. konkukian
                    178        246  Salmonella paratyphi-a
                    179        245  Spinacia oleracea (Spinach)
                    180        245  Wolinella succinogenes
                    181        242  Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
                    182        239  Mycobacterium paratuberculosis
                    183        235  Bacillus stearothermophilus
                    184        233  Wigglesworthia glossinidia brevipalpis
                    185        231  Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
                    186        228  Equus caballus (Horse)
                    187        227  Chlamydophila caviae
                    188        227  Bifidobacterium longum
                    189        223  Geobacter sulfurreducens
                    190        221  Rhodopirellula baltica
                    191        220  Porphyra purpurea
                    192        219  Porphyromonas gingivalis (Bacteroides gingivalis)
                    193        219  Burkholderia pseudomallei (Pseudomonas pseudomallei)
                    194        217  Corynebacterium diphtheriae
                    195        216  Chlamydomonas reinhardtii
                    196        214  Helicobacter hepaticus
                    197        213  Methanococcus maripaludis
                    198        212  Bacillus clausii (strain KSM-K16)
                    199        211  Bacillus cereus (strain ZK)
                    200        210  Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
                    201        209  Klebsiella pneumoniae
                    202        209  Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
                    203        203  Haloarcula marismortui (Halobacterium marismortui)
                    204        202  Mannheimia succiniciproducens (strain MBEL55E)
                    205        202  Yarrowia lipolytica (Candida lipolytica)
                    206        200  Vaccinia virus (strain Western Reserve / WR) (VACV)
                    
                    
                    
                    3.3  Taxonomic distribution of the sequences
                    
                    Kingdom        sequences (% of the database)
                    Archaea            9783 (  5%)
                    Bacteria          89394 ( 46%)
                    Eukaryota         85403 ( 44%)
                    Viruses            9737 (  5%)
                    
                    
                    Within Eukaryota:
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  12861 ( 15%)           (  7%)
                    Other Mammalia         25396 ( 30%)           ( 13%)
                    Other Vertebrata        7582 (  9%)           (  4%)
                    Viridiplantae          13805 ( 16%)           (  7%)
                    Fungi                  12450 ( 15%)           (  6%)
                    Insecta                 4391 (  5%)           (  2%)
                    Nematoda                2971 (  3%)           (  2%)
                    Other                   5947 (  7%)           (  3%)
                    
                    
                    4.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50    4028             1001-1100     1637
                    51- 100   13893             1101-1200     1126
                    101- 150   19808             1201-1300      842
                    151- 200   18963             1301-1400      639
                    201- 250   19439             1401-1500      495
                    251- 300   16591             1501-1600      309
                    301- 350   17240             1601-1700      230
                    351- 400   15608             1701-1800      177
                    401- 450   12090             1801-1900      189
                    451- 500   10072             1901-2000      154
                    501- 550    7724             2001-2100       95
                    551- 600    5156             2101-2200      148
                    601- 650    4379             2201-2300      119
                    651- 700    3103             2301-2400       84
                    701- 750    2629             2401-2500       64
                    751- 800    2190                 >2500      480
                    801- 850    1774
                    851- 900    1891
                    901- 950    1339
                    951-1000    1108
                    
                    
                    The average sequence length in Swiss-Prot is 362 amino acids.
                    
                    The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
                    The longest sequence is  SYNE1_HUMAN (Q8NF91):  8797 amino acids.
                    
                    
                    5.  JOURNAL CITATIONS
                    
                    Note: the following citation statistics reflect the number of distinct
                    journal citations.
                    
                    Total number of journals cited in this release of Swiss-Prot: 1618
                    
                    
                    5.1 Table of the frequency of journal citations
                    
                    Journals cited 1x:  577
                    2x:  226
                    3x:  114
                    4x:   77
                    5x:   55
                    6x:   36
                    7x:   33
                    8x:   36
                    9x:   21
                    10x:   15
                    11- 20x:  124
                    21- 50x:  132
                    51-100x:   56
                    >100x:  116
                    
                    
                    5.2  List of the most cited journals in Swiss-Prot
                    
                    Nb    Citations   Journal name
                    --    ---------   -------------------------------------------------------------
                    1        12470   Journal of Biological Chemistry
                    2         6211   Proceedings of the National Academy of Sciences of the U.S.A.
                    3         4207   Journal of Bacteriology
                    4         3922   Gene
                    5         3880   Nucleic Acids Research
                    6         3338   Biochemical and Biophysical Research Communications
                    7         3249   FEBS Letters
                    8         2906   Biochemistry
                    9         2799   European Journal of Biochemistry
                    10         2756   The EMBO Journal
                    11         2524   Nature
                    12         2458   Biochimica et Biophysica Acta
                    13         2228   Journal of Molecular Biology
                    14         2134   Molecular and Cellular Biology
                    15         2118   Genomics
                    16         2018   Cell
                    17         1619   Biochemical Journal
                    18         1490   Science
                    19         1337   Molecular Microbiology
                    20         1251   Plant Molecular Biology
                    21         1241   Molecular and General Genetics
                    22         1020   Journal of Cell Biology
                    23         1013   Journal of Biochemistry
                    24          963   Virology
                    25          961   Human Molecular Genetics
                    26          894   Nature Genetics
                    27          854   Journal of Virology
                    28          837   Genes and Development
                    29          778   The American Journal of Human Genetics
                    30          766   Oncogene
                    31          757   Plant Physiology
                    32          735   Human Mutation
                    33          669   Infection and Immunity
                    34          668   Journal of Immunology
                    35          638   Structure
                    36          636   Archives of Biochemistry and Biophysics
                    37          626   Yeast
                    38          625   Development
                    39          578   Journal of General Virology
                    40          561   Genetics
                    41          559   Microbiology
                    42          517   FEMS Microbiology Letters
                    43          507   Nature Structural Biology
                    44          473   Blood
                    45          457   Human Genetics
                    46          452   Current Genetics
                    47          410   Molecular Biology of the Cell
                    48          396   Applied and Environmental Microbiology
                    49          394   The Plant Cell
                    50          390   Molecular and Biochemical Parasitology
                    51          384   Journal of Clinical Investigation
                    52          383   Developmental Biology
                    53          374   Cancer Research
                    54          370   Mammalian Genome
                    55          367   Journal of Cell Science
                    56          361   Protein Science
                    57          358   Mechanisms of Development
                    58          356   Molecular Endocrinology
                    59          346   Neuron
                    60          344   Acta Crystallographica, Section D
                    61          340   Immunogenetics
                    62          331   The Journal of Experimental Medicine
                    63          327   Journal of Molecular Evolution
                    64          326   The Plant Journal
                    65          316   DNA and Cell Biology
                    66          315   Journal of Neuroscience
                    67          314   Molecular Cell
                    68          298   Endocrinology
                    69          283   Biological Chemistry Hoppe-Seyler
                    70          279   DNA Sequence
                    71          276   Journal of Neurochemistry
                    72          268   Current Biology
                    73          259   The Journal of Clinical Endocrinology and Metabolism
                    74          255   Molecular Biology and Evolution
                    75          247   Brain Research. Molecular Brain Research
                    76          240   Journal of General Microbiology
                    77          240   Bioscience, Biotechnology, and Biochemistry
                    78          239   Toxicon
                    79          238   American Journal of Physiology
                    80          227   Cytogenetics and Cell Genetics
                    81          216   Comparative Biochemistry and Physiology
                    82          214   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
                    83          198   Antimicrobial Agents and Chemotherapy
                    84          189   Molecular Pharmacology
                    85          181   Journal of Investigative Dermatology
                    86          179   Proteins
                    87          173   Journal of Medical Genetics
                    88          163   Peptides
                    89          161   DNA Research
                    90          158   DNA
                    91          157   Molecular Plant-Microbe Interactions
                    92          155   Genome Research
                    93          154   Virus Research
                    94          154   American Journal of Medical Genetics
                    95          152   Tissue Antigens
                    96          151   Plant and Cell Physiology
                    97          144   Biochimie
                    98          144   European Journal of Immunology
                    99          143   Biology of Reproduction
                    100          138   Bioorganicheskaia Khimiia
                    101          137   Molecular and Cellular Endocrinology
                    102          135   Hemoglobin
                    103          119   Archives of Microbiology
                    104          119   Insect Biochemistry and Molecular Biology
                    105          119   Molecular Phylogenetics and Evolution
                    106          118   Agricultural and Biological Chemistry
                    107          117   Experimental Cell Research
                    108          112   Journal of Human Genetics
                    109          112   Annals of Neurology
                    110          110   Nature Cell Biology
                    111          110   European Journal of Human Genetics
                    112          109   General and Comparative Endocrinology
                    113          108   Neurology
                    114          106   RNA
                    115          104   Diabetes
                    116          103   Developmental Dynamics
                    
                    
                    6.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some Swiss-Prot lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                     378807              1.95
                    Journal                          335752    181666    1.73
                    Submitted to EMBL/GenBank/DDBJ    40044     34264    0.21
                    Submitted to Swiss-Prot             671       668   <0.01
                    Plant Gene Register                 510       498   <0.01
                    Book citation                       494       482   <0.01
                    Unpublished observations            469       465   <0.01
                    Submitted to other databases        353       345   <0.01
                    Thesis                              301       299   <0.01
                    Patent                              124       122   <0.01
                    Unpublished results                  83        81   <0.01
                    Worm Breeder's Gazette                6         6   <0.01
                    
                    Comments (CC)                       732470              3.77
                    SIMILARITY                       208863    174553    1.07
                    FUNCTION                         130272    127160    0.67
                    SUBCELLULAR LOCATION              97545     97545    0.50
                    CATALYTIC ACTIVITY                69879     65211    0.36
                    SUBUNIT                           64500     64500    0.33
                    PATHWAY                           35314     32279    0.18
                    COFACTOR                          27402     24680    0.14
                    TISSUE SPECIFICITY                20576     20576    0.11
                    PTM                               13541     11921    0.07
                    MISCELLANEOUS                     12757     11828    0.07
                    DOMAIN                             9667      8549    0.05
                    ALTERNATIVE PRODUCTS               7803      7803    0.04
                    CAUTION                            7087      6298    0.04
                    INDUCTION                          5374      5374    0.03
                    INTERACTION                        4966      4966    0.03
                    DEVELOPMENTAL STAGE                4928      4928    0.03
                    DISEASE                            3066      2236    0.02
                    ENZYME REGULATION                  2770      2770    0.01
                    MASS SPECTROMETRY                  1881      1597    0.01
                    DATABASE                           1413      1322    0.01
                    BIOPHYSICOCHEMICAL PROPERTIES      1117      1117    0.01
                    POLYMORPHISM                        509       496   <0.01
                    RNA EDITING                         401       401   <0.01
                    ALLERGEN                            387       387   <0.01
                    TOXIC DOSE                          277       276   <0.01
                    BIOTECHNOLOGY                       117       117   <0.01
                    PHARMACEUTICAL                       58        58   <0.01
                    
                    Features (FT)                      1082233              5.57
                    TRANSMEM                         123597     26975    0.64
                    METAL                             77253     19387    0.40
                    CONFLICT                          71568     24920    0.37
                    TOPO_DOM                          62341     12741    0.32
                    TURN                              62287      4652    0.32
                    CARBOHYD                          61962     15580    0.32
                    STRAND                            57083      4155    0.29
                    DISULFID                          57006     15630    0.29
                    DOMAIN                            55803     29830    0.29
                    ACT_SITE                          45107     26532    0.23
                    HELIX                             44973      4509    0.23
                    REPEAT                            42352      6078    0.22
                    VARIANT                           34916      6771    0.18
                    CHAIN                             31363     25429    0.16
                    NP_BIND                           26977     18970    0.14
                    MOD_RES                           25644     13037    0.13
                    REGION                            22431     11354    0.12
                    BINDING                           20862     11717    0.11
                    SIGNAL                            19975     19973    0.10
                    COMPBIAS                          18938     10358    0.10
                    VARSPLIC                          16016      6973    0.08
                    MUTAGEN                           12686      3279    0.07
                    ZN_FING                           12442      4999    0.06
                    SITE                              12122      6874    0.06
                    NON_TER                           10993      8349    0.06
                    MOTIF                             10543      7622    0.05
                    INIT_MET                           8934      8858    0.05
                    PROPEP                             6568      5481    0.03
                    DNA_BIND                           5748      5387    0.03
                    LIPID                              5732      3787    0.03
                    COILED                             5564      3389    0.03
                    PEPTIDE                            4022      1851    0.02
                    TRANSIT                            3372      3342    0.02
                    CA_BIND                            2334       941    0.01
                    NON_CONS                           1096       526    0.01
                    CROSSLNK                           1001       761    0.01
                    UNSURE                              417       170   <0.01
                    SE_CYS                              205       139   <0.01
                    
                    Cross-references (DR)              2038749             10.49
                    InterPro                         399585    178352    2.06
                    EMBL                             371472    186572    1.91
                    Pfam                             235069    172308    1.21
                    PROSITE                          175229    108545    0.90
                    GO95530     27032    0.49
                    PIR                               93479     86872    0.48
                    PRINTS                            74849     58177    0.39
                    HSSP                              73924     73924    0.38
                    TIGRFAMs                          72463     67688    0.37
                    HAMAP                             66305     66201    0.34
                    ProDom                            52161     50186    0.27
                    SMART                             48269     36731    0.25
                    PANTHER                           47077     44568    0.24
                    Ensembl                           36475     36473    0.19
                    PDB                               30616      8392    0.16
                    SMR                               26242     26242    0.14
                    TIGR                              19090     18555    0.10
                    PIRSF                             15949     15699    0.08
                    HGNC                              12075     12018    0.06
                    MIM                               11065      9064    0.06
                    MGI9693      9659    0.05
                    IntAct                             7064      7064    0.04
                    SGD5192      5129    0.03
                    GermOnline                         4926      4876    0.03
                    EcoGene                            4225      4223    0.02
                    EchoBASE                           4159      4127    0.02
                    MEROPS                             3861      3746    0.02
                    H-InvDB                            3676      3658    0.02
                    TAIR                               3675      3603    0.02
                    RGD3231      3228    0.02
                    WormPep                            3097      2666    0.02
                    FlyBase                            2883      2852    0.01
                    GeneDB_Spombe                      2872      2838    0.01
                    TRANSFAC                           2782      2494    0.01
                    SubtiList                          2757      2756    0.01
                    WormBase                           2738      2661    0.01
                    Gramene                            1890      1883    0.01
                    StyGene                            1467      1464    0.01
                    TubercuList                        1431      1395    0.01
                    SWISS-2DPAGE                       1155      1155    0.01
                    GeneFarm                           1059      1053    0.01
                    ListiList                          1019      1011    0.01
                    Reactome                            992       992    0.01
                    Leproma                             625       621   <0.01
                    ZFIN516       509   <0.01
                    PhotoList                           488       488   <0.01
                    MaizeDB                             426       421   <0.01
                    HIV370       365   <0.01
                    REBASE                              367       362   <0.01
                    OGP367       367   <0.01
                    ECO2DBASE                           351       299   <0.01
                    DictyBase                           326       324   <0.01
                    AGD300       294   <0.01
                    SagaList                            298       297   <0.01
                    LegioList                           286       286   <0.01
                    GlycoSuiteDB                        283       283   <0.01
                    PHCI-2DPAGE                         239       239   <0.01
                    MypuList                            175       175   <0.01
                    Aarhus/Ghent-2DPAGE                 128        98   <0.01
                    Siena-2DPAGE                        103       103   <0.01
                    HSC-2DPAGE                           85        85   <0.01
                    COMPLUYEAST-2DPAGE                   59        59   <0.01
                    PhosSite                             54        54   <0.01
                    PMMA-2DPAGE                          52        52   <0.01
                    Maize-2DPAGE                         39        39   <0.01
                    Rat-heart-2DPAGE                     28        28   <0.01
                    ANU-2DPAGE                           16        16   <0.01
                    
                    Number of explicitly cross-referenced databases: 69
                    Number of implicitly cross-referenced databases: 31
                    
                    
                    7.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in Swiss-Prot: 208469
                    
                    Total number of entries encoded on a plastid: 64  
                    Total number of entries encoded on a mitochondrion: 3334
                    Total number of entries encoded on a plasmid: 3046
                    
                    Number of fragments: 8504
                    Number of additional sequences encoded on splice variants: 12128
                    
                

UniProtKB/TrEMBL protein database release 31.0 statistics

                    
                    1.  INTRODUCTION
                    
                    Release 31.0 of 13-Sept-2005 of UniProtKB/TrEMBL has been produced in synch
                    with UniProtKB/Swiss-Prot release 48 and EMBL/DDBJ/GenBank nucleotide
                    sequence database release 83 and updates until the 19-August-2005. It contains 
                    2'055'517 sequence entries, comprising 680'464'593 amino acids.
                    
                    405'513 sequences have been added since release 30. This represents an 
                    increase of 27%.
                    
                    In the document delac_tr.txt, you will find a list of all accession numbers
                    which were previously present in TrEMBL, but which have now been
                    deleted from the database. Most deletions are due to the deletion of the
                    corresponding CDS in the source nucleotide sequence databases EMBL-
                    Bank/DDBJ/GenBank. In addition, some entries are recognised to be Open
                    Reading frames (ORFs) that have been wrongly predicted to code for proteins.
                    When there is enough evidence that these hypothetical proteins are not real,
                    we take the decision to remove them from TrEMBL. 
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 7.85   Gln (Q) 3.87   Leu (L) 9.72   Ser (S) 7.14
                    Arg (R) 5.38   Glu (E) 6.08   Lys (K) 5.51   Thr (T) 5.71
                    Asn (N) 4.48   Gly (G) 6.87   Met (M) 2.38   Trp (W) 1.35
                    Asp (D) 5.13   His (H) 2.27   Phe (F) 4.11   Tyr (Y) 3.11
                    Cys (C) 1.49   Ile (I) 5.96   Pro (P) 4.94   Val (V) 6.48
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.06
                    
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Ser, Gly, Val, Glu, Ile, Thr, Lys, Arg, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of 
                    UniProtKB/TrEMBL: 95545
                    
                    The first twenty species represent 571629 sequences: 27.1 % of the
                    total number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:46664
                    2x:18167
                    3x: 9165
                    4x: 4827
                    5x: 2821
                    6x: 2208
                    7x: 1484
                    8x: 1222
                    9x: 1005
                    10x:  763
                    11- 20x: 3497
                    21- 50x: 1926
                    51-100x:  775
                    >100x: 1021
                    
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     138508  Human immunodeficiency virus 1
                    2      58027  Homo sapiens (Human)
                    3      49342  Oryza sativa (japonica cultivar-group)
                    4      39688  Arabidopsis thaliana (Mouse-ear cress)
                    5      39144  Mus musculus (Mouse)
                    6      27998  Tetraodon nigroviridis (Green puffer)
                    7      25252  Drosophila melanogaster (Fruit fly)
                    8      25184  Hepatitis C virus
                    9      20341  Caenorhabditis elegans
                    10      20090  Trypanosoma cruzi
                    11      15223  Anopheles gambiae str. PEST
                    12      14672  Plasmodium chabaudi
                    13      14614  Dictyostelium discoideum (Slime mold)
                    14      13522  Brachydanio rerio (Zebrafish) (Danio rerio)
                    15      13197  Caenorhabditis briggsae
                    16      11765  Plasmodium berghei
                    17      11636  Gibberella zeae PH-1
                    18      11543  Xenopus laevis (African clawed frog)
                    19      11007  Magnaporthe grisea 70-15
                    20      10876  Neurospora crassa
                    21       9872  Aspergillus fumigatus Af293
                    22       9826  Rattus norvegicus (Rat)
                    23       9676  Schistosoma japonicum (Blood fluke)
                    24       9474  Aspergillus nidulans FGSC A4
                    25       9168  Candida albicans SC5314
                    26       9092  Entamoeba histolytica HM-1:IMSS
                    27       8990  Hepatitis B virus
                    28       8349  uncultured bacterium
                    29       8212  Leishmania major
                    30       8122  Bradyrhizobium japonicum
                    31       8063  Solibacter usitatus Ellin6076
                    32       7801  Plasmodium yoelii yoelii
                    33       7663  Burkholderia vietnamiensis G4
                    34       7563  Streptomyces coelicolor
                    35       7349  Streptomyces avermitilis
                    36       7236  Escherichia coli
                    37       7178  Rhizobium loti (Mesorhizobium loti)
                    38       7050  Rhodopirellula baltica
                    39       7049  Burkholderia cenocepacia HI2424
                    40       6994  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    41       6545  Pseudomonas aeruginosa
                    42       6531  Cryptococcus neoformans var. neoformans B-3501A
                    43       6498  Ustilago maydis 521
                    44       6456  Burkholderia cenocepacia AU 1054
                    45       6433  Ralstonia eutropha JMP134
                    46       6399  Yarrowia lipolytica (Candida lipolytica)
                    47       6394  Giardia lamblia ATCC 50803
                    48       6243  Bacillus anthracis
                    49       6180  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
                    50       6124  Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
                    51       5905  Bacillus cereus G9241
                    52       5848  Cryptococcus neoformans var. neoformans JEC21
                    53       5757  Nocardia farcinica
                    54       5701  Burkholderia pseudomallei (Pseudomonas pseudomallei)
                    55       5694  Rhizobium meliloti (Sinorhizobium meliloti)
                    56       5661  Crocosphaera watsonii
                    57       5644  Polaromonas sp. JS666
                    58       5556  Anabaena sp. (strain PCC 7120)
                    59       5507  Bacillus cereus (strain ATCC 10987)
                    60       5474  Gallus gallus (Chicken)
                    61       5429  Trypanosoma brucei
                    62       5421  Bacillus cereus (strain ZK)
                    63       5226  Plasmodium falciparum (isolate 3D7)
                    64       5193  Yersinia pestis
                    65       5183  Helicobacter pylori (Campylobacter pylori)
                    66       5131  Kluyveromyces lactis (Yeast)
                    67       5074  Pseudomonas syringae pv. syringae (strain B728a)
                    68       5055  Photobacterium profundum (Photobacterium sp. (strain SS9))
                    69       5043  Candida glabrata (Yeast) (Torulopsis glabrata)
                    70       5042  Pseudomonas syringae pv. phaseolicola 1448A
                    71       4959  Pseudomonas syringae pv. tomato
                    72       4938  Azotobacter vinelandii AvOP
                    73       4918  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
                    74       4872  Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
                    75       4865  Escherichia coli O157:H7
                    76       4837  Bacillus thuringiensis subsp. konkukian
                    77       4796  Bacillus licheniformis (strain DSM 13 / ATCC 14580)
                    78       4782  Bacillus cereus (strain ATCC 14579 / DSM 31)
                    79       4767  Pseudomonas putida (strain KT2440)
                    80       4747  Streptococcus pneumoniae
                    81       4744  Bacteroides fragilis
                    82       4730  Ralstonia solanacearum (Pseudomonas solanacearum)
                    83       4610  Burkholderia mallei (Pseudomonas mallei)
                    84       4593  Bacteroides thetaiotaomicron
                    85       4583  Rhodopseudomonas palustris
                    86       4563  Xanthomonas oryzae pv. oryzae
                    87       4552  Leptospira interrogans
                    88       4546  Oryza sativa (Rice)
                    89       4533  Frankia sp. CcI3
                    90       4518  Arthrobacter sp. FB24
                    91       4515  Salmonella choleraesuis
                    92       4456  Ashbya gossypii (Yeast) (Eremothecium gossypii)
                    93       4412  Vibrio vulnificus (strain YJ016)
                    94       4390  Vibrio parahaemolyticus
                    95       4381  Azoarcus sp. (strain EbN1)
                    96       4352  Mycobacterium tuberculosis
                    97       4310  Anaeromyxobacter dehalogenans 2CP-C
                    98       4237  Xanthomonas campestris pv. campestris (strain 8004)
                    99       4213  Bacteroides fragilis (strain ATCC 25285 / NCTC 9343)
                    100       4180  Mycobacterium paratuberculosis
                    101       4159  Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
                    102       4155  Dechloromonas aromatica RCB
                    103       4127  Shewanella oneidensis
                    104       4119  Silicibacter pomeroyi
                    105       4116  Gloeobacter violaceus
                    106       4106  Theileria parva
                    107       4099  Pongo pygmaeus (Orangutan)
                    108       4081  Photorhabdus luminescens subsp. laumondii
                    109       4075  Plasmodium falciparum
                    110       4056  Corynebacterium glutamicum (Brevibacterium flavum)
                    111       4051  Chromobacterium violaceum
                    112       4051  Cryptosporidium parvum
                    113       4046  Methanosarcina acetivorans
                    114       4037  Haloarcula marismortui (Halobacterium marismortui)
                    115       4027  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
                    116       4023  Vibrio vulnificus
                    117       4007  Cryptosporidium hominis
                    118       4006  Salmonella typhi
                    
                    
                    3.3  Taxonomic distribution of the sequences
                    
                    Kingdom        sequences (% of the database)
                    Archaea           50509 (  3%)
                    Bacteria         804377 ( 37%)
                    Eukaryota        914970 ( 43%)
                    Viruses          333442 ( 18%)
                    Other              1078 ( <1%)
                    
                    Within Eukaryota:
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  58027 (  6%)           (  3%)
                    Other Mammalia         98381 ( 11%)           (  5%)
                    Other Vertebrata      128012 ( 14%)           (  6%)
                    Viridiplantae         185692 ( 20%)           (  9%)
                    Fungi                  89476 ( 15%)           (  6%)
                    Insecta                89476 ( 10%)           (  4%)
                    Nematoda               36401 (  4%)           (  2%)
                    Other                 183435 ( 20%)           (  9%)
                    
                    
                    
                    4.  SEQUENCE SIZE
                    
                    4.1  Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50   26909             1001-1100    13065
                    51- 100  126492             1101-1200     9306
                    101- 150  159806             1201-1300     6932
                    151- 200  147708             1301-1400     4548
                    201- 250  149510             1401-1500     3797
                    251- 300  138902             1501-1600     2629
                    301- 350  135080             1601-1700     2119
                    351- 400  109439             1701-1800     1745
                    401- 450   86159             1801-1900     1340
                    451- 500   75059             1901-2000     1143
                    501- 550   58434             2001-2100      861
                    551- 600   41682             2101-2200     1010
                    601- 650   32186             2201-2300      823
                    651- 700   24992             2301-2400      645
                    701- 750   21448             2401-2500      481
                    751- 800   18053             >2500         4181
                    801- 850   14994
                    851- 900   13392
                    901- 950   10107
                    951-1000    8026
                    
                    
                    
                    4.2  Longest and shortest sequences
                    
                    The shortest sequence is Q16047_HUMAN:     4 amino acids.
                    The longest sequence is  Q8WZ42_HUMAN: 34350 amino acids.
                    
                    
                    5.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some TrEMBL 
                    lines, as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    2964602              1.41
                    Journal                         1700868   1464012    0.81
                    Submitted to EMBL/GenBank/DDBJ  1219962    912710    0.58
                    Thesis                             4784      4732   <0.01
                    Book citation                      4076      4032   <0.01
                    Submitted to other databases        440       432   <0.01
                    Other                             34472     20641    0.02
                    
                    Comments (CC)                      1056686              0.50
                    CAUTION                          323137    323137    0.15
                    SIMILARITY                       237377    234839    0.11
                    FUNCTION                         131897    117967    0.06
                    SUBCELLULAR LOCATION             108462    108460    0.05
                    CATALYTIC ACTIVITY               105716     91616    0.05
                    SUBUNIT                           65915     65915    0.03
                    COFACTOR                          42117     42117    0.02
                    PATHWAY                           32883     32502    0.02
                    MISCELLANEOUS                      3629      3619   <0.01
                    INTERACTION                        3468      3468   <0.01
                    DOMAIN                             1951      1592   <0.01
                    MASS SPECTROMETRY                   119        63   <0.01
                    ALLERGEN                             15        15   <0.01
                    
                    Features (FT)                      1162978              0.55
                    NON_TER                         1088124    650363    0.52
                    CHAIN                             42871     25628    0.02
                    SIGNAL                            31423     30474    0.01
                    TRANSIT                             560       556   <0.01
                    
                    Cross-references (DR)             14786754              7.02
                    GO                              4294849   1243800    2.04
                    InterPro                        2765243   1431739    1.31
                    EMBL                            2446971   2099223    1.16
                    Pfam                            1748797   1332161    0.83
                    PROSITE                         1016814    646081    0.48
                    PRINTS                           431666    357696    0.21
                    SMART                            352928    268332    0.17
                    HSSP                             286785    286508    0.14
                    SMR                              277524    277496    0.13
                    ProDom                           222934    214109    0.11
                    TIGRFAMs                         205592    190426    0.10
                    PIR                              196746    161117    0.09
                    Ensembl                          117293    117293    0.06
                    TIGR                              91544     85531    0.04
                    Gramene                           58354     58319    0.03
                    PANTHER                           53906     53896    0.03
                    PIRSF                             40276     39473    0.02
                    MGI                               35859     33668    0.02
                    FlyBase                           22134     22084    0.01
                    WormPep                           19260     19178    0.01
                    WormBase                          19250     19178    0.01
                    TAIR                              17779     17718    0.01
                    ZFIN                              10704     10700    0.01
                    MEROPS                             8295      8031   <0.01
                    IntAct                             5715      5715   <0.01
                    LegioList                          5607      5577   <0.01
                    ListiList                          4796      4779   <0.01
                    AGD4416      4416   <0.01
                    PhotoList                          4192      4068   <0.01
                    HGNC                               3538      3538   <0.01
                    PDB2968      1762   <0.01
                    TubercuList                        2557      2551   <0.01
                    RGD2331      2316   <0.01
                    GeneDB_Spombe                      2063      2057   <0.01
                    SagaList                           1796      1702   <0.01
                    SGD1323      1321   <0.01
                    TRANSFAC                            989       977   <0.01
                    Leproma                             982       981   <0.01
                    DictyBase                           979       979   <0.01
                    MypuList                            607       603   <0.01
                    REBASE                              125       120   <0.01
                    PHCI-2DPAGE                         108       108   <0.01
                    ANU-2DPAGE                           70        70   <0.01
                    SWISS-2DPAGE                         63        63   <0.01
                    Reactome                             20        20   <0.01
                    PMMA-2DPAGE                           3         3   <0.01
                    Siena-2DPAGE                          2         2   <0.01
                    COMPLUYEAST-2DPAGE                    1         1   <0.01
                    
                    
                    Number of explicitly cross-referenced databases: 69
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in TrEMBL: 216643
                    
                    Total number of entries encoded on Plastid; Chloroplast: 42172
                    Total number of entries encoded on Mitochondrion: 108892
                    Total number of entries encoded on Plastid; Cyanelle: 7
                    Total number of entries encoded on Plastid; Apicoplast: 142
                    Total number of entries encoded on Plastid; Non-photosynthetic plastid: 198
                    Total number of entries encoded on Plastid: 1833
                    Total number of entries encoded on Plasmid: 37058
                    
                    Number of fragments: 652514
                    
                    
                

Submissions and Updates

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.

Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml

For all queries regarding submissions to UniProtkb and to submit new protein sequence data, please contact:

UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail: datasubs@ebi.ac.uk


Download information

Bi-Weekly releases

The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional splice isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/README.varsplic

Major releases

For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 4 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on CD-ROM from the EBI.


Contact

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address: datalib@ebi.ac.uk / swissprot@ebi.ac.uk
WWW server: http://www.ebi.ac.uk/


SIB Swiss Institute of Bioinformatics
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4
Switzerland

Telephone: (+41 22) 702 50 50
Fax: (+41 22) 702 58 58
Electronic mail address: Swiss-Prot@expasy.org
WWW server: http://www.expasy.org/


Protein Information Resource (PIR)
Georgetown University Medical Center
3900 Reservoir Road, NW
Box 571455
Washington, DC 20057-1455
United States of America

Telephone: (+1 202) 687 1039
Fax: (+1 202) 687 0057)
Electronic mail address: pirmail@georgetown.edu
WWW server: http://pir.georgetown.edu

Citation

If you want to cite UniProt in a publication please use the following reference:

Bairoch A., Apweiler R., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Natale D.A., O'Donovan C., Redaschi N., Yeh L.S., The Universal Protein Resource (UniProt), Nucleic Acids Res. 33: D154-D159 (2005).