Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Release 8.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 50.0 and the UniProtKB/TrEMBL Protein Database release 33.0.

More information on these databases can be found in the user manual What is the UniProt Knowledgebase?.


UniProtKB/Swiss-Prot protein knowledgebase release 50.0 statistics

Release 50.0 of 30-May-2006 of UniProtKB/Swiss-Prot contains 222'289 sequence entries, comprising 81'585'146 amino acids abstracted from 142'438 references.

The growth of the database is summarized below.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154
44.0 07/04 153'871 56'608'159
45.0 10/04 163'235 59'631'787
46.0 02/05 168'297 61'443'278
47.0 05/05 181'577 65'746'672
48.0 09/05 194'317 70'391'852
49.0 02/06 207'132 75'438'310
50.0 05/06 222'289 81'585'146

In rare cases, Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.


Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

  • be as complete as possible. All sequences available at a given time should be immediately included in UniProtKB/Swiss-Prot. This also includes sequence corrections and updates;
  • provide a higher level of annotation;
  • provide cross-references to specialized database(s) that contain, among other data, some information about the genes that code for these proteins;
  • provide specific indexes and documents.

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana TAIR arath.txt 4'155
C.albicans None yet calbican.txt 534
C.elegans Wormpep celegans.txt 2'850
D.discoideum DictyBase dicty.txt 325
D.melanogaster FlyBase fly.txt 2'382
M.musculus MGD mgdtosp.txt 11'030
S.cerevisiae SGD yeast.txt 5'427
S.pombe GeneDB_SPombe pombe.txt 3'005

UniProtKB/Swiss-Prot release statistics
                    
                    
                    1.  INTRODUCTION
                    
                    Release 50.0 of 30-May-2006 of UniProtKB/Swiss-Prot contains 222289 sequence entries,
                    comprising 81585146 amino acids abstracted from 142438 references.
                    
                    15220 sequences have been added since release 49.0, the sequence data of
                    953 existing entries has been updated and the annotations of
                    190604 entries have been revised. This represents an increase of 8%.
                    
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 7.87   Gln (Q) 3.95   Leu (L) 9.64   Ser (S) 6.83
                    Arg (R) 5.39   Glu (E) 6.67   Lys (K) 5.93   Thr (T) 5.41
                    Asn (N) 4.15   Gly (G) 6.95   Met (M) 2.37   Trp (W) 1.14
                    Asp (D) 5.35   His (H) 2.29   Phe (F) 3.97   Tyr (Y) 3.04
                    Cys (C) 1.50   Ile (I) 5.91   Pro (P) 4.81   Val (V) 6.73
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00
                    
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Thr, Arg, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/Swiss-Prot: 9879
                    
                    The first twenty species represent 71712 sequences:  32.3 % of the total
                    number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x: 4698
                    2x: 1554
                    3x:  759
                    4x:  490
                    5x:  323
                    6x:  292
                    7x:  198
                    8x:  159
                    9x:  140
                    10x:   77
                    11- 20x:  421
                    21- 50x:  311
                    51-100x:  131
                    >100x:  326
                    
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1      14036  Homo sapiens (Human)
                    2      11030  Mus musculus (Mouse)
                    3       5427  Saccharomyces cerevisiae (Baker's yeast)
                    4       5111  Rattus norvegicus (Rat)
                    5       4850  Escherichia coli
                    6       4155  Arabidopsis thaliana (Mouse-ear cress)
                    7       3005  Schizosaccharomyces pombe (Fission yeast)
                    8       2850  Caenorhabditis elegans
                    9       2835  Bacillus subtilis
                    10       2382  Drosophila melanogaster (Fruit fly)
                    11       2054  Bos taurus (Bovine)
                    12       1816  Escherichia coli O157:H7
                    13       1782  Methanococcus jannaschii
                    14       1774  Haemophilus influenzae
                    15       1571  Salmonella typhimurium
                    16       1495  Escherichia coli O6
                    17       1468  Shigella flexneri
                    18       1407  Mycobacterium tuberculosis
                    19       1403  Gallus gallus (Chicken)
                    20       1261  Xenopus laevis (African clawed frog)
                    21       1164  Salmonella typhi
                    22       1147  Mycobacterium bovis
                    23       1106  Pongo pygmaeus (Orangutan)
                    24       1071  Pseudomonas aeruginosa
                    25       1065  Sus scrofa (Pig)
                    26        969  Synechocystis sp. (strain PCC 6803)
                    27        969  Archaeoglobus fulgidus
                    28        858  Yersinia pestis
                    29        857  Vibrio cholerae
                    30        845  Rhizobium meliloti (Sinorhizobium meliloti)
                    31        806  Oryza sativa (Rice)
                    32        792  Oryctolagus cuniculus (Rabbit)
                    33        750  Aquifex aeolicus
                    34        721  Brachydanio rerio (Zebrafish) (Danio rerio)
                    35        718  Pasteurella multocida
                    36        700  Vibrio parahaemolyticus
                    37        687  Mycoplasma pneumoniae
                    38        681  Staphylococcus aureus (strain Mu50 / ATCC 700699)
                    39        679  Staphylococcus aureus (strain N315)
                    40        666  Streptomyces coelicolor
                    41        663  Staphylococcus aureus (strain MW2)
                    42        661  Staphylococcus aureus (strain COL)
                    43        660  Staphylococcus aureus (strain MRSA252)
                    44        659  Staphylococcus aureus (strain MSSA476)
                    45        652  Bacillus halodurans
                    46        643  Vibrio vulnificus
                    47        641  Canis familiaris (Dog)
                    48        627  Mycobacterium leprae
                    49        623  Vibrio vulnificus (strain YJ016)
                    50        608  Treponema pallidum
                    51        605  Anabaena sp. (strain PCC 7120)
                    52        587  Methanobacterium thermoautotrophicum
                    53        577  Pseudomonas putida (strain KT2440)
                    54        576  Pseudomonas syringae pv. tomato
                    55        572  Buchnera aphidicola subsp. Acyrthosiphon pisum
                    56        570  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
                    57        570  Staphylococcus epidermidis (strain ATCC 12228)
                    58        569  Bacillus anthracis
                    59        567  Helicobacter pylori (Campylobacter pylori)
                    60        562  Buchnera aphidicola subsp. Schizaphis graminum
                    61        555  Photorhabdus luminescens subsp. laumondii
                    62        554  Bradyrhizobium japonicum
                    63        550  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
                    64        550  Neurospora crassa
                    65        548  Helicobacter pylori J99 (Campylobacter pylori J99)
                    66        548  Rickettsia prowazekii
                    67        541  Ralstonia solanacearum (Pseudomonas solanacearum)
                    68        540  Lactococcus lactis subsp. lactis (Streptococcus lactis)
                    69        537  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    70        535  Zea mays (Maize)
                    71        534  Candida albicans (Yeast)
                    72        533  Listeria monocytogenes
                    73        529  Rhizobium loti (Mesorhizobium loti)
                    74        527  Listeria innocua
                    75        523  Yersinia pseudotuberculosis
                    76        518  Xanthomonas campestris pv. campestris
                    77        512  Pan troglodytes (Chimpanzee)
                    78        512  Neisseria meningitidis serogroup A
                    79        511  Neisseria meningitidis serogroup B
                    80        510  Ashbya gossypii (Yeast) (Eremothecium gossypii)
                    81        507  Buchnera aphidicola subsp. Baizongia pistaciae
                    82        505  Shewanella oneidensis
                    83        502  Clostridium acetobutylicum
                    84        497  Bacillus cereus (strain ATCC 14579 / DSM 31)
                    85        496  Caulobacter crescentus (Caulobacter vibrioides)
                    86        495  Kluyveromyces lactis (Yeast) (Candida sphaerica)
                    87        483  Mycoplasma genitalium
                    88        480  Xanthomonas axonopodis pv. citri
                    89        474  Thermotoga maritima
                    90        474  Streptococcus pneumoniae
                    91        464  Xylella fastidiosa
                    92        463  Listeria monocytogenes serotype 4b (strain F2365)
                    93        462  Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
                    94        455  Deinococcus radiodurans
                    95        455  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
                    96        449  Haemophilus ducreyi
                    97        448  Brucella melitensis
                    98        448  Oceanobacillus iheyensis
                    99        448  Brucella suis
                    100        441  Pyrococcus horikoshii
                    101        441  Candida glabrata (Yeast) (Torulopsis glabrata)
                    102        440  Mimivirus
                    103        440  Corynebacterium glutamicum (Brevibacterium flavum)
                    104        439  Methanosarcina acetivorans
                    105        438  Clostridium perfringens
                    106        436  Pyrococcus abyssi
                    107        434  Halobacterium salinarium (Halobacterium halobium)
                    108        433  Chlamydia trachomatis
                    109        422  Salmonella paratyphi-a
                    110        420  Borrelia burgdorferi (Lyme disease spirochete)
                    111        420  Methanosarcina mazei (Methanosarcina frisia)
                    112        416  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
                    113        414  Chlamydia pneumoniae (Chlamydophila pneumoniae)
                    114        411  Pyrococcus furiosus
                    115        410  Nicotiana tabacum (Common tobacco)
                    116        409  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
                    117        407  Thermoanaerobacter tengcongensis
                    118        404  Chlamydia muridarum
                    119        404  Rhizobium sp. (strain NGR234)
                    120        403  Lactobacillus plantarum
                    121        401  Chromobacterium violaceum
                    122        399  Bordetella pertussis
                    123        398  Campylobacter jejuni
                    124        397  Bordetella parapertussis
                    125        397  Ovis aries (Sheep)
                    126        397  Synechococcus elongatus (Thermosynechococcus elongatus)
                    127        395  Streptococcus mutans
                    128        394  Enterococcus faecalis (Streptococcus faecalis)
                    129        391  Sulfolobus solfataricus
                    130        390  Photobacterium profundum (Photobacterium sp. (strain SS9))
                    131        385  Streptococcus pyogenes serotype M1
                    132        384  Streptomyces avermitilis
                    133        383  Streptococcus pyogenes serotype M6
                    134        383  Bacillus cereus (strain ATCC 10987)
                    135        380  Streptococcus pyogenes serotype M18
                    136        379  Streptococcus pyogenes serotype M3
                    137        378  Staphylococcus aureus
                    138        375  Emericella nidulans (Aspergillus nidulans)
                    139        375  Rickettsia conorii
                    140        360  Chlorobium tepidum
                    141        356  Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
                    142        355  Corynebacterium efficiens
                    143        350  Yarrowia lipolytica (Candida lipolytica)
                    144        344  Aeropyrum pernix
                    145        344  Nitrosomonas europaea
                    146        342  Methanopyrus kandleri
                    147        341  Bacillus thuringiensis subsp. konkukian
                    148        340  Leptospira interrogans
                    149        338  Rhodopseudomonas palustris
                    150        338  Pisum sativum (Garden pea)
                    151        330  Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
                    152        328  Gloeobacter violaceus
                    153        326  Streptococcus agalactiae serotype III
                    154        325  Dictyostelium discoideum (Slime mold)
                    155        324  Streptococcus agalactiae serotype V
                    156        324  Acinetobacter sp. (strain ADP1)
                    157        322  Sulfolobus tokodaii
                    158        320  Salmonella choleraesuis
                    159        318  Lycopersicon esculentum (Tomato)
                    160        317  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
                    161        316  Synechococcus sp. (strain WH8102)
                    162        309  Prochlorococcus marinus (strain MIT 9313)
                    163        309  Prochlorococcus marinus
                    164        306  Thermoplasma acidophilum
                    165        305  Burkholderia pseudomallei (Pseudomonas pseudomallei)
                    166        304  Rhodopirellula baltica
                    167        303  Bacillus cereus (strain ZK / E33L)
                    168        298  Bacillus clausii (strain KSM-K16)
                    169        298  Fusobacterium nucleatum subsp. nucleatum
                    170        295  Mannheimia succiniciproducens (strain MBEL55E)
                    171        294  Triticum aestivum (Wheat)
                    172        290  Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
                    173        287  Mycobacterium paratuberculosis
                    174        287  Coxiella burnetii
                    175        287  Macaca mulatta (Rhesus macaque)
                    176        283  Burkholderia mallei (Pseudomonas mallei)
                    177        283  Glycine max (Soybean)
                    178        282  Sulfolobus acidocaldarius
                    179        281  Methylococcus capsulatus
                    180        281  Pseudomonas putida
                    181        280  Solanum tuberosum (Potato)
                    182        276  Hordeum vulgare (Barley)
                    183        275  Geobacter sulfurreducens
                    184        275  Bacteroides thetaiotaomicron
                    185        274  Cavia porcellus (Guinea pig)
                    186        274  Pyrobaculum aerophilum
                    187        274  Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
                    188        273  Bacillus licheniformis (strain DSM 13 / ATCC 14580)
                    189        273  Clostridium tetani
                    190        273  Vibrio fischeri (strain ATCC 700601 / ES114)
                    191        271  Wolinella succinogenes
                    192        270  Thermoplasma volcanium
                    193        270  Synechococcus sp. (strain PCC 6301) (Anacystis nidulans)
                    194        268  Bacteriophage T4
                    195        266  Geobacillus kaustophilus
                    196        261  Staphylococcus haemolyticus (strain JCSC1435)
                    197        261  Rhodobacter capsulatus (Rhodopseudomonas capsulata)
                    198        259  Corynebacterium diphtheriae
                    199        257  Nocardia farcinica
                    200        257  Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
                    201        255  Zymomonas mobilis
                    202        254  Vaccinia virus (strain Copenhagen) (VACV)
                    203        252  Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
                    204        251  Idiomarina loihiensis
                    205        251  Staphylococcus saprophyticus subsp. saprophyticus
                    206        250  Wigglesworthia glossinidia brevipalpis
                    207        247  Spinacia oleracea (Spinach)
                    208        246  Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
                    209        245  Bifidobacterium longum
                    210        244  Helicobacter hepaticus
                    211        242  Equus caballus (Horse)
                    212        237  Symbiobacterium thermophilum
                    213        237  Porphyromonas gingivalis (Bacteroides gingivalis)
                    214        236  Chlamydophila caviae
                    215        235  Bacillus stearothermophilus (Geobacillus stearothermophilus)
                    216        235  Haloarcula marismortui (Halobacterium marismortui)
                    217        235  Methanococcus maripaludis
                    218        234  Aspergillus fumigatus (Sartorya fumigata)
                    219        230  Leifsonia xyli subsp. xyli
                    220        224  Blochmannia floridanus
                    221        223  Legionella pneumophila subsp. pneumophila
                    222        220  Porphyra purpurea
                    223        220  Silicibacter pomeroyi
                    224        219  Legionella pneumophila (strain Paris)
                    225        217  Azoarcus sp. (strain EbN1)
                    226        217  Legionella pneumophila (strain Lens)
                    227        217  Chlamydomonas reinhardtii
                    228        214  Lactobacillus johnsonii
                    229        213  Bacteroides fragilis
                    230        211  Klebsiella pneumoniae
                    231        211  Synechococcus sp. (strain PCC 7942) (Anacystis nidulans R2)
                    232        210  Brucella abortus
                    233        207  Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
                    234        207  Xanthomonas oryzae pv. oryzae
                    235        203  Campylobacter jejuni (strain RM1221)
                    236        202  Bartonella henselae (Rochalimaea henselae)
                    237        200  Cricetulus griseus (Chinese hamster)
                    238        200  Vaccinia virus (strain Western Reserve / WR) (VACV)
                    239        200  Propionibacterium acnes
                    
                    
                    
                    3.3  Taxonomic distribution of the sequences
                    
                    Kingdom        sequences (% of the database)
                    Archaea           10342 (  5%)
                    Bacteria         106509 ( 48%)
                    Eukaryota         95502 ( 43%)
                    Viruses            9936 (  4%)
                    
                    
                    Within Eukaryota:
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  14037 ( 15%)           (  6%)
                    Other Mammalia         28693 ( 30%)           ( 13%)
                    Other Vertebrata        8503 (  9%)           (  4%)
                    Viridiplantae          15348 ( 16%)           (  7%)
                    Fungi                  14864 ( 16%)           (  7%)
                    Insecta                 4576 (  5%)           (  2%)
                    Nematoda                3226 (  3%)           (  1%)
                    Other                   6255 (  7%)           (  3%)
                    
                    
                    4.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50    4291             1001-1100     1929
                    51- 100   15692             1101-1200     1293
                    101- 150   22826             1201-1300     1026
                    151- 200   21659             1301-1400      836
                    201- 250   22185             1401-1500      664
                    251- 300   19011             1501-1600      356
                    301- 350   19688             1601-1700      253
                    351- 400   17719             1701-1800      205
                    401- 450   14245             1801-1900      203
                    451- 500   11517             1901-2000      165
                    501- 550    8978             2001-2100      108
                    551- 600    6102             2101-2200      161
                    601- 650    5221             2201-2300      139
                    651- 700    3659             2301-2400      101
                    701- 750    2997             2401-2500       83
                    751- 800    2488             >2500          577
                    801- 850    2104
                    851- 900    2325
                    901- 950    1709
                    951-1000    1340
                    
                    
                    The average sequence length in UniProtKB/Swiss-Prot is 367 amino acids.
                    
                    The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
                    The longest sequence is   DIG1_CAEEL (Q09165): 13100 amino acids.
                    
                    
                    5.  JOURNAL CITATIONS
                    
                    Note: the following citation statistics reflect the number of distinct
                    journal citations.
                    
                    Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1700
                    
                    
                    5.1 Table of the frequency of journal citations
                    
                    Journals cited 1x:  594
                    2x:  234
                    3x:  127
                    4x:   80
                    5x:   65
                    6x:   45
                    7x:   28
                    8x:   38
                    9x:   25
                    10x:   13
                    11- 20x:  120
                    21- 50x:  146
                    51-100x:   64
                    >100x:  121
                    
                    
                    5.2  List of the most cited journals in UniProtKB/Swiss-Prot
                    
                    Nb    Citations   Journal name
                    --    ---------   -------------------------------------------------------------
                    1        13585   Journal of Biological Chemistry
                    2         6599   Proceedings of the National Academy of Sciences of the U.S.A.
                    3         4337   Journal of Bacteriology
                    4         4050   Gene
                    5         3971   Nucleic Acids Research
                    6         3535   Biochemical and Biophysical Research Communications
                    7         3391   FEBS Letters
                    8         3102   Biochemistry
                    9         3000   The EMBO Journal
                    10         2874   European Journal of Biochemistry
                    11         2663   Nature
                    12         2536   Biochimica et Biophysica Acta
                    13         2399   Molecular and Cellular Biology
                    14         2350   Journal of Molecular Biology
                    15         2199   Genomics
                    16         2127   Cell
                    17         1723   Biochemical Journal
                    18         1609   Science
                    19         1410   Molecular Microbiology
                    20         1292   Plant Molecular Biology
                    21         1263   Molecular and General Genetics
                    22         1122   Journal of Cell Biology
                    23         1048   Journal of Virology
                    24         1044   Virology
                    25         1043   Journal of Biochemistry
                    26         1025   Human Molecular Genetics
                    27          976   Nature Genetics
                    28          953   Genes and Development
                    29          855   Oncogene
                    30          854   Plant Physiology
                    31          845   The American Journal of Human Genetics
                    32          781   Human Mutation
                    33          726   Journal of Immunology
                    34          707   Infection and Immunity
                    35          686   Structure
                    36          681   Development
                    37          660   Archives of Biochemistry and Biophysics
                    38          659   Yeast
                    39          656   Genetics
                    40          631   Journal of General Virology
                    41          588   Microbiology
                    42          538   FEMS Microbiology Letters
                    43          531   Nature Structural Biology
                    44          519   Molecular Biology of the Cell
                    45          512   Blood
                    46          485   The Plant Cell
                    47          477   Human Genetics
                    48          464   Current Genetics
                    49          438   Cancer Research
                    50          424   Journal of Cell Science
                    51          418   Applied and Environmental Microbiology
                    52          418   Molecular Cell
                    53          409   Developmental Biology
                    54          405   Journal of Clinical Investigation
                    55          398   Molecular and Biochemical Parasitology
                    56          397   The Plant Journal
                    57          395   Protein Science
                    58          393   Mechanisms of Development
                    59          388   Mammalian Genome
                    60          384   Acta Crystallographica, Section D
                    61          383   Neuron
                    62          373   Molecular Endocrinology
                    63          360   The Journal of Experimental Medicine
                    64          356   Immunogenetics
                    65          350   Journal of Neuroscience
                    66          334   Journal of Molecular Evolution
                    67          332   Current Biology
                    68          328   Endocrinology
                    69          325   DNA and Cell Biology
                    70          307   Journal of Neurochemistry
                    71          290   DNA Sequence
                    72          286   The Journal of Clinical Endocrinology and Metabolism
                    73          285   Biological Chemistry Hoppe-Seyler
                    74          277   American Journal of Physiology
                    75          273   Molecular Biology and Evolution
                    76          267   Toxicon
                    77          265   Bioscience, Biotechnology, and Biochemistry
                    78          260   Brain Research. Molecular Brain Research
                    79          244   Cytogenetics and Cell Genetics
                    80          241   Journal of General Microbiology
                    81          222   Comparative Biochemistry and Physiology
                    82          222   Proteins
                    83          214   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
                    84          209   Antimicrobial Agents and Chemotherapy
                    85          203   Molecular Pharmacology
                    86          194   Journal of Medical Genetics
                    87          192   Peptides
                    88          189   Journal of Investigative Dermatology
                    89          175   DNA Research
                    90          174   Plant and Cell Physiology
                    91          171   Biology of Reproduction
                    92          167   Molecular Plant-Microbe Interactions
                    93          166   Genome Research
                    94          165   Virus Research
                    95          158   European Journal of Immunology
                    96          158   DNA
                    97          153   Tissue Antigens
                    98          151   Biochimie
                    99          146   Nature Cell Biology
                    100          145   Hemoglobin
                    101          144   Molecular and Cellular Endocrinology
                    102          142   Experimental Cell Research
                    103          140   Bioorganicheskaia Khimiia
                    104          140   American Journal of Medical Genetics
                    105          139   RNA
                    106          130   Archives of Microbiology
                    107          130   Annals of Neurology
                    108          127   Neurology
                    109          126   European Journal of Human Genetics
                    110          125   Insect Biochemistry and Molecular Biology
                    111          125   Molecular Phylogenetics and Evolution
                    112          119   Journal of Human Genetics
                    113          118   Agricultural and Biological Chemistry
                    114          118   Immunity
                    115          114   General and Comparative Endocrinology
                    116          113   Developmental Dynamics
                    117          111   Diabetes
                    118          110   Planta
                    119          108   Molecular Immunology
                    120          108   Genes to Cells
                    121          107   Molecular Reproduction and Development
                    122          100   Journal of Protein Chemistry
                    
                    
                    6.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                     435045              1.96
                    Journal                          383401    205296    1.72
                    Submitted to EMBL/GenBank/DDBJ    48324     41484    0.22
                    Submitted to Swiss-Prot             731       728   <0.01
                    Unpublished observations            591       586   <0.01
                    Book citation                       550       538   <0.01
                    Plant Gene Register                 524       512   <0.01
                    Submitted to other databases        451       442   <0.01
                    Thesis                              335       333   <0.01
                    Patent                              132       130   <0.01
                    Worm Breeder's Gazette                6         6   <0.01
                    
                    Comments (CC)                       881518              3.97
                    SIMILARITY                       248995    201780    1.12
                    FUNCTION                         153738    149327    0.69
                    SUBCELLULAR LOCATION             119386    119386    0.54
                    CATALYTIC ACTIVITY                83382     77265    0.38
                    SUBUNIT                           79044     79044    0.36
                    PATHWAY                           43828     37904    0.20
                    COFACTOR                          33098     29781    0.15
                    TISSUE SPECIFICITY                22576     22576    0.10
                    MISCELLANEOUS                     18666     16988    0.08
                    PTM                               16949     14335    0.08
                    DOMAIN                            12426     10848    0.06
                    ALTERNATIVE PRODUCTS               9183      9183    0.04
                    CAUTION                            8458      7515    0.04
                    INDUCTION                          6256      6256    0.03
                    DEVELOPMENTAL STAGE                5437      5437    0.02
                    INTERACTION                        5241      5241    0.02
                    DISEASE                            3333      2421    0.01
                    ENZYME REGULATION                  3297      3297    0.01
                    WEB RESOURCE                       2650      2198    0.01
                    MASS SPECTROMETRY                  2284      1924    0.01
                    BIOPHYSICOCHEMICAL PROPERTIES      1415      1415    0.01
                    POLYMORPHISM                        543       531   <0.01
                    RNA EDITING                         446       446   <0.01
                    ALLERGEN                            406       406   <0.01
                    TOXIC DOSE                          294       293   <0.01
                    BIOTECHNOLOGY                       125       125   <0.01
                    PHARMACEUTICAL                       62        62   <0.01
                    
                    Features (FT)                      1641920              7.39
                    CHAIN                            225649    219050    1.02
                    STRAND                           177477      8416    0.80
                    TRANSMEM                         141710     30894    0.64
                    TURN                             108207      8399    0.49
                    METAL                             93856     23135    0.42
                    CONFLICT                          79765     27675    0.36
                    HELIX                             75313      8160    0.34
                    TOPO_DOM                          72874     14864    0.33
                    DOMAIN                            70126     38162    0.32
                    CARBOHYD                          67286     16964    0.30
                    DISULFID                          67250     17444    0.30
                    ACT_SITE                          52788     30894    0.24
                    REPEAT                            48040      7024    0.22
                    VARIANT                           38965      7678    0.18
                    BINDING                           38428     17101    0.17
                    MOD_RES                           35039     16388    0.16
                    NP_BIND                           32065     22851    0.14
                    REGION                            30909     16320    0.14
                    SIGNAL                            21745     21735    0.10
                    COMPBIAS                          21676     12148    0.10
                    VAR_SEQ                           19563      8570    0.09
                    MUTAGEN                           16181      4020    0.07
                    ZN_FING                           15853      6166    0.07
                    MOTIF                             15342     10251    0.07
                    SITE                              13020      7384    0.06
                    NON_TER                           10833      8283    0.05
                    INIT_MET                           9526      9526    0.04
                    COILED                             7259      4591    0.03
                    PROPEP                             6979      5839    0.03
                    LIPID                              6389      4148    0.03
                    DNA_BIND                           6217      5804    0.03
                    PEPTIDE                            6007      3672    0.03
                    TRANSIT                            3809      3774    0.02
                    CA_BIND                            2453       996    0.01
                    CROSSLNK                           1533      1070    0.01
                    NON_CONS                           1128       523    0.01
                    UNSURE                              433       175   <0.01
                    SE_CYS                              227       160   <0.01
                    
                    Cross-references (DR)              2651245             11.93
                    InterPro                         533792    205926    2.40
                    EMBL                             418849    214245    1.88
                    Pfam                             283198    199672    1.27
                    PROSITE                          206361    126572    0.93
                    GenomeReviews                    123803    111797    0.56
                    GO                               107621     27601    0.48
                    PIR                               95678     89375    0.43
                    TIGRFAMs                          86567     80907    0.39
                    PRINTS                            84536     66061    0.38
                    HAMAP                             82213     82096    0.37
                    HSSP                              77318     77318    0.35
                    BioCyc                            69625     64431    0.31
                    SMART                             63133     47858    0.28
                    ProDom                            58601     56498    0.26
                    UniGene                           49298     45228    0.22
                    Ensembl                           39717     39707    0.18
                    PDB                               34828      9530    0.16
                    PANTHER                           34556     34344    0.16
                    SMR                               29394     29394    0.13
                    TIGR                              21483     20908    0.10
                    PIRSF                             18539     18286    0.08
                    LinkHub                           15918     15912    0.07
                    HGNC                              13439     13381    0.06
                    MIM                               11904      9726    0.05
                    MGI                               10866     10825    0.05
                    IntAct                             8573      8573    0.04
                    SGD5486      5419    0.02
                    MEROPS                             5139      4836    0.02
                    GermOnline                         4925      4879    0.02
                    RGD4818      4815    0.02
                    EcoGene                            4229      4226    0.02
                    TAIR                               4198      4125    0.02
                    EchoBASE                           4159      4127    0.02
                    H-InvDB                            3677      3659    0.02
                    WormPep                            3337      2847    0.02
                    WormBase                           3040      2962    0.01
                    GeneDB_Spombe                      3038      3003    0.01
                    FlyBase                            3020      2973    0.01
                    TRANSFAC                           2834      2542    0.01
                    SubtiList                          2776      2775    0.01
                    Gramene                            2449      2449    0.01
                    StyGene                            1527      1523    0.01
                    GeneFarm                           1463      1449    0.01
                    TubercuList                        1435      1399    0.01
                    SWISS-2DPAGE                       1169      1169    0.01
                    ListiList                          1061      1053   <0.01
                    Reactome                           1001      1001   <0.01
                    ZFIN709       702   <0.01
                    Leproma                             630       627   <0.01
                    PhotoList                           555       555   <0.01
                    AGD516       510   <0.01
                    LegioList                           436       436   <0.01
                    MaizeDB                             435       430   <0.01
                    OGP372       372   <0.01
                    HIV370       365   <0.01
                    REBASE                              352       348   <0.01
                    ECO2DBASE                           351       299   <0.01
                    SagaList                            327       326   <0.01
                    DictyBase                           326       324   <0.01
                    GlycoSuiteDB                        282       282   <0.01
                    PHCI-2DPAGE                         241       241   <0.01
                    MypuList                            187       187   <0.01
                    Aarhus/Ghent-2DPAGE                 128        98   <0.01
                    Siena-2DPAGE                        103       103   <0.01
                    HSC-2DPAGE                           85        85   <0.01
                    PhosSite                             62        62   <0.01
                    COMPLUYEAST-2DPAGE                   59        59   <0.01
                    PMMA-2DPAGE                          52        52   <0.01
                    Rat-heart-2DPAGE                     28        28   <0.01
                    PptaseDB                             28        28   <0.01
                    ANU-2DPAGE                           20        20   <0.01
                    
                    Number of explicitly cross-referenced databases: 72
                    Number of implicitly cross-referenced databases: 27
                    
                    
                    7.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in UniProtKB/Swiss-Prot: 222786
                    
                    Total number of entries encoded on a Mitochondrion: 3464
                    Total number of entries encoded on a Plasmid: 3104
                    Total number of entries encoded on a Plastid: 22
                    Total number of entries encoded on a Plastid; Apicoplast: 6
                    Total number of entries encoded on a Plastid; Chloroplast: 5495
                    Total number of entries encoded on a Plastid; Cyanelle: 145
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 88
                    
                    Number of fragments: 8434
                    Number of additional sequences produced by alternative splicing, initiation or promoter usage: 14886 
                    
                

UniProtKB/TrEMBL protein database release 33.0 statistics

                    
                    1.  INTRODUCTION
                    
                    Release 33.0 of 30-May-2006 of UniProtKB/TrEMBL has been produced in synch
                    with UniProtKB/Swiss-Prot release 50 and EMBL/DDBJ/GenBank nucleotide sequence
                    database release 86 and updates until the 19-May-2006. It contains
                    2'948'323 sequence entries comprising 953'383'047 amino acids.
                    
                    
                    In the document delac_tr.txt, you will find a list of all accession numbers
                    which were previously present in UniProtKB/TrEMBL, but which have now been
                    deleted from the database. Most deletions are due to the deletion of the
                    corresponding CDS in the source nucleotide sequence databases EMBL-
                    Bank/DDBJ/GenBank. In addition, some entries are recognised to be Open
                    Reading frames (ORFs) that have been wrongly predicted to code for proteins.
                    When there is enough evidence that these hypothetical proteins are not real,
                    we take the decision to remove them from UniProtKB/TrEMBL. 
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 8.18   Gln (Q) 3.96   Leu (L) 9.81   Ser (S) 6.95
                    Arg (R) 5.46   Glu (E) 6.05   Lys (K) 5.35   Thr (T) 5.64
                    Asn (N) 4.38   Gly (G) 6.96   Met (M) 2.39   Trp (W) 1.33
                    Asp (D) 5.19   His (H) 2.24   Phe (F) 4.08   Tyr (Y) 3.06
                    Cys (C) 1.41   Ile (I) 5.99   Pro (P) 4.86   Val (V) 6.56
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Lys, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of
                    UniProtKB/TrEMBL: 110673
                    
                    The first twenty species represent 637379 sequences: 21.6 % of the
                    total number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:51889
                    2x:20887
                    3x:10829
                    4x: 5968
                    5x: 3323
                    6x: 2571
                    7x: 1805
                    8x: 1526
                    9x: 1170
                    10x: 1170
                    11- 20x: 5022
                    21- 50x: 2285
                    51-100x:  919
                    >100x: 1309
                    
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     154386  Human immunodeficiency virus 1
                    2      56935  Oryza sativa (japonica cultivar-group)
                    3      56171  Homo sapiens (Human)
                    4      49899  Mus musculus (Mouse)
                    5      42469  Arabidopsis thaliana (Mouse-ear cress)
                    6      29291  Hepatitis C virus
                    7      28028  Tetraodon nigroviridis (Green puffer)
                    8      27315  Tetrahymena thermophila SB210
                    9      25569  Drosophila melanogaster (Fruit fly)
                    10      20450  Caenorhabditis elegans
                    11      20133  Trypanosoma cruzi
                    12      17666  Medicago truncatula (Barrel medic)
                    13      16045  Brachydanio rerio (Zebrafish) (Danio rerio)
                    14      15087  Anopheles gambiae str. PEST
                    15      14666  Plasmodium chabaudi
                    16      13118  Caenorhabditis briggsae
                    17      13099  Dictyostelium discoideum AX4
                    18      12739  uncultured bacterium
                    19      12233  Xenopus laevis (African clawed frog)
                    20      12080  Aspergillus oryzae
                    21      11866  Hepatitis B virus (HBV)
                    22      11760  Plasmodium berghei
                    23      11690  Gibberella zeae (Fusarium graminearum)
                    24      11034  Chaetomium globosum CBS 148.51
                    25      10814  Neurospora crassa
                    26      10092  Drosophila pseudoobscura (Fruit fly)
                    27      10084  Aspergillus fumigatus (Sartorya fumigata)
                    28       9695  Schistosoma japonicum (Blood fluke)
                    29       9695  Rattus norvegicus (Rat)
                    30       9447  Trypanosoma brucei
                    31       9343  Aspergillus nidulans FGSC A4
                    32       9090  Entamoeba histolytica HM-1:IMSS
                    33       8978  Candida albicans SC5314
                    34       8182  Escherichia coli
                    35       8102  Bradyrhizobium japonicum
                    36       8063  Solibacter usitatus Ellin6076
                    37       7937  Frankia sp. EAN1pec
                    38       7796  Plasmodium yoelii yoelii
                    39       7663  Burkholderia vietnamiensis G4
                    40       7598  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    41       7543  Streptomyces coelicolor
                    42       7538  Bos taurus (Bovine)
                    43       7432  Bradyrhizobium sp. BTAi1
                    44       7325  Streptomyces avermitilis
                    45       7152  Rhizobium loti (Mesorhizobium loti)
                    46       7128  Rhizobium leguminosarum bv. viciae 3841
                    47       7097  Leishmania major
                    48       7049  Burkholderia cenocepacia HI2424
                    49       6967  Rhodopirellula baltica
                    50       6966  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    51       6717  Hahella chejuensis (strain KCTC 2396)
                    52       6688  Pseudomonas aeruginosa
                    53       6679  Psychroflexus torquis ATCC 700755
                    54       6526  Burkholderia ambifaria AMMD
                    55       6456  Burkholderia cenocepacia AU 1054
                    56       6445  Cryptococcus neoformans (Filobasidiella neoformans)
                    57       6415  Cryptococcus neoformans var. neoformans B-3501A
                    58       6410  Ustilago maydis 521
                    59       6394  Giardia lamblia ATCC 50803
                    60       6299  Ralstonia metallidurans (strain CH34)
                    61       6295  Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
                    62       6256  Yarrowia lipolytica (Candida lipolytica)
                    63       6221  Burkholderia pseudomallei (strain 1710b)
                    64       6219  Bacillus anthracis
                    65       6129  Bacillus thuringiensis serovar israelensis ATCC 35646
                    66       6028  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
                    67       5973  Mycobacterium vanbaalenii PYR-1
                    68       5959  Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
                    69       5906  Rhizobium etli (strain CFN 42 / ATCC 51251)
                    70       5904  Bacillus cereus G9241
                    71       5862  Rhizobium meliloti (Sinorhizobium meliloti)
                    72       5852  Mycobacterium sp. KMS
                    73       5689  Bacillus sp. NRRL B-14911
                    74       5687  Mycobacterium sp. JLS
                    75       5683  Nocardia farcinica
                    76       5667  Crocosphaera watsonii
                    77       5646  Polaromonas sp. JS666
                    78       5632  Burkholderia pseudomallei (Pseudomonas pseudomallei)
                    79       5605  Pseudomonas fluorescens (strain PfO-1)
                    80       5604  Gallus gallus (Chicken)
                    81       5589  Plasmodium falciparum
                    82       5580  Mycobacterium sp. MCS
                    83       5575  Chimpanzee immunodeficiency virus (SIV-cpz) 
                    84       5541  Anabaena sp. (strain PCC 7120)
                    85       5538  Photobacterium profundum 3TCK
                    86       5525  Burkholderia thailandensis (strain E264 / ATCC 700388 / DSM 13276 / CIP 106301)
                    87       5523  Bacillus weihenstephanensis KBAB4
                    88       5513  Mycobacterium flavescens PYR-GCK
                    89       5508  Anabaena variabilis (strain ATCC 29413 / PCC 7937)
                    90       5451  Bacillus cereus (strain ATCC 10987)
                    91       5335  Bacillus cereus (strain ZK / E33L)
                    92       5307  Helicobacter pylori (Campylobacter pylori)
                    93       5245  Pseudomonas putida F1
                    94       5236  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    95       5209  Plasmodium falciparum (isolate 3D7)
                    96       5198  Escherichia coli (strain UTI89 / UPEC)
                    97       5126  Streptococcus pneumoniae
                    98       5084  Paracoccus denitrificans PD1222
                    99       5053  Clostridium beijerincki NCIMB 8052
                    100       5024  Xanthobacter sp. (strain Py2)
                    
                    
                    3.3  Taxonomic distribution of the sequences
                    
                    Kingdom        sequences (% of the database)
                    Archaea           64704 (  2%)
                    Bacteria        1403934 ( 48%)
                    Eukaryota       1079778 ( 37%)
                    Viruses          397199 ( 13%)
                    Other              2706 ( <1%)
                    
                    Within Eukaryota:
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  56171 (  5%)           (  2%)
                    Other Mammalia        118643 ( 11%)           (  4%)
                    Other Vertebrata      147130 ( 14%)           (  5%)
                    Viridiplantae         229637 ( 21%)           (  8%)
                    Fungi                 149949 ( 14%)           (  5%)
                    Insecta               113478 ( 11%)           (  4%)
                    Nematoda               36769 (  3%)           (  1%)
                    Other                 228001 ( 21%)           (  8%)
                    
                    
                    4.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50   37583             1001-1100    17797
                    51- 100  191311             1101-1200    12738
                    101- 150  241118             1201-1300     9058
                    151- 200  228166             1301-1400     5997
                    201- 250  228527             1401-1500     4957
                    251- 300  215399             1501-1600     3549
                    301- 350  202986             1601-1700     2812
                    351- 400  161957             1701-1800     2427
                    401- 450  130194             1801-1900     1783
                    451- 500  111941             1901-2000     1515
                    501- 550   82265             2001-2100     1112
                    551- 600   60188             2101-2200     1213
                    601- 650   45640             2201-2300     1004
                    651- 700   35499             2301-2400      824
                    701- 750   31079             2401-2500      602
                    751- 800   27240             >2500         5521
                    801- 850   20845
                    851- 900   18276
                    901- 950   13569
                    951-1000   10654 
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   323 amino acids.
                    
                    The shortest sequence is Q96AT0_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    5.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL 
                    lines, as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    4669308              1.58
                    Journal                         2465151   1767061    0.84
                    Submitted to EMBL/GenBank/DDBJ  2155290   1480848    0.73
                    Thesis                             7092      7049   <0.01
                    Book citation                      5703      5650   <0.01
                    Submitted to other databases        433       425   <0.01
                    Other                             35639     21853    0.01
                    
                    Comments (CC)                      1376078              0.47
                    CAUTION                          656158    656158    0.22
                    SIMILARITY                       263637    259013    0.09
                    FUNCTION                         107439    103591    0.04
                    SUBCELLULAR LOCATION              99911     99911    0.03
                    CATALYTIC ACTIVITY                84184     81162    0.03
                    SUBUNIT                           73235     73235    0.02
                    COFACTOR                          57259     57259    0.02
                    PATHWAY                           17020     14720    0.01
                    DOMAIN                             8228      6479   <0.01
                    INTERACTION                        4964      4964   <0.01
                    MISCELLANEOUS                      3941      3911   <0.01
                    MASS SPECTROMETRY                   106        60   <0.01
                    ALLERGEN                             16        16   <0.01
                    
                    Features (FT)                      1451448              0.49
                    NON_TER                         1301420    778886    0.44
                    SIGNAL                           101901     98278    0.03
                    CHAIN                             47553     28355    0.02
                    TRANSIT                             574       570   <0.01
                    
                    Cross-references (DR)             22518476              7.64
                    GO                              5573650   1583402    1.89
                    InterPro                        4697855   2145112    1.59
                    EMBL                            3357754   2939600    1.14
                    Pfam                            2693145   2002284    0.91
                    PROSITE                         1460980    943854    0.50
                    GenomeReviews                    756281    711148    0.26
                    PRINTS                           616501    512569    0.21
                    SMART                            478052    376438    0.16
                    TIGRFAMs                         382823    353844    0.13
                    SMR                              360014    359986    0.12
                    ProDom                           338676    325068    0.11
                    PANTHER                          311255    298522    0.11
                    BioCyc                           295799    276419    0.10
                    HSSP                             280639    280240    0.10
                    PIR                              195935    160405    0.07
                    TIGR                             124901    118614    0.04
                    UniGene                          118729    113983    0.04
                    Ensembl                          106727    106725    0.04
                    PIRSF                             74400     73601    0.03
                    Gramene                           74077     74076    0.03
                    MGI                               48592     45388    0.02
                    FlyBase                           27208     27170    0.01
                    TAIR                              20318     20251    0.01
                    WormPep                           18713     18632    0.01
                    WormBase                          18712     18632    0.01
                    LinkHub                           15746     15746    0.01
                    MEROPS                            12975     12530   <0.01
                    ZFIN                              12644     12639   <0.01
                    IntAct                             5800      5800   <0.01
                    LegioList                          5467      5437   <0.01
                    ListiList                          4754      4737   <0.01
                    AGD4200      4200   <0.01
                    PhotoList                          4125      4001   <0.01
                    HGNC                               4003      4002   <0.01
                    PDB3654      2197   <0.01
                    TubercuList                        2554      2548   <0.01
                    RGD2041      2030   <0.01
                    GeneDB_Spombe                      1951      1938   <0.01
                    SagaList                           1767      1673   <0.01
                    SGD1221      1203   <0.01
                    DictyBase                           978       978   <0.01
                    Leproma                             977       976   <0.01
                    TRANSFAC                            929       917   <0.01
                    MypuList                            595       591   <0.01
                    REBASE                              124       119   <0.01
                    PHCI-2DPAGE                         106       106   <0.01
                    ANU-2DPAGE                           65        65   <0.01
                    SWISS-2DPAGE                         49        49   <0.01
                    Reactome                              9         9   <0.01
                    PMMA-2DPAGE                           3         3   <0.01
                    Siena-2DPAGE                          2         2   <0.01
                    COMPLUYEAST-2DPAGE                    1         1   <0.01
                    
                    
                    Number of explicitly cross-referenced databases: 72
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 228833
                    
                    Total number of entries encoded on a Mitochondrion: 132715
                    Total number of entries encoded on a Plasmid: 48188
                    Total number of entries encoded on a Plastid: 2601
                    Total number of entries encoded on a Plastid; Apicoplast: 122
                    Total number of entries encoded on a Plastid; Chloroplast: 47787
                    Total number of entries encoded on a Plastid; Cyanelle: 7
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 165
                    
                    Number of fragments: 722286
                    
                

Submissions and Updates

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.

Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml

For all queries regarding submissions to UniProtKB and to submit new protein sequence data, please contact:

UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail:


Download information

Bi-Weekly releases

The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic

Major releases

For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on CD-ROM from the EBI.


Contact

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address: /
WWW server: http://www.ebi.ac.uk/


SIB Swiss Institute of Bioinformatics
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4
Switzerland

Telephone: (+41 22) 379 50 50
Fax: (+41 22) 379 58 58
Electronic mail address:
WWW server: http://www.expasy.org/


Protein Information Resource (PIR)
Georgetown University Medical Center
3300 Whitehaven St., Suite 1200
Washington, DC 20008
United States of America

Telephone: (+1 202) 687 1039
Fax: (+1 202) 687 0057)
Electronic mail address:
WWW server: http://pir.georgetown.edu

Citation

If you want to cite UniProt in a publication please use the following reference:

Wu C.H., Apweiler R., Bairoch A., Natale D.A., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Mazumder R., O'Donovan C., Redaschi N., Suzek B. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 34: D187-D191 (2006).