Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Release 52.0 of 06-Mar-07 of UniProtKB/Swiss-Prot contains 261'513 sequence entries, comprising 95'638'062 amino acids abstracted from 153'035 references.

The growth of the database is summarized below.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154
44.0 07/04 153'871 56'608'159
45.0 10/04 163'235 59'631'787
46.0 02/05 168'297 61'443'278
47.0 05/05 181'577 65'746'672
48.0 09/05 194'317 70'391'852
49.0 02/06 207'132 75'438'310
50.0 05/06 222'289 81'585'146
51.0 10/06 241'242 88'541'632
52.0 03/07 261'513 95'638'062

In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.


Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

  • be as complete as possible. All sequences available at a given time should be immediately included in UniProtKB/Swiss-Prot. This also includes sequence corrections and updates;
  • provide a higher level of annotation;
  • provide cross-references to specialized database(s) that contain, among other data, some information about the genes that code for these proteins;
  • provide specific indexes and documents.

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana TAIR arath.txt 5'065
C.albicans None yet calbican.txt 604
C.elegans Wormpep celegans.txt 3'081
D.discoideum DictyBase dicty.txt 350
D.melanogaster FlyBase fly.txt 2'588
M.musculus MGD mgdtosp.txt 12'408
S.cerevisiae SGD yeast.txt 6'239
S.pombe GeneDB_SPombe pombe.txt 3'217

UniProtKB/Swiss-Prot release statistics
                    
                    1.  INTRODUCTION
                    
                    Release 52.0 of 06-Mar-07 of UniProtKB/Swiss-Prot contains 261513 sequence entries,
                    comprising 95638062 amino acids abstracted from 153035 references. 
                    
                    20329 sequences have been added since release 51.0, the sequence data of
                    11364 existing entries has been updated and the annotations of
                    196464 entries have been revised.
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 7.87   Gln (Q) 3.96   Leu (L) 9.65   Ser (S) 6.84
                    Arg (R) 5.42   Glu (E) 6.66   Lys (K) 5.92   Thr (T) 5.41
                    Asn (N) 4.13   Gly (G) 6.95   Met (M) 2.39   Trp (W) 1.13
                    Asp (D) 5.34   His (H) 2.29   Phe (F) 3.95   Tyr (Y) 3.02
                    Cys (C) 1.50   Ile (I) 5.91   Pro (P) 4.82   Val (V) 6.73
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00
                    
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Arg, Thr, Asp, Pro, Asn, Gln,
                    Phe, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/Swiss-Prot: 10849
                    
                    The first twenty species represent 80696 sequences:  30.9 % of the total
                    number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x: 5201
                    2x: 1664
                    3x:  815
                    4x:  532
                    5x:  349
                    6x:  318
                    7x:  218
                    8x:  183
                    9x:  159
                    10x:   82
                    11- 20x:  433
                    21- 50x:  325
                    51-100x:  169
                    >100x:  401
                    
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1      15945  Homo sapiens (Human)
                    2      12710  Mus musculus (Mouse)
                    3       6163  Saccharomyces cerevisiae (Baker's yeast)
                    4       5864  Rattus norvegicus (Rat)
                    5       4978  Arabidopsis thaliana (Mouse-ear cress)
                    6       4931  Escherichia coli
                    7       3420  Bos taurus (Bovine)
                    8       3176  Schizosaccharomyces pombe (Fission yeast)
                    9       3006  Caenorhabditis elegans
                    10       2849  Bacillus subtilis
                    11       2485  Drosophila melanogaster (Fruit fly)
                    12       1883  Escherichia coli O157:H7
                    13       1782  Methanococcus jannaschii
                    14       1780  Xenopus laevis (African clawed frog)
                    15       1774  Haemophilus influenzae
                    16       1665  Gallus gallus (Chicken)
                    17       1626  Salmonella typhimurium
                    18       1585  Pongo pygmaeus (Orangutan)
                    19       1550  Escherichia coli O6
                    20       1524  Shigella flexneri
                    21       1416  Mycobacterium tuberculosis
                    22       1222  Salmonella typhi
                    23       1160  Sus scrofa (Pig)
                    24       1158  Mycobacterium bovis
                    25       1135  Brachydanio rerio (Zebrafish) (Danio rerio)
                    26       1125  Oryza sativa (Rice)
                    27       1107  Pseudomonas aeruginosa
                    28        976  Synechocystis sp. (strain PCC 6803)
                    29        971  Archaeoglobus fulgidus
                    30        905  Yersinia pestis
                    31        887  Vibrio cholerae
                    32        884  Mimivirus
                    33        876  Rhizobium meliloti (Sinorhizobium meliloti)
                    34        829  Oryctolagus cuniculus (Rabbit)
                    35        801  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
                    36        756  Aquifex aeolicus
                    37        748  Staphylococcus aureus (strain Mu50 / ATCC 700699)
                    38        746  Staphylococcus aureus (strain N315)
                    39        737  Pasteurella multocida
                    40        734  Vibrio parahaemolyticus
                    41        730  Staphylococcus aureus (strain MW2)
                    42        728  Staphylococcus aureus (strain COL)
                    43        726  Staphylococcus aureus (strain MSSA476)
                    44        724  Staphylococcus aureus (strain MRSA252)
                    45        687  Mycoplasma pneumoniae
                    46        686  Streptomyces coelicolor
                    47        681  Canis familiaris (Dog)
                    48        677  Vibrio vulnificus
                    49        673  Bacillus halodurans
                    50        658  Vibrio vulnificus (strain YJ016)
                    51        632  Mycobacterium leprae
                    52        629  Anabaena sp. (strain PCC 7120)
                    53        618  Staphylococcus epidermidis (strain ATCC 12228)
                    54        618  Pseudomonas syringae pv. tomato
                    55        617  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
                    56        617  Neurospora crassa
                    57        617  Yersinia pseudotuberculosis
                    58        612  Pseudomonas putida (strain KT2440)
                    59        612  Bacillus anthracis
                    60        611  Treponema pallidum
                    61        606  Candida albicans (Yeast)
                    62        605  Ashbya gossypii (Yeast) (Eremothecium gossypii)
                    63        601  Photorhabdus luminescens subsp. laumondii
                    64        587  Methanobacterium thermoautotrophicum
                    65        581  Bradyrhizobium japonicum
                    66        575  Rickettsia prowazekii
                    67        574  Helicobacter pylori (Campylobacter pylori)
                    68        572  Buchnera aphidicola subsp. Acyrthosiphon pisum 
                    69        571  Kluyveromyces lactis (Yeast) (Candida sphaerica)
                    70        570  Ralstonia solanacearum (Pseudomonas solanacearum)
                    71        568  Pan troglodytes (Chimpanzee)
                    72        568  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    73        562  Buchnera aphidicola subsp. Schizaphis graminum
                    74        561  Salmonella paratyphi-a
                    75        556  Lactococcus lactis subsp. lactis (Streptococcus lactis)
                    76        556  Rhizobium loti (Mesorhizobium loti)
                    77        556  Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
                    78        555  Helicobacter pylori J99 (Campylobacter pylori J99)
                    79        554  Zea mays (Maize)
                    80        549  Listeria monocytogenes
                    81        541  Listeria innocua
                    82        540  Xanthomonas campestris pv. campestris
                    83        537  Shewanella oneidensis
                    84        537  Bacillus cereus (strain ATCC 14579 / DSM 31)
                    85        530  Neisseria meningitidis serogroup A
                    86        530  Neisseria meningitidis serogroup B
                    87        518  Candida glabrata (Yeast) (Torulopsis glabrata)
                    88        517  Clostridium acetobutylicum
                    89        516  Caulobacter crescentus (Caulobacter vibrioides)
                    90        507  Buchnera aphidicola subsp. Baizongia pistaciae
                    91        506  Xanthomonas axonopodis pv. citri
                    92        491  Streptococcus pneumoniae
                    93        488  Thermotoga maritima
                    94        483  Mycoplasma genitalium
                    95        482  Oceanobacillus iheyensis
                    96        481  Listeria monocytogenes serotype 4b (strain F2365)
                    97        481  Xylella fastidiosa
                    98        474  Brucella suis
                    99        473  Salmonella choleraesuis
                    100        473  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    101        472  Brucella melitensis
                    102        472  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
                    103        470  Photobacterium profundum (Photobacterium sp. (strain SS9))
                    104        466  Haemophilus ducreyi
                    105        465  Deinococcus radiodurans
                    106        457  Methanosarcina acetivorans
                    107        453  Corynebacterium glutamicum (Brevibacterium flavum)
                    108        453  Clostridium perfringens
                    109        449  Rickettsia conorii
                    110        446  Pyrococcus horikoshii
                    111        445  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
                    112        441  Bacillus cereus (strain ATCC 10987)
                    113        441  Pyrococcus abyssi
                    114        440  Halobacterium salinarium (Halobacterium halobium)
                    115        440  Bordetella pertussis
                    116        437  Methanosarcina mazei (Methanosarcina frisia)
                    117        435  Chromobacterium violaceum
                    118        435  Chlamydia trachomatis
                    119        432  Bordetella parapertussis
                    120        432  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
                    121        431  Emericella nidulans (Aspergillus nidulans)
                    122        424  Borrelia burgdorferi (Lyme disease spirochete)
                    123        424  Yarrowia lipolytica (Candida lipolytica)
                    124        422  Thermoanaerobacter tengcongensis
                    125        421  Nicotiana tabacum (Common tobacco)
                    126        421  Pyrococcus furiosus
                    127        420  Lactobacillus plantarum
                    128        418  Synechococcus elongatus (Thermosynechococcus elongatus)
                    129        416  Chlamydia pneumoniae (Chlamydophila pneumoniae)
                    130        415  Streptococcus pyogenes serotype M6
                    131        414  Ovis aries (Sheep)
                    132        412  Campylobacter jejuni
                    133        412  Enterococcus faecalis (Streptococcus faecalis)
                    134        411  Streptococcus mutans
                    135        410  Streptomyces avermitilis
                    136        406  Chlamydia muridarum
                    137        406  Rhizobium sp. (strain NGR234)
                    138        404  Bacillus thuringiensis subsp. konkukian
                    139        397  Sulfolobus solfataricus
                    140        396  Streptococcus pyogenes serotype M1
                    141        394  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
                    142        394  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
                    143        391  Streptococcus pyogenes serotype M18
                    144        390  Streptococcus pyogenes serotype M3
                    145        381  Acinetobacter sp. (strain ADP1)
                    146        380  Burkholderia pseudomallei (Pseudomonas pseudomallei)
                    147        376  Shigella sonnei (strain Ss046)
                    148        373  Bacillus cereus (strain ZK / E33L)
                    149        372  Chlorobium tepidum
                    150        371  Rhodopseudomonas palustris
                    151        371  Nitrosomonas europaea
                    152        369  Corynebacterium efficiens
                    153        368  Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
                    154        367  Vibrio fischeri (strain ATCC 700601 / ES114)
                    155        366  Bacillus clausii (strain KSM-K16)
                    156        360  Rickettsia bellii (strain RML369-C)
                    157        356  Methanopyrus kandleri
                    158        356  Mannheimia succiniciproducens (strain MBEL55E)
                    159        355  Bacillus licheniformis (strain DSM 13 / ATCC 14580)
                    160        354  Staphylococcus haemolyticus (strain JCSC1435)
                    161        354  Gloeobacter violaceus
                    162        353  Burkholderia mallei (Pseudomonas mallei)
                    163        352  Shigella boydii serotype 4 (strain Sb227)
                    164        351  Leptospira interrogans
                    165        349  Staphylococcus saprophyticus subsp. saprophyticus 
                    166        349  Rickettsia felis (Rickettsia azadi)
                    167        348  Aeropyrum pernix
                    168        346  Streptococcus agalactiae serotype III
                    169        343  Streptococcus agalactiae serotype V
                    170        341  Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
                    171        340  Solanum tuberosum (Potato)
                    172        339  Shigella dysenteriae serotype 1 (strain Sd197)
                    173        339  Pisum sativum (Garden pea)
                    174        337  Methylococcus capsulatus
                    175        337  Dictyostelium discoideum (Slime mold)
                    176        337  Synechococcus sp. (strain WH8102)
                    177        334  Sulfolobus tokodaii
                    178        332  Rickettsia typhi
                    179        332  Prochlorococcus marinus (strain MIT 9313)
                    180        332  Glycine max (Soybean)
                    181        331  Prochlorococcus marinus
                    182        331  Geobacillus kaustophilus
                    183        323  Mycobacterium paratuberculosis
                    184        322  Staphylococcus aureus
                    185        320  Rhodopirellula baltica
                    186        317  Idiomarina loihiensis
                    187        316  Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
                    188        316  Geobacter sulfurreducens
                    189        316  Thermoplasma acidophilum
                    190        315  Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1) 
                    191        314  Pseudomonas syringae pv. syringae (strain B728a)
                    192        312  Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
                    193        311  Aspergillus fumigatus (Sartorya fumigata)
                    194        311  Fusobacterium nucleatum subsp. nucleatum
                    195        309  Coxiella burnetii
                    196        307  Triticum aestivum (Wheat)
                    197        303  Macaca mulatta (Rhesus macaque)
                    198        300  Brucella abortus
                    199        299  Azoarcus sp. (strain EbN1)
                    200        296  Nocardia farcinica
                    201        296  Synechococcus sp. (strain PCC 7942) (Anacystis nidulans R2)
                    202        296  Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
                    203        295  Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
                    204        294  Wolinella succinogenes
                    205        293  Zymomonas mobilis
                    206        292  Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
                    207        292  Bacteroides thetaiotaomicron
                    208        291  Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
                    209        290  Sulfolobus acidocaldarius
                    210        287  Clostridium tetani
                    211        285  Symbiobacterium thermophilum
                    212        285  Pseudomonas putida
                    213        285  Silicibacter pomeroyi
                    214        285  Pyrobaculum aerophilum
                    215        285  Legionella pneumophila subsp. pneumophila 
                    216        284  Haemophilus influenzae (strain 86-028NP)
                    217        284  Xanthomonas oryzae pv. oryzae
                    218        283  Hordeum vulgare (Barley)
                    219        282  Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
                    220        282  Cavia porcellus (Guinea pig)
                    221        281  Legionella pneumophila (strain Paris)
                    222        281  Thermoplasma volcanium
                    223        279  Legionella pneumophila (strain Lens)
                    224        279  Pseudomonas fluorescens (strain PfO-1)
                    225        277  Corynebacterium diphtheriae
                    226        273  Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
                    227        273  Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
                    228        269  Spinacia oleracea (Spinach)
                    229        268  Bacteriophage T4
                    230        267  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    231        262  Rhodobacter capsulatus (Rhodopseudomonas capsulata)
                    232        261  Helicobacter hepaticus
                    233        261  Methanococcus maripaludis
                    234        260  Wigglesworthia glossinidia brevipalpis
                    235        259  Haloarcula marismortui (Halobacterium marismortui)
                    236        259  Equus caballus (Horse)
                    237        259  Bifidobacterium longum
                    238        258  Xanthomonas campestris pv. campestris (strain 8004)
                    239        257  Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
                    240        257  Staphylococcus aureus (strain NCTC 8325)
                    241        256  Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
                    242        255  Leifsonia xyli subsp. xyli
                    243        254  Vaccinia virus (strain Copenhagen) (VACV)
                    244        253  Gluconobacter oxydans (Gluconobacter suboxydans)
                    245        252  Dechloromonas aromatica (strain RCB)
                    246        251  Anabaena variabilis (strain ATCC 29413 / PCC 7937)
                    247        251  Porphyromonas gingivalis (Bacteroides gingivalis)
                    248        249  Bartonella henselae (Rochalimaea henselae)
                    249        248  Staphylococcus aureus (strain bovine RF122)
                    250        247  Campylobacter jejuni (strain RM1221)
                    251        244  Chlamydophila caviae
                    252        243  Bacteroides fragilis
                    253        243  Desulfotalea psychrophila
                    254        240  Blochmannia floridanus
                    255        238  Lactobacillus johnsonii
                    256        237  Cryptococcus neoformans (Filobasidiella neoformans)
                    257        237  Propionibacterium acnes
                    258        236  Bartonella quintana (Rochalimaea quintana)
                    259        235  Bacillus stearothermophilus (Geobacillus stearothermophilus)
                    260        235  Burkholderia pseudomallei (strain 1710b)
                    261        234  Pseudoalteromonas haloplanktis (strain TAC 125)
                    262        232  Xanthomonas campestris pv. vesicatoria (strain 85-10)
                    263        232  Nitrosococcus oceani (strain ATCC 19707 / NCIMB 11848)
                    264        230  Thiobacillus denitrificans (strain ATCC 25259)
                    265        229  Brucella abortus (strain 2308)
                    266        225  Gorilla gorilla gorilla (Lowland gorilla)
                    267        224  Chlamydomonas reinhardtii
                    268        221  Bdellovibrio bacteriovorus
                    269        220  Francisella tularensis subsp. tularensis
                    270        220  Ustilago maydis (Smut fungus)
                    271        220  Streptococcus thermophilus (strain ATCC BAA-250 / LMG 18311)
                    272        219  Staphylococcus aureus (strain USA300)
                    273        218  Streptococcus thermophilus (strain CNRZ 1066)
                    274        217  Porphyra purpurea
                    275        217  Escherichia coli (strain UTI89 / UPEC)
                    276        213  Psychrobacter arcticum
                    277        212  Klebsiella pneumoniae
                    278        211  Felis silvestris catus (Cat)
                    279        210  Pelobacter carbinolicus (strain DSM 2380 / Gra Bd 1)
                    280        210  Nitrobacter winogradskyi (strain Nb-255 / ATCC 25391)
                    281        208  Cricetulus griseus (Chinese hamster)
                    282        208  Treponema denticola
                    283        207  Porphyra yezoensis
                    284        207  Rhodospirillum rubrum (strain ATCC 11170 / NCIB 8255)
                    285        206  Lactobacillus acidophilus
                    286        204  Caenorhabditis briggsae
                    287        201  Escherichia coli O6:K15:H31 (strain 536 / UPEC)
                    288        201  Mesocricetus auratus (Golden hamster)
                    289        200  Vaccinia virus (strain Western Reserve / WR) (VACV)
                    
                    
                    
                    3.3  Taxonomic distribution of the sequences
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea           10908 (  4%)
                    Bacteria         127559 ( 49%)
                    Eukaryota        112139 ( 43%)
                    Viruses           10907 (  4%)
                    
                    
                    Within Eukaryota:
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  15946 ( 14%)           (  6%)
                    Other Mammalia         34810 ( 31%)           ( 13%)
                    Other Vertebrata       10293 (  9%)           (  4%)
                    Viridiplantae          18768 ( 17%)           (  7%)
                    Fungi                  16974 ( 15%)           (  6%)
                    Insecta                 4794 (  4%)           (  2%)
                    Nematoda                3451 (  3%)           (  1%)
                    Other                   7103 (  6%)           (  3%)
                    
                    
                    4.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50    5020             1001-1100     2190
                    51- 100   19316             1101-1200     1469
                    101- 150   27430             1201-1300     1178
                    151- 200   26015             1301-1400      973
                    201- 250   26277             1401-1500      809
                    251- 300   22407             1501-1600      399
                    301- 350   22622             1601-1700      321
                    351- 400   20653             1701-1800      276
                    401- 450   16334             1801-1900      243
                    451- 500   14148             1901-2000      205
                    501- 550   10709             2001-2100      127
                    551- 600    7271             2101-2200      180
                    601- 650    6284             2201-2300      177
                    651- 700    4277             2301-2400      113
                    701- 750    3615             2401-2500       92
                    751- 800    2907             >2500          679
                    801- 850    2438
                    851- 900    2548
                    901- 950    1920
                    951-1000    1531
                    
                    
                    The average sequence length in UniProtKB/Swiss-Prot is 365 amino acids.
                    
                    The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
                    The longest sequence is  TITIN_HUMAN (Q8WZ42): 34350 amino acids.
                    
                    
                    5.  JOURNAL CITATIONS
                    
                    Note: the following citation statistics reflect the number of distinct
                    journal citations.
                    
                    Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1756
                    
                    
                    5.1 Table of the frequency of journal citations
                    
                    Journals cited 1x:  618
                    2x:  236
                    3x:  133
                    4x:   89
                    5x:   67
                    6x:   47
                    7x:   35
                    8x:   35
                    9x:   34
                    10x:   16
                    11- 20x:  130
                    21- 50x:  149
                    51-100x:   70
                    >100x:  129
                    
                    
                    5.2  List of the most cited journals in UniProtKB/Swiss-Prot
                    
                    Nb    Citations   Journal name
                    --    ---------   -------------------------------------------------------------
                    1        14640   Journal of Biological Chemistry
                    2         7008   Proceedings of the National Academy of Sciences of the U.S.A.
                    3         4469   Journal of Bacteriology
                    4         4188   Gene
                    5         4053   Nucleic Acids Research
                    6         3779   Biochemical and Biophysical Research Communications
                    7         3520   FEBS Letters
                    8         3259   Biochemistry
                    9         3220   The EMBO Journal
                    10         2924   European Journal of Biochemistry
                    11         2785   Nature
                    12         2709   Molecular and Cellular Biology
                    13         2646   Biochimica et Biophysica Acta
                    14         2497   Journal of Molecular Biology
                    15         2292   Genomics
                    16         2256   Cell
                    17         1818   Biochemical Journal
                    18         1718   Science
                    19         1483   Molecular Microbiology
                    20         1346   Plant Molecular Biology
                    21         1286   Journal of Virology
                    22         1268   Molecular and General Genetics
                    23         1267   Journal of Cell Biology
                    24         1113   Virology
                    25         1088   Human Molecular Genetics
                    26         1064   Journal of Biochemistry
                    27         1051   Nature Genetics
                    28         1046   Genes and Development
                    29          942   Oncogene
                    30          939   Plant Physiology
                    31          909   The American Journal of Human Genetics
                    32          822   Human Mutation
                    33          781   Journal of Immunology
                    34          775   Development
                    35          765   Infection and Immunity
                    36          741   Genetics
                    37          731   Structure
                    38          691   Yeast
                    39          685   Archives of Biochemistry and Biophysics
                    40          656   Journal of General Virology
                    41          652   Molecular Biology of the Cell
                    42          619   Microbiology
                    43          576   Blood
                    44          557   FEMS Microbiology Letters
                    45          553   The Plant Cell
                    46          541   Nature Structural Biology
                    47          505   Molecular Cell
                    48          497   Human Genetics
                    49          492   Journal of Cell Science
                    50          489   Current Genetics
                    51          486   Cancer Research
                    52          475   Developmental Biology
                    53          452   Mechanisms of Development
                    54          443   The Plant Journal
                    55          434   Applied and Environmental Microbiology
                    56          426   Protein Science
                    57          422   Neuron
                    58          418   Journal of Clinical Investigation
                    59          417   Mammalian Genome
                    60          417   Acta Crystallographica, Section D
                    61          409   Current Biology
                    62          402   Molecular and Biochemical Parasitology
                    63          393   Journal of Neuroscience
                    64          390   Molecular Endocrinology
                    65          380   The Journal of Experimental Medicine
                    66          370   Immunogenetics
                    67          345   Journal of Molecular Evolution
                    68          338   DNA and Cell Biology
                    69          335   Journal of Neurochemistry
                    70          333   Endocrinology
                    71          317   DNA Sequence
                    72          315   Toxicon
                    73          302   The Journal of Clinical Endocrinology and Metabolism
                    74          300   American Journal of Physiology
                    75          291   Molecular Biology and Evolution
                    76          286   Biological Chemistry Hoppe-Seyler
                    77          285   Brain Research. Molecular Brain Research
                    78          281   Bioscience, Biotechnology, and Biochemistry
                    79          249   Cytogenetics and Cell Genetics
                    80          242   Comparative Biochemistry and Physiology
                    81          242   Journal of General Microbiology
                    82          238   Proteins
                    83          224   Journal of Medical Genetics
                    84          220   Peptides
                    85          218   Antimicrobial Agents and Chemotherapy
                    86          215   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
                    87          215   Molecular Pharmacology
                    88          202   Journal of Investigative Dermatology
                    89          194   Biology of Reproduction
                    90          191   Genome Research
                    91          189   Plant and Cell Physiology
                    92          189   Nature Cell Biology
                    93          183   DNA Research
                    94          180   Molecular Plant-Microbe Interactions
                    95          180   Virus Research
                    96          175   European Journal of Immunology
                    97          172   Experimental Cell Research
                    98          164   RNA
                    99          160   Biochimie
                    100          158   Tissue Antigens
                    101          158   DNA
                    102          156   Neurology
                    103          152   Molecular and Cellular Endocrinology
                    104          151   Developmental Dynamics
                    105          149   Molecular Phylogenetics and Evolution
                    106          149   Hemoglobin
                    107          147   American Journal of Medical Genetics
                    108          145   Bioorganicheskaia Khimiia
                    109          140   Archives of Microbiology
                    110          140   Annals of Neurology
                    111          138   Genes to Cells
                    112          138   European Journal of Human Genetics
                    113          134   Insect Biochemistry and Molecular Biology
                    114          132   Journal of Human Genetics
                    115          129   Immunity
                    116          128   Planta
                    117          123   Animal Genetics
                    118          123   Developmental Cell
                    119          121   Molecular Reproduction and Development
                    120          118   Agricultural and Biological Chemistry
                    121          118   General and Comparative Endocrinology
                    122          117   Diabetes
                    123          111   Molecular Immunology
                    124          109   Glycobiology
                    125          109   Investigative Ophthalmology and Visual Science
                    126          107   The New England Journal of Medicine
                    127          106   Journal of Protein Chemistry
                    128          102   Molecular and Cellular Neuroscience
                    129          101   Archives of Virology
                    130          100   British Journal of Haematology
                    
                    
                    6.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                     514754              1.97
                    Journal                          448577    234379    1.72
                    Submitted to EMBL/GenBank/DDBJ    62118     54649    0.24
                    Submitted to Swiss-Prot            1146      1128   <0.01
                    Submitted to other databases        640       626   <0.01
                    Unpublished observations            637       631   <0.01
                    Book citation                       578       566   <0.01
                    Plant Gene Register                 537       525   <0.01
                    Thesis                              380       378   <0.01
                    Patent                              135       133   <0.01
                    Worm Breeder's Gazette                6         6   <0.01
                    
                    Comments (CC)                      1058365              4.05
                    SIMILARITY                       297431    239032    1.14
                    FUNCTION                         184967    178423    0.71
                    SUBCELLULAR LOCATION             143499    143499    0.55
                    SUBUNIT                           98812     98812    0.38
                    CATALYTIC ACTIVITY                98305     90234    0.38
                    PATHWAY                           52267     44460    0.20
                    COFACTOR                          39248     35113    0.15
                    TISSUE SPECIFICITY                24970     24970    0.10
                    MISCELLANEOUS                     22037     19810    0.08
                    PTM                               21043     17067    0.08
                    DOMAIN                            15822     13637    0.06
                    ALTERNATIVE PRODUCTS              11342     11342    0.04
                    CAUTION                           10296      9071    0.04
                    INTERACTION                        7093      7093    0.03
                    INDUCTION                          7087      7087    0.03
                    DEVELOPMENTAL STAGE                6212      6212    0.02
                    ENZYME REGULATION                  4520      4520    0.02
                    DISEASE                            3600      2604    0.01
                    WEB RESOURCE                       3448      2894    0.01
                    MASS SPECTROMETRY                  2693      2242    0.01
                    BIOPHYSICOCHEMICAL PROPERTIES      1653      1653    0.01
                    POLYMORPHISM                        586       570   <0.01
                    RNA EDITING                         477       477   <0.01
                    ALLERGEN                            419       419   <0.01
                    TOXIC DOSE                          324       319   <0.01
                    BIOTECHNOLOGY                       141       141   <0.01
                    PHARMACEUTICAL                       73        73   <0.01
                    
                    Features (FT)                      1744500              6.67
                    CHAIN                            266207    257979    1.02
                    TRANSMEM                         167633     36823    0.64
                    METAL                            111155     27262    0.43
                    STRAND                            92324      8678    0.35
                    CONFLICT                          90312     31283    0.35
                    HELIX                             88592      9112    0.34
                    DOMAIN                            86017     47759    0.33
                    TOPO_DOM                          85194     17293    0.33
                    CARBOHYD                          76138     19104    0.29
                    DISULFID                          74123     18930    0.28
                    BINDING                           64883     22914    0.25
                    ACT_SITE                          62217     35914    0.24
                    MOD_RES                           58057     23003    0.22
                    REPEAT                            54932      8200    0.21
                    VARIANT                           44600      9160    0.17
                    NP_BIND                           40877     29321    0.16
                    REGION                            38954     20753    0.15
                    COMPBIAS                          25530     14709    0.10
                    VAR_SEQ                           24590     10681    0.09
                    SIGNAL                            24413     24403    0.09
                    TURN                              23826      7391    0.09
                    MUTAGEN                           19304      4726    0.07
                    ZN_FING                           18962      7399    0.07
                    MOTIF                             18018     11888    0.07
                    SITE                              15906      9034    0.06
                    INIT_MET                          10975     10975    0.04
                    NON_TER                           10704      8215    0.04
                    COILED                             9800      6360    0.04
                    PROPEP                             7982      6739    0.03
                    LIPID                              7548      4882    0.03
                    DNA_BIND                           6921      6424    0.03
                    PEPTIDE                            6644      4151    0.03
                    TRANSIT                            4550      4504    0.02
                    CA_BIND                            2693      1111    0.01
                    CROSSLNK                           2024      1381    0.01
                    NON_CONS                           1175       523   <0.01
                    UNSURE                              469       183   <0.01
                    SE_CYS                              251       182   <0.01
                    
                    Cross-references (DR)              3650964             13.96
                    InterPro                         617838    238799    2.36
                    EMBL                             492611    253089    1.88
                    GO                               339578    137455    1.30
                    Pfam                             325839    231011    1.25
                    PROSITE                          246106    150243    0.94
                    KEGG                             180404    162878    0.69
                    GenomeReviews                    150533    134388    0.58
                    HAMAP                            103797    103678    0.40
                    TIGRFAMs                         101433     95037    0.39
                    PIR                               98446     91898    0.38
                    PRINTS                            93227     73725    0.36
                    HSSP                              80523     80523    0.31
                    SMART                             76697     58494    0.29
                    BioCyc                            72324     66863    0.28
                    ProDom                            70565     68226    0.27
                    UniGene                           59630     54887    0.23
                    Gene3D                            59342     52618    0.23
                    Ensembl                           51212     51212    0.20
                    GermOnline                        42029     41413    0.16
                    PANTHER                           41377     41043    0.16
                    PDB                               38554     10498    0.15
                    SMR                               35989     35989    0.14
                    ArrayExpress                      35710     35710    0.14
                    RZPD-ProtExp                      27256     12772    0.10
                    TIGR                              23892     23273    0.09
                    PIRSF                             20894     20636    0.08
                    LinkHub                           17639     17639    0.07
                    HGNC                              15422     15355    0.06
                    IntAct                            13399     13399    0.05
                    MIM                               12802     10405    0.05
                    MGI                               12583     12537    0.05
                    DIP8824      8774    0.03
                    SGD6235      6148    0.02
                    CYGD                               6223      6134    0.02
                    RGD5692      5689    0.02
                    MEROPS                             5364      5058    0.02
                    TAIR                               5039      4947    0.02
                    EcoGene                            4311      4308    0.02
                    EchoBASE                           4158      4126    0.02
                    H-InvDB                            3677      3659    0.01
                    WormPep                            3617      3002    0.01
                    WormBase                           3270      3188    0.01
                    FlyBase                            3234      3110    0.01
                    GeneDB_Spombe                      3209      3174    0.01
                    TRANSFAC                           2878      2584    0.01
                    SubtiList                          2790      2789    0.01
                    Gramene                            2789      2789    0.01
                    Reactome                           2706      1545    0.01
                    GeneFarm                           1831      1812    0.01
                    DrugBank                           1826       502    0.01
                    StyGene                            1579      1575    0.01
                    HPA1486      1324    0.01
                    TubercuList                        1444      1408    0.01
                    SWISS-2DPAGE                       1179      1179   <0.01
                    ZFIN                               1120      1108   <0.01
                    ListiList                          1091      1083   <0.01
                    REPRODUCTION-2DPAGE                 829       829   <0.01
                    Leproma                             635       632   <0.01
                    AGD611       605   <0.01
                    PhotoList                           601       601   <0.01
                    LegioList                           560       560   <0.01
                    MaizeGDB                            442       437   <0.01
                    OGP377       376   <0.01
                    REBASE                              364       358   <0.01
                    HIV361       351   <0.01
                    PeroxiBase                          361       350   <0.01
                    ECO2DBASE                           351       299   <0.01
                    SagaList                            347       346   <0.01
                    DictyBase                           340       337   <0.01
                    GlycoSuiteDB                        282       282   <0.01
                    PHCI-2DPAGE                         241       241   <0.01
                    MypuList                            192       192   <0.01
                    DOSAC-COBS-2DPAGE                   149       147   <0.01
                    Aarhus/Ghent-2DPAGE                 128        98   <0.01
                    Siena-2DPAGE                        103       103   <0.01
                    HSC-2DPAGE                           85        85   <0.01
                    PhosSite                             70        70   <0.01
                    Cornea-2DPAGE                        67        67   <0.01
                    COMPLUYEAST-2DPAGE                   59        59   <0.01
                    euHCVdb                              55        44   <0.01
                    PMMA-2DPAGE                          52        52   <0.01
                    PptaseDB                             29        29   <0.01
                    Rat-heart-2DPAGE                     28        28   <0.01
                    ANU-2DPAGE                           22        22   <0.01
                    
                    Number of explicitly cross-referenced databases: 85
                    Number of implicitly cross-referenced databases: 26
                    
                    
                    7.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in UniProtKB/Swiss-Prot: 237138
                    
                    Total number of entries encoded on a Mitochondrion: 4306
                    Total number of entries encoded on a Plasmid: 3295
                    Total number of entries encoded on a Plastid: 26
                    Total number of entries encoded on a Plastid; Apicoplast: 6
                    Total number of entries encoded on a Plastid; Chloroplast: 8037
                    Total number of entries encoded on a Plastid; Cyanelle: 145
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 91
                    
                    Number of fragments: 8360
                    Number of additional sequences produced by alternative splicing, initiation or promoter usage: 18319
                    
                

UniProtKB/TrEMBL protein database release 35.0 statistics

                    
                    1.  INTRODUCTION
                    
                    Release 35.0 of 06-Mar-2007 of UniProtKB/TrEMBL contains 3874166 sequence entries
                    comprising 1260291226 amino acids.
                    
                    696753 sequences have been added since release 34, the sequence data of
                    10513 existing entries has been updated and the annotations of
                    2124731 entries have been revised. This represents an increase of 23%.
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 8.37   Gln (Q) 4.00   Leu (L) 9.83   Ser (S) 6.86
                    Arg (R) 5.52   Glu (E) 6.04   Lys (K) 5.28   Thr (T) 5.62
                    Asn (N) 4.30   Gly (G) 6.98   Met (M) 2.39   Trp (W) 1.33
                    Asp (D) 5.23   His (H) 2.23   Phe (F) 4.06   Tyr (Y) 3.04
                    Cys (C) 1.37   Ile (I) 5.96   Pro (P) 4.85   Val (V) 6.60
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Lys, Asp, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 127380
                    
                    The first twenty species represent 763609 sequences:  19.7 % of the
                    total number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:58527
                    2x:23921
                    3x:12255
                    4x: 6910
                    5x: 4051
                    6x: 2999
                    7x: 2175
                    8x: 1781
                    9x: 1381
                    10x: 1525
                    11- 20x: 6496
                    21- 50x: 2672
                    51-100x: 1081
                    >100x: 1606
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     177135  Human immunodeficiency virus 1
                    2      71810  Oryza sativa (japonica cultivar-group)
                    3      53146  Homo sapiens (Human)
                    4      52403  Mus musculus (Mouse)
                    5      50189  Trichomonas vaginalis G3
                    6      45187  Arabidopsis thaliana (Mouse-ear cress)
                    7      39844  Paramecium tetraurelia
                    8      35448  Hepatitis C virus
                    9      28040  Tetraodon nigroviridis (Green puffer)
                    10      27313  Tetrahymena thermophila SB210
                    11      26501  Drosophila melanogaster (Fruit fly)
                    12      20214  Caenorhabditis elegans
                    13      20166  Trypanosoma cruzi
                    14      18711  Medicago truncatula (Barrel medic)
                    15      18430  Brachydanio rerio (Zebrafish) (Danio rerio)
                    16      17188  uncultured bacterium
                    17      16864  Aedes aegypti (Yellowfever mosquito)
                    18      16432  Phaeosphaeria nodorum SN15
                    19      14666  Plasmodium chabaudi
                    20      13922  Hepatitis B virus (HBV)
                    21      13557  Aspergillus niger
                    22      13415  Anopheles gambiae str. PEST
                    23      13082  Dictyostelium discoideum AX4
                    24      13074  Caenorhabditis briggsae
                    25      12674  Xenopus laevis (African clawed frog)
                    26      12032  Aspergillus oryzae
                    27      11780  Plasmodium berghei
                    28      11650  Gibberella zeae (Fusarium graminearum)
                    29      10980  Chaetomium globosum CBS 148.51
                    30      10662  Neurospora crassa
                    31      10403  Neosartorya fischeri  (Aspergillus fischerianus 
                    32      10393  Aspergillus terreus NIH2624
                    33      10278  Coccidioides immitis RS
                    34      10084  Drosophila pseudoobscura (Fruit fly)
                    35      10006  Aspergillus fumigatus (Sartorya fumigata)
                    36       9719  Schistosoma japonicum (Blood fluke)
                    37       9640  Emericella nidulans (Aspergillus nidulans)
                    38       9446  Trypanosoma brucei
                    39       9343  Candida albicans (Yeast)
                    40       9232  Rattus norvegicus (Rat)
                    41       9113  Aspergillus clavatus NRRL 1
                    42       9089  Entamoeba histolytica HM-1:IMSS
                    43       8994  Rhodococcus sp. (strain RHA1)
                    44       8811  Escherichia coli
                    45       8512  Stigmatella aurantiaca DW4/3-1
                    46       8436  Burkholderia xenovorans (strain LB400)
                    47       8249  Microscilla marina ATCC 23134
                    48       8244  Bos taurus (Bovine)
                    49       8097  Bradyrhizobium japonicum
                    50       7975  Ostreococcus tauri
                    51       7937  Frankia sp. EAN1pec
                    52       7834  Burkholderia phymatum STM815
                    53       7808  Plasmodium yoelii yoelii
                    54       7761  Solibacter usitatus (strain Ellin6076)
                    55       7663  Burkholderia vietnamiensis G4
                    56       7524  Streptomyces coelicolor
                    57       7490  Helicobacter pylori (Campylobacter pylori)
                    58       7461  Burkholderia cenocepacia MC0-3
                    59       7449  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    60       7432  Bradyrhizobium sp. BTAi1
                    61       7310  Burkholderia phytofirmans PsJN
                    62       7300  Streptomyces avermitilis
                    63       7207  Myxococcus xanthus (strain DK 1622)
                    64       7139  Rhizobium loti (Mesorhizobium loti)
                    65       7113  Leishmania major
                    66       7042  Hepatitis C virus subtype 1b
                    67       6996  Burkholderia ambifaria MC40-6
                    68       6994  Rhizobium leguminosarum bv. viciae (strain 3841)
                    69       6952  Rhodopirellula baltica
                    70       6921  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    71       6882  Burkholderia cenocepacia (strain HI2424)
                    72       6792  Pseudomonas aeruginosa
                    73       6708  Frankia alni (strain ACN14a)
                    74       6679  Psychroflexus torquis ATCC 700755
                    75       6597  Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
                    76       6592  Burkholderia cepacia (strain ATCC 53795 / AMMD)
                    77       6566  Hahella chejuensis (strain KCTC 2396)
                    78       6564  Burkholderia multivorans ATCC 17616
                    79       6511  Ralstonia eutropha  (Cupriavidus necator 
                    80       6488  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    81       6471  Ustilago maydis (Smut fungus)
                    82       6420  Plasmodium falciparum
                    83       6398  Cryptococcus neoformans (Filobasidiella neoformans)
                    84       6394  Giardia lamblia ATCC 50803
                    85       6363  Cryptococcus neoformans var. neoformans B-3501A
                    86       6337  Sinorhizobium medicae WSM419
                    87       6313  Burkholderia cenocepacia (strain AU 1054)
                    88       6272  Stappia aggregata IAM 12614
                    89       6269  Oryza sativa (Rice)
                    90       6267  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    91       6186  Yarrowia lipolytica (Candida lipolytica)
                    92       6181  Bacillus anthracis
                    93       6176  Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
                    94       6154  Ralstonia metallidurans (strain CH34 / ATCC 43123 / DSM 2839)
                    95       6129  Bacillus thuringiensis serovar israelensis ATCC 35646
                    96       6110  Lyngbya sp. PCC 8106
                    97       6095  Burkholderia pseudomallei (strain 1710b)
                    98       6003  Delftia acidovorans SPH-1
                    99       5962  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
                    100       5909  Mycobacterium vanbaalenii (strain DSM 7251 / PYR-1)
                    
                    3.3  Taxonomic distribution of the sequences
                    
                    Kingdom        sequences (% of the database)
                    Archaea           85628 (  2%)
                    Bacteria        1953096 ( 50%)
                    Eukaryota       1353357 ( 35%)
                    Viruses          438444 ( 12%)
                    Other              3639 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  53146 (  4%)           (  1%)
                    Other Mammalia        128207 (  9%)           (  3%)
                    Other Vertebrata      166359 ( 12%)           (  4%)
                    Viridiplantae         277650 ( 21%)           (  7%)
                    Fungi                 221421 ( 16%)           (  6%)
                    Insecta               140469 ( 10%)           (  4%)
                    Nematoda               36997 (  3%)           (  1%)
                    Other                 329108 ( 24%)           (  8%)
                    
                    
                    
                    4.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50   49629             1001-1100    23412
                    51- 100  256397             1101-1200    16751
                    101- 150  325180             1201-1300    11671
                    151- 200  309661             1301-1400     7811
                    201- 250  311786             1401-1500     6409
                    251- 300  298194             1501-1600     4703
                    301- 350  279147             1601-1700     3695
                    351- 400  220660             1701-1800     3063
                    401- 450  179845             1801-1900     2267
                    451- 500  153062             1901-2000     1925
                    501- 550  111613             2001-2100     1566
                    551- 600   81746             2101-2200     1583
                    601- 650   61473             2201-2300     1247
                    651- 700   47710             2301-2400     1057
                    701- 750   41866             2401-2500      846
                    751- 800   37368             >2500      7206
                    801- 850   27923
                    851- 900   24629
                    901- 950   17965
                    951-1000   14061
                    
                    
                    
                    
                    The average sequence length in UniProtKB/TrEMBL is   325 amino acids.
                    
                    The shortest sequence is Q96AT0_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    5.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    5698366              1.47
                    Submitted to EMBL/GenBank/DDBJ  3001879   2155019    0.77
                    Journal                         2633713   2185824    0.68
                    Thesis                             6111      6059   <0.01
                    Book citation                      4173      4128   <0.01
                    Submitted to other databases        281       275   <0.01
                    Other                             52209     34730    0.01
                    
                    Comments (CC)                      1596760              0.41
                    CAUTION                          764048    764048    0.20
                    SIMILARITY                       288557    283163    0.07
                    SUBCELLULAR LOCATION             134969    134969    0.03
                    FUNCTION                         121188    115775    0.03
                    CATALYTIC ACTIVITY               110804    100343    0.03
                    SUBUNIT                           83645     83645    0.02
                    COFACTOR                          59953     59716    0.02
                    PATHWAY                           19869     16688    0.01
                    DOMAIN                             5538      5051   <0.01
                    INTERACTION                        4527      4527   <0.01
                    MISCELLANEOUS                      3652      3652   <0.01
                    ALLERGEN                              6         6   <0.01
                    MASS SPECTROMETRY                     4         4   <0.01
                    
                    Features (FT)                      1871141              0.48
                    NON_TER                         1551259    926827    0.40
                    CHAIN                            190187    160684    0.05
                    SIGNAL                           129160    129160    0.03
                    TRANSIT                             535       535   <0.01
                    
                    Cross-references (DR)             27737959              7.16
                    GO                              6176751   2117700    1.59
                    InterPro                        5183167   2359043    1.34
                    EMBL                            4421091   3866114    1.14
                    Pfam                            2961494   2202542    0.76
                    PROSITE                         1630918   1054799    0.42
                    GenomeReviews                   1149944   1105731    0.30
                    KEGG                             874421    836997    0.23
                    Gene3D                           758975    651836    0.20
                    PRINTS                           661334    551431    0.17
                    SMART                            562318    439627    0.15
                    TIGRFAMs                         417608    385499    0.11
                    SMR                              398503    398493    0.10
                    ProDom                           390620    372224    0.10
                    BioCyc                           281152    266236    0.07
                    HSSP                             272626    272224    0.07
                    UniGene                          253722    234448    0.07
                    PANTHER                          239476    237162    0.06
                    PIR                              185394    150287    0.05
                    TIGR                             153089    146665    0.04
                    RZPD-ProtExp                     114853     36208    0.03
                    ArrayExpress                     101817    101712    0.03
                    Ensembl                           93999     93997    0.02
                    PIRSF                             87590     86698    0.02
                    Gramene                           71013     71013    0.02
                    MGI                               44956     43553    0.01
                    HGNC                              38055     38004    0.01
                    euHCVdb                           30120     30120    0.01
                    FlyBase                           24842     24806    0.01
                    TAIR                              19580     19520    0.01
                    WormPep                           19308     19223   <0.01
                    WormBase                          19065     18982   <0.01
                    LinkHub                           13923     13923   <0.01
                    ZFIN                              12974     12972   <0.01
                    DictyBase                         12926     12926   <0.01
                    MEROPS                            11947     11509   <0.01
                    LegioList                          5345      5315   <0.01
                    IntAct                             5246      5246   <0.01
                    ListiList                          4724      4707   <0.01
                    PDB4407      2648   <0.01
                    AGD4096      4096   <0.01
                    PhotoList                          4081      3957   <0.01
                    RGD4044      3711   <0.01
                    REBASE                             3697      3672   <0.01
                    TubercuList                        2545      2539   <0.01
                    DIP2487      2482   <0.01
                    GeneDB_Spombe                      1779      1766   <0.01
                    SagaList                           1749      1655   <0.01
                    Leproma                             972       971   <0.01
                    PeroxiBase                          902       901   <0.01
                    TRANSFAC                            881       870   <0.01
                    MypuList                            590       586   <0.01
                    SGD407       406   <0.01
                    CYGD133       130   <0.01
                    PHCI-2DPAGE                         106       106   <0.01
                    ANU-2DPAGE                           63        63   <0.01
                    Reactome                             49        36   <0.01
                    REPRODUCTION-2DPAGE                  40        40   <0.01
                    SWISS-2DPAGE                         39        39   <0.01
                    PMMA-2DPAGE                           3         3   <0.01
                    Siena-2DPAGE                          2         2   <0.01
                    COMPLUYEAST-2DPAGE                    1         1   <0.01
                    
                    Number of explicitly cross-referenced databases: 85
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 245473
                    
                    Total number of entries encoded on a Mitochondrion: 156079
                    Total number of entries encoded on a Plasmid: 62697
                    Total number of entries encoded on a Plastid: 3559
                    Total number of entries encoded on a Plastid; Apicoplast: 179
                    Total number of entries encoded on a Plastid; Chloroplast: 53902
                    Total number of entries encoded on a Plastid; Cyanelle: 7
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 181
                    
                    Number of fragments: 929039
                    
                

Submissions and Updates

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.

Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml

For all queries regarding submissions to UniProtKB and to submit new protein sequence data, please contact:

UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail: datasubs@ebi.ac.uk


Download information

Bi-Weekly releases

The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic

Major releases

For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on DVD from the EBI.


Contact

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address: datalib@ebi.ac.uk / swissprot@ebi.ac.uk
WWW server: http://www.ebi.ac.uk/


SIB Swiss Institute of Bioinformatics
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4
Switzerland

Telephone: (+41 22) 379 50 50
Fax: (+41 22) 379 58 58
Electronic mail address: Swiss-Prot@expasy.org
WWW server: http://www.expasy.org/


Protein Information Resource (PIR)
Georgetown University Medical Center
3300 Whitehaven St., Suite 1200
Washington, DC 20008
United States of America

Telephone: (+1 202) 687 1039
Fax: (+1 202) 687 0057)
Electronic mail address: pirmail@georgetown.edu
WWW server: http://pir.georgetown.edu

Citation

If you want to cite UniProt in a publication please use the following reference:

The UniProt Consortium
"The Universal Protein Resource (UniProt)"
Nucleic Acids Res. 35:D193-D197(2007) doi:10.1093/nar/gkl929