Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Release 11.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 53.0 and the UniProtKB/TrEMBL Protein Database release 36.0.

More information on these databases can be found in the user manual What is the UniProt Knowledgebase?.


UniProtKB/Swiss-Prot protein knowledgebase release 53.0 statistics

Release 53.0 of 29-May-07 of UniProtKB/Swiss-Prot contains 269293 sequence entries, comprising 98902758 amino acids abstracted from 156204 references.

The growth of the database is summarized below.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154
44.0 07/04 153'871 56'608'159
45.0 10/04 163'235 59'631'787
46.0 02/05 168'297 61'443'278
47.0 05/05 181'577 65'746'672
48.0 09/05 194'317 70'391'852
49.0 02/06 207'132 75'438'310
50.0 05/06 222'289 81'585'146
51.0 10/06 241'242 88'541'632
52.0 03/07 261'513 95'638'062
53.0 05/07 269'293 98'902'758

In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.


Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

  • be as complete as possible. All sequences available at a given time should be immediately included in UniProtKB/Swiss-Prot. This also includes sequence corrections and updates;
  • provide a higher level of annotation;
  • provide cross-references to specialized database(s) that contain, among other data, some information about the genes that code for these proteins;
  • provide specific indexes and documents.

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana TAIR arath.txt 5'771
C.albicans None yet calbican.txt 621
C.elegans Wormpep celegans.txt 3'113
D.discoideum DictyBase dicty.txt 357
D.melanogaster FlyBase fly.txt 2'659
M.musculus MGD mgdtosp.txt 13'155
S.cerevisiae SGD yeast.txt 6'240
S.pombe GeneDB_SPombe pombe.txt 3'232

UniProtKB/Swiss-Prot release statistics
                    
                    1.  INTRODUCTION
                    
                    Release 53.0 of 29-May-07 of UniProtKB/Swiss-Prot contains 269293 sequence entries,
                    comprising 98902758 amino acids abstracted from 156204 references. 
                    
                    9228 sequences have been added since release 52.0, the sequence data of
                    734 existing entries has been updated and the annotations of
                    210454 entries have been revised.
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 7.85   Gln (Q) 3.96   Leu (L) 9.66   Ser (S) 6.87
                    Arg (R) 5.42   Glu (E) 6.67   Lys (K) 5.93   Thr (T) 5.40
                    Asn (N) 4.12   Gly (G) 6.94   Met (M) 2.39   Trp (W) 1.13
                    Asp (D) 5.34   His (H) 2.29   Phe (F) 3.95   Tyr (Y) 3.01
                    Cys (C) 1.50   Ile (I) 5.89   Pro (P) 4.84   Val (V) 6.72
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00
                    
                    
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Arg, Thr, Asp, Pro, Asn, Gln,
                    Phe, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/Swiss-Prot: 10917
                    
                    The first twenty species represent 84159 sequences:  31.3 % of the total
                    number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x: 5200
                    2x: 1661
                    3x:  801
                    4x:  529
                    5x:  359
                    6x:  332
                    7x:  225
                    8x:  194
                    9x:  168
                    10x:   89
                    11- 20x:  451
                    21- 50x:  329
                    51-100x:  170
                    >100x:  409
                    
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1      16602  Homo sapiens (Human)
                    2      13316  Mus musculus (Mouse)
                    3       6163  Saccharomyces cerevisiae (Baker's yeast)
                    4       6119  Rattus norvegicus (Rat)
                    5       5706  Arabidopsis thaliana (Mouse-ear cress)
                    6       4930  Escherichia coli
                    7       4025  Bos taurus (Bovine)
                    8       3188  Schizosaccharomyces pombe (Fission yeast)
                    9       3032  Caenorhabditis elegans
                    10       2854  Bacillus subtilis
                    11       2545  Drosophila melanogaster (Fruit fly)
                    12       2008  Xenopus laevis (African clawed frog)
                    13       1885  Escherichia coli O157:H7
                    14       1782  Methanococcus jannaschii
                    15       1774  Haemophilus influenzae
                    16       1762  Pongo pygmaeus (Orangutan)
                    17       1752  Gallus gallus (Chicken)
                    18       1636  Salmonella typhimurium
                    19       1552  Escherichia coli O6
                    20       1528  Shigella flexneri
                    21       1418  Mycobacterium tuberculosis
                    22       1332  Danio rerio (Zebrafish) (Brachydanio rerio)
                    23       1232  Salmonella typhi
                    24       1223  Pseudomonas aeruginosa
                    25       1195  Sus scrofa (Pig)
                    26       1159  Mycobacterium bovis
                    27       1077  Oryza sativa subsp. japonica (Rice)
                    28        978  Synechocystis sp. (strain PCC 6803)
                    29        971  Archaeoglobus fulgidus
                    30        906  Yersinia pestis
                    31        892  Vibrio cholerae
                    32        884  Mimivirus
                    33        884  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
                    34        879  Rhizobium meliloti (Sinorhizobium meliloti)
                    35        838  Oryctolagus cuniculus (Rabbit)
                    36        796  Staphylococcus aureus (strain Mu50 / ATCC 700699)
                    37        794  Staphylococcus aureus (strain N315)
                    38        770  Staphylococcus aureus (strain MW2)
                    39        770  Staphylococcus aureus (strain COL)
                    40        766  Staphylococcus aureus (strain MSSA476)
                    41        759  Staphylococcus aureus (strain MRSA252)
                    42        756  Aquifex aeolicus
                    43        738  Vibrio parahaemolyticus
                    44        738  Pasteurella multocida
                    45        714  Canis familiaris (Dog)
                    46        687  Streptomyces coelicolor
                    47        687  Mycoplasma pneumoniae
                    48        682  Vibrio vulnificus
                    49        674  Bacillus halodurans
                    50        663  Vibrio vulnificus (strain YJ016)
                    51        647  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
                    52        645  Staphylococcus epidermidis (strain ATCC 12228)
                    53        633  Mycobacterium leprae
                    54        631  Anabaena sp. (strain PCC 7120)
                    55        629  Neurospora crassa
                    56        621  Ashbya gossypii (Yeast) (Eremothecium gossypii)
                    57        619  Yersinia pseudotuberculosis
                    58        618  Bacillus anthracis
                    59        618  Pseudomonas syringae pv. tomato
                    60        617  Candida albicans (Yeast)
                    61        614  Pseudomonas putida (strain KT2440)
                    62        612  Treponema pallidum
                    63        611  Pan troglodytes (Chimpanzee)
                    64        602  Photorhabdus luminescens subsp. laumondii
                    65        598  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    66        591  Zea mays (Maize)
                    67        588  Methanobacterium thermoautotrophicum
                    68        588  Kluyveromyces lactis (Yeast) (Candida sphaerica)
                    69        582  Bradyrhizobium japonicum
                    70        579  Rickettsia prowazekii
                    71        577  Salmonella paratyphi-a
                    72        574  Helicobacter pylori (Campylobacter pylori)
                    73        572  Buchnera aphidicola subsp. Acyrthosiphon pisum 
                    74        572  Ralstonia solanacearum (Pseudomonas solanacearum)
                    75        571  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    76        562  Buchnera aphidicola subsp. Schizaphis graminum
                    77        559  Rhizobium loti (Mesorhizobium loti)
                    78        559  Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
                    79        555  Lactococcus lactis subsp. lactis (Streptococcus lactis)
                    80        555  Helicobacter pylori J99 (Campylobacter pylori J99)
                    81        550  Listeria monocytogenes
                    82        542  Bacillus cereus (strain ATCC 14579 / DSM 31)
                    83        542  Listeria innocua
                    84        541  Xanthomonas campestris pv. campestris
                    85        539  Shewanella oneidensis
                    86        535  Candida glabrata (Yeast) (Torulopsis glabrata)
                    87        530  Neisseria meningitidis serogroup A
                    88        530  Neisseria meningitidis serogroup B
                    89        519  Clostridium acetobutylicum
                    90        517  Caulobacter crescentus (Caulobacter vibrioides)
                    91        507  Buchnera aphidicola subsp. Baizongia pistaciae
                    92        507  Xanthomonas axonopodis pv. citri
                    93        494  Brucella suis
                    94        493  Brucella melitensis
                    95        492  Streptococcus pneumoniae
                    96        490  Salmonella choleraesuis
                    97        489  Thermotoga maritima
                    98        485  Oceanobacillus iheyensis
                    99        483  Mycoplasma genitalium
                    100        482  Listeria monocytogenes serotype 4b (strain F2365)
                    101        481  Rickettsia conorii
                    102        481  Xylella fastidiosa
                    103        472  Photobacterium profundum (Photobacterium sp. (strain SS9))
                    104        472  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
                    105        467  Deinococcus radiodurans
                    106        467  Haemophilus ducreyi
                    107        458  Methanosarcina acetivorans
                    108        456  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
                    109        454  Corynebacterium glutamicum (Brevibacterium flavum)
                    110        454  Clostridium perfringens
                    111        448  Bacillus cereus (strain ATCC 10987)
                    112        446  Pyrococcus horikoshii
                    113        443  Bordetella parapertussis
                    114        443  Emericella nidulans (Aspergillus nidulans)
                    115        442  Bordetella pertussis
                    116        441  Pyrococcus abyssi
                    117        440  Halobacterium salinarium (Halobacterium halobium)
                    118        438  Chromobacterium violaceum
                    119        437  Methanosarcina mazei (Methanosarcina frisia)
                    120        436  Yarrowia lipolytica (Candida lipolytica)
                    121        435  Chlamydia trachomatis
                    122        434  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
                    123        432  Rickettsia felis (Rickettsia azadi)
                    124        425  Borrelia burgdorferi (Lyme disease spirochete)
                    125        423  Lactobacillus plantarum
                    126        422  Thermoanaerobacter tengcongensis
                    127        421  Nicotiana tabacum (Common tobacco)
                    128        421  Pyrococcus furiosus
                    129        419  Synechococcus elongatus (Thermosynechococcus elongatus)
                    130        419  Rickettsia bellii (strain RML369-C)
                    131        417  Streptococcus pyogenes serotype M6
                    132        416  Ovis aries (Sheep)
                    133        416  Chlamydia pneumoniae (Chlamydophila pneumoniae)
                    134        414  Enterococcus faecalis (Streptococcus faecalis)
                    135        413  Streptococcus mutans
                    136        413  Bacillus thuringiensis subsp. konkukian
                    137        412  Campylobacter jejuni
                    138        412  Streptomyces avermitilis
                    139        408  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
                    140        406  Chlamydia muridarum
                    141        406  Rhizobium sp. (strain NGR234)
                    142        397  Streptococcus pyogenes serotype M1
                    143        397  Sulfolobus solfataricus
                    144        394  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
                    145        393  Streptococcus pyogenes serotype M18
                    146        391  Streptococcus pyogenes serotype M3
                    147        390  Rickettsia typhi
                    148        389  Staphylococcus haemolyticus (strain JCSC1435)
                    149        388  Shigella sonnei (strain Ss046)
                    150        383  Acinetobacter sp. (strain ADP1)
                    151        383  Burkholderia pseudomallei (Pseudomonas pseudomallei)
                    152        382  Bacillus cereus (strain ZK / E33L)
                    153        378  Staphylococcus saprophyticus subsp. saprophyticus 
                    154        374  Rhodopseudomonas palustris
                    155        373  Chlorobium tepidum
                    156        372  Nitrosomonas europaea
                    157        370  Corynebacterium efficiens
                    158        369  Vibrio fischeri (strain ATCC 700601 / ES114)
                    159        368  Bacillus clausii (strain KSM-K16)
                    160        368  Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
                    161        364  Shigella boydii serotype 4 (strain Sb227)
                    162        359  Methanopyrus kandleri
                    163        359  Bacillus licheniformis (strain DSM 13 / ATCC 14580)
                    164        357  Mannheimia succiniciproducens (strain MBEL55E)
                    165        356  Burkholderia mallei (Pseudomonas mallei)
                    166        354  Gloeobacter violaceus
                    167        351  Leptospira interrogans
                    168        349  Aeropyrum pernix
                    169        348  Shigella dysenteriae serotype 1 (strain Sd197)
                    170        348  Streptococcus agalactiae serotype III
                    171        345  Streptococcus agalactiae serotype V
                    172        344  Dictyostelium discoideum (Slime mold)
                    173        341  Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
                    174        340  Solanum tuberosum (Potato)
                    175        340  Pisum sativum (Garden pea)
                    176        338  Methylococcus capsulatus
                    177        338  Synechococcus sp. (strain WH8102)
                    178        336  Geobacillus kaustophilus
                    179        334  Sulfolobus tokodaii
                    180        332  Prochlorococcus marinus (strain MIT 9313)
                    181        332  Glycine max (Soybean)
                    182        331  Prochlorococcus marinus
                    183        325  Mycobacterium paratuberculosis
                    184        324  Staphylococcus aureus
                    185        324  Aspergillus fumigatus (Sartorya fumigata)
                    186        323  Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
                    187        321  Brucella abortus
                    188        320  Idiomarina loihiensis
                    189        320  Rhodopirellula baltica
                    190        318  Macaca mulatta (Rhesus macaque)
                    191        317  Geobacter sulfurreducens
                    192        317  Staphylococcus aureus (strain NCTC 8325)
                    193        317  Pseudomonas syringae pv. syringae (strain B728a)
                    194        317  Thermoplasma acidophilum
                    195        315  Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1) 
                    196        314  Coxiella burnetii
                    197        313  Fusobacterium nucleatum subsp. nucleatum
                    198        312  Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
                    199        310  Triticum aestivum (Wheat)
                    200        300  Azoarcus sp. (strain EbN1) (Aromatoleum aromaticum (strain EbN1))
                    201        299  Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
                    202        297  Nocardia farcinica
                    203        297  Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
                    204        297  Synechococcus sp. (strain PCC 7942) (Anacystis nidulans R2)
                    205        296  Staphylococcus aureus (strain bovine RF122)
                    206        295  Wolinella succinogenes
                    207        295  Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
                    208        293  Zymomonas mobilis
                    209        293  Bacteroides thetaiotaomicron
                    210        292  Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
                    211        291  Sulfolobus acidocaldarius
                    212        288  Clostridium tetani
                    213        287  Symbiobacterium thermophilum
                    214        287  Pseudomonas putida
                    215        287  Haemophilus influenzae (strain 86-028NP)
                    216        287  Silicibacter pomeroyi
                    217        287  Legionella pneumophila subsp. pneumophila 
                    218        287  Pseudomonas fluorescens (strain PfO-1)
                    219        286  Xanthomonas oryzae pv. oryzae
                    220        285  Pyrobaculum aerophilum
                    221        284  Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
                    222        283  Cavia porcellus (Guinea pig)
                    223        283  Legionella pneumophila (strain Paris)
                    224        283  Hordeum vulgare (Barley)
                    225        282  Thermoplasma volcanium
                    226        281  Legionella pneumophila (strain Lens)
                    227        279  Staphylococcus aureus (strain USA300)
                    228        279  Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
                    229        278  Corynebacterium diphtheriae
                    230        276  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    231        273  Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
                    232        269  Gorilla gorilla gorilla (Lowland gorilla)
                    233        269  Spinacia oleracea (Spinach)
                    234        268  Bacteriophage T4
                    235        264  Equus caballus (Horse)
                    236        263  Methanococcus maripaludis
                    237        262  Rhodobacter capsulatus (Rhodopseudomonas capsulata)
                    238        262  Xanthomonas campestris pv. campestris (strain 8004)
                    239        261  Helicobacter hepaticus
                    240        261  Bifidobacterium longum
                    241        260  Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
                    242        260  Wigglesworthia glossinidia brevipalpis
                    243        259  Haloarcula marismortui (Halobacterium marismortui)
                    244        259  Oryza sativa (Rice)
                    245        258  Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
                    246        257  Dechloromonas aromatica (strain RCB)
                    247        255  Anabaena variabilis (strain ATCC 29413 / PCC 7937)
                    248        255  Leifsonia xyli subsp. xyli
                    249        254  Vaccinia virus (strain Copenhagen) (VACV)
                    250        253  Gluconobacter oxydans (Gluconobacter suboxydans)
                    251        252  Porphyromonas gingivalis (Bacteroides gingivalis)
                    252        251  Brucella abortus (strain 2308)
                    253        250  Bartonella henselae (Rochalimaea henselae)
                    254        247  Bacteroides fragilis
                    255        247  Campylobacter jejuni (strain RM1221)
                    256        245  Cryptococcus neoformans (Filobasidiella neoformans)
                    257        244  Chlamydophila caviae
                    258        243  Desulfotalea psychrophila
                    259        242  Pseudoalteromonas haloplanktis (strain TAC 125)
                    260        241  Blochmannia floridanus
                    261        241  Burkholderia pseudomallei (strain 1710b)
                    262        240  Lactobacillus johnsonii
                    263        238  Propionibacterium acnes
                    264        237  Xanthomonas campestris pv. vesicatoria (strain 85-10)
                    265        237  Bartonella quintana (Rochalimaea quintana)
                    266        236  Nitrosococcus oceani (strain ATCC 19707 / NCIMB 11848)
                    267        235  Bacillus stearothermophilus (Geobacillus stearothermophilus)
                    268        233  Thiobacillus denitrificans (strain ATCC 25259)
                    269        232  Escherichia coli (strain UTI89 / UPEC)
                    270        227  Ustilago maydis (Smut fungus)
                    271        225  Chlamydomonas reinhardtii
                    272        224  Streptococcus thermophilus (strain ATCC BAA-250 / LMG 18311)
                    273        222  Francisella tularensis subsp. tularensis
                    274        222  Streptococcus thermophilus (strain CNRZ 1066)
                    275        221  Bdellovibrio bacteriovorus
                    276        217  Escherichia coli O6:K15:H31 (strain 536 / UPEC)
                    277        217  Porphyra purpurea
                    278        216  Psychrobacter arcticum
                    279        213  Caenorhabditis briggsae
                    280        212  Klebsiella pneumoniae
                    281        212  Pelobacter carbinolicus (strain DSM 2380 / Gra Bd 1)
                    282        212  Nitrobacter winogradskyi (strain Nb-255 / ATCC 25391)
                    283        211  Felis silvestris catus (Cat)
                    284        209  Cricetulus griseus (Chinese hamster)
                    285        209  Gibberella zeae (Fusarium graminearum)
                    286        209  Lactobacillus acidophilus
                    287        209  Treponema denticola
                    288        209  Rhodospirillum rubrum (strain ATCC 11170 / NCIB 8255)
                    289        207  Porphyra yezoensis
                    290        203  Nitrosospira multiformis (strain ATCC 25196 / NCIMB 11849)
                    291        202  Burkholderia thailandensis (strain E264 / ATCC 700388 / DSM 13276 / CIP 106301)
                    292        202  Mesocricetus auratus (Golden hamster)
                    293        200  Vaccinia virus (strain Western Reserve / WR) (VACV)
                    294        200  Thiomicrospira crunogena (strain XCL-2)
                    
                    
                    
                    3.3  Taxonomic distribution of the sequences
                    
                    
                    Kingdom        sequences (% of the database)
                    Archaea           10967 (  4%)
                    Bacteria         130080 ( 48%)
                    Eukaryota        117170 ( 44%)
                    Viruses           11076 (  4%)
                    
                    
                    Within Eukaryota:
                    
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  16603 ( 14%)           (  6%)
                    Other Mammalia         36858 ( 31%)           ( 14%)
                    Other Vertebrata       10970 (  9%)           (  4%)
                    Viridiplantae          19863 ( 17%)           (  7%)
                    Fungi                  17340 ( 15%)           (  6%)
                    Insecta                 4888 (  4%)           (  2%)
                    Nematoda                3486 (  3%)           (  1%)
                    Other                   7162 (  6%)           (  3%)
                    
                    
                    4.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50    5095             1001-1100     2259
                    51- 100   19534             1101-1200     1515
                    101- 150   27896             1201-1300     1220
                    151- 200   26732             1301-1400     1003
                    201- 250   26947             1401-1500      838
                    251- 300   23343             1501-1600      417
                    301- 350   23199             1601-1700      330
                    351- 400   21643             1701-1800      284
                    401- 450   17008             1801-1900      255
                    451- 500   14711             1901-2000      222
                    501- 550   11081             2001-2100      135
                    551- 600    7644             2101-2200      187
                    601- 650    6459             2201-2300      178
                    651- 700    4429             2301-2400      116
                    701- 750    3737             2401-2500       94
                    751- 800    2996             >2500          703
                    801- 850    2516
                    851- 900    2629
                    901- 950    1992
                    951-1000    1570
                    
                    
                    The average sequence length in UniProtKB/Swiss-Prot is 367 amino acids.
                    
                    The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
                    The longest sequence is  TITIN_HUMAN (Q8WZ42): 34350 amino acids.
                    
                    
                    5.  JOURNAL CITATIONS
                    
                    Note: the following citation statistics reflect the number of distinct
                    journal citations.
                    
                    Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1816
                    
                    
                    5.1 Table of the frequency of journal citations
                    
                    Journals cited 1x:  631
                    2x:  231
                    3x:  136
                    4x:   93
                    5x:   68
                    6x:   49
                    7x:   39
                    8x:   31
                    9x:   32
                    10x:   16
                    11- 20x:  139
                    21- 50x:  147
                    51-100x:   73
                    >100x:  131
                    
                    
                    5.2  List of the most cited journals in UniProtKB/Swiss-Prot
                    
                    Nb    Citations   Journal name
                    --    ---------   -------------------------------------------------------------
                    1        14911   Journal of Biological Chemistry
                    2         7108   Proceedings of the National Academy of Sciences of the U.S.A.
                    3         4518   Journal of Bacteriology
                    4         4231   Gene
                    5         4078   Nucleic Acids Research
                    6         3837   Biochemical and Biophysical Research Communications
                    7         3551   FEBS Letters
                    8         3308   Biochemistry
                    9         3269   The EMBO Journal
                    10         2943   European Journal of Biochemistry
                    11         2815   Nature
                    12         2782   Molecular and Cellular Biology
                    13         2673   Biochimica et Biophysica Acta
                    14         2530   Journal of Molecular Biology
                    15         2307   Genomics
                    16         2279   Cell
                    17         1851   Biochemical Journal
                    18         1754   Science
                    19         1500   Molecular Microbiology
                    20         1356   Journal of Virology
                    21         1353   Plant Molecular Biology
                    22         1294   Journal of Cell Biology
                    23         1271   Molecular and General Genetics
                    24         1132   Virology
                    25         1103   Human Molecular Genetics
                    26         1076   Journal of Biochemistry
                    27         1073   Genes and Development
                    28         1072   Nature Genetics
                    29          973   Plant Physiology
                    30          970   Oncogene
                    31          938   The American Journal of Human Genetics
                    32          832   Human Mutation
                    33          813   Development
                    34          789   Journal of Immunology
                    35          769   Infection and Immunity
                    36          757   Genetics
                    37          741   Structure
                    38          696   Yeast
                    39          694   Archives of Biochemistry and Biophysics
                    40          688   Molecular Biology of the Cell
                    41          679   Journal of General Virology
                    42          627   Microbiology
                    43          583   Blood
                    44          574   The Plant Cell
                    45          564   FEMS Microbiology Letters
                    46          544   Nature Structural Biology
                    47          532   Molecular Cell
                    48          510   Journal of Cell Science
                    49          504   Human Genetics
                    50          503   Developmental Biology
                    51          501   Cancer Research
                    52          493   Current Genetics
                    53          474   Mechanisms of Development
                    54          470   The Plant Journal
                    55          446   Applied and Environmental Microbiology
                    56          438   Current Biology
                    57          432   Protein Science
                    58          430   Acta Crystallographica, Section D
                    59          429   Neuron
                    60          423   Mammalian Genome
                    61          421   Journal of Clinical Investigation
                    62          403   Molecular and Biochemical Parasitology
                    63          402   Journal of Neuroscience
                    64          392   Molecular Endocrinology
                    65          384   The Journal of Experimental Medicine
                    66          372   Immunogenetics
                    67          348   Journal of Neurochemistry
                    68          347   Journal of Molecular Evolution
                    69          342   DNA and Cell Biology
                    70          339   Endocrinology
                    71          324   Toxicon
                    72          323   DNA Sequence
                    73          311   The Journal of Clinical Endocrinology and Metabolism
                    74          309   American Journal of Physiology
                    75          295   Molecular Biology and Evolution
                    76          292   Brain Research. Molecular Brain Research
                    77          286   Biological Chemistry Hoppe-Seyler
                    78          284   Bioscience, Biotechnology, and Biochemistry
                    79          252   Cytogenetics and Cell Genetics
                    80          246   Comparative Biochemistry and Physiology
                    81          244   Proteins
                    82          242   Journal of General Microbiology
                    83          242   Journal of Medical Genetics
                    84          225   Peptides
                    85          221   Molecular Pharmacology
                    86          219   Antimicrobial Agents and Chemotherapy
                    87          215   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
                    88          208   Journal of Investigative Dermatology
                    89          205   Biology of Reproduction
                    90          202   Nature Cell Biology
                    91          196   Genome Research
                    92          196   Plant and Cell Physiology
                    93          189   Virus Research
                    94          183   DNA Research
                    95          181   Molecular Plant-Microbe Interactions
                    96          177   Experimental Cell Research
                    97          176   European Journal of Immunology
                    98          169   RNA
                    99          166   Biochimie
                    100          166   Neurology
                    101          160   Developmental Dynamics
                    102          159   Tissue Antigens
                    103          158   DNA
                    104          152   Molecular and Cellular Endocrinology
                    105          149   American Journal of Medical Genetics
                    106          149   Molecular Phylogenetics and Evolution
                    107          149   Hemoglobin
                    108          145   Bioorganicheskaia Khimiia
                    109          144   European Journal of Human Genetics
                    110          143   Genes to Cells
                    111          142   Annals of Neurology
                    112          141   Archives of Microbiology
                    113          140   Planta
                    114          137   Journal of Human Genetics
                    115          135   Insect Biochemistry and Molecular Biology
                    116          131   Immunity
                    117          128   Developmental Cell
                    118          125   Animal Genetics
                    119          122   Molecular Reproduction and Development
                    120          120   Diabetes
                    121          118   Agricultural and Biological Chemistry
                    122          118   General and Comparative Endocrinology
                    123          116   Glycobiology
                    124          116   Investigative Ophthalmology and Visual Science
                    125          112   Molecular Immunology
                    126          110   The New England Journal of Medicine
                    127          106   Molecular and Cellular Neuroscience
                    128          106   Journal of Protein Chemistry
                    129          102   Eukaryotic cell
                    130          102   Archives of Virology
                    131          101   British Journal of Haematology
                    
                    
                    6.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                     535419              1.99
                    Journal                          466313    240501    1.73
                    Submitted to EMBL/GenBank/DDBJ    64038     56191    0.24
                    Submitted to Swiss-Prot            2131      2080    0.01
                    Submitted to other databases        656       643   <0.01
                    Unpublished observations            634       628   <0.01
                    Book citation                       578       568   <0.01
                    Plant Gene Register                 537       525   <0.01
                    Thesis                              388       386   <0.01
                    Patent                              138       136   <0.01
                    Worm Breeder's Gazette                6         6   <0.01
                    
                    Comments (CC)                      1094051              4.06
                    SIMILARITY                       308814    246723    1.15
                    FUNCTION                         190370    183685    0.71
                    SUBCELLULAR LOCATION             148332    148332    0.55
                    SUBUNIT                          101315    101315    0.38
                    CATALYTIC ACTIVITY               100585     92207    0.37
                    PATHWAY                           53586     45379    0.20
                    COFACTOR                          40430     36241    0.15
                    TISSUE SPECIFICITY                25652     25652    0.10
                    MISCELLANEOUS                     22399     20186    0.08
                    PTM                               21807     17661    0.08
                    DOMAIN                            17056     14702    0.06
                    ALTERNATIVE PRODUCTS              12219     12219    0.05
                    INTERACTION                        8011      8011    0.03
                    INDUCTION                          7327      7327    0.03
                    SEQUENCE CAUTION                   6991      6991    0.03
                    DEVELOPMENTAL STAGE                6431      6431    0.02
                    ENZYME REGULATION                  4713      4713    0.02
                    WEB RESOURCE                       4182      3404    0.02
                    DISEASE                            3728      2674    0.01
                    CAUTION                            3445      3371    0.01
                    MASS SPECTROMETRY                  2868      2347    0.01
                    BIOPHYSICOCHEMICAL PROPERTIES      1736      1736    0.01
                    POLYMORPHISM                        600       584   <0.01
                    RNA EDITING                         484       484   <0.01
                    ALLERGEN                            421       421   <0.01
                    TOXIC DOSE                          326       321   <0.01
                    BIOTECHNOLOGY                       150       150   <0.01
                    PHARMACEUTICAL                       73        73   <0.01
                    
                    Features (FT)                      1810156              6.72
                    CHAIN                            274122    265689    1.02
                    TRANSMEM                         173694     38311    0.65
                    METAL                            114900     28257    0.43
                    CONFLICT                          94113     32564    0.35
                    STRAND                            92328      8673    0.34
                    DOMAIN                            89923     50410    0.33
                    HELIX                             88733      9109    0.33
                    TOPO_DOM                          87932     17895    0.33
                    CARBOHYD                          77812     19639    0.29
                    DISULFID                          75787     19361    0.28
                    BINDING                           70187     24266    0.26
                    MOD_RES                           66122     25262    0.25
                    ACT_SITE                          64096     37057    0.24
                    REPEAT                            58439      8843    0.22
                    VARIANT                           46452      9660    0.17
                    NP_BIND                           42707     30693    0.16
                    REGION                            41005     21872    0.15
                    COMPBIAS                          26848     15541    0.10
                    VAR_SEQ                           26720     11548    0.10
                    SIGNAL                            25239     25229    0.09
                    TURN                              23767      7394    0.09
                    MUTAGEN                           20170      4905    0.07
                    ZN_FING                           19737      7617    0.07
                    MOTIF                             18802     12363    0.07
                    SITE                              16382      9270    0.06
                    INIT_MET                          11205     11205    0.04
                    NON_TER                           10727      8231    0.04
                    COILED                            10588      6900    0.04
                    PROPEP                             8180      6890    0.03
                    LIPID                              7976      5129    0.03
                    DNA_BIND                           7151      6620    0.03
                    PEPTIDE                            6760      4226    0.03
                    TRANSIT                            4711      4664    0.02
                    CA_BIND                            2746      1143    0.01
                    CROSSLNK                           2069      1416    0.01
                    NON_CONS                           1297       534   <0.01
                    UNSURE                              477       185   <0.01
                    SE_CYS                              252       183   <0.01
                    
                    Cross-references (DR)              4033012             14.98
                    InterPro                         651509    247238    2.42
                    EMBL                             511792    260808    1.90
                    Pfam                             344308    240530    1.28
                    GO                               342677    138336    1.27
                    PROSITE                          254039    154834    0.94
                    Gene3D                           236303    170230    0.88
                    KEGG                             183185    165599    0.68
                    GenomeReviews                    154937    138089    0.58
                    HAMAP                            107598    107477    0.40
                    TIGRFAMs                         106041     99200    0.39
                    PANTHER                          103373     92186    0.38
                    PIR                               99319     92736    0.37
                    PRINTS                            95064     75266    0.35
                    HSSP                              81207     81207    0.30
                    SMART                             80636     61451    0.30
                    ProDom                            72914     70542    0.27
                    BioCyc                            72868     67334    0.27
                    UniGene                           64399     59174    0.24
                    Ensembl                           54123     54101    0.20
                    GermOnline                        42029     41413    0.16
                    PDB                               38670     10525    0.14
                    ArrayExpress                      37093     37093    0.14
                    PIRSF                             36473     33978    0.14
                    SMR                               36314     36314    0.13
                    RZPD-ProtExp                      28321     13310    0.11
                    TIGR                              24163     23538    0.09
                    LinkHub                           17851     17834    0.07
                    HGNC                              16065     15996    0.06
                    IntAct                            14505     14505    0.05
                    MGI                               13185     13140    0.05
                    MIM                               13142     10642    0.05
                    DIP8831      8781    0.03
                    SGD6236      6149    0.02
                    CYGD                               6224      6135    0.02
                    RGD5989      5985    0.02
                    TAIR                               5775      5677    0.02
                    MEROPS                             5482      5173    0.02
                    EcoGene                            4311      4308    0.02
                    EchoBASE                           4158      4126    0.02
                    H-InvDB                            3677      3659    0.01
                    WormPep                            3652      3028    0.01
                    FlyBase                            3313      3189    0.01
                    WormBase                           3304      3222    0.01
                    GeneDB_Spombe                      3221      3186    0.01
                    Gramene                            3075      3075    0.01
                    TRANSFAC                           2884      2589    0.01
                    SubtiList                          2795      2794    0.01
                    Reactome                           2707      1546    0.01
                    Orphanet                           2513      1615    0.01
                    GeneFarm                           1854      1835    0.01
                    DrugBank                           1826       502    0.01
                    StyGene                            1589      1585    0.01
                    HPA1486      1324    0.01
                    TubercuList                        1446      1410    0.01
                    ZFIN                               1303      1291   <0.01
                    SWISS-2DPAGE                       1181      1181   <0.01
                    PseudoCAP                          1164      1155   <0.01
                    ListiList                          1093      1085   <0.01
                    REPRODUCTION-2DPAGE                 834       834   <0.01
                    Leproma                             636       633   <0.01
                    AGD627       621   <0.01
                    PhotoList                           602       602   <0.01
                    LegioList                           564       564   <0.01
                    MaizeGDB                            458       453   <0.01
                    OGP379       378   <0.01
                    PeroxiBase                          377       366   <0.01
                    REBASE                              364       358   <0.01
                    HIV361       351   <0.01
                    ECO2DBASE                           351       299   <0.01
                    SagaList                            349       348   <0.01
                    DictyBase                           347       344   <0.01
                    GlycoSuiteDB                        282       282   <0.01
                    PHCI-2DPAGE                         241       241   <0.01
                    MypuList                            193       193   <0.01
                    DOSAC-COBS-2DPAGE                   149       147   <0.01
                    Aarhus/Ghent-2DPAGE                 128        98   <0.01
                    Siena-2DPAGE                        103       103   <0.01
                    HSC-2DPAGE                           85        85   <0.01
                    PhosSite                             70        70   <0.01
                    Cornea-2DPAGE                        67        67   <0.01
                    COMPLUYEAST-2DPAGE                   59        59   <0.01
                    euHCVdb                              55        44   <0.01
                    PMMA-2DPAGE                          52        52   <0.01
                    PptaseDB                             29        29   <0.01
                    Rat-heart-2DPAGE                     28        28   <0.01
                    ANU-2DPAGE                           25        25   <0.01
                    BuruList                              5         5   <0.01
                    
                    Number of explicitly cross-referenced databases: 88
                    Number of implicitly cross-referenced databases: 26
                    
                    
                    7.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in UniProtKB/Swiss-Prot: 241395
                    
                    Total number of entries encoded on a Mitochondrion: 4307
                    Total number of entries encoded on a Plasmid: 3324
                    Total number of entries encoded on a Plastid: 26
                    Total number of entries encoded on a Plastid; Apicoplast: 6
                    Total number of entries encoded on a Plastid; Chloroplast: 8139
                    Total number of entries encoded on a Plastid; Cyanelle: 145
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 91
                    
                    Number of fragments: 8376 
                    Number of additional sequences produced by alternative splicing, initiation or promoter usage: 20101
                    
                

UniProtKB/TrEMBL protein database release 36.0 statistics

                    
                    1.  INTRODUCTION
                    
                    Release 36.0 of 29-May-2007 of UniProtKB/TrEMBL contains 4377315 sequence entries
                    comprising 1418480772 amino acids.
                    
                    635321 sequences have been added since release 35, the sequence data of
                    6733 existing entries has been updated and the annotations of
                    2544163 entries have been revised. This represents an increase of 17%.
                    
                    
                    
                    2.  AMINO ACID COMPOSITION
                    
                    2.1  Composition in percent for the complete database
                    
                    Ala (A) 8.60   Gln (Q) 3.93   Leu (L) 9.87   Ser (S) 6.80
                    Arg (R) 5.59   Glu (E) 6.03   Lys (K) 5.16   Thr (T) 5.60
                    Asn (N) 4.19   Gly (G) 7.07   Met (M) 2.40   Trp (W) 1.33
                    Asp (D) 5.26   His (H) 2.22   Phe (F) 4.03   Tyr (Y) 3.00
                    Cys (C) 1.35   Ile (I) 5.89   Pro (P) 4.86   Val (V) 6.66
                    
                    Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05
                    
                    
                    2.2  Classification of the amino acids by their frequency
                    
                    Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
                    Gln, Tyr, Met, His, Cys, Trp
                    
                    
                    3.  TAXONOMIC ORIGIN
                    
                    Total number of species represented in this release of UniProtKB/TrEMBL: 133843
                    
                    The first twenty species represent  811024 sequences:  18.5 % of the
                    total number of entries.
                    
                    
                    3.1 Table of the frequency of occurrence of species
                    
                    Species represented 1x:61175
                    2x:25028
                    3x:12878
                    4x: 7213
                    5x: 4220
                    6x: 3118
                    7x: 2264
                    8x: 1889
                    9x: 1517
                    10x: 1615
                    11- 20x: 7245
                    21- 50x: 2799
                    51-100x: 1134
                    >100x: 1748
                    
                    
                    3.2  Table of the most represented species
                    
                    ------  ---------  --------------------------------------------
                    Number  Frequency  Species
                    ------  ---------  --------------------------------------------
                    1     183943  Human immunodeficiency virus 1
                    2      95950  Oryza sativa subsp. japonica (Rice)
                    3      51267  Homo sapiens (Human)
                    4      50189  Trichomonas vaginalis G3
                    5      49978  Mus musculus (Mouse)
                    6      44092  Arabidopsis thaliana (Mouse-ear cress)
                    7      39844  Paramecium tetraurelia
                    8      38479  Oryza sativa subsp. indica (Rice)
                    9      37042  Hepatitis C virus
                    10      28036  Tetraodon nigroviridis (Green puffer)
                    11      26966  Drosophila melanogaster (Fruit fly)
                    12      22297  Medicago truncatula (Barrel medic)
                    13      20231  Caenorhabditis elegans
                    14      20162  Trypanosoma cruzi
                    15      19623  Danio rerio (Zebrafish) (Brachydanio rerio)
                    16      18255  uncultured bacterium
                    17      16853  Aedes aegypti (Yellowfever mosquito)
                    18      16685  Tetrahymena thermophila SB210
                    19      16460  Phaeosphaeria nodorum (Septoria nodorum)
                    20      14672  Plasmodium chabaudi
                    21      14311  Hepatitis B virus (HBV)
                    22      13528  Aspergillus niger
                    23      13412  Anopheles gambiae str. PEST
                    24      13071  Dictyostelium discoideum AX4
                    25      13062  Caenorhabditis briggsae
                    26      12905  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
                    27      12570  Xenopus laevis (African clawed frog)
                    28      12020  Aspergillus oryzae
                    29      11783  Plasmodium berghei
                    30      10991  Chaetomium globosum (Soil fungus)
                    31      10645  Neurospora crassa
                    32      10429  Coccidioides immitis
                    33      10370  Neosartorya fischeri  (Aspergillus fischerianus 
                    34      10360  Aspergillus terreus (strain NIH 2624)
                    35      10067  Drosophila pseudoobscura (Fruit fly)
                    36       9747  Cryptococcus neoformans (Filobasidiella neoformans)
                    37       9721  Aspergillus fumigatus (Sartorya fumigata)
                    38       9720  Schistosoma japonicum (Blood fluke)
                    39       9518  Emericella nidulans (Aspergillus nidulans)
                    40       9453  Trypanosoma brucei
                    41       9332  Candida albicans (Yeast)
                    42       9080  Aspergillus clavatus
                    43       9004  Escherichia coli
                    44       8987  Rhodococcus sp. (strain RHA1)
                    45       8557  Rattus norvegicus (Rat)
                    46       8512  Stigmatella aurantiaca DW4/3-1
                    47       8424  Burkholderia xenovorans (strain LB400)
                    48       8285  Bos taurus (Bovine)
                    49       8249  Microscilla marina ATCC 23134
                    50       8123  Bradyrhizobium japonicum
                    51       8011  Leishmania infantum
                    52       7975  Ostreococcus tauri
                    53       7937  Frankia sp. EAN1pec
                    54       7880  Leishmania braziliensis
                    55       7834  Burkholderia phymatum STM815
                    56       7808  Plasmodium yoelii yoelii
                    57       7757  Solibacter usitatus (strain Ellin6076)
                    58       7659  Helicobacter pylori (Campylobacter pylori)
                    59       7522  Streptomyces coelicolor
                    60       7461  Burkholderia cenocepacia MC0-3
                    61       7439  Burkholderia sp. (strain 383) (Burkholderia cepacia 
                    62       7432  Bradyrhizobium sp. BTAi1
                    63       7409  Burkholderia vietnamiensis G4
                    64       7403  Ostreococcus lucimarinus CCE9901
                    65       7349  Burkholderia pseudomallei 305
                    66       7310  Burkholderia phytofirmans PsJN
                    67       7297  Streptomyces avermitilis
                    68       7274  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
                    69       7215  Burkholderia pseudomallei (strain 668)
                    70       7199  Myxococcus xanthus (strain DK 1622)
                    71       7161  Saccharopolyspora erythraea (strain ATCC 11635 / DSM 40517 / NRRL 2338)
                    72       7147  Burkholderia pseudomallei (strain 1106a)
                    73       7136  Rhizobium loti (Mesorhizobium loti)
                    74       7113  Hepatitis C virus subtype 1b
                    75       7113  Leishmania major
                    76       6996  Burkholderia ambifaria MC40-6
                    77       6986  Rhizobium leguminosarum bv. viciae (strain 3841)
                    78       6953  Rhodopirellula baltica
                    79       6916  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
                    80       6870  Burkholderia cenocepacia (strain HI2424)
                    81       6726  Pseudomonas aeruginosa
                    82       6711  Bradyrhizobium sp. ORS278
                    83       6704  Frankia alni (strain ACN14a)
                    84       6679  Psychroflexus torquis ATCC 700755
                    85       6595  Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
                    86       6581  Burkholderia cepacia (strain ATCC 53795 / AMMD)
                    87       6564  Burkholderia multivorans ATCC 17616
                    88       6553  Hahella chejuensis (strain KCTC 2396)
                    89       6517  Plasmodium falciparum
                    90       6501  Ralstonia eutropha  (Cupriavidus necator 
                    91       6468  Ustilago maydis (Smut fungus)
                    92       6411  Cyanothece sp. CCY0110
                    93       6394  Giardia lamblia ATCC 50803
                    94       6337  Sinorhizobium medicae WSM419
                    95       6302  Burkholderia cenocepacia (strain AU 1054)
                    96       6300  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
                    97       6272  Stappia aggregata IAM 12614
                    98       6227  Oryza sativa (Rice)
                    99       6172  Bacillus anthracis
                    100       6170  Yarrowia lipolytica (Candida lipolytica)
                    
                    
                    
                    3.3  Taxonomic distribution of the sequences
                    
                    Kingdom        sequences (% of the database)
                    Archaea           97220 (  2%)
                    Bacteria        2327609 ( 53%)
                    Eukaryota       1446119 ( 33%)
                    Viruses          502656 ( 11%)
                    Other              3709 ( <1%)
                    
                    
                    
                    Within Eukaryota:
                    
                    Category            sequences (% of Eukaryota) (% of the complete database)
                    Human                  51267 (  4%)           (  1%)
                    Other Mammalia        127002 (  9%)           (  3%)
                    Other Vertebrata      174804 ( 12%)           (  4%)
                    Viridiplantae         355389 ( 25%)           (  8%)
                    Fungi                 225852 ( 16%)           (  5%)
                    Insecta               144995 ( 10%)           (  3%)
                    Nematoda               37220 (  3%)           (  1%)
                    Other                 329590 ( 23%)           (  8%)
                    
                    
                    
                    4.  SEQUENCE SIZE
                    
                    Repartition of the sequences by size (excluding fragments)
                    
                    From   To  Number             From   To   Number
                    1-  50   60636             1001-1100    26286
                    51- 100  301601             1101-1200    18518
                    101- 150  378501             1201-1300    12788
                    151- 200  360318             1301-1400     8482
                    201- 250  361834             1401-1500     6934
                    251- 300  346925             1501-1600     5095
                    301- 350  322854             1601-1700     3905
                    351- 400  254139             1701-1800     3210
                    401- 450  208459             1801-1900     2411
                    451- 500  175752             1901-2000     2043
                    501- 550  126518             2001-2100     1627
                    551- 600   93347             2101-2200     1698
                    601- 650   70033             2201-2300     1317
                    651- 700   54300             2301-2400     1093
                    701- 750   47578             2401-2500      864
                    751- 800   42478             >2500         7587
                    801- 850   31364
                    851- 900   27564
                    901- 950   20246
                    951-1000   15849
                    
                    The average sequence length in UniProtKB/TrEMBL is   324 amino acids.
                    
                    The shortest sequence is Q96AT0_HUMAN:     4 amino acids.
                    The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.
                    
                    
                    
                    5.  STATISTICS FOR SOME LINE TYPES
                    
                    The following table summarizes the total number of some UniProtKB/TrEMBL lines,
                    as well as the number of entries with at least one such line, and the
                    frequency of the lines.
                    
                    Total    Number of  Average
                    Line type / subtype                number   entries    per entry
                    ---------------------------------  -------- ---------  ---------
                    
                    References (RL)                    6233829              1.42
                    Submitted to EMBL/GenBank/DDBJ  3316771   2557101    0.76
                    Journal                         2829436   2384972    0.65
                    Thesis                             6545      6491   <0.01
                    Book citation                      4240      4195   <0.01
                    Submitted to other databases        273       267   <0.01
                    Other                             76564     44407    0.02
                    
                    Comments (CC)                      1728629              0.39
                    CAUTION                          930449    930449    0.21
                    SIMILARITY                       281294    276006    0.06
                    FUNCTION                         120986    113447    0.03
                    SUBCELLULAR LOCATION             116592    116592    0.03
                    CATALYTIC ACTIVITY               111221    100470    0.03
                    SUBUNIT                           78221     78221    0.02
                    COFACTOR                          58468     58243    0.01
                    PATHWAY                           20159     16511   <0.01
                    DOMAIN                             4879      4275   <0.01
                    MISCELLANEOUS                      3656      3656   <0.01
                    INTERACTION                        2695      2695   <0.01
                    ALLERGEN                              5         5   <0.01
                    MASS SPECTROMETRY                     4         4   <0.01
                    
                    Features (FT)                      1963260              0.45
                    NON_TER                         1627029    970945    0.37
                    CHAIN                            198930    168578    0.05
                    SIGNAL                           136762    136762    0.03
                    TRANSIT                             539       539   <0.01
                    
                    Cross-references (DR)             32364872              7.39
                    InterPro                        6595878   3152098    1.51
                    GO                              5811332   2060966    1.33
                    EMBL                            4964829   4369327    1.13
                    Pfam                            4064279   3000740    0.93
                    PROSITE                         2155488   1407184    0.49
                    GenomeReviews                   1143708   1099331    0.26
                    Gene3D                           933992    822246    0.21
                    KEGG                             869694    832105    0.20
                    PRINTS                           853606    716490    0.20
                    SMART                            761643    594461    0.17
                    TIGRFAMs                         666776    612627    0.15
                    PANTHER                          578057    551858    0.13
                    ProDom                           531615    507197    0.12
                    SMR                              395863    395863    0.09
                    BioCyc                           280518    265666    0.06
                    HSSP                             270957    270554    0.06
                    UniGene                          242685    224363    0.06
                    PIR                              184223    149145    0.04
                    TIGR                             171226    164627    0.04
                    Ensembl                          157071    157070    0.04
                    PIRSF                            154302    147467    0.04
                    RZPD-ProtExp                     108875     33477    0.02
                    ArrayExpress                      95585     95495    0.02
                    Gramene                           70734     70734    0.02
                    MGI                               42221     41755    0.01
                    HGNC                              34930     34883    0.01
                    euHCVdb                           32511     32511    0.01
                    FlyBase                           24665     24629    0.01
                    WormPep                           19286     19205   <0.01
                    TAIR                              18958     18899   <0.01
                    WormBase                          18790     18711   <0.01
                    ZFIN                              15410     15406   <0.01
                    LinkHub                           13489     13489   <0.01
                    DictyBase                         12917     12917   <0.01
                    MEROPS                            11642     11209   <0.01
                    LegioList                          5339      5309   <0.01
                    IntAct                             5322      5322   <0.01
                    ListiList                          4722      4705   <0.01
                    PseudoCAP                          4407      4404   <0.01
                    PDB4323      2607   <0.01
                    BuruList                           4235      4201   <0.01
                    PhotoList                          4078      3954   <0.01
                    AGD4073      4073   <0.01
                    RGD3795      3466   <0.01
                    REBASE                             3692      3667   <0.01
                    TubercuList                        2543      2537   <0.01
                    DIP2509      2504   <0.01
                    GeneDB_Spombe                      1770      1758   <0.01
                    SagaList                           1745      1651   <0.01
                    PeroxiBase                         1371      1368   <0.01
                    Leproma                             971       970   <0.01
                    TRANSFAC                            872       862   <0.01
                    MypuList                            589       585   <0.01
                    SGD375       374   <0.01
                    PHCI-2DPAGE                         106       106   <0.01
                    CYGD101        98   <0.01
                    ANU-2DPAGE                           60        60   <0.01
                    Reactome                             46        33   <0.01
                    SWISS-2DPAGE                         37        37   <0.01
                    REPRODUCTION-2DPAGE                  30        30   <0.01
                    PMMA-2DPAGE                           3         3   <0.01
                    Siena-2DPAGE                          2         2   <0.01
                    COMPLUYEAST-2DPAGE                    1         1   <0.01
                    
                    Number of explicitly cross-referenced databases: 87
                    
                    
                    6.  MISCELLANEOUS STATISTICS
                    
                    Total number of distinct authors cited in UniProtKB/TrEMBL: 248496
                    
                    Total number of entries encoded on a Mitochondrion: 167124
                    Total number of entries encoded on a Plasmid: 71535
                    Total number of entries encoded on a Plastid: 3525
                    Total number of entries encoded on a Plastid; Apicoplast: 183
                    Total number of entries encoded on a Plastid; Chloroplast: 57807
                    Total number of entries encoded on a Plastid; Cyanelle: 7
                    Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 212
                    
                    Number of fragments: 973161
                    
                

Submissions and Updates

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.

Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml

For all queries regarding submissions to UniProtKB and to submit new protein sequence data, please contact:

UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail: datasubs@ebi.ac.uk


Download information

Bi-Weekly releases

The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic

Major releases

For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on DVD from the EBI.


Contact

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address: datalib@ebi.ac.uk / swissprot@ebi.ac.uk
WWW server: http://www.ebi.ac.uk/


SIB Swiss Institute of Bioinformatics
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4
Switzerland

Telephone: (+41 22) 379 50 50
Fax: (+41 22) 379 58 58
Electronic mail address: Swiss-Prot@expasy.org
WWW server: http://www.expasy.org/


Protein Information Resource (PIR)
Georgetown University Medical Center
3300 Whitehaven St., Suite 1200
Washington, DC 20008
United States of America

Telephone: (+1 202) 687 1039
Fax: (+1 202) 687 0057)
Electronic mail address: pirmail@georgetown.edu
WWW server: http://pir.georgetown.edu

Citation

If you want to cite UniProt in a publication please use the following reference:

The UniProt Consortium
"The Universal Protein Resource (UniProt)"
Nucleic Acids Res. 35:D193-D197(2007) doi:10.1093/nar/gkl929