Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Modified residue

Last modified February 24, 2017

This subsection of the ‘PTM / Processing’ section specifies the position and type of each modified residue excluding lipids, glycans and protein cross-links.

Common modifications include phosphorylation, methylation, acetylation, amidation, formation of pyrrolidone carboxylic acid, isomerization, hydroxylation, sulfation, flavin-binding, cysteine oxidation and nitrosylation.

We describe the chemical nature of the modified residue using a controlled vocabulary (see the document ‘Controlled vocabulary of posttranslational modifications (PTM)’).

We provide additional information concerning the modification, such as:

  1. the form of the protein which undergoes the modification; this may be either a specific isoform, a particular processed or modified form of the protein, or a specific sequence variant;
    Examples: P41500 (isoform), P84715 (processed form), P68871 (sequence variant)
  2. the enzyme which carries out the modification (‘by…’). For proteins of infectious organisms, such as viruses, phages and bacteria, we also indicate whether the modification is carried out by a host protein;
    Examples: P03279 (modifying enzyme indicated), Q53EZ4 (viral protein modified by host protein)
  3. information on the frequency of the modification or the relationship with another feature (‘partial’, ‘alternate’, ‘transient’). The term ‘partial’ indicates that not all protein molecules are modified, ‘alternate’ means that the same amino acid can be modified in more than one way, and ‘transient’ is applied to exceptional cases of otherwise stable modifications. For partial modifications, we do not propagate this comment to homologous proteins and we do not specify the fraction of proteins modified, as this may depend on the experimental conditions.
    Examples: P16157 (partial modification), Q9UKV3 (alternate modification), Q9FX54 (transient modification)

See also the subsection Post-translational modifications for additional information on modifications for which position-specific data is not yet available.

Data sources and propagation of modified residue annotation

We annotate experimentally determined sites of modification. These are propagated ‘By similarity’ to related orthologs provided the following criteria are met:

  1. the modification should be necessary for protein function (and therefore likely to occur in related organisms);
  2. the enzyme which performs the modification must exist in the related organism, or the same type of modification must have already been observed (this condition is mandatory);
  3. the modified residue and the surrounding region should be conserved in the orthologous protein (this condition is also mandatory).

Propagation is generally restricted to closely related species, e.g. among mammals or bacteria from the same taxonomic group. We usually do not propagate information concerning modified residues among plants or unicellular fungi. Specific details regarding the type of modification may be generalized during propagation. For instance, a particular lysine can be subjected to mono-or dimethylation in E.coli strain K12, but this information is not propagated to the E.coli O157:H7 orthologous entry, where we simply indicate that methylation occurs.
Example: P0CE47 (source entry), P0A6N3 (orthologous entry)

Sites that are not modified

When a site is found not to be modified, and if this is of biological significance, we indicate it in the ‘Site’ subsection. This information is not propagated in related entries.
Examples: Q10471, P62152, P32457, P07173.

Unknown sites bearing unknown modifications

Here is an example of a feature where the identity of the amino acid is unknown (an X is shown at this position in the sequence) and the only information concerning the modification is that the N-terminus is blocked: P80979 (Blocked amino end (Xaa)).

1. Phosphorylation

Phosphorylation refers to the transfer of a gamma-phosphate to an amino acid. It is a key mechanism for signaling in both eukaryotic and prokaryotic cells. It can occur on a number of cytoplasmic and nuclear residues, i.e. on the hydroxyl group of serine, threonine or tyrosine, on the nitrogen of arginine, histidine or lysine, on the carboxyl group of aspartate, or on the sulfhydryl group of cysteine.

Related keyword: Phosphoprotein

Phosphorylation is frequent on serine, threonine, and tyrosine from eukaryotic proteins, serine phosphorylation being the most common. Phosphorylation of histidine and aspartate is known to occur as part of the two-component signalling in prokaryotes and has also been described in eukaryotes (mainly fungi and plants).
Example: O15350

Since phosphorylation (phosphoserine, phosphothreonine and phosphotyrosine) is a reversible modification, phosphorylation sites are never annotated as ‘partial’.
Examples: Q9RQQ9, P02662, P04859, P68433

Histidine can be phosphorylated on either of its two nitrogen atoms. We refer to ‘Pros-phosphohistidine’, when phosphorylation occurs on the nitrogen atom that is closest to the alpha-carbon and ‘Tele-phosphohistidine’, where it occurs on the most distal one. When the exact position of phosphate attachment on histidine is not known we simply use the term ‘Phosphohistidine’.
Examples: P0AA06, P39928, P16575, P26762, P0A0E2

Note that phosphorylation of aspartate follows a specific syntax:
Example: P04042

We annotate experimentally determined phosphorylation sites and transfer this information to related orthologs as described. We do not annotate predicted phosphorylation sites. When transferring annotation regarding phosphorylation, we do not usually specify the kinase responsible in the orthologous entry, except when the modification is part of a precise, well-studied transduction pathway. For phosphohistidine, we propagate the position of the phosphorylated nitrogen atom.

2. Methylation

Cytoplasmic and nuclear proteins can be enzymatically modified in several ways by the addition of methyl groups from S-adenosylmethionine. Methylation reactions occurring on carboxyl groups can be reversible and modulate the activity of the target protein, while those on nitrogen atoms at the N-terminus and on side-chains are usually irreversible.

Related keyword: Methylation

Carboxyl methylation

Carboxyl methylation can occur either on a C-terminal cysteine, leucine or lysine residue, or on the side chain of a glutamate residue (or glutamine, after deamidation). It can affect protein-protein interactions and protein function.
Examples: P67775, P02994

Cysteine carboxymethylation frequently occurs after prenylation of the CAAX (Cys – aliphatic_twice – any residue) sequence and proteolytic cleavage of the C-A bond.
Example: P34068

In prokaryotes, glutamate methyl ester formation plays a major role in chemotactic signal transduction. We indicate whether the glutamate methyl ester is formed either from glutamate or from glutamine.
Example: P07018

Nitrogen methylation

Nitrogen methylation can occur on the N-terminus of a polypeptide chain on a phenylalanine, isoleucine, leucine, methionine, tyrosine, proline or alanine residue. It can also occur on the side chain of lysine, arginine or histidine residues. In eukaryotes, arginine and lysine methylation have been found mainly on histones and play an important role in signal transduction processes, nuclear transport and regulation of transcription.
Examples: P0A7N9, P00873

Histidine methylation

Histidine methylation can occur on two positions, which are specified as ‘Pros-methylhistidine’ and ‘Tele-methylhistidine’. When the exact position of the methylation is not known, we simply indicate ‘Methylhistidine’.
Examples: P31725, P62739, P24020

Lysine methylation

Lysine can be mono-, di-, or tri-methylated. We describe each of these modifications as ‘N6-methyllysine’, ‘N6,N6-dimethyllysine’ and ‘N6,N6,N6-trimethyllysine’. If the number of methyl groups is unknown, we simply indicate ‘N6-methylated lysine’. This general description is also used when propagating the information ‘By similarity’ to homologous proteins.
Example: P13538

Lysine methylation can compete with acetylation on the same residue. In this case, the modifications are described as ‘alternate’.

Arginine methylation

Arginine can be monomethylated or dimethylated in two ways which leads to four different descriptions: ‘Omega-N-methylarginine’, ‘Dimethylated arginine’, ‘Symmetric dimethylarginine’ and ‘Asymmetric dimethylarginine’. If the residue is dimethylated, but the position of the methyl groups is unknown, it is annotated as ‘Dimethylated arginine’. If the number of methyl groups is unknown, the ambiguous description ‘Omega-N-methylated arginine’ is used. This general description is also used when propagating information to homologous proteins.
Examples: Q07666, P11940, Q60487, P45481

Other rare examples of side chain methylation

Examples: P00318, P0A287

3. Acetylation

We annotate both N-terminal acetylation and acetylation on internal residues.

Related keyword: Acetylation

N-terminal acetylation

N-terminal acetylation is one of the most common post-translational modifications in eukaryotes, but it is rare in prokaryotes. It refers to the addition of an acetyl group from acetyl-CoA to the alpha-amino group of the first residue of a protein, often after the cleavage of the initiator methionine. The most commonly acetylated residues are glycine, alanine, serine or threonine. This reaction occurs in the cytosol. Methionine residues can also be modified if the next residue is an aspartate, glutamate, leucine, isoleucine, tryptophan, phenylalanine or asparagine residue. Note that the modified position may not correspond to the first amino acid of the displayed sequence if N-terminal acetylation occurs after proteolytic processing of the chain.

We annotate experimentally determined sites of N-terminal acetylation and this information is propagated ‘By similarity’ to homologous proteins in related species.
Examples: P23542, P68251, P01201, P41682, Q71SP7

Internal acetylation

Internal acetylation is the addition of a N-alpha-acetyl group from acetyl-CoA to the side chain of a lysine residue. In eukaryotes, it generally takes place in the nucleus and affects mainly, but not exclusively, histones. It also occurs in prokaryotes.

Lysine acetylation can compete with acetylation on the same residue, in which case both modifications are described as ‘alternate’.

We annotate experimentally determined sites of internal acetylation and propagate the information ‘By similarity’ to homologous proteins in related species.
Examples: P0C0S9, Q12158, Q9NHD5, Q8ZKF6, Q88EH6

4. Amidation

The C-terminus of secreted proteins is often modified by cleavage between the amino group and the alpha carbon of a C-terminal glycine, resulting in the amidation of the precedent amino acid. This modification protects the C-terminus from degradation by proteases. Amidated proteins contain a signal peptide and are often processed prior to amidation in order to expose a glycine at C-terminus. Amidation has been observed in eukaryotes including mammals, non-mammalian vertebrates and insects, but not in plants.

Related keyword: Amidation

The sequence including the amidated position generally conforms to the consensus cleavage sequence for proconvertases, which is G[RK] or GRK. Following proconvertase cleavage, removal of the pair of basic residues is accomplished by carboxypeptidase H, while the terminal glycine is converted to a new C-terminal amide by the bifunctional peptidylglycine alpha-amidating monooxygenase (PAM).

We annotate experimentally determined sites of amidation, which may be propagated to homologous proteins in related species. For each amidation, we specify the position of the glycine which provides the amide group, provided it is present within the mature protein chain. We also specify the position at which the protein is cleaved, if that is known, and the limits of the peptide which is removed by cleavage. For aspartate and glutamate amides the position of the amide is also indicated as the side chains of these amino acids can theoretically be modified to amides by a different reaction.
Examples: P58913, P09859

Terminal amidation

Examples: P69148, P82387

Glutamate amidation

Example: P20481

5. Pyrrolidone carboxylic acid

The N-terminal glutamine of extracellular and multi-pass membrane proteins can be modified by cyclization of the glutamine via condensation of the alpha-amino group with the side-chain carboxyl group. Modified proteins show an increased half-life. Modified proteins of this type cannot be sequenced by the Edman method, they are blocked.

This modification can also occur from a glutamate residue but this seems to be extremely rare, and may be correlated with an extreme acidic context of the protein involved. We specify it by indicating ‘Pyrrolidone carboxylic acid (Glu)’ in the ‘Description’ field.

This modification has been observed in eukaryotes (including mammals, plants, insects), archaea (including halobacteria) and bacteria (including proteobacteria and actinobacteria).
Synonyms: pyrrolidone carboxylic acid (Pca), pyroglutamic acid (Pga), pyroglutamate (pyro-Glu, pGlu, Pyr).
Note: Pyro-Glu is often indicated in papers as ‘pGlu’ and sometimes, in one-letter code as “U”, although this is now used for selenocysteine. In figures of publications, it may be cited as Z, pQ or E.
Examples: P30233, P68000, P02945

Related keyword: Pyrrolidone carboxylic acid

We annotate experimentally identified pyrrolidone carboxylic acid modifications, which may be propagated to homologous proteins in related species. When the N-terminal glutamine of an extracellular protein is known to be blocked, we annotate it as a pyrrolidone carboxylic acid with the evidence ‘Curated’.
Example: P12111

6. Isomerization

In the translation process, only L-amino acids are incorporated into nascent polypeptides by the ribosomes. However, a variety of amino acids in secreted peptides are post-translationally converted to the D-form, probably via a deprotonation-reprotonation mechanism at the alpha-carbon. This modification has a strong effect on protein conformation and thus on protein activity and interactions. This modification has been found in secreted neuropeptides, toxins and hormones from molluscs, frogs, crustaceans, arachnids, and in bacterial lantibiotics. Synonyms: racemization, epimerisation, stereoinversion.
Example: P35904

Related keyword: D-amino acid

D-alanine can originate from both alanine and serine. The name of the original amino acid is given in brackets in the ‘Description’ field.
Examples: O93456, P23826

2 stereoisomers of L-isoleucine can exist: D-allo-isoleucine and D-threo-isoleucine. However, only the allo-isomer has been observed so far.
Example: P29006

When D-amino acids are found as partners of a cross-link, it is indicated in the ‘Cross-link’ subsection.
Examples: P08136, O07623

7. Hydroxylation

The modified amino acids are generally extracellular (secreted proteins, extracellular matrix proteins, multi-pass membrane proteins).

In animals, 3-hydroxyproline, 4-hydroxyproline and 5-hydroxylysine are mostly found in collagen and collagen-like domain-containing proteins. 4-hydroxyprolines have been shown to stabilize the collagen triple helix. Some of the hydroxylysines are further modified by O-glycosylation. In plants 4-hydroxyproline is mostly found in structural components of cell walls (hydroxyproline-rich glycoproteins, e.g. extensins, proline-rich proteins, arabinogalactan proteins). In most of the extensin and extensin-like domain-containing proteins, hydroxyprolines are further O-glycosylated. 4-hydroxyproline is also found in secreted neuropeptides, toxins, anti-fungal and anti-bacterial peptides of several molluscs, insects and plants.

3-hydroxyaspartate and 3-hydroxyasparagine have been found in EGF-like domain-containing proteins, e.g. blood coagulation protein factors VII, IX and X, proteins C, S, and Z, the LDL receptor and thrombomodulin.

Related keyword: Hydroxylation

Hydroxyproline

Both C3 and C4 of proline can be hydroxylated. The carbon bearing the hydroxyl group is indicated in the ‘Description’ field, when known. In the absence of precise information, we use the ambiguous description ‘Hydroxyproline’. Note that hydroxyproline can be further glycosylated.

4-hydroxyproline
Examples: Q5E9E3, P60245

3-hydroxyproline
Examples: P30754, Q9S8M0, Q16665

The information on hydroxyproline may be propagated ‘By similarity’ to closely related organisms.
Example: Q15848

Hydroxylysine

5-hydroxylysine

Example: Q3Y5Z3

Note: Hydroxylysine can be glycosylated.

4,5-dihydroxylysine, 3’,4’-dihydroxyphenylalanine (DOPA), 3,4-dihydroxyarginine
Examples: O18495, Q25460, O18496

Hydroxyasparagine

3-hydroxyasparagine, 3-hydroxyaspartate.
Examples: Q8CG16, O88278

8. Sulfation

Extracellular tyrosines of secreted and multi-pass membrane proteins can
be modified by the addition of a sulfate group. Cytoplasmic serine and threonine residues can also undergo sulfation, although very rarely. Although the function of this modification has not yet been fully elucidated, it may serve to enhance protein stability and modulate protein-protein interactions. When carbohydrates attached to proteins are sulfated, we indicate this fact in the ‘Post-translational modification’ subsection. Sulfation has been observed in eukaryotes only.

Related keyword: Sulfation

Carbohydrates attached to proteins can also be sulfated, this is annotated in the ‘Post-translational modification’ subsection, but won’t lead to the attribution of the keyword ‘Sulfation’, which applies only to the modification of the protein itself, but not to the attached carbohydrate groups.

Tyrosine sulfation

Example: Q7LZ52

Serine and threonine sulfation:

Examples: Q01974, Q8IIJ9

We annotate not only experimentally determined sites of tyrosine sulfation (as well as that on serine and threonine residues), but also tyrosine sulfation sites predicted with the ‘Sulfinator’ tool. Sulfation prediction is taken into account only when this modification is known to occur on the protein concerned but the exact site is not known. The annotation of a predicted site is flagged with ‘Sequence Analysis’ (Sequence Model).
Example: P30443

9. Flavin-binding

Enzymes involved in electron transport, oxidation and reduction often contain a flavin group (flavin mononucleotide [FMN]) or flavin adenine dinucleotide [FAD]) as a cofactor. When the cofactor is covalently bound to the protein, this is considered as a post-translational modification and is annotated in the ‘Amino acid modifications’ subsection. When the cofactor is not covalently bound, the binding region is annotated in the ‘Nucleotide binding’ subsection. The flavin can be transferred to the hydroxyl group of a serine, threonine or tyrosine, to the one or the other nitrogen of a histidine or to the sulfhydryl group of a cysteine. Flavin-binding has been observed in all organisms.
Examples: P21398, P09788, Q9UI17, Q752Y3, Q9K0M5, Q9KPS2, O34627, Q48303, P40875

Related keywords: Flavoprotein, FAD, FMN

10. Cysteine oxidation and nitrosylation

Sulfur occurs in many different oxidation states in biological systems. In response to mild oxidative stress, reactive oxygen and nitrogen species, such as peroxides, superoxide, nitric oxide or peroxinitrite, can oxidize cytoplasmic cysteines to cysteine sulfenic acid (-SOH), S-nitrosocysteine (-SNO) and sulfinic acid (-SO2H)(single, single and double oxidation state, respectively). These modifications can alter protein activity, protein-protein interactions, or protein stability.

Cysteine sulfenic acid is extremely reactive. It is generally stabilized
by either the formation of a disulfide bond (in this case the residue is called a cystine) or glutathione binding or reduced by specific enzymes to cysteine. Disulfide bonds are annotated in the ‘Disulfide bond’ subsection.

In response to severe oxidative stress, cysteines are irreversibly oxidized to cysteine sulfonic acid (-SO3) or cysteic acid, which generally leads to protein inactivation and degradation. We do not annotate this modification.

Sulfenic and sulfinic acid

Examples: P13448, P03122

Related keyword: Oxidation

S-nitrosocysteine

Examples: P68871, P0ACQ6

Related keyword: S-nitrosylation

Related documents

The nature of the post-translationally formed amino acid is annotated by using a controlled vocabulary. The currently defined list of controlled vocabulary, as well as other information, such as the target amino acid, the related keyword, the taxonomic range and the subcellular location of the modification, are available in ptmlist.txt document. Links to the RESID database are also provided to help gain a better insight into every modification.

See also: Evidence, Post-translational modifications