Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniProt release 2015_03

Published March 4, 2015


Regulation of translation initiation through folding

Many physiopathological events, such as stress or nutrient deprivation, induce rapid changes in cellular protein levels. In these cases, cells preferentially use translational control of existing mRNAs over transcriptional control, since the latter generates a slower response. Translation can be divided into 4 steps, initiation, elongation, termination, and ribosome recycling, but most regulation occurs at the initiation level.
In eukaryotes, translation initiation involves recruitment of the 40S ribosome to mRNA by the eukaryotic initiation factor 4F (eIF4F) complex. This complex is composed of eIF4E, which binds to the mRNA 5' cap structure, eIF4A, an RNA helicase and eIF4G, a scaffolding protein. Availability of eIF4E is rate-limiting in this process and it is an important target for control. Under stress or starvation conditions, when translation has to be rapidly repressed, eIF4E binding proteins (4E-BPs) interact with eIF4E outcompeting eIF4G, hence preventing eIF4F assembly and cap-dependent translation initiation. 3 4E-BPs have been identified in mammals. 4E-BP2 (EIF4EBP2) is one of them. It is an intrinsically disordered protein (IDP) that contains several phosphorylation sites. In its unphosphorylated state, 4E-BP2 interacts with eIF4E via 2 domains: a YXXXXLΦ motif (residues 54 through 60) and a secondary dynamic motif (residues 78 through 82). The unphosphorylated (or minimally phosphorylated), eIF4E-binding form of EIF4EBP2 is unstable and targeted for degradation via the ubiquitin-proteasome pathway. By contrast, highly phosphorylated 4E-BP2 is very stable, but only weakly binds to eIF4E and hence can be outcompeted by eIF4G, allowing translation to occur.

How does phosphorylation regulate 4E-BP2 interaction with eIF4E and its stability? It has been recently shown that phosphorylation induces a widespread disorder-to-order transition occurring in 2 steps. First, phosphorylation at Thr-37 and Thr-46 by MTOR induces folding of residues Pro-18 to Arg-62 into a four-stranded β-domain that sequesters the helical YXXXXLΦ motif into a partially buried β-strand, blocking accessibility to eIF4E. The folding also protects Lys-57 from ubiquitination, preventing proteasomal degradation. This ordered structure is further stabilized by phosphorylation at Ser-65, Thr-70 and Ser-83. The fully phosphorylated protein has an affinity for eIF4E 4,000 fold lower than the unphosphorylated form. This observation implies that binding must be coupled to unfolding in order to free the YXXXXLΦ motif, and it is indeed what is experimentally observed. When the phosphorylated form binds eIF4E, it undergoes an order-to-disorder transition, as suggested by NMR spectra that are similar to those of the unphosphorylated form.

Although it has long been suspected that the function of IDPs may be controlled by post-translational modifications (PTMs), this is the first report experimentally showing how a PTM can fold an entire domain. This new data have been annotated into UniProtKB/Swiss-Prot and as of this release, the updated EIF4EBP2 entry is publicly available.

UniProtKB news

New proteomics mapping files

Mappings of UniProt Knowledgebase (UniProtKB) human sequences to identified human peptides from public mass spectrometry (MS) proteomics repositories can now be found in the new dedicated ‘proteomics_mapping’ directory on the UniProt FTP site together with a description of how the mappings were generated. The mappings are based on our analysis of the content of those MS proteomics repositories that openly share with us their data and quality metrics concerning peptide identifications.

Mass spectrometry provides direct experimental evidence for the existence of proteins and these new peptide mappings greatly increase the proportion of human sequences in UniProtKB whose existence is supported by experimental proteomics data. The human reference proteome currently contains 89383 sequences and our analysis provides mass spectrometry evidence for 68229 of those sequences.

In future UniProt releases, we expect to add data from more MS proteomics repositories and additional species. We very much welcome the feedback of the community on our efforts.

New FTP repository for reference proteomes

Based on a gene-centric perspective, UniProt Knowledgebase (UniProtKB) starts to provide data sets for reference proteomes, whose repository can be found at the new reference_proteomes directory.

As of release 2015_03, it encompasses 1933 species distributed in Eukaryota, Archaea and Bacteria. Viruses will be added in the next release.

Removal of the cross-references to PhosSite

Cross-references to PhosSite have been removed.

Removal of the cross-references to PptaseDB

Cross-references to PptaseDB have been removed.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified disease:

Deleted diseases:

  • Amelogenesis imperfecta and gingival fibromatosis syndrome
  • Glycogen storage disease 14
  • Ichthyosis, autosomal recessive, with hypotrichosis
  • Loeys-Dietz syndrome 2A
  • Loeys-Dietz syndrome 2B
  • Leigh syndrome, X-linked
  • Mental retardation, X-linked 59

Changes to keywords

New keyword:

UniParc news

UniParc cross-references with proteome identifier and component

The UniParc XML format uses dbReference elements to represent cross-references to external database records that contain the same sequence as the UniParc record. Additional information about an external database record is provided with different types of property child elements. We have introduced two new types for cross-references to external database records from which UniProt proteomes are derived: The type "proteome_id" shows the identifier of the corresponding UniProt proteome and the type "component" the genomic component which encodes the protein. As a first step, we have added this information to bacterial ENA records.


<entry dataset="uniparc">
    <dbReference type="EMBL" id="AAK44239" version_i="1" active="Y" version="1" created="2003-03-12" last="2014-11-23">
        <property type="NCBI_GI" value="13879058"/>
        <property type="NCBI_taxonomy_id" value="83331"/>
        <property type="protein_name" value="serine/threonine protein kinase"/>
        <property type="gene_name" value="MT0017"/>
        <property type="proteome_id" value="UP000001020"/>
        <property type="component" value="Chromosome"/>
    <dbReference type="EMBL" id="ABQ71734" version_i="1" active="Y" version="1" created="2007-07-09" last="2014-11-23">
        <property type="NCBI_GI" value="148503925"/>
        <property type="NCBI_taxonomy_id" value="419947"/>
        <property type="protein_name" value="serine/threonine protein kinase"/>
        <property type="gene_name" value="pknB"/>
        <property type="proteome_id" value="UP000001988"/>
        <property type="component" value="Chromosome"/>
    <dbReference type="EMBL_CON" id="EFD75652" version_i="1" active="Y" version="2" created="2011-12-05" last="2014-11-23">
        <property type="NCBI_taxonomy_id" value="537209"/>
        <property type="protein_name" value="transmembrane serine/threonine-protein kinase B pknB"/>
        <property type="gene_name" value="TBIG_00439"/>
        <property type="proteome_id" value="UP000004676"/>
        <property type="component" value="Unassembled WGS sequence"/>

This change did not affect the UniParc XSD, but may nevertheless require code changes.

UniProt RDF news

UniProt RDF files compressed with XZ instead of gzip

The UniProt RDF distribution has been available on the UniProt FTP site as gzip compressed RDF/XML files since 2008. We have now changed the compression algorithm from gzip to XZ, which has a number of features that make it a better choice for the UniProt RDF data:

  • It reduces the file size by approximately 23%, which improves FTP download time.
  • It can be decompressed in parallel, which can give faster decompression rates on current hardware with a minimum of 6-8 CPU cores.
  • It allows random access.

Replacement of UniProt RDF file go.rdf with go.owl

The UniProt RDF distribution that is available on the UniProt FTP site contained a go.rdf file that has been replaced with a go.owl file that contains a subset of the official go.owl distribution of the Gene Ontology consortium, which is taken as a snapshot that is in sync with the GO annotations in the UniProt Knowledgebase.

In practical terms this means:

  • Scripts downloading go.rdf must be changed to download go.owl instead.
  • The URL pattern has been replaced by
  • Querying for a GO term linked to a UniProt record using SPARQL can be done with:

select ?protein ?go where {?protein up:classifiedWith ?go . ?go a owl:Class}

UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again