Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniProt release 2011_02

Published February 8, 2011

Headline

Automatic annotation of UniProtKB/TrEMBL using PDB-derived data

Producing manual annotations for UniProtKB/Swiss-Prot entries containing 3D-structural data has been a priority from the very start of the Swiss-Prot database in 1986, at which point the PDB archive contained a total of 213 structures. Almost 25 years later, the PDB archive had a record 7,971 structures deposited in 2010 (giving a total of 70,229 structures) of which 7,848 contained a polypeptide chain. Although not all polypeptides are mapped to UniProtKB (immunoglobulins and synthetic sequences are not mapped, for example, as these sequences are not within the scope of UniProtKB), the vast majority are. In addition, most vertebrate proteins whose 3D structure has been deposited to the PDB archive have been manually annotated in the Swiss-Prot section of UniProtKB.

Of these mapped polypeptides, about 85,000 PDB cross-references map to 25,000 UniProtKB entries. These are divided between approximately 16,000 UniProtKB/Swiss-Prot and 8,000 UniProtKB/TrEMBL entries with at least one PDB cross-reference. Only three years earlier, the UniProtKB/TrEMBL section contained close to 3,000 entries compared to UniProtKB/Swiss-Prot’s about 12,000 entries with a PDB cross-reference. The vast majority of these 8,000 TrEMBL entries are from bacteria and other microbes.

In UniProtKB/Swiss-Prot, 3D-structure data are manually annotated and integrated mainly in the ‘Sequence annotation (Features)’ section (see Q9C0B1 as an example). This enables users to find the salient structural information directly in the UniProtKB/Swiss-Prot entry. The situation is different for entries in UniProtKB/TrEMBL which have not yet benefited from the manual addition of these data. Furthermore, with the number of UniProtKB/TrEMBL entries with a PDB cross-reference more than doubling in three years, an ever-growing amount of new and potentially interesting PDB-derived data would remain difficult to access for many UniProtKB users. The new UniProt-PDB import pipeline addresses this issue and the UniProtKB/TrEMBL now contains annotations derived from the PDBe and PDBe Motif databases. These annotations include 15,000 new sequence annotations (‘Features’) where ligand interactions are shown for interactions with small molecules and metal ions (e.g. Q8U2I8, Q8U2V3, Q939U1), and 10,000 new citations in almost 8,000 entries.

The procedure produces feature annotations for about 200 types of small molecules in the PDB archive, which have been hand-picked to offer the most unambiguous biological activity. Typically, these include the most common metals, enzyme cofactors (as detailed in the CoFactor database), post-translationally modified residues, carbohydrates, flavins and nucleotide phosphates. None of this would have been possible without the UniProtKB-PDB mappings data produced and maintained in a collaborative effort between UniProt and the PDBe in the form of the SIFTS project which provides the crucial link between UniProtKB and PDB sequence coordinates down to the residue level.

By including these data, we hope to improve the accessibility to experimental, position-specific data about ligand binding sites for the scientific community.

All UniProtKB entries containing 3D-structure data can be retrieved using the keyword 3D-structure. They include UniProtKB/Swiss-Prot and UniProtKB/TrEMBL entries.

UniProtKB news

Cross-references to neXtProt

Cross-references have been added to neXtProt, the human protein knowledge platform.

neXtProt is available at http://www.nextprot.org/.

The format of the explicit links in the flat file is:

Resource abbreviation neXtProt
Resource identifier neXtProt unique identifier.
Example P31946:
DR   neXtProt; NX_P31946; -.

Show all the entries having a cross-reference to neXtProt.

Cross-references to GeneTree

Cross-references have been added to the phylogenetic gene trees that are available at www.ensembl.org and www.ensemblgenomes.org.

The format of the explicit links in the flat file is:

Resource abbreviation GeneTree
Resource identifier GeneTree unique identifier.
Example P32234:
DR   GeneTree; EMGT00050000006238; -.

Show all the entries having a cross-reference to GeneTree.