PMPContinuous Automated Model EvaluatiOn
logo

CAMEO | Ligand Binding Prediction Assessment

Predicting binding sites from a protein's sequence has the potential for yielding high impact on life science research - if the predictions are specific and accurate enough to help addressing relevant biological questions. In CAMEO we plan to continuously assess ligand binding site predictions to evaluate the current state of the art of prediction methods, identify possible bottlenecks, and further stimulate the development of new methods.

In previous CASP experiments the very low number of challenging target structures with relevant ligands has been a major limitation to the assessment as it did not allow to draw significant conclusions on the specific strengths and weakness of different prediction methods. Further, the current ligand binding site prediction format used in CASP has a number of limitations. All ligands are treated uniformly, independent of their chemical type and all potential binding sites are treated uniformly, independent of their affinity for different ligands. Hence, in CAMEO we have modified the ligand binding site prediction format to allow a more fine-grained prediction and a more detailed assessment.

  • Assessment of this category will be done continuously based on a weekly PDB pre-release in order to accumulate a sufficiently large number of prediction targets.
  • Binding sites differ chemically and structurally from each other e.g. a metal ion binding site has different characteristics compared to e.g. a sugar binding site. We therefore will assess ligand binding site predictions according to chemotype categories of the ligand expected to be bound.
  • The prediction of binding site residues will be employing continuous probability measures as opposed to the binary prediction format used in CASP, thus reflecting the likelihood for a residue to be involved in binding a ligand of a certain type.

Format Definition

The format used by the predictioncenter during CASP9 is accepted, however there is a new format implemented, which follows the suggestions from the last assessment for the ligand binding category during CASP9.

This new format consists of three sections separated by the "|" symbol:

  1. The first section is a unique identifier for a residue or atom. It has two mandatory fields, the residue name ("r") and the residue number ("n"). In addition, two optional fields can be specified, the chain name ("c") and/or the atom name ("a").
  2. The second section contains predicted p-values for four ligand categories: ions ("I"), organics ("O"), polynucleotides ("N") and peptides ("P"). Predictions for all four categories are mandatory. The values are probabilities resembling the likelihood of a ligand belonging to a specific category. All ligands in the PDB are categorized into four classes based on the PDB ligand classification.
  3. The last section is optional and allows the specification of ligands (three letter code by PDB).

Format details:

  • All values must be specified as a key-value pair, where the value is seprated from the key by a "=" sign. All key-value pairs must be terminated by a ";" sign. The order of the key-value pairs within one of the three sections is non-relevant, whereas the order of the three sections is fixed.
  • All predicted values must be in a range from 0.0 to 1.0.
  • To simplify the prediction format, lines can be omitted if they contain only zero values for the predictions (both for the categories and the compounds).
  • Predictions can be made at the residue and/or atom level. In the case of the latter, the atom name must be specified in the unique identifier section (key: "a").
  • Predictions for all four categories are mandatory. Additionally, specific compounds can be listed in the last section of the prediction line.

Format


r=<resname>; n=<resnum>; [c=<chainname>;] [a=<atomname>;] | I=<ion prob>; O=<org prob>; N=<nucl prob>; P=<pep prob> | [<compound ID1>=<compound prob>;] [<compound ID2>=<compound prob>;] ...

Examples

1. ZN binding (3ZTT)


r=SER; n=198; | I=0.000; O=0.000; N=0.000; P=0.000; |

r=GLU; n=199; | I=1.000; O=0.000; N=0.000; P=0.000; |   

r=GLY; n=200; | I=0.000; O=0.000; N=0.000; P=0.000; |   

r=ALA; n=201; | I=0.513; O=0.000; N=0.000; P=0.000; |   

2. ATP and Mg binding (3QAM)

                              

r=GLU; n=170; a=N;   | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 

r=GLU; n=170; a=CA;  | I=0.047; O=0.412; N=0.000; P=0.000; | ANP=0.412; MN=0.047; 

r=GLU; n=170; a=C;   | I=0.337; O=0.668; N=0.000; P=0.000; | ANP=0.668; MN=0.337; 

r=GLU; n=170; a=O;   | I=0.372; O=1.000; N=0.000; P=0.000; | ANP=1.000; MN=0.372; 

r=GLU; n=170; a=CB;  | I=0.249; O=0.424; N=0.000; P=0.000; | ANP=0.424; MN=0.249; 

r=GLU; n=170; a=CG;  | I=0.000; O=0.077; N=0.000; P=0.000; | ANP=0.077; MN=0.000; 

r=GLU; n=170; a=CD;  | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 

r=GLU; n=170; a=OE1; | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 

r=GLU; n=170; a=OE2; | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 

r=ASN; n=171; a=N;   | I=0.331; O=0.353; N=0.000; P=0.000; | ANP=0.353; MN=0.331; 

r=ASN; n=171; a=CA;  | I=0.401; O=0.307; N=0.000; P=0.000; | ANP=0.307; MN=0.401; 

r=ASN; n=171; a=C;   | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 

r=ASN; n=171; a=O;   | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 

r=ASN; n=171; a=CB;  | I=0.528; O=0.251; N=0.000; P=0.000; | ANP=0.251; MN=0.528; 

r=ASN; n=171; a=CG;  | I=0.987; O=0.584; N=0.000; P=0.000; | ANP=0.584; MN=0.987; 

r=ASN; n=171; a=OD1; | I=1.000; O=0.939; N=0.000; P=0.000; | ANP=0.939; MN=1.000; 

r=ASN; n=171; a=ND2; | I=0.859; O=0.637; N=0.000; P=0.000; | ANP=0.637; MN=0.859; 

r=LEU; n=173; a=N;   | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 

r=LEU; n=173; a=CA;  | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 

r=LEU; n=173; a=C;   | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 

r=LEU; n=173; a=O;   | I=0.000; O=0.000; N=0.000; P=0.000; | ANP=0.000; MN=0.000; 

r=LEU; n=173; a=CB;  | I=0.000; O=0.175; N=0.000; P=0.000; | ANP=0.175; MN=0.000; 

r=LEU; n=173; a=CG;  | I=0.000; O=0.509; N=0.000; P=0.000; | ANP=0.509; MN=0.000; 

r=LEU; n=173; a=CD1; | I=0.000; O=0.898; N=0.000; P=0.000; | ANP=0.898; MN=0.000; 

r=LEU; n=173; a=CD2; | I=0.000; O=0.696; N=0.000; P=0.000; | ANP=0.696; MN=0.000;