MLnetOiS Logo left

MLnetOiS Logo right

 Resources:DatasetsDetails

 

Index
Resources
* Bibliography
* Courses
* Datasets
* Links
* Showcases
* Software


Proteins

Add Add a dataset to the database.
Update Update the entry for this dataset.

 

 

 

Back

 

 

 

up arrow

Name (abbrev)

Name (full)

Category

Last update

 

Proteins

Molecular Biology

b D, Y

up arrow

Application domain

Further specifications

 

Learning Rules for Predicting Protein Secondary Structure

up arrow

Type

Format

Complexity

 

ILP

Golem

46 KB (tar, gzip)

up arrow

WWW / FTP

 

 


http://
http://

up arrow

Contact person(s)

Related group(s)

Optional contact address

 

  1. Muggleton, Stephen
  1. University of York, Dept of CS
  2. , casino, gambling, poker,

up arrow

References

 

Muggleton S., King R.D., and Sternberg M.J.E. (1992).
Predicting protein secondary structure using inductive logic programming.
in Protein Engineering, 5:647--657.

up arrow

Annotations

 

Predicting the secondary structure (three-dimensional shape) of proteins from their amino acid sequence (primary structure) is widely believed to be one of the hardest unsolved problems in molecular biology. The amino acids can be arranged in different patterns (spirals, turns, flat sections etc.) which are of considerable interest to pharmaceutical companies since a protein's shape generally determines its function as an enzyme. The dataset studied in [Muggleton 92] has been created with the goal to learn rules to identify whether a position in a protein is in an alpha-helix. The positive examples state which positions of chosen proteins are in an alpha-helix and the negative examples identify the rest of positions.

The constants of the considered language denote all the 20 existing amino acids and the values of some physical or chemical properties as sizes, hydrophobicities, polarities (e.g. polar0 and polar1). Background knowledge is expressed using the following predicates:

  • position(A,B,C) meaning "residue of protein A at position B is C".
  • octf(A,B,C,D,E,F,G,H,I) provides information that allows to index groups of nine adjacent positions in a protein (positions A--I occur in sequence).
  • alpha_triplet(A,B,C), alpha_pair(A,B), index groups of three or two adjacent positions in a protein, respectively.
  • alpha_pair4(A,B) holds if a pair of positions A,B is separated by 4 positions in a protein.


Additional unary predicates characterize some physical and chemical properties of the individual residues (hydrophobicity, hydrophilicity, charge, size, polarity, whether a residue is aliphatic or aromatic, whether it is a hydrogen donor or acceptor etc.). Ordering relations between some constants (less_than(polar0,polar1) are also provided.

 

Comments

 

 

 

 

Index
Resources
* Bibliography
* Courses
* Datasets
* Links
* Showcases
* Software

 

 

Supported by EU project Esprit No. 29288, University of Magdeburg Logo Uni Magdeburg and GMD Logo GMD
© ECSC - EC - EAEC, Brussels-Luxembourg, 2000