Dataset Details - Machine Learning network Online Information Service

MLnetOiS Logo left

MLnetOiS Logo right

Proteins

Add a dataset to the database.
Update the entry for this dataset.


	Name (abbrev)	Name (full)	Category	Last update
	Proteins		Molecular Biology	b D, Y
	Application domain		Further specifications
	Learning Rules for Predicting Protein Secondary Structure
	Type	Format	Complexity
	ILP	Golem	46 KB (tar, gzip)
	WWW / FTP
	http:// http://
	Contact person(s)	Related group(s)	Optional contact address
	Muggleton, Stephen	University of York, Dept of CS , casino, gambling, poker,
	References
	Muggleton S., King R.D., and Sternberg M.J.E. (1992). Predicting protein secondary structure using inductive logic programming. in Protein Engineering, 5:647--657.
	Annotations
	Predicting the secondary structure (three-dimensional shape) of proteins from their amino acid sequence (primary structure) is widely believed to be one of the hardest unsolved problems in molecular biology. The amino acids can be arranged in different patterns (spirals, turns, flat sections etc.) which are of considerable interest to pharmaceutical companies since a protein's shape generally determines its function as an enzyme. The dataset studied in [Muggleton 92] has been created with the goal to learn rules to identify whether a position in a protein is in an alpha-helix. The positive examples state which positions of chosen proteins are in an alpha-helix and the negative examples identify the rest of positions. The constants of the considered language denote all the 20 existing amino acids and the values of some physical or chemical properties as sizes, hydrophobicities, polarities (e.g. polar0 and polar1). Background knowledge is expressed using the following predicates: position(A,B,C) meaning "residue of protein A at position B is C". octf(A,B,C,D,E,F,G,H,I) provides information that allows to index groups of nine adjacent positions in a protein (positions A--I occur in sequence). alpha_triplet(A,B,C), alpha_pair(A,B), index groups of three or two adjacent positions in a protein, respectively. alpha_pair4(A,B) holds if a pair of positions A,B is separated by 4 positions in a protein. Additional unary predicates characterize some physical and chemical properties of the individual residues (hydrophobicity, hydrophilicity, charge, size, polarity, whether a residue is aliphatic or aromatic, whether it is a hydrogen donor or acceptor etc.). Ordering relations between some constants (less_than(polar0,polar1) are also provided.
	Comments