| Linguistic methods have provided some interesting results in the recognition of complex biological signals. However the hand development of grammars is difficult and, because it requires human expertise, expensive. Thus, given the enormous volume of data arising from genome projects, there is a need to automate the acquisition of grammars from sets of biological sequences. We propose to develop an efficient method of acquiring such grammars using Inductive Logic Programming (ILP). Our method will use efficient technqiues for discovering non-terminals which are potentially pertinent to the subsequent induction of a biological grammar and a parser which increases the speed at which biological grammars may be acquired by an ILP system. The speed at which ILP systems can generate biological grammars had previously been a bottleneck. When learning a biological grammar, training sequences must be parsed repeatedly during the search and thus the ILP system speed is dependent upon the efficiency of the parser. We tackled this speed problem by selecting Context-Free Grammar parsers, improving them with respect to biological data, and making them available to ILP systems written in Prolog. We have shown empirically that on average this reduces the overall induction time by an amount equal to 60% of the time that ILP systems would normally spend parsing. | |