| Malerba D. Document Understanding: A Machine Learning Approach Technical Report, Esprit Project 5203 INTREPID, 4 March 1993.
Esposito F., Malerba D., Semeraro G., & Pazzani M. A Machine Learning Approach to Document Understanding Proc. 2nd Int. Workshop on Multistrategy Learning, Harpers Ferry, WV, pp. 276-292, May 1993.
Esposito F., Malerba D., & Semeraro G. Learning Contextual Rules in First-Order Logic Proc. 4th Italian Workshop on Machine Learning (GAA93), Milan, Italy, pp. 111-127, June 1993.
Esposito F., Malerba D., & Semeraro G. Automated Acquisition of Rules for Document Understanding Proc. of the 2nd Int. Conf. on Document Analysis and Recognition, Tsukuba Science City, Japan, pp. 650-654, October 1993.
Semeraro G., Esposito F., & Malerba D. Learning Contextual Rules for Document Understanding Proc. 10th IEEE Conf. on Artificial Intelligence for Applications San Antonio, Texas, pp. 108-115, March 1994.
Esposito F., Malerba D., & Semeraro G. Multistrategy Learning for Document Recognition Applied Artificial Intelligence, 8, pp. 33-84, 1994
|
| The problem concerns classification of some parts of a business letter using information about the layout of a one page document. There are five concepts to be learned. These concepts are expressed as predicates, namely sender, receiver, logotype, reference number and date. The used language allows to characterize some properties of the text-blocks (their width and height, position of the block on a page etc.) as well as mutual position of two blocks (e.g. aligned-only-upper-row(X,Y)). The dataset describes properties present in 30 single page documents, providing approximately 250 training instances and 120 test instances. The considered problem is complicated by the presence of dependencies among concepts. The problem can be cast as a multiple predicate learning problem. Experimental results prove that learning contextual rules, that is rules in which concept dependencies are explicitely considered, leads to good results. Initially, results were published in a technical report [Malerba93], summary of the results appears in [Esposito93a, Esposito93b, Esposito93c and Semerano]. Problem of the whole document processing system is treated in detail in [Esposito94].
|