Cantitate/Preț
Produs

Machine Learning of Information Extraction Procedures - An Ilp Approach


en Limba Engleză Paperback
Automatic fact retrieval from text documents is becoming one of the key technologies for the Information Age. One category of Intelligent Information Systems aims at supporting the user in search and retrieval of precious information from data resources like intranets or the World Wide Web containing billions of web pages and linked documents. Until now, most of the existing systems are restricted to document retrieval tasks and only a few hand tailored systems exist allowing the user to query and retrieve facts from the vast amount of online information available. In the last decade several approaches have been developed in the Information Extraction (IE) research area that are able to automatically construct (learn) extraction procedures, so called wrappers. Wrappers allow documents to be interpreted and accessed like relational databases. They form one of the core components in future Intelligent Information Systems, since they allow the user to query, compare and combine information from various textual information sources. This thesis presents an Logic Programming and Inductive Logic Programming (ILP) framework for supervised learning of wrappers from positive examples only. In contrast to existing systems that adapt some methods from the Artificial Intelligence subfield of Inductive Logic Programming the here presented machine learning approach follows a pure logical bottom-up learning approach under a new IE-ILP semantics. The presented learning approach for multi-slot extraction programs is independent of the chosen wrapper model and document view. Three classes of Inductive Logic Programming algorithms are presented, two one step learning algorithms, a set of iterative learning algorithms, and one algorithm combining clustering techniques with an iterative ILP algorithm. Several extraction tasks are investigated and a formal definition of wrapper classes is given. Based on these wrapper classes three wrapper models are presented using two different document representations, a sequential token and a DOM related representation. The introduced learning algorithms and wrapper models are evaluated on standard test cases and they are compared with related methods and machine learning based information extraction systems. For some of the single-slot extraction tasks the implemented methods yield better results than the best state-of-the-art systems. Learned wrappers for multi-slot extraction tasks show promising competitive quality scores in comparison to the leading extraction systems.
Citește tot Restrânge

Preț: 43170 lei

Nou

Puncte Express: 648

Preț estimativ în valută:
8263 8612$ 6878£

Indisponibil temporar

Doresc să fiu notificat când acest titlu va fi disponibil:

Preluare comenzi: 021 569.72.76

Specificații

ISBN-13: 9783832507916
ISBN-10: 3832507914
Pagini: 267
Editura: Logos Verlag Berlin