- PANKOW
-
The success of the Semantic Web depends on the availability of ontologies as well as of web pages annotated with metadata conforming to these ontologies. Acquiring the necessary metadata through manual definition of an information extraction system is a laborious task requiring a lot of time and expert know-how. PANKOW (Pattern-based Annotation through Knowledge on the Web) , represents an automated self annotating Web method based on counting Google hits of instantiated linguistic patterns. It employs an unsupervised learning approach to characterize instances with regard to ontology, by combining the idea of using linguistic patterns to identify ontological relations as well as the idea of using the Web as a big corpus to overcome data sparseness.
The system scans the Web pages for phrases in the HTML text that might be categorized as instances of the ontology. Candidate phrases are proper nouns, identified by a standard part-of-speech tagging procedure. All candidate proper nouns and all candidate ontology concepts are introduced into linguistic patterns to derive hypothesis phrases. Then Google is queried for the hypothesis phrases through its Web service API. Finally the system sums up the query results to a total for each instance-concept pair and categorizes the candidate proper nouns into their highest rank concepts. The results are comparable to state of the art systems, whereas the approach is simpler and more intuitive to use to annotate the web.
PANKOW project has been initiated at University of Karlsruhe, Germany in 2004.
Reference: Philipp Cimiano, Siegfried Handschuh, Steffen Staab Towards the Self-Annotating Web In Proceedings of the 13th WWW Conference, pp. 462-471. ACM, New York, May 2004. ISBN 1-58113-844-X
Categories:
Wikimedia Foundation. 2010.