Cypher transcoder

Cypher transcoder

Infobox_Software
name = Cypher Transcoder



caption = Cypher Interface (Web Services API Interface)
developer = [http://www.monrai.com Monrai Technologies]
latest_release_version = 1.2 beta
latest_release_date = June 2008
operating_system = Cross-platform
genre = Open Source and Commercial
website = [http://cypher.monrai.com/ cypher.monrai.com]
The Cypher transcoder is a NLP engine that converts plain language statements and phrases into RDF triples and SPARQL queries.

The software has been developed by Monrai Technologies with Sherman Monroe as principal investigator and chief software architect.

History

Alicebot and A.I.M.L.

In 1999, Sherman Monroe developed an idea for a "conversation front-end to databases" after discovering and tinkering with Dr. Richard Wallace's Alicebot chatter bot on the web. Monroe's original motive was to extend the functionality of Alicebot to achieve this. He made several contributions to the AIML specification [http://www.alicebot.org/TR/2001/WD-aiml/] , but after realizing the language itself was too limited for this task, he decided to build a completely new engine from scratch. Monroe's design called for an Alicebot-like chat interface utilizing a robust NLP engine to process user input, and a knowledge representational language with enough semantic rigor to encapsulate the content of complex sentences and phrases. While a sophomore at Morehouse College, Monroe hacked the first prototype using Java and XML, which produced answers to user input by searching Google and splicing result pages into simple subject, main verb and direct object triples, which were then stored into a simple index. The user's NL query was then spliced in the same manner, and the verb and WH pronouns from the input query were used to retrieve answer from the index. This rudimentary engine was able to produce decent results from English questions such as "who is the wife of Bill Clinton".

Convinced of the technical feasibility and economic viability of the technology, Monroe sought out investors to develop and commercialize the engine, and Monrai Technologies was formed. In the fall of 2001, [http://talapro.com T.A. Lewis and Associates] became a Monrai technology partner and began funding the project. While at T.A.L.A. the project was given the name Cypher, and many key advancements were made to the prototype and its underlying theoretical framework called transcography. In 2006, Openlink Software became a technology partner and continued the project's funding.

Semantic Web

In the beginning of 2002, Monroe was drawn to RDF by the need for a universal knowledge representation formalism for Cypher output. The first prototype using RDF was based on the [http://openrdf.org Sesame] repository. Monroe devised a language for specifying lexical rules and associated semantic output as RDF templates. The language used a proprietary XML spec which was later ported to be fully RDF-compliant.

Transcography

Cypher rigorously conforms to a sub-discipline of natural language processing called Transcography, which was developed by Monrai with the goal of merging the field of natural language processing with the increasingly popular Semantic Web movement. Transcography is a set of core principles for converting parsed phrases into RDF triples. More specifically, transcography is the process of parsing the phrase structure of a natural language construct, and translating the grammar tree output into a semantic graph. The output of each NL construct is three things:

  1. a URI representation of the NL construct
  2. a set of one or more subject-object-value triples involving the URI
  3. the set of all triples produced by sub-phrases

Thus, Cypher views any and all linguistic input as a URI + related triples. This notion makes the lexical component a powerful NL resource for Cypher.

As an example of transcographic output, consider the phrase: John's coach. The transcographic process produces a URI representing the phrase, for example: http://john.mysite.com/MrDouglass, and a set of triples representing the statements involved in the phrase:

{http://john.mysite.com/me} jo:isCoachedBy {http://john.mysite.com/MrDouglass}

Cypher leverages these triples to create either an RDF model or an SPARQL query. The mode of output is based on whether the NL construct is a clause or description, or if it's a noun phrase or question. The triples of sub-phrases are recursively merged to produce a root graph representing the root NL phrase or clause. For example, consider: John's coach knows Martin. The URI produced will represent this clause (e.g. the URI of a reified RDF triple, or the URI of a semantic frame), and a graph containing:

{qv:node1} foaf:knows {http://john.mysite.com/MartinCrump}

The URI qv:node1 represents a SPARQL query variable of a SPARQL query which was serialized in RDF. This is because the phrase John's coach is a relational noun phrase, and thus, is anaphora reference. By re-constructing the SPARQL query for the variable (by following the links from qv:node1), and then executing the query, a program can retrieve the resource that represents John's coach at the time of the query. This technique is used because John may have a new coach at the time of the query. Transcography stipulates that any anaphora reference be represented by a query variable (linked to the RDF representation of the SPARQL query) unless the program is ready to apply the variable value (e.g. to presenting it to a human user in an interface).

The word transcography is the combination of transcode, which means "to convert media from one format to another", and -graphy which is "writing or text representation produced in a specified manner or by a specified process". Thus the literal meaning is "text transcoding". Knowledge representation frameworks used in the process include RDF and Frame Semantics.

The following six principles form the core of transcography:

Symbolic Reference

Each constituent of a phrase must resolve to a concept, referenced either by description (e.g. the blue bird) or unique identifier (i.e. Henry Ford). A transcoder, therefore, produces either a URI or BNode which represents the phrase, plus a set of triples representing the description given by the phrase.

Node Expansion

The set of triples produced by each child node of a phrase is included in the parent phrase’s output.

Subcategorization

Transcography conforms to the theory that verbs and other atomic units of meaning in a language subcategorize for their arguments, and that this information is specified in the lexicon.

Identity Transfer

The human language processor produces semantic output by consulting a dictionary, and retrieving an entry for each word encountered in the input. The entry contains the description of an anonymous entity, and this description is transferred to the instance concept.

Inference

Each phrase and clause in natural language expresses information not explicit in the phrase. The human language processor makes use of a dictionary which provides a semantic map, linking the explicit description provided by the phrase, to implicit descriptions inferred from the phrase.

Gestalt

Because of the influence from Frame Semantics, Cypher adheres to the principles of gestalt. The mind tends to see things not in isolation, but as part of a greater whole which encompasses (or is, rather encompassed, by) the body of world knowledge we gather from prior experiences. Thus, the phrase a book inherently makes reference to its author (though anonymous) and its topic (though unknown), as these are a couple of the things brought to mind by the word book. When such implied elements are not present in the grammatical context, the mind tends to fill in the semantic gaps with anonymous objects that fit the minimum requirements of that missing element.

MetaLanguage Ontology (MLO)

Cypher uses explicit information about phrase structure, lexical rules, and semantic relations. This information is encapsulated in the MetaLanguage, which is an RDF ontology. An example of a lexical entry in MLO is:

meet meet V

&mlo;agent has come upon &mlo;theme as by chance or arrangement

Output Types

Cypher produces RDF triples (in various flavors, including turtle, n3, trix, and ntriples) from natural language clauses, and both SPARQL and SeRQL queries from natural language noun phrases and questions. In addition, grammar parse trees are generated, encoding such information as phrase type, part-of-speech, morphological data, parser duration/time, and lexical resource used. Cypher is also equipped with a plugin-in framework for creating custom output such as Cyc microtheories.

Similar Technologies

[http://www.semantra.com/ Semantra]

[http://powerset.com Powerset]

[http://trueknowledge.com Trueknowledge]

[http://hakia.com Hakia]

See Also

Context-free grammar

Semantic Frames

Semantic Web

Linked Data

External Links

[http://demo.monrai.com Online Cypher Demo]

[http://www.monrai.com/products/cypher/cypher_manual.html Cypher User Guide]

[http://myopenlink.net:8890/DAV/home/sdmonroe/LDPTalk2008.pub.pptx Harnessing Social Collaboration] - Presentation on Cypher

[http://eprints.ecs.soton.ac.uk/15735/1/CNL_Reportv7.pdf Controlled Natural Languages for the Semantic Web] - A case study on the need for better NL-based UI tools for the Semantic Web

Notes


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Cypher — Not to be confused with Cipher Cypher may refer to: A royal cypher or monogram like glyph Contents 1 Art and entertainment 2 Fictional characters 3 Persons …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”