Content determination

Content determination

Content determination is a subtask of Natural language generation, which involves deciding the on the information communicated in a generated text. It is closely related to Document structuring NLG task.

Contents

Example

Consider an NLG system which summarises information about sick babies.[1] Suppose this system has four pieces of information it can communicate

  1. The baby is being given morphine via an IV drop
  2. The baby's heart rate shows bradycardia's (temporary drops)
  3. The baby's temperature is normal
  4. The baby is crying

Which of these bits of information should be included in the generated texts?

Issues

There are three general issues which almost always impact the content determination task, and can be illustrated with the above example.

Perhaps the most fundamental issue is the communicative goal of the text, ie its purpose and reader. In the above example, for instance, a doctor who wants to make a decision about medical treatment would probably be most interested in the heart rate bradycardias, while a parent who wanted to know how her child was doing would probably be more interested in the fact that the baby was being given morphine and was crying.

The second issue is the size and level of detail of the generated text. For instance, a short summary which was sent to a doctor as a 160 character SMS text message might only mention the heart rate bradycarias, while a longer summary which was printed out as a multipage document might also mention the fact that the baby is on a morphine IV.

The final issue is how unusual and unexpected the information is. For example, neither doctors nor parents would place a high priority on being told that the baby's temperature was normal, if they expected this to be the case.

Regardless, content determination is very important to users, indeed in many cases the quality of content determination is the most important factor (from the user's perspective) in determining the overall quality of the generated text.

Techniques

There are three basic approaches to document structuring: schemas (content templates), statistical approaches, and explict reasoning.

Schemas [2] are templates which explicitly specify the content of a generated text (as well as Document structuring information). Typically they are constructed by manually analysing a corpus of human-written texts in the target genre, and extracting a content template from these texts. Schemas work well in practice in domains where content is somewhat standardised, but work less well in domains where content is more fluid (such as the medical example above).

Statistical techniques use statistical corpus analysis techniques to automatically determine the content of the generated texts. Such work is in its infancy, and has mostly been applied to contexts where the communicative goal, reader, size, and level of detail are fixed. For example, generation of newswire summaries of sporting events.[3]

Explicit reasoning approaches have probably attracted the most attention from researchers. The basic idea is to use AI reasoning techniques (such as knowledge-based rules,[1] planning,[4] pattern detection,[5] case-based reasoning,[6] etc) to examine the information available to be communicated (including how unusual/unexpected it is), the communicative goal and reader, and the characteristics of the generated text (including target size), and decide on the optimal content for the generated text. A very wide range of techniques has been explored, but there is no consensus as to which is most effective.

References

  1. ^ a b Portet F, Reiter E, Gatt A, Hunter J, Sripada S, Freer Y, Sykes C (2009). "Automatic Generation of Textual Summaries from Neonatal Intensive Care Data". Artificial Intelligence 173: 789–816. doi:10.1016/j.artint.2008.12.002. 
  2. ^ K McKeown (1985). Text Generation. Cambridge University Press
  3. ^ R Barzilay and M Lapata (2005). Collective content selection for concept-to-text generation. Proceedings of EMNLP-2005 [1]
  4. ^ J Moore and C Paris (1993). Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information Using. Computational Linguistics 19:651-694 [2]
  5. ^ J Yu, E Reiter, J Hunter, C Mellish (2007). Choosing the content of textual summaries of large time-series data sets. Natural Language Engineering 13:25-49
  6. ^ P Gervás, B Díaz-Agudo, F Peinado, R Hervás (2005) Story plot generation based on CBR. Knowledge-Based Systems 18:235-242

Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Content Assembly Mechanism — (CAM) is an XML based standard for creating and managing information exchanges that are interoperable and deterministic descriptions of machine processable information content flows into and out of XML structures. CAM is a product of the OASIS… …   Wikipedia

  • Haplodiploid sex-determination system — The haplodiploid sex determination system determines the sex of the offspring of many Hymenopterans (bees, ants, and wasps), and coleopterans (bark beetles). In this system, sex is determined by the number of sets of chromosomes an individual… …   Wikipedia

  • GC-content — (or guanine cytosine content), in molecular biology, is the percentage of nitrogenous bases on a DNA molecule which are either guanine or cytosine (from a possibility of four different ones, also including adenine and thymine). [… …   Wikipedia

  • Sex-determination system — A sex determination system is a biological system that determines the development of sexual characteristics in an organism. Most sexual organisms have two sexes. In many cases, sex determination is genetic: males and females have different… …   Wikipedia

  • Trans-Spliced Exon Coupled RNA End Determination (TEC-RED) — Trans Spliced Exon Coupled RNA End Determination is a technique designed by Muller et al. that, like SAGE, allows for the digital detection of messenger RNA sequences. Unlike SAGE, detection and purification of transcripts from the 5’ end of the… …   Wikipedia

  • Phenolic content in wine — The phenolic compounds in Syrah grapes contribute to the taste, color and mouthfeel of the wine. The phenolic compounds natural phenol and polyphenols in wine include a large group of several hundred chemical compounds that affect the taste,… …   Wikipedia

  • Nucleic acid structure determination — This article is about the experimental determination of nucleic acid structure. For computational methods, see Nucleic acid structure prediction. Structure probing of nucleic acids is the process by which biochemical techniques are used to… …   Wikipedia

  • Optimum water content for tillage — The optimum water content for tillage (OPT) is defined as the moisture content of soil at which tillage produces the largest number of small aggregates. Contents 1 Overview 2 Relationships between water content at field capacity (FC) and Plastic… …   Wikipedia

  • Patient Self-Determination Act — The Patient Self Determination Act is an American law passed by the U.S. Congress in 1990. It requires most hospitals to give patients information on state laws regarding advance directives such as living wills.External links*… …   Wikipedia

  • Coal bed methane extraction — (CBM extraction) is a method for extracting methane from a coal deposit. Contents 1 Basic principles 2 Areas with coal bed methane extraction 3 Measuring the gas content of coal …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”