Text analytics

Text analytics

The term text analytics describes a set of linguistic, lexical, pattern recognition,extraction, tagging/structuring, visualization, and predictive techniques. The termalso describes processes that apply these techniques, whether independently or inconjunction with query and analysis of fielded, numerical data, to solve businessproblems. These techniques and processes discover and present knowledge – facts,business rules, and relationships – that is otherwise locked in textual form, impenetrableto automated processing.

A typical application is to scan a set of documents written in a natural language and either model the document set for predictive classification purposes or populate a database or search index with the information extracted. Current approaches to text analytics use natural language processing techniques that focus on specialized domains.

Typical subtasks are:

* Named Entity Recognition: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.
* Coreference: identification chains of noun phrases that refer to the same object. For example, anaphora is a type of coreference.
* Relationship Extraction: extraction of named relationships between entities in text

ee also

* Noisy text analytics
* Information extraction
* Computational linguistics
* Natural language processing
* Named entity recognition
* Text mining

oftware and Applications

Commercial Software and Applications

* AeroText - provides a suite of text mining applications for content analysis. Content used can be in multiple languages.
* Alethes OpenEyes [http://www.alethes.it] - provides a complete suite fot text analytics for 8 different language, including information extration, entity recognition, taxonomy generation, clustering, categorization, summarization, sentiment analysis.
* Anderson Analytics - provider of text analytics and content analysis especially as it relates to consumer behavior.
* Attensity provides hosted, integrated and stand-alone text analytics software.
* Carabao Language Kit - suite of components for text analytics, categorization, sense disambiguation, idiom extraction, named entity recognition with tools to add a new language or edit exiting one(s).
* Clarabridge is a provider of end-to-end text analytics software and solutions for Voice of the Customer, Quality Assurance, Competitive Intelligence and other application areas.
* Clearforest [http://www.clearforest.com] is a provider of solutions and software to extract structured data from unstructured texts. It recently got acquired by Reuters which was merged with Thomson. The new organization is now called Thomson Reuters.
* [http://www.eaagle.com/index.php?go=FTM Eaagle Full Text Mapper] - a text mining software solution that uses knowledge discovery and data visualization as a basis for analyzing unstructured text.
* EpiAnalytics [http://www.EpiAnalytics.com] provides advanced operational analytics for routing, classification and business intelligence.
* IBM LanguageWare [http://www.alphaworks.ibm.com/tech/lrw] is the IBM suite for Text Analytics (Tools and Runtime).
* Ixreveal [http://www.ixreveal.com] is commercial text mining and patented OLAP (OnLine Analytical Processing) for Text software vendor specialized in providing complete solution for structured and unstructured data using advanced analytics algorithms and techniques. uReveal and uReka! [http://www.ureka.info] products have been adopted by major international companies and US local and federal government agencies in areas like fraud and recovery, voice of the customer, and law enforcement.
* Infonic provides commercial sentiment analysis of financial news feeds for the Thomson Reuters RMDS trading information system. The "sentiment scores" that this software provides are used within algorithmic trading systems by several major trading banks. Infonic also develops unique document summarization and textual navigation technologies that aid in Knowledge Management.
* Island Data [http://www.islanddata.com] provides real-time text analysis for unstructured textual data sources. The text analytics engine is statistically based which makes the algorithm equally effective for all languages. The company is managed by text mining experts including James Sanger (Chairman, Island Data Corp.), author of The Text Mining Handbook.
*Lexalytics [http://www.lexalytics.com] is a commercial provider of enterprise software solutions offering entity, theme, and quote extraction, as well as summarization and sentiment analysis of unstrutured content including online news, blogs and corporate documents. The company recently merged with Infonic's Text Analytics Division.
* Leximancer is a commercial data mining tools that can be used to analyze collections of textual documents and visually displays the extracted information. It is language independent and can be used for text analysis, coding open-ended surveys, media analysis and CRM notes. [http://www.Leximancer.com]
* Rapid-I is a provider of predictive analytics, data mining, and text mining software, solutions, and services.
* SPSS [http://www.spss.com] - provider of SPSS Text Analysis for Surveys, Text Mining for Clementine, LexiQuest Mine and LexiQuest Categorize, commercial text analytics software that can be used in conjunction with SPSS Predictive Analytics Solutions.
* Teezir Search Solutions designs, delivers and hosts knowledge management applications for professional services firms. Its flagship solution is Teezir Expert Finder, a search engine that identifies experts within an organization, based on all documents on the firm's networks
* TEMIS [http://www.temis.com] - Software solution editor providing Collaborative Solutions for Analyzing and Discovering Strategic Information to serve the Information Intelligence needs of business corporations.

Open-Source Software and Applications

* RapidMiner - open-source software for data and text mining
* GATE - Open-source toolbox for text engineering and natural language processing

External links

* Automatic Content Extraction, Linguistic Data Consortium: http://projects.ldc.upenn.edu/ace/
* Automatic Content Extraction, NIST: http://www.itl.nist.gov/iad/894.01/tests/ace/
* Message Understanding Conference: http://www.itl.nist.gov/iaui/894.02/related_projects/muc/
* Seth Grimes's Text Analytics expert channel at the Business Intelligence Network: http://www.b-eye-network.com/channels/index.php?filter_channel=1394
* Text Analytics Summit: http://www.textanalyticsnews.com/
* Text Analytics Wiki: http://textanalytics.wikidot.com/start
* Text Analytics Yahoo group: http://tech.groups.yahoo.com/group/TextAnalytics/
* Text Analytics Linkedin group: http://www.linkedin.com/e/gis/22313/3A5CAF691C78

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Noisy text analytics — is a process of information extraction whose goal is to automatically extract structured or semistructured information from noisy unstructured text data. While Text analytics is a growing and mature field that has great value because of the huge… …   Wikipedia

  • Text mining — Text mining, sometimes alternately referred to as text data mining , roughly equivalent to text analytics , refers generally to the process of deriving high quality information from text. High quality information is typically derived through the… …   Wikipedia

  • Text Mining — Text Mining, seltener auch Textmining, Text Data Mining oder Textual Data Mining, ist ein Bündel von Analyseverfahren, die die algorithmusassistierte Entdeckung von Bedeutungsstrukturen aus un oder schwachstrukturierten Textdaten ermöglichen soll …   Deutsch Wikipedia

  • News analytics — News analysis refers to the measurement of the various qualitative and quantitative attributes of textual (unstructured data) news stories. Some of these attributes are: sentiment, relevance, and novelty. Expressing news stories as numbers… …   Wikipedia

  • Visual analytics — s. [Pak Chung Wong and J. Thomas (2004). Visual Analytics . in: IEEE Computer Graphics and Applications , Volume 24, Issue 5, Sept. Oct. 2004 Page(s): 20 21.] People use visual analytics tools and techniques to synthesize information and derive… …   Wikipedia

  • Semantic analytics — is the use of ontologies to analyze content in web resources. This field of research combines text analytics and semantic web technologies like RDF.Some academic research groups that have active project in this area include [http://knoesis.wright …   Wikipedia

  • General Architecture for Text Engineering — GATE ventana principal de GATE Developer v5 Desarrollador GATE research team …   Wikipedia Español

  • Google-Analytics — Logo Google Analytics ist ein kostenloser Dienst, welcher der Analyse von Zugriffen auf Webseiten dient (siehe auch Web Analytics). Neben den von anderer Analysesoftware bekannten Funktionen wie Herkunft der Besucher, Verweildauer und… …   Deutsch Wikipedia

  • Web Analytics — (auch Web Controlling, Web Analyse, Datenverkehrsanalyse, Traffic Analyse, Clickstream Analyse, Webtracking) ist die Sammlung und Auswertung des Verhaltens von Besuchern auf Websites. Ein Analytic Tool untersucht typischerweise, woher die… …   Deutsch Wikipedia

  • Mobile web analytics — Internet marketing Display advertising Email marketing E mail marketing software Interactive advertising …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”