- Stop words
Stop words sometimes known as stopwords or Noise Words (in the case of SQL Server [ [http://technet.microsoft.com/en-us/library/ms142551.aspx Noise Words ] ] ), is the name given to words which are filtered out prior to, or after, processing of natural language data (text).
Hans Peter Luhn , one of the pioneers ininformation retrieval , is credited with coining the phrase and using the concept in his design. It is controlled by human input and not automated. This is sometimes seen as a negative approach to the natural articles of speech as mentioned above.There is no definite list of stop words which all natural language processing tools incorporate. Not all NLP tools use a stoplist. Some tools specifically avoid using them to support
phrase searching . The use of astemming algorithm may reduce part of the rationale or dependence on a stoplist to filter out words.Fact|date=February 2007Stop words can cause problems when using a
search engine to search for phrases that include them, particularly in names such as 'The Who ', 'The The ', or 'Take That '.See also
*
Text mining
*Concept mining
*Information extraction
*Natural language processing
*Query expansion
*Stemming
* Search engine indexing
*Poison words External links
* [http://snowball.tartarus.org/ The snowball project] currently provides lists of stopwords for English, French, Spanish, German, Portuguese, Italian, Dutch, Swedish, Norwegian, Danish, Russian, Finnish and Hungarian as part of a software stemmer project. These lists are used in other software such as the
Perl Lingua::StopWords module.
* [http://mail.sarai.net/pipermail/prc/Week-of-Mon-20080204/001656.html Hindi Stop Words]
* [http://www.solariz.de/blog/70-deutsche-stopwords German Stop Words]
* [http://pl.wikipedia.org/wiki/Wikipedia:Stopwords Polish Stop Words]References
Wikimedia Foundation. 2010.