Adversarial information retrieval

Adversarial information retrieval

Adversarial information retrieval (adversarial IR) is a topic in information retrieval that addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.

On the Web, the predominant form of such manipulation is search engine spamming (also known as spamdexing), including techniques that are employed to disrupt the activity of web search engines, usually for financial gain. Examples of spamdexing are link-bombing, comment or referrer spam, spam blogs (splogs), malicious tagging, reverse engineering of ranking algorithms, advertisement blocking, and web content filtering [B. Davison, M. Najork, and T. Converse (2006), [http://www.acm.org/sigs/sigir/forum/2006D/2006d_sigirforum_davison.pdf SIGIR Worksheet Report: Adversarial Information Retrieval on the Web (AIRWeb 2006)] ] .

The name stems from the fact that there are two sides with opposing goals. For instance, the relationship between the owner of a Web site trying to rank high on a search engine and the search engine administrator is an adversarial relationship in a zero-sum game. Every undeserved gain in ranking by the web site is a loss of precision for the search engine.

Topics

Topics related to Web spam (spamdexing):
* Link spam
* Keyword spamming
* Cloaking
* Malicious tagging
* Spam related to blogs, including comment spam, splogs, and ping spam

Other topics:

* Click fraud detection
* Reverse engineering of a search engine's ranking algorithm
* Web content filtering
* Advertisement blocking
* Stealth crawling
* Malicious tagging or voting in social networks

History

The term "adversarial information retrieval" was first coined in 2000 by Andrei Broder (then Chief Scientist at Alta Vista) during the Web plenary session at the TREC-9 conference [D. Hawking and N. Craswell (2004), [http://es.csiro.au/pubs/trecbook_for_website.pdf Very Large Scale Retrieval and Web Search (Preprint version)] ] .

References

See also

*Spamdexing
*Information retrieval

External links

* [http://airweb.cse.lehigh.edu/ AIRWeb] : series of workshops on Adversarial Information Retrieval on the Web
* [http://webspam.lip6.fr/ Web Spam Challenge] : competition for researchers on Web Spam Detection
* [http://www.yr-bcn.es/webspam/ Web Spam Datasets] : datasets for research on Web Spam Detection


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Information retrieval — This article is about information retrieval in general. For the fictional government department, see Brazil (film). Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for… …   Wikipedia

  • Information retrieval applications — Areas where information retrieval techniques are employed include (the entries are in alphabetical order within each category):General applications of information retrieval* Digital libraries * Information filtering ** Recommender systems * Media …   Wikipedia

  • Spamdexing — For spam on Wikipedia, see Wikipedia:Spam and Wikipedia:WikiProject Spam. In computing, spamdexing (also known as search spam, search engine spam, web spam or Search Engine Poisoning)[1] is the deliberate manipulation of search engine indexes. It …   Wikipedia

  • Search engine optimization — SEO redirects here. For other uses, see SEO (disambiguation). Internet marketing …   Wikipedia

  • Spamdexing — es uno de los varios métodos de manipular la relevancia o prominencia de los recursos indexados por un motor de búsqueda, usualmente en una forma inconsistente con el propósito del sistema de indexado. Los motores de búsqueda usan una variedad de …   Wikipedia Español

  • Spam in blogs — For blogs that are built only for spamming, see Spam blog. Spam blacklist redirects here. For Wikipedia s internal spam blocking mechanism, see Wikipedia:Spam blacklist. Spam in blogs (also called simply blog spam or comment spam) is a form of… …   Wikipedia

  • Word salad (computer science) — Word salad is a mixture of seemingly meaningful words that together signify nothing; [Lavergne 2006:384] the phrase draws its name from the common name for a symptom of schizophrenia, Word salad. When applied to a physical theory, word salad is a …   Wikipedia

  • Content farm — In the context of the World Wide Web, the term content farm is used to describe a company that employs large numbers of often freelance writers to generate large amounts of textual content which is specifically designed to satisfy algorithms for… …   Wikipedia

  • Spam (electronic) — An email box folder littered with spam messages A typical spam m …   Wikipedia

  • Social bookmarking — is a method for Internet users to organize, store, manage and search for bookmarks of resources online. Unlike file sharing, the resources themselves aren t shared, merely bookmarks that reference them. Descriptions may be added to these… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”