Overcategorization

Overcategorization

Overcategorization, overcategorisation or category clutter is the process of assigning too many categories, classes or index terms to a given document. Wikipedia has developed a set of principles concerning overcategorization (Wikipedia:overcategorization). Interestingly, the concept seems not to appear in the literature of Library and information science (LIS), although it is clearly relevant for all kinds of document classification and indexing. In LIS some related concepts have been developed, for example exhaustivity of indexing and information overload, among others.

Basic principles

If too many categories as assigned to a given document, the implications for the users depends on how informative the links are. If the user is able to distinguish between useful and not useful links, the damage is limited: The user only waste time selecting links. In many cases, however, the user cannot judge whether or not a given link will turn out to be fruitful. In that case he has to follow the link and to read or skim another document. The worst case is, of course, that even after reading the new document the user is unable to decide whether or not it might be useful if its subject matter it thoroughly investigated.

Overcategorization also has another unpleasant implication: It makes the system (for example Wikipedia) difficult to maintain in a consistent way. If the system is inconsistent it means that when the user considers the links in a given category, he will not find all documents relevant in relation to that category.

Basically, the problem of overcategorization should be understand from the perspective of relevance and the traditional measures of recall and precision. If too few relevant categories is assigned to a document recall may decrease. If too many non-relevant categories is assigned precision becomes lower. The hard job is to say which categories are fruitful or relevant for future use of the document.

See also


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • overcategorization — noun The act, process, or result of overcategorizing …   Wiktionary

  • Wikipedia:What Wikipedia is not — WP:NOT redirects here. For Wikipedia s notability guidelines, see Wikipedia:Notability. This page documents an English Wikipedia policy, a widely accepted standard that all editors should normally follow. Changes made to it should reflect… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”