Link rot

Link rot: For broken links in Wikipedia, see Wikipedia:Linkrot, Wikipedia:Using the Wayback Machine, and Special:BrokenRedirects.

Link rot (or linkrot), also known as link death or link breaking is an informal term for the process by which, either on individual websites or the Internet in general, increasing numbers of links point to web pages, servers or other resources that have become permanently unavailable. The phrase also describes the effects of failing to update out-of-date web pages that clutter search engine results. A link that does not work any more is called a broken link, dead link or dangling link.

Contents

1 Causes

2 Prevalence

3 Discovering

4 Combating

4.1 Server side

4.2 User side

4.3 Web archiving

4.4 Authors citing URLs

5 See also

6 Further reading

6.1 Link rot on the Web

6.2 In academic literature

6.3 In digital libraries

7 References

8 External links

Causes

A link may become broken for several reasons: The most common result of a dead link is a 404 error, which indicates that the web server responded, but the specific page could not be found.

Some news sites contribute to the link rot problem by keeping only recent news articles online where they are freely accessible at their original URLs, then removing them or moving them to a paid subscription area. This causes a heavy loss of supporting links in sites discussing newsworthy events and using news sites as references.^{[citation needed]}

Another type of dead link occurs when the server that hosts the target page stops working or relocates to a new domain name. In this case the browser may return a DNS error, or it may display a site unrelated to the content sought. The latter can occur when a domain name is allowed to lapse, and is subsequently reregistered by another party. Domain names acquired in this manner are attractive to those who wish to take advantage of the stream of unsuspecting surfers that will inflate hit counters and PageRanking.

A link might also be broken because of some form of blocking such as content filters or firewalls. Dead links commonplace on the Internet can also occur on the authoring side, when website content is assembled, copied, or deployed without properly verifying the targets, or simply not kept up to date.

Prevalence

The 404 "Not Found" response is familiar to even the occasional Web user. A number of studies have examined the prevalence of link rot on the Web, in academic literature, and in digital libraries. In a 2003 experiment, Fetterly et al. discovered that about one link out of every 200 disappeared each week from the internet. McCown et al. (2005) discovered that half of the URLs cited in D-Lib Magazine articles were no longer accessible 10 years after publication, and other studies have shown link rot in academic literature to be even worse (Spinellis, 2003, Lawrence et al., 2001). Nelson and Allen (2002) examined link rot in digital libraries and found that about 3% of the objects were no longer accessible after one year.

Discovering

Detecting link rot for a given URL is difficult using automated methods. If a URL is accessed and returns an HTTP 200 (OK) response, it may be considered accessible, but the contents of the page may have changed and may no longer be relevant. Some web servers also return a soft 404, a page returned with a 200 (OK) response (instead of a 404 that indicates the URL is no longer accessible). Bar-Yossef et al. (2004) developed a heuristic for automatically discovering soft 404s.^{[citation needed]}

Combating

Due to the unprofessional image that dead links bring to both sites linking and linked to, there are multiple solutions that are available to tackle them — some working to prevent them in the first place, and others trying to resolve them when they have occurred. There are several tools that have been developed to help combat link rot.

Server side

Avoiding unmanaged hyperlink collections

Avoiding links to pages deep in a website ("deep linking")

Using redirection mechanisms (e.g. "301: Moved Permanently") to automatically refer browsers and crawlers to the new location of a URL

Content Management Systems may offer inbuilt solutions to the management of links, e.g. links are updated when content is changed or moved on the site.

WordPress guards against link rot by replacing non-canonical URLs with their canonical versions.^[1]

IBM's Peridot attempts to automatically fix broken links.

Permalinking stops broken links by guaranteeing that the content will never move. Another form of permalinking is linking to a permalink that then redirects to the actual content, ensuring that even though the real content may be moved etc., links pointing to the resources stay intact.

User side

The Linkgraph widget gets the URL of the correct page based upon the old broken URL by using historical location information.

The Google 404 Widget employs Google technology to 'guess' the correct URL, and also provides the user a Google search box to find the correct page.

When a user receives a 404 response, the Google Toolbar attempts to assist the user in finding the missing page.^[2]

Deadurl.com^[3] gathers and ranks alternate urls for a broken link using Google Cache, the Internet Archive, and user submissions.^[4] Typing deadurl.com/ left of a broken link in the browser's address bar and pressing enter loads a ranked list of alternate urls, or (depending on user preference) immediately forwards to the best one.^[5]

Web archiving

To combat link rot, web archivists are actively engaged in collecting the Web or particular portions of the Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. The largest web archiving organization is the Internet Archive, which strives to maintain an archive of the entire Web, taking periodic snapshots of pages that can then be accessed for free via the Wayback Machine and without registration many years later simply by typing in the URL, or automatically by using browser extensions.^[6] National libraries, national archives and various consortia of organizations are also involved in archiving culturally important Web content.

Individuals may also use a number of tools that allow them to archive web resources that may go missing in the future:

WebCite, a tool specifically for scholarly authors, journal editors and publishers to permanently archive "on-demand" and retrieve cited Internet references (Eysenbach and Trudel, 2005).

Archive-It, a subscription service that allows institutions to build, manage and search their own web archive

Some social bookmarking websites, such as Furl, make private copies of web pages bookmarked by their users.

Google keeps a text-based cache (temporary copy) of the pages it has crawled, which can be used to read the information of recently removed pages. However, unlike in archiving services, cached pages are not stored permanently.

The WayBack Machine, at the Internet Archive,^[7] is a free website that archives old web pages. It does not archive websites whose owners have stated they do not want their website archived.

Authors citing URLs

A number of studies have shown how widespread link rot is in academic literature (see below). Authors of scholarly publications have also developed best practices for combating link rot in their work:

Avoiding URL citations that point to resources on a researcher's personal home page (McCown et al., 2005)

Using Persistent Uniform Resource Locators (PURLs) and digital object identifiers (DOIs) whenever possible

Using web archiving services (e.g. WebCite) to permanently archive and retrieve cited Internet references (Eysenbach and Trudel, 2005).

See also

Digital preservation

Internet Archive

Permalink

Slashdot effect

Web archiving

WebCite

Further reading

Link rot on the Web

Ziv Bar-Yossef, Andrei Z. Broder, Ravi Kumar, and Andrew Tomkins (2004). "Sic transit gloria telae: towards an understanding of the Web’s decay". Proceedings of the 13th international conference on World Wide Web. pp. 328–337. doi:10.1145/988672.988716.

Tim Berners-Lee (1998). Cool URIs Don’t Change. http://www.w3.org/Provider/Style/URI.html. Retrieved 2010-09-14.

Gunther Eysenbach and Mathieu Trudel (2005). "Going, going, still there: using the WebCite service to permanently archive cited web pages". Journal of Medical Internet Research 7 (5): e60. doi:10.2196/jmir.7.5.e60. PMC 1550686. PMID 16403724. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=1550686.

Dennis Fetterly, Mark Manasse, Marc Najork, and Janet Wiener (2003). "A large-scale study of the evolution of web pages". Proceedings of the 12th international conference on World Wide Web. http://www2003.org/cdrom/papers/refereed/p097/P97%20sources/p97-fetterly.html. Retrieved 2010-09-14.

Wallace Koehler (2004). "A longitudinal study of web pages continued: A consideration of document persistence". Information Research 9 (2). http://informationr.net/ir/9-2/paper174.html.

John Markwell and David W. Brooks (2002). "Broken Links: The Ephemeral Nature of Educational WWW Hyperlinks". Journal of Science Education and Technology 11 (2): 105–108. doi:10.1023/A:1014627511641.

In academic literature

Daniel Gomes, Mário J. Silva (2006). "Modelling Information Persistence on the Web". Proceedings of The 6th International Conference on Web Engineering (ICWE'06). http://xldb.di.fc.ul.pt/daniel/docs/papers/gomes06urlPersistence.pdf. Retrieved 2010-09-14.

Robert P. Dellavalle, Eric J. Hester, Lauren F. Heilig, Amanda L. Drake, Jeff W. Kuntzman, Marla Graber, Lisa M. Schilling (2003). "Going, Going, Gone: Lost Internet References". Science 302 (5646): 787–788. doi:10.1126/science.1088234. PMID 14593153.

Steve Lawrence, David M. Pennock, Gary William Flake, Robert Krovetz, Frans M. Coetzee, Eric Glover, Finn Arup Nielsen, Andries Kruger, C. Lee Giles (2001). "Persistence of Web References in Scientific Research". Computer 34 (2): 26–31. doi:10.1109/2.901164. http://doi.ieeecomputersociety.org/10.1109/2.901164.

Wallace Koehler (1999). "An Analysis of Web Page and Web Site Constancy and Permanence". Journal of the American Society for Information Science 50 (2): 162–180. doi:10.1002/(SICI)1097-4571(1999)50:2<162::AID-ASI7>3.0.CO;2-B.

Frank McCown, Sheffan Chan, Michael L. Nelson, and Johan Bollen (2005). "The Availability and Persistence of Web References in D-Lib Magazine". Proceedings of the 5th International Web Archiving Workshop and Digital Preservation (IWAW'05). http://www.iwaw.net/05/papers/iwaw05-mccown1.pdf.

Carmine Sellitto (2005). "The impact of impermanent Web-located citations: A study of 123 scholarly conference publications". Journal of the American Society for Information Science and Technology 56 (7): 695–703. doi:10.1002/asi.20159. http://doi.wiley.com/10.1002/asi.20159.

Diomidis Spinellis (2003). "The Decay and Failures of Web References". Communications of the ACM 46 (1): 71–77. doi:10.1145/602421.602422. http://www.spinellis.gr/pubs/jrnl/2003-CACM-URLcite/html/urlcite.html.

In digital libraries

Michael L. Nelson and B. Danette Allen (2002). "Object Persistence and Availability in Digital Libraries". D-Lib Magazine 8 (1). doi:10.1045/january2002-nelson.

References

^ Rønn-Jensen, Jesper (2007-10-05). "Software Eliminates User Errors And Linkrot". Justaddwater.dk. http://justaddwater.dk/2007/10/05/blog-software-eliminates-user-errors-and-linkrot/. Retrieved 2007-10-05.

^ Mueller, John (2007-12-14). "FYI on Google Toolbar's Latest Features". Google Webmaster Central Blog. http://googlewebmastercentral.blogspot.com/2007/12/fyi-on-google-toolbars-latest-features.html. Retrieved 2008-07-09.

^ deadurl.com

^ "DeadURL.com". http://deadurl.com/. Retrieved 2011-03-17. "DeadURL.com gathers as many backup links as possible for each dead url, via Google cache, Archive.org, and user submissions."

^ "DeadURL.com". http://deadurl.com/. Retrieved 2011-03-17. "Just type deadurl.com/ in front of a link that doesn't work, and hit Enter."

^ 404-Error ? :: Add-ons for Firefox

^ archive.org

External links

Future-Proofing Your URIs

Jakob Nielsen, "Fighting Linkrot", Jakob Nielsen's Alertbox, June 14, 1998.

Warrick - a tool for recovering lost websites from the Internet Archive and search engine caches

Pagefactor and UndeadLinks.com - user-contributed databases of moved URLs

W3C Link Checker

mod_brokenlink - Apache module that reports broken links.

Categories:
URL
Data quality

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

link rot — UK US noun [uncountable] computing a situation in which links on websites do not work Thesaurus: internethyponym … Useful english dictionary
link rot — n. The gradual obsolescence of the links on a Web page as the sites they point to become unavailable. Example Citation: Web sites are constantly being moved, renamed, and deleted, all of which contributes to the slow but inevitable process of… … New words
link rot — A slang expression used to describe an out of date URL on a Web page. Link rot occurs when the page indicated by the link is moved or erased. See also link; Uniform Resource Locator … Dictionary of networking
link rot — /ˈlɪŋk rɒt/ (say lingk rot) noun Internet 1. the process by which links on an internet site gradually become inoperative because of changes to the websites to which the links were originally attached. 2. the effect on a website of the gradual… …
link rot — n. process by which links on a Web page became obsolete as the sites they point to change location or disappear … English contemporary dictionary
link rot — UK / US noun [uncountable] computing a situation in which links on websites do not work … English dictionary
rot — • rot röter, rötes|te, seltener roter, rotes|te I. Kleinschreibung {{link}}K 89{{/link}}: – rote Farbe – rote Grütze – die roten Blutkörperchen – der rote Faden – der rote Teppich – der rote Hahn (Feuer) – das rote Ass (Kartenspiel) – er wirkt… … Die deutsche Rechtschreibung
rot — rot, röter / roter, rötest / rotest ; Adj; 1 von der Farbe des Blutes und reifer Tomaten: ein roter Himmel bei Sonnenuntergang; die Fehler in einem Text mit roter Tinte anstreichen; sich die Lippen rot anmalen || K : rotbärtig, rotbraun; rot… … Langenscheidt Großwörterbuch Deutsch als Fremdsprache
Rot (album) — Infobox Album | Name = Rot Type = Album Artist = Sabrina Setlur Released = August 24, 2007 Recorded = July 2006 August 2007; USP Studios, Frankfurt am Main, Germany Genre = Rap, hip hop, electro Length = 67:08 Label = 3p (No. 134) Producer = Bayz … Wikipedia
rot-rot — D✓rot rot, rot|rot {{link}}K 23{{/link}}; die D✓rot rote oder rotrote Koalition (zwischen SPD und Linkspartei); die Stimmen von D✓Rot Rot oder Rotrot … Die deutsche Rechtschreibung

Academic Dictionaries and Encyclopedias

Link rot

Contents

Causes

Prevalence

Discovering

Combating

Server side

User side

Web archiving

Authors citing URLs

See also

Further reading

Link rot on the Web

In academic literature

In digital libraries

References

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Link rot

Contents

Causes

Prevalence

Discovering

Combating

Server side

User side

Web archiving

Authors citing URLs

See also

Further reading

Link rot on the Web

In academic literature

In digital libraries

References

External links

Look at other dictionaries:

Share the article and excerpts

Direct link