Web mining

Web mining

Web mining - is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining.

Web usage mining

Web usage mining is the application that uses data mining to analyse and discover interesting patterns of user’s usage data on the web. The usage data records the user’s behaviour when the user browses or makes transactions on the web site. It is an activity that involves the automatic discovery of patterns from one or more Web servers. Organizations often generate and collect large volumes of data; most of this information is usually generated automatically by Web servers and collected in server log. Analyzing such data can help these organizations to determine the value of particular customers, cross marketing strategies across products and the effectiveness of promotional campaigns, etc.

The first web analysis tools simply provided mechanisms to report user activity as recorded in the servers. Using such tools, it was possible to determine such information as the number of accesses to the server, the times or time intervals of visits as well as the domain names and the URLs of users of the Web server. However, in general, these tools provide little or no analysis of data relationships among the accessed files and directories within the Web space.Now more sophisticated techniques for discovery and analysis of patterns are emerging. These tools fall into two main categories: Pattern Discovery Tools and Pattern Analysis Tools.

Another interesting application of Web Usage Mining is Web Link recommendation. One of the last trend is represented by the online monitoring of page accesses to render personalized pages on the basis of similar visit patterns.

Web content mining

Web content mining is the process to discover useful information from text, image, audio or video data in the web. Web content mining sometimes is called web text mining, because the text content is the most widely researched area. The technologies that are normally used in web content mining are NLP (Natural language processing) and IR (Information retrieval).

Web structure mining

Web structure mining is the process of using graph theory to analyse the node and connection structure of a web site. According to the type of web structural data, web structure mining can be divided into two kinds.

The first kind of web structure mining is extracting patterns from hyperlinks in the web. A hyperlink is a structural component that connects the web page to a different location. The other kind of the web structure mining is mining the document structure. It is using the tree-like structure to analyse and describe the HTML (Hyper Text Markup Language) or XML (eXtensible Markup Language) tags within the web page.

also

Resources

Books

* Jesus Mena, "Data Mining Your Website", Digital Press, 1999
* Soumen Chakrabarti, "Mining the Web: Analysis of Hypertext and Semi Structured Data", Morgan Kaufmann, 2002
* Bing Liu, [http://www.cs.uic.edu/~liub/WebMiningBook.html "Web Data Mining: Exploring Hyperlinks, Contents and Usage Data"] , Springer, 2007
* Advances in Web Mining and Web Usage Analysis 2005 - revised papers from 7 th workshop on Knowledge Discovery on the Web, Olfa Nasraoui, Osmar Zaiane, Myra Spiliopoulou, Bamshad Mobasher, Philip Yu, Brij Masand, Eds., Springer Lecture Notes in Artificial Intelligence, LNAI 4198, 2006
* Web Mining and Web Usage Analysis 2004 - revised papers from 6 th workshop on Knowledge Discovery on the Web, Bamshad Mobasher, Olfa Nasraoui, Bing Liu, Brij Masand, Eds., Springer Lecture Notes in Artificial Intelligence, 2006
* Mike Thelwall, [http://linkanalysis.wlv.ac.uk/ "Link Analysis: An Information Science Approach"] , 2004, Academic Press

oftware

* Web-scraping_software_comparison

Bibliographic references

* Baraglia, R. Silvestri, F. (2007) [http://soave.isti.cnr.it/%7Esilvestr/wp-content/uploads/2007/03/p63-baraglia.pdf "Dynamic personalization of web sites without user intervention"] , In Communication of the ACM 50(2): 63-67
* Cooley, R. Mobasher, B. and Srivastave, J. (1997) “Web Mining: Information and Pattern Discovery on the World Wide Web” In Proceedings of the 9th IEEE International Conference on Tool with Artificial Intelligence
* Cooley, R., Mobasher, B. and Srivastava, J. “Data Preparation for Mining World Wide Web Browsing Patterns”, Journal of Knowledge and Information System, Vol.1, Issue. 1, pp.5-32, 1999
* Kohavi, R., Mason, L. and Zheng, Z. (2004) “Lessons and Challenges from Mining Retail E-commerce Data” Machine Learning, Vol 57, pp. 83-113
* Lillian Clark, I-Hsien Ting, Chris Kimble, Peter Wright, Daniel Kudenko (2006) [http://informationr.net/ir/11-2/paper249.html "Combining ethnographic and clickstream data to identify user Web browsing strategies"] Journal of Information Research, Vol. 11 No. 2, January 2006
* Eirinaki, M., Vazirgiannis, M. (2003) "Web Mining for Web Personalization", ACM Transactions on Internet Technology, Vol.3, No.1, February 2003
* Mobasher, B., Cooley, R. and Srivastava, J. (2000) “Automatic Personalization based on web usage Mining” Communications of the ACM, Vol. 43, No.8, pp. 142-151
* Mobasher, B., Dai, H., Kuo, T. and Nakagawa, M. (2001) “Effective Personalization Based on Association Rule Discover from Web Usage Data” In Proceedings of WIDM 2001, Atlanta, GA, USA, pp. 9-15
* Nasraoui O., Petenes C., [http://webmining.spd.louisville.edu/Websites/PAPERS/conference/Nasraoui_WebKDD03_web_recomm.pdf "Combining Web Usage Mining and Fuzzy Inference for Website Personalization"] , in Proc. of WebKDD 2003 – KDD Workshop on Web mining as a Premise to Effective and Intelligent Web Applications, Washington DC, August 2003, p. 37
* Nasraoui O., Frigui H., Joshi A., and Krishnapuram R., [http://webmining.spd.louisville.edu/Websites/PAPERS/conference/Nasraoui-IFSA-99-mining-web-access-logs.pdf “Mining Web Access Logs Using Relational Competitive Fuzzy Clustering”] , Proceedings of the Eighth International Fuzzy Systems Association Congress, Hsinchu, Taiwan, August 1999
* Nasraoui O., [http://webmining.spd.louisville.edu/Websites/PAPERS/book_chapter/FINAL-Nasraoui-WWW-Personalization.htm “World Wide Web Personalization,”] Invited chapter in “Encyclopedia of Data Mining and Data Warehousing”, J. Wang, Ed, Idea Group, 2005
* Pierrakos, D., Paliouras, G., Papatheodorou, C., Spyropoulos C. D. (2003) “Web usage mining as a tool for personalization: a survey”, User modelling and user adapted interaction journal, Vol.13, Issue 4, pp. 311-372
* I-Hsien Ting, Chris Kimble, Daniel Kudenko (2005)"A Pattern Restore Method for Restoring Missing Patterns in Server Side Clickstream Data"
* I-Hsien Ting, Chris Kimble, Daniel Kudenko (2006) [http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/wi/&toc=comp/proceedings/wi/2005/2415/00/2415toc.xml&DOI=10.1109/WI.2005.153 "UBB Mining: Finding Unexpected Browsing Behaviour in Clickstream Data to improve a Web Site’s Design"]

Related Conference

* [http://www-users.cs.york.ac.uk/~derrick/WMEE2008/ WMEE 2008] : The Second Workshop on Web Mining for E-commerce and E-Services 2008
* [http://www.cs.york.ac.uk/~derrick/WMEE2007 WMEE 2007] : Workshop on Web Mining for E-commerce and E-Services 2007
* [http://webmining.spd.louisville.edu/webkdd06/ WebKDD 2006] : SIGKDD Workshop on Web Mining and Web Usage Analysis
* [http://www.kde.cs.uni-kassel.de/ws/webmine2006/ WebMine 2006] :Workshop on Web Mining 2006
* [http://orestes.ii.uam.es/workshop/ WebConMine 2006] : Workshop on Web Content Mining 2006

External links

* [http://www.galeas.de/webmining.html Web Mining] by Patricio Galeas
* [http://webmining.spd.louisville.edu/Websites/tutorials/Chapter2-approaches-mining-web.pdf Tutorial on Web Mining] by Olfa Nasraoui, University of Louisville
* [http://webmining.spd.louisville.edu/Websites/PAPERS/book_chapter/FINAL-Nasraoui-WWW-Personalization.htm Tutorial on Web Personalization] by Olfa Nasraoui, University of Louisville
* [http://www.cs.uic.edu/~liub/ACL-07-tutorial-WCM-to-NLP.pdf Tutorial on Web Content Mining] by Bing Liu, University of Illinois at Chicago
* [http://www.thewebwatcher.com TheWebWatcher monitoring service]
* [http://www.wessex.ac.uk/news/maksym.html A fuzzy logic based Web Mining system for click stream analysis] by Maksym Rusynyk
* [http://jwebpro.sourceforge.net/ A Java-based Web Processing Toolkit]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Web-Mining — Unter Web Mining versteht man die Übertragung von Techniken des Data Mining zur (teil)automatischen Extraktion von Informationen aus dem Internet, speziell dem World Wide Web. Web Mining übernimmt Verfahren und Methoden aus den Bereichen… …   Deutsch Wikipedia

  • Web Mining — Unter Web Mining (web mining) auch Webmining versteht man die Übertragung von Techniken des Data Mining zur (teil)automatischen Extraktion von Informationen aus dem Internet, speziell dem World Wide Web. Webmining übernimmt Verfahren und Methoden …   Deutsch Wikipedia

  • Web mining — El Web mining o Webmining es una metodología de recuperación de la información que usa herramientas de la minería de datos para extraer información tanto del contenido de las páginas, de su estructura de relaciones (enlaces) y de los registro de… …   Wikipedia Español

  • Web-Forschung — bezeichnet die empirische und methodische Forschung über oder mit Hilfe des Internets. Sie weist dabei drei Hauptbezüge auf: Internet als Gegenstand Internet als Medium Internet als Methode Inhaltsverzeichnis 1 Begriffliche Abgrenzung 2… …   Deutsch Wikipedia

  • Web Scraping — Der Begriff Screen Scraping (engl., etwa: „Bildschirm auskratzen“) umfasst generell alle Verfahren zum Auslesen von Texten aus Computerbildschirmen. Gegenwärtig wird der Ausdruck jedoch beinahe ausschließlich in Bezug auf Webseiten verwendet… …   Deutsch Wikipedia

  • Mining in South Africa — has been the main driving force behind the history and development of Africa s most advanced and richest economy. Large scale and profitable mining started with the discovery of a diamond on the banks of the Orange River in 1867 by Erasmus Jacobs …   Wikipedia

  • Mining in Egypt — has had a long history that goes back to predynastic times. Egypt has substantial mineral resources, including 48 million tons of tantalite (fourth largest in the world), 50 million tons of coal, and an estimated 6.7 million ounces of gold in the …   Wikipedia

  • Mining in Cornwall — first began in the early Bronze Age approximately 2,150 BC and ended with the South Crofty tin mine closing in 1998.HistoryMining in Cornwall has existed from the early Bronze Age around 2,150 B.C. Cornwall is thought to have been visited by… …   Wikipedia

  • Web 3.0 — is one of the terms used to describe the evolutionary stage of the Web that follows Web 2.0. Given that technical and social possibilities identified in this latter term are yet to be fully realised the nature of defining Web 3.0 is highly… …   Wikipedia

  • Mining industry of Ghana — accounts for 5% of the country s GDP and minerals make up 37% of total exports, of which gold contributes over 90% of the total mineral exports. Thus, the main focus of Ghana s mining and minerals development industry remains focused on gold.… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”