Structure mining

Structure mining

Structure mining or structured data mining is the process of finding and extracting useful information from semi structured data sets. Graph mining is a special case of structured data mining[citation needed].

Contents

Description

The growth of the use of semi-structured data has created new opportunities for data mining, which has traditionally been concerned with tabular data sets, reflecting the strong association between data mining and relational databases. Much of the world's interesting and mineable data does not easily fold into relational databases, though a generation of software engineers have been trained to believe this was the only way to handle data, and data mining algorithms have generally been developed only to cope with tabular data.

XML, being the most frequent way of representing semi-structured data, is able to represent both tabular data and arbitrary trees. Any particular representation of data to be exchanged between two applications in XML is normally described by a Schema often written in XSD. Practical examples of such Schemata, for instance NewsML, are normally very sophisticated, containing multiple optional subtrees, used for representing special case data. Frequently around 90% of a Schema is concerned with the definition of these optional data items and sub-trees.

Messages and data, therefore, that are transmitted or encoded using XML and that conform to the same Schema are liable to contain very different data depending on what is being transmitted.

Such data presents large problems for conventional data mining. Two messages that conform to the same Schema may have little data in common. Building a training set from such data means that if one were to try to format it as tabular data for conventional data mining, large sections of the tables would or could be empty.

There is a tacit assumption made in the design of most data mining algorithms that the data presented will be complete. Many algorithms perform badly with incomplete data sets, for instance those based on neural networks.[citation needed]

XPath is the standard mechanism used to refer to nodes and data items within XML. It has similarities to standard techniques for navigating directory hierarchies used in operating systems user interfaces. To data and structure mine XML data of any form, at least two extensions are required to conventional data mining. These are the ability to associate an XPath statement with any data pattern and sub statements with each data node in the data pattern, and the ability to mine the presence and count of any node or set of nodes within the document.

As an example, if one were to represent a family tree in XML, using these extensions one could create a data set containing all the individuals in the tree, data items such as name and age at death, and counts of related nodes, such as number of children. More sophisticated searches could extract data such as grandparents' lifespans etc.

The addition of these data types related to the structure of a document or message facilitates structure mining.

The other desideratum is that the actual mining algorithms employed, whether supervised or unsupervised, must be able to handle sparse data. In practice the set of data mining algorithms that are best at handling sparse data are those that process the training set data into trees of related patterns. These are frequently descendants of or take their inspiration from Ross Quinlan's ID3 algorithm.[citation needed]

See also

External links

References


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Mining in Bolivia — Mining in Potosí Mining in Bolivia has been a dominant feature of the Bolivian economy as well as Bolivian politics since 1557. Colonial era silver mining in Bolivia, particularly in Potosí, played a critical role in the Spanish Empire and the… …   Wikipedia

  • Mining industry of Ghana — accounts for 5% of the country s GDP and minerals make up 37% of total exports, of which gold contributes over 90% of the total mineral exports. Thus, the main focus of Ghana s mining and minerals development industry remains focused on gold.… …   Wikipedia

  • Mining Association of the United Kingdom — Formation 1946 Legal status Non profit company Purpose/focus Mining businesses in the United Kingdom Location Expert House, Sandford Street, Lichfield, Staffordshire, WS13 6QA …   Wikipedia

  • Mining in Namibia — Mining is the biggest contributor to Namibia s economy in terms of revenue. It accounts for 25% of the country s income.[1] Its contribution to the gross domestic product (10.4% in 2009) is also very important and makes it one of the largest… …   Wikipedia

  • Structure pyramidale des ligues de football en angleterre — La structure pyramidale des ligues de football en Angleterre (English football league system en anglais) désigne le système de classement officiel des ligues et divisions du football anglais (certains clubs gallois évoluant en Angleterre sont… …   Wikipédia en Français

  • Mining in the United States — Contents 1 History 2 Mining by mineral 3 Mining by state 4 See also Histor …   Wikipedia

  • Mining in Bhutan — Mining of industrial minerals was insignificant to Bhutan’s economy except for the production of ferrosilicon. The country’s rugged terrain provides sites to harvest hydropower, which has driven rapid growth in the transport and construction… …   Wikipedia

  • mining — /muy ning/, n. 1. the act, process, or industry of extracting ores, coal, etc., from mines. 2. the laying of explosive mines. [1250 1300; ME: undermining (walls in an attack); see MINE2, ING1] * * * I Excavation of materials from the Earth s… …   Universalium

  • Mining industry of Russia — The mineral industry of Russia is one of the world s leading mineral industries and accounts for a large percentage of the Commonwealth of Independent States production of a range of mineral products, including metals, industrial minerals, and… …   Wikipedia

  • Mining in Limburg — Coal mining in Limburg, a province of the Netherlands, has taken place since the 16th century. Near the Augustinian Abbey of Rolduc, coal is found very close to the surface. The abbey owned the coal, and beginning in the 16th century hired local… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”