- Knowledge Grid
The Knowledge Grid is a software system based on a set of services for
knowledge discovery on theGrid . The main goal is to enable the collaboration of scientists and professionals that must mine data stored in different research centers as well as executive managers that must use a knowledge management system operating on several data warehouses located in different company establishments.The Knowledge Grid uses the basic Grid services and defines a set of additional layers to implement the services of distributed knowledge discovery on globally connected computers where each node can be a sequential or a parallel machine.
The Knowledge Grid enables the collaboration of scientists that must mine data stored in different research centers as well as analysts that must use a knowledge management system operating on several
data warehouse s located in the different company establishments. [The Knowledge Grid, ByMario Cannataro andDomenico Talia ,Communications of the ACM January 2003/Vol. 46, No. 1, pp 89-93]The Knowledge Grid framework supports data mining on the Grid by providing mechanisms and higher level services for
• searching resources,
• representing, creating, and managing knowledge discovery processes, and
• composing existing data services and data mining services as structured, compound services,
to allow users to design, store, document, verify, share, and re-execute their applications, as well as manage their output results.
The Knowledge Grid services are organized in two hierarchical levels: the Core K-Grid layer and the High-level K-Grid layer. The High-level K-Grid layer includes services to compose, validate, and execute a distributed knowledge discovery computation. The main services of the High-level K-Grid layer are:
• The Data Access Service (DAS), responsible for the publication and searching of data to be mined (data sources), as well as the search of inferred models (mining results).
• The Tools and Algorithms Access Service (TAAS), responsible for publishing and searching extraction tools, data mining tools, and visualization tools.
• The Execution Plan Management Service (EPMS). An execution plan is represented by a graph describing interactions and data flows between data sources, extraction tools, data mining tools, and visualization tools. The EPMS allows for defining the structure of an application by building the corresponding execution graph and adding a set of constraints about resources. The execution plan generated by this service is referred to as an abstract execution plan, because it may include both well identified resources and abstract resources, i.e., resources that are defined through constraints about their features, but are not known a priori.
• The Results Presentation Service (RPS) offers facilities for presenting and visualizing the extracted knowledge models (e.g., association rules, clustering models, and classifications).
The Core K-Grid layer offers basic services for the management of metadata describing features of hosts, data sources, data mining tools, and visualization tools. This layer also coordinates the application execution by attempting to fulfill the application requirements with respect to available Grid resources. The Core K-Grid layer comprises two main services:
• The Knowledge Directory Service (KDS), responsible for handling metadata describing Knowledge Grid resources. Such resources include hosts, data repositories, tools and algorithms used to extract, analyze, and manipulate data, distributed knowledge discovery execution plans, and knowledge models obtained as a result of mining processes. The metadata information is represented by XML documents stored in a Knowledge Metadata Repository (KMR).
• The Resource Allocation and Execution Management Service (RAEMS), used to find a suitable mapping between an abstract execution plan and available resources, with the goal of satisfying the constraints (e.g., CPU, storage, memory, database, and network bandwidth requirements) imposed by the execution plan. The output of this process is an instantiated execution plan, which defines the resource requests for each data mining process. Generated execution plans are stored in the Knowledge Execution Plan Repository (KEPR). After the execution plan activation, this service manages the application execution and the storing of results in the Knowledge Base Repository (KBR).
Currently, the Knowledge Grid mechanisms are being designed and implemented following the Service Oriented Architecture (SOA) model. In particular, the so-called Open Grid Services Architecture (OGSA) paradigm and the emerging Web Services Resource Framework (WSRF) family of standards are being adopted for re-implementing the Knowledge Grid services. These services will permit the design and orchestration of distributed data mining applications running on large-scale, OGSA-based Grids.
Notes
This paper is a duplicate publication!
Wikimedia Foundation. 2010.