Metadata discovery

Metadata discovery: In metadata, metadata discovery is the process of using automated tools to discover the semantics of a data element in data sets. This process usually ends with a set of mappings between the data source elements and a centralized metadata registry.

Metadata discovery is also known as metadata scanning.

Contents

1 Data source formats for metadata discovery

2 A taxonomy of metadata matching algorithms

2.1 Lexical Matching

2.2 Semantic Matching

2.3 Statistical Matching

3 Vendors

4 Research

5 See also

6 References

Data source formats for metadata discovery

Data sets may be in a variety of different forms including:

Relational databases

Spreadsheets

XML files

Web services

Software source code such as Fortran, Jovial, COBOL, Assembler, RPG, PL/1, EasyTrieve, Java, C# or C++ classes, and hundreds of other software languages

Unstructured text documents such as Microsoft Word or PDF files

A taxonomy of metadata matching algorithms

There are distinct categories of automated metadata discovery:

Lexical Matching

Exact match - where data element linkages are made based on the exact name of a column in a database, the name of an XML element or a label on a screen. For example if a database column has the name "PersonBirthDate" and a data element in a metadata registry also has the name "PersonBirthDate", automated tools can infer that the column of a database has the same semantics (meaning) as the data element in the metadata registry.

Synonym match - where the discovery tool in not just given a single name but a set of synonym.

Pattern match - in this case the tools is given a set of lexical patterns that it can match. For example the tools may search for "*gender*" or "*sex*"

Semantic Matching

Semantic matching attempts to use semantics to associate target data with registered data elements.

Semantic Similarity - In this algorithm that relies on a database of word conceptual nearness is used. For example the WordNet system can rank how close words are conceptually to each other. For example the terms "Person", "Individual" and "Human" may be highly similar concepts.

Statistical Matching

Statistical matching uses statistics about data sources data itself to derive similarities with registered data elements.

Distinct Value Analysis - By analyzing all the distinct values in a column the similarity to a registered data element may be made. For example if a column only has two distinct values of 'male' and 'female' this could be mapped to 'PersonGenderCode'.

Data distribution analysis - By analyzing the distribution of values within a single column and comparing this distribution with known data elements a semantic linkage could be inferred.

Vendors

The following vendors (listed in alphabetical order) provide metadata discovery and metadata mapping software and solutions

Esquire Innovations (see [7)

IBM

InfoLibrarian Corporation (see [1])

Masai Technologies (see [2])

Revelytix (see [3])

Sliver Creek Systems (see [4])

Sypherlink: Harvester (see [5])

Unicorn Systems (see [6])

Research

INDUS project at the Iowa State University (see [7])

Mercury - A Distributed Metadata Management and Data Discovery System developed at the Oak Ridge National Laboratory DAAC (see [8]) ^[1]

See also

metadata

data mapping

data warehouse

semantic web

Defense Discovery Metadata Specification

References

^ Devarakonda, R., Palanisamy, G., Wilson, B., and Green, J., "Mercury: reusable metadata management, data discovery and access system", Earth Science Informatics (Springer Berlin / Heidelberg) 3 (1): 87–94, doi:10.1007/s12145-010-0050-7

Massive Data Analysis Systems by San Diego Supercomputer Center June 1997

IBM Whitepaper on Enterprise Metadata Discovery

White Paper on Metadata Management - by Esquire Innovations

Categories:
Metadata

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

Metadata — For the page on metadata about Wikipedia, see Wikipedia:Metadata. The term metadata is an ambiguous term which is used for two fundamentally different concepts (types). Although the expression data about data is often used, it does not apply to… … Wikipedia
Metadata Object Description Schema — (MODS) MODS Logo XML based bibliographic description schema developed by the United States Library of Congress Network Development and Standards Office. MODS was designed as a compromise between the complexity of the … Wikipedia
Discovery Net — is one of the earliest examples of a scientific workflow system allowing users to coordinate the execution of remote services based on Web service and Grid Services (OGSA and Open Grid Services Architecture) standards. The system was designed and … Wikipedia
Discovery Project — The Discovery Project ( Digital Semantic Corpora for Virtual Research in Philosophy ) is an international consortium of content providers and software developers funded under the aegis of the European Commission s e Contentplus program to develop … Wikipedia
Department of Defense Discovery Metadata Specification — The Department of Defense Discovery Metadata Specification (DoD Discovery Metadata Specification or DDMS) is a Net Centric Enterprise Services (NCES) metadata initiative. DDMS is loosely based on the Dublin Core vocabulary. DDMS defines discovery … Wikipedia
Defense Discovery Metadata Specification — (DDMS or DoD Discovery Metadata Specification) is a Net Centric Enterprise Services (NCES) metadata initiative. DDMS defines discovery metadata elements for resources posted to community and organizational shared spaces. Sometimes (incorrectly)… … Wikipedia
Electronic discovery — Electronic discovery, or e discovery , refers to discovery in civil litigation which deals with information in electronic format also referred to as Electronically Stored Information ESI . In this context, electronic form is the representation of … Wikipedia
Mercury: Metadata Search System — Mercury is a Distributed Metadata Management, Data Discovery and Access System [1]. It is a scientific data search system to capture and manage biogeochemical and ecological data in support of the National Aeronautics and Space Administration s… … Wikipedia
Application Discovery and Understanding — (ADU) is the process of automatically analyzing artifacts of a software application and determining metadata structures associated with the application in the form of lists of data elements and business rules. The relationships discovered between … Wikipedia
Knowledge discovery — is a concept of the field of computer science that describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data. It is often described as deriving knowledge from the input… … Wikipedia

Academic Dictionaries and Encyclopedias

Metadata discovery

Contents

Data source formats for metadata discovery

A taxonomy of metadata matching algorithms

Lexical Matching

Semantic Matching

Statistical Matching

Vendors

Research

See also

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Metadata discovery

Contents

Data source formats for metadata discovery

A taxonomy of metadata matching algorithms

Lexical Matching

Semantic Matching

Statistical Matching

Vendors

Research

See also

References

Look at other dictionaries:

Share the article and excerpts

Direct link