Data classification (data management)

Data classification (data management)

In the field of data management, data classification as a part of Information Lifecycle Management (ILM) process can be defined as tool for categorization of data to enable/help organization to effectively answer following questions:

  • What data types are available?
  • Where are certain data located?
  • What access levels are implemented?
  • What protection level is implemented and does it adhere to compliance regulations?

When implemented it provides a bridge between IT professionals and process or application owners. IT staff is informed about the data value and on the other hand management (usually application owners) understands better to what segment of data centre has to be invested to keep operations running effectively. This can be of particular importance in risk management, legal discovery, and compliance with government regulations. Data classification is typically a manual process; however, there are many tools from different vendors that can help gather information about the data.

Contents

How to start process of data classification?

First step is to evaluate and divide the various applications and data as follows:

  • Structured data (statistically around 15% of data)
    • Generally describes proprietary data which can be accessible only through application or application programming interfaces (API)
    • Applications that produce structured data are usually database applications.
    • This type of data usually brings complex procedures of data evaluation and migration between the storage tiers.
    • To ensure adequate quality standards, the classification process has to be monitored by Subject Matter Experts.
  • Unstructured data (all other data that cannot be categorized as structured around 85%).
    • Generally describes data files that has no physical interconnectivity (e.g. documents, pictures, multimedia files, ... ).
    • Relatively simple process of data classification is criteria assignment.
    • Simple process of data migration between assigned segments of predefined storage tiers.

Basic criteria for unstructured data classification

  • Time criteria is the simplest and most commonly used where different type of data is evaluated by time of creation, time of access, time of update, etc.
  • Metadata criteria as type, name, owner, location and so on can be used to create more advanced classification policy
  • Content criteria which involve usage of advanced content classification algorithms are most advanced forms of unstructured data classification

Basic criteria for structured data classification

These criteria are usually initiated by application requirements such as:

  • Disaster recovery and Business Continuity rules
  • Data centre resources optimization and consolidation
  • Hardware performance limitations and possible improvements by reorganization

Benefits of data classification

Benefits of effective implementation of appropriate data classification can significantly improve ILM process and save data centre storage resources. If implemented systemically it can generate improvements in data centre performance and utilization. Data classification can also reduce costs and administration overhead. "Good enough" data classification can produce these results:

  • Data compliance and easier risk management. Data are located where expected on predefined storage tier and "point in time"
  • Simplification of data encryption because all data need not be encrypted. This saves valuable processor cycles and all related consecutiveness.
  • Data indexing to improve user access times
  • Data protection is redefined where RTO (Recovery Time Objective) is improved.

See also

References


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Master data management — In computing, master data management (MDM) comprises a set of processes and tools that consistently defines and manages the non transactional data entities of an organization (which may include reference data). MDM has the objective of providing… …   Wikipedia

  • Clinical data management system — A clinical data management system or CDMS is a tool used in clinical research to manage the data of a clinical trial. The clinical trial data gathered at the investigator site in the case report form are stored in the CDMS. To reduce the… …   Wikipedia

  • Master Data Management — In computing, master data management (MDM) comprises a set of processes and tools that consistently defines and manages the non transactional data entities of an organization (also called reference data). MDM has the objective of providing… …   Wikipedia

  • Data profiling — is the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data. The purpose of these statistics may be to: Find out whether existing data can easily… …   Wikipedia

  • Data classification — may refer to: Data classification (data management) Data classification (business intelligence) Classification (machine learning), classification of data using machine learning algorithms Assigning a level of sensitivity to classified information …   Wikipedia

  • Management Dynamics — Type Private Industry Supply Chain Management Enterprise Software Founded 1990 Headquar …   Wikipedia

  • Data mining — Not to be confused with analytics, information extraction, or data analysis. Data mining (the analysis step of the knowledge discovery in databases process,[1] or KDD), a relatively young and interdisciplinary field of computer science[2][3] is… …   Wikipedia

  • Data center — An operation engineer overseeing a Network Operations Control Room of a data center. A data center (or data centre or datacentre or datacenter) is a facility used to house computer systems and associated components, such as telecommunications and …   Wikipedia

  • Data model — Overview of data modeling context: A data model provides the details of information to be stored, and is of primary use when the final product is the generation of computer software code for an application or the preparation of a functional… …   Wikipedia

  • Data Mining — Exploration de données L’exploration de données, aussi connue sous les noms fouille de données, data mining (forage de données) ou encore Extraction de Connaissances à partir de Données (ECD en français, KDD en Anglais), a pour objet l’extraction …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”