Grid FileSystem

Grid FileSystem

A Grid File System is a computer file system whose goal is improved reliability and availability by taking advantage of many smaller file storage areas. [cite web
url=http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V06-4KKFP1T-2&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_version=1&_urlVersion=0&_userid=10&md5=b250a0c9bd05f08b4dfb63ee02bb304c
title=ScienceDirect - Future Generation Computer Systems : Towards a complete grid filesystem functionality
publisher=www.sciencedirect.com
accessdate=2008-08-18
last=
first=
]

Components

Current file systems contain up to three components:-File Table (FAT table, MFT, etc)-File Data-MetaData (user permissions, etc)

A Grid File System would have similar needs:-File Table (or search index)-File Data-MetaData

Comparisons

Because current File Systems are designed to appear as a single disk for a single computer to manage (entirely), many new challenges arise in a grid scenario whereby any single disk within the grid should be capable of handling requests for any data contained in the grid.

Features

Most file storage utilizes layers of redundancy to achieve a high level of data protection (inability to lose data). Current means of redundancy include replication and parity checks. Such redundancy can be implemented via a RAID array (whereby multiple physical disks appear to a local computer as a single disk, which may include data replication, and/or disk partitioning).Similarly, a Grid File System would consist of some level of redundancy (either at the logical file level, or at the block level, possibly including some sort of parity check) across the various disks present in the "Grid".

Framework

First and foremost, a File Table mechanism is necessary. Additionally, the file table must include a mechanism for locating the (target/destination) file within the grid.Secondly, a mechanism for working with File Data must exist. This mechanism is responsible for making File Data available to requests.

Implementation

With the recent advent of Torrent technology, a parallel can be drawn to a Grid File System, in that a torrent tracker (and search engine) would be the "File Table", and the torrent applications (transmitting the files) would be the "File Data" component.An RSS-Feed like mechanism could be utilized by File Table nodes to indicate when new files are added to the table, to instigate replication and other similar components.

A File system which incorporates Torrent technology (distributed replication, distributed data request/fulfillment) would likely be a good start for such a technology.

If both such systems (file table, and file data) were capable of being addressed as a single entity (ie: using virtual nodes in a cluster), then growth into such a system could be easily controlled simply by deciding which uses the grid member would be responsible (File Table and file lookups, and/or File Data).

Availability

Assuming there exists some method of managing data replication (assigning quotas, etc) autonomously within the grid, data could be configured for high availability, regardless of loss or outage.

Troubles

The largest problem currently revolves around distributing data updates. Torrents support minimal heiarchy (currently implemented either as metaData in the torrent tracker, or strictly as UI and basic categorization). Updating multiple nodes concurrently (assuming atomic transactions are required) presents latency during updates and additions, usually to the point of not being feasible.Additionally, a grid (network based) file system breaks traditional TCP/IP paradigms in that a File System (generally low level, ring 0 type of operations) require complicated TCP/IP implementations, introducing layers of abstraction and complication to the process of creating such a grid file system.

Examples

Current examples of high available data include:Network Load Balancing / CARP - splitting incoming requests to multiple computers, usually configured identically or as one wholeShared Storage Clustering / SANs - a single disk (one or more physical disks acting as a single logical disk) is presented to multiple computers which split incoming requests. This is usually used when more computing power is required than disk access.Data Replication / Mirroring - multiple computers may attempt to synchronize data (usually point-in-time or snapshot based). Used more often for either Reporting (based on last snapshot) or backup purposes.Data Partitioning - splitting data among multiple computers. In databases, data is often partitioned based on tables (certain tables exist on certain computers, or a table is split among multiple computers at certain "break points")... general files tend to be partitioned either by category (cetegory based folders), or location (geographically separated).

Grid computing would bring the benefits from many such solutions, if it were widely adopted.

ee also

;Concepts & related technology
* Distributed computing
* List of distributed computing projects
* Grid Computing

References


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Grid-oriented storage — (GOS) is a dedicated data storage architecture which can be connected directly to a computational grid to support advanced data bank services and reservoirs for data that can be shared among multiple computers and end users on the grid.… …   Wikipedia

  • Hadoop — Infobox Software name = Apache Hadoop caption = developer = Apache Software Foundation latest release version = 0.18.0 latest release date = release date|2008|08|22 latest preview version = latest preview date = operating system = Cross platform… …   Wikipedia

  • Plan 9 from Bell Labs — Infobox OS name = Plan 9 from Bell Labs caption = Glenda, the Plan 9 Bunny developer = Bell Labs source model = Free software/Open source kernel type = Hybrid supported platforms = x86, MIPS, DEC Alpha, SPARC, PowerPC, ARM ui = rio / rc family =… …   Wikipedia

  • Abkürzungen/Computer — Dies ist eine Liste technischer Abkürzungen, die im IT Bereich verwendet werden. A [nach oben] AA Antialiasing AAA authentication, authorization and accounting, siehe Triple A System AAC Advanced Audio Coding AACS …   Deutsch Wikipedia

  • Liste der Abkürzungen (Computer) — Dies ist eine Liste technischer Abkürzungen, die im IT Bereich verwendet werden. A [nach oben] AA Antialiasing AAA authentication, authorization and accounting, siehe Triple A System AAC Advanced Audio Coding AACS …   Deutsch Wikipedia

  • IBM General Parallel File System — Infobox Software name = IBM GPFS caption = developer = IBM latest release version = 3.2.1 6 latest release date = September 2008 operating system = AIX / Linux / Microsoft Windows Server 2003 R2 genre = filesystem license = Proprietary website =… …   Wikipedia

  • Oracle (DBMS) — Oracle Datenbank Basisdaten Entwickler: Oracle Aktuelle Version: 11g (10. Juli 2007) …   Deutsch Wikipedia

  • Oracle Database — Oracle Datenbank Basisdaten Entwickler: Oracle Aktuelle Version: 11g (10. Juli 2007) …   Deutsch Wikipedia

  • SQL*Plus — Oracle Datenbank Basisdaten Entwickler: Oracle Aktuelle Version: 11g (10. Juli 2007 …   Deutsch Wikipedia

  • Hadoop — Apache Hadoop Logotipo de Hadoop Desarrollador Apache Software Foundation http://hadoop.apache.org/ Información general …   Wikipedia Español

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”