Data set (IBM mainframe)

Data set (IBM mainframe)

data set (archaic), dataset (preferred), is a computer file having a record organization. The term pertains to the IBM mainframe operating system line, starting with OS/360, and is still used by its successors, including the current z/OS. Those systems historically preferred this term over a file. A dataset is typically stored on direct access storage device (DASD) or magnetic tape, however unit record devices are also supported.

Datasets are not unstructured streams of bytes, but rather are organized in various logical record and block structures determined by the DSORG (data set organization), RECFM (record format), and other parameters. These parameters are specified at the time of the data set allocation (creation), for example with the Job Control Language DD statements. Inside a job they are stored in the Data Control Block (DCB), which is a data structure used to access datasets, for example using access methods.


Dataset organization

OS/360, the DCB's DSORG parameter specifies how the dataset is organized. It may be physically sequential ("PS"), indexed sequential ("IS"), partitioned ("PO"), or Direct Access ("DA"). Datasets on tape may only be DSORG=PS. The choice of organization depends on how the data is to be accessed, and in particular, how it is to be updated.

Programmers utilize various access methods (such as QSAM or VSAM) in programs reading and writing data sets, their choice depending on given data set organization.

Record format (RECFM)

Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the dataset. This is specified in the DCB RECFM parameter. RECFM=F means that the records are of fixed length, specified via the LRECL parameter, and RECFM=V specifies a variable-length record. V records when stored on media are prefixed by a Record Descriptor Word (RDW) containing the integer length of the record in bytes. With RECFM=FB and RECFM=VB, multiple logical records are grouped together into a single physical block on tape or disk. FB and VB are fixed-blocked, and variable-blocked, respectively. The BLKSIZE parameter specifies the maximum length of the block. RECFM=FBS could be also specified, meaning fixed-blocked standard, meaning the all blocks except the last one were required to be in full BLKSIZE length. RECFM=VBS, or variable-blocked spanned, means a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one.

This mechanism eliminates the need for using any "delimiter" byte value to separate records. Thus data can be of any type, including binary integers, floating point, or characters, without introducing a false end-of-record condition. The data set is an abstraction of a collection of records, in contrast to files as unstructured streams of bytes.

Partitioned datasets

For example, a PDS or Partitioned Data Set is a dataset containing multiple members, each of which holds a separate sub-data set, similar to a directory in other types of file systems. This type of dataset is often used to hold executable programs (load modules), source program libraries (especially Assembler macro definitions). A PDS may be compared to a Zip file on microcomputers, except the files stored in a PDS are not compressed.

The Partitioned Data Set can only allocate on a single volume with the maximum size of 65536 tracks.

Besides members, a PDS consists also of their directory. Each member can be accessed directly using the directory structure. Once a member is located, the data stored in that member is handled in the same manner as a PS (sequential) data set.

Whenever a member is deleted, the space it occupied is unusable for storing other data. Likewise, if a member is re-written, it is stored in a new spot at the back of the PDS and leaves wasted “dead” space in the middle. The only way to recover “dead” space is to perform frequent file compression, that moves all members to the front of the data space and leaves free usable space at the back. (Note that in modern parlance, this kind of operation might be called defragmentation or garbage collection; data compression nowadays refers to a different, more complicated concept.) PDS files can only reside on disk in order to use the directory structure to access individual members, not on tape. They are most often used for storing multiple JCL files, utility control statements and executable modules.

An improvement of this scheme is a Partitioned Data Set Extended (PDSE or PDS/E, sometimes just libraries) introduced with MVS/XA system.

PDS/E structure is similar to PDS and is used to store the same types of data. However, PDS/E files have a better directory structure which does not require pre-allocation of directory blocks when the PDS/E is defined (and therefore does not run out of directory blocks if not enough were specified). Also, PDS/E automatically stores members in such a way that compression operation is not needed to reclaim "dead" space. PDS/E files can only reside on disk in order to use the directory structure to access individual members.

See also


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Data set — For IBM mainframe term for a file, see Data set (IBM mainframe). A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data… …   Wikipedia

  • IBM mainframe utility programs — are supplied with IBM mainframe operating systems such as MVS to carry out various tasks associated with datasets, etc.History/Common JCLMany of these utilities were designed by IBM users, through the group SHARE, and then developed by IBM or… …   Wikipedia

  • IBM mainframe — IBM mainframes, though perceived as synonymous with mainframe computers in general due to their marketshare, are now technically and specifically IBM s line of business computers that can all trace their design evolution to the IBM… …   Wikipedia

  • History of IBM mainframe operating systems — The history of operating systems running on IBM mainframes is a notable chapter of history of mainframe operating systems, because of IBM s long standing position as the world s largest hardware supplier of mainframe computers.Arguably the… …   Wikipedia

  • IBM DB2 — Developer(s) IBM Initial release 1983 (1983) …   Wikipedia

  • IBM WebSphere MQ — is a family of network software products launched by IBM in March 1992. It was previously known as MQSeries, a trademark that IBM rebranded in 2002 to join the suite of WebSphere products. WebSphere MQ, which is often referred to simply as MQ by… …   Wikipedia

  • IBM System Management Facilities — IBM SMF is a component of IBM s z/OS for mainframe computers, providing a standardised method for writing out records of activity to a file (or data set to use a z/OS term). SMF provides full instrumentation of all baseline activities running on… …   Wikipedia

  • Mainframe computer — For other uses, see Mainframe (disambiguation). An IBM 704 mainframe (1964) Mainframes (often colloquially referred to as big iron [1]) are powerful computers used primarily by corporate and governmental organizations for critical applications,… …   Wikipedia

  • IBM System/36 — The IBM System/36 was a minicomputer marketed by IBM from 1983 to 2000. It was a multi user, multi tasking successor to the System/34. Like the System/34 and the older System/32, the System/36 was primarily programmed in the RPG II language. One… …   Wikipedia

  • IBM 700/7000 series — The IBM 700/7000 series was a series of large scale (mainframe) computer systems made by IBM through the 1950s and early 1960s. The series included several different, incompatible processor architectures. The 700s used vacuum tube logic and were… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”