Distributed operating system

A distributed operating system is the logical aggregation of operating system software over a collection of independent, networked, communicating, and spatially disseminated computational nodes.^[1] Individual system nodes each hold a discrete software subset of the global aggregate operating system. Each node-level software subset is a composition of two distinct provisioners of services.^[2]

The first is a ubiquitous minimal kernel, or microkernel, situated directly above each node’s hardware. The microkernel provides only the necessary mechanisms for a node's functionality. Second is a higher-level collection of system management components, providing all necessary policies for a node's individual and collaborative activities. This collection of management components exists immediately above the microkernel, and below any user applications or APIs that might reside at higher levels.^[3]

These two entities, the microkernel and the management components collection, work together. They support the global system’s goal of seamlessly integrating all network-connected resources and processing functionality into an efficient, available, and unified system.^[4] This seamless integration of individual nodes into a global system is referred to as transparency, or single system image; describing the illusion provided to users of the global system’s appearance as a singular and local computational entity.

1 Description
- 1.1 A system within a system
- 1.2 Distributed operating system essentials
2 Overview
3 Distributed computing models
- 3.1 The nature of distribution
- 3.2 Three basic distributions
4 Major design considerations
5 Historical perspectives
6 See also
7 References
8 External links

Description

A system within a system

Structure of monolithic kernel, microkernel and hybrid kernel-based operating systems

A Distributed operating system is an operating system. This statement may be trivial, but it is not always overt and obvious because the distributed operating system is such an integral part of the distributed system. This idea is synonymous to the consideration of a square. A square might not immediately be recognized as a rectangle. Although possessing all requisite attributes defining a rectangle, a square’s additional attributes and specific configuration provide a disguise. At its core, the distributed operating system provides only the essential services and minimal functionality required of an operating system, but its additional attributes and particular configuration make it different. The Distributed operating system fulfills its role as operating system; and does so in a manner indistinguishable from a centralized, monolithic operating system. That is, although distributed in nature, it supports a single system image through the implementation of transparency; or more simply said, the system’s appearance as a singular, local entity.

Distributed operating system essentials

A distributed operating system, illustrated in a similar fashion, would be a container suggesting minimal operating system functionality and scope. This container would completely cover all disseminated hardware resources, defining the system-level. The container would extend across the system, supporting a layer of modular software components existing in the user-level. These software components supplement the distributed operating system with a configurable set of added services, usually integrated within the monolithic operating system (and the system-level). This division of minimal system-level function from additional user-level modular services provides a “separation of mechanism and policy.” Mechanism and policy can be simply interpreted as "how something is done" versus "why something is done," respectively. Achieving this separation allows for an exceptionally loosely coupled, flexible, and scalable distributed operating system.

Overview

The kernel

The kernel is a minimal, but complete set of node-level utilities necessary for access to a node’s underlying hardware and resources. These mechanisms provide the complete set of “building-blocks” essential for node operation; mainly low-level allocation, management, and disposition of a node’s resources, processes, communication, and I/O management support functions.^[5] These functions are made possible by exposing a concise, yet comprehensive array of primitive mechanisms and services. The kernel is arguably the primary consideration in a distributed operating system; however, within the kernel, the subject of foremost importance is that of a well-structured and highly efficient communications sub-system.^[3]

In a distributed operating system, the kernel is often defined by a relatively minimal architecture. A kernel of this design is referred to as a Microkernel.^[6]^[7] The microkernel usually contains only the mechanisms and services which, if otherwise removed, would render a node or the global system functionally inoperable. The minimal nature of the microkernel strongly enhances a distributed operating system’s modular potential.^[8] It is generally the case that the microkernel is implemented directly above its node’s hardware and resources; it is also common for a kernel to be identically replicated over all nodes in a system.^[9] The combination of a microkernel’s minimal design and ubiquitous node coverage enhances the global system's extensibility, and the ability to dynamically introduce new nodes or services.^[10]

General overview of system management components that reside above the microkernel.

System management components overview

System management components

A node’s system management components are a collection of software server processes that define the policies of a system node. These components are the composite of a node’s system software not directly required within the kernel. These software services support all of the needs of the node; namely communication, process and resource management, reliability, performance, and security to mention just a few. In this capacity, system management components compare directly to the centralized operating software of a single-entity system.^[3]

However, these system management components have additional challenges with respect to supporting a node's responsibilities to the global system. In addition, the system management components accept the defensive responsibilities of reliability, availability, and persistence inherent in the distributed operating system. Quite often, any effort to realize a high level of success in a particular area incites conflict with similar efforts in other areas. Therefore, a consistent approach, balanced perspective, and a deep understanding of the overall system and its goals can help mitigate some complexity, and assist in quickly identifying potential points of diminishing returns. This is an example of why the separation of policy and mechanism is so critical.^[10]

Working together as an operating system

The architecture and design of a distributed operating system is specifically aligned with realizing both individual node and global system goals. Any architecture or design must be approached in a manner consistent with separating policy and mechanism. In doing so, a distributed operating system attempts to provide a highly efficient and reliable distributed computing framework allowing for an absolute minimal user awareness of the underlying command and control efforts.^[8]

The multi-level collaboration between a kernel and the system management components, and in turn between the distinct nodes in a distributed operating system is the functional challenge of the distributed operating system. This is the point in the system that must maintain a perfect harmony of purpose, and simultaneously maintain a complete disconnect of intent from implementation. This challenge is the distributed operating system's opportunity, to produce the foundation and framework for a reliable, efficient, available, robust, extensible, and scalable system. However, this opportunity comes at a very high cost in complexity.

The price of complexity

In a distributed operating system, the exceptional degree of inherent complexity could easily render the entire system an anathema to any user. As such, the logical price of realizing a distributed operation system must be calculated in terms of overcoming vast amounts of complexity in many areas, and on many levels. This calculation includes the depth, breadth, and range of design investment and architectural planning required in achieving even the most modest implementation.^[11]

These design and development considerations are critical and unforgiving. For instance, a deep understanding of a distributed operating system’s overall architectural and design detail is required at an exceptionally early point.^[1] There are an exhaustive array of design considerations inherent in the development of a distributed operating system. Each of these design considerations can potentially affect many of the others to a significant degree. This leads to a massive effort in balanced approach, in terms of the individual design considerations, and many of their permutations. As an aid in this effort, most rely strongly on the immense amount of documented experience and research in distributed computing which exists, and continues even today.

Perspectives: past, present, and future

Research and experimentation efforts began in earnest in the 1970s and continued through 1990s, with focused interest peaking in the late 1980s. A number of distributed operating systems were introduced during this period; however, very few of these implementations achieved even modest commercial success.

The subject of distributed operating systems however, has a much richer historical perspective. This is especially evident when considering distributed operating system design issues severally, and with respect to some of the primordial strides taken towards their realization. There are several instances of fundamental and pioneering implementations of primitive distributed operating system component concepts dating back to the early 1950s.^[12]^[13]^[14] Some of these very early individual steps were not focused directly on distributed computing, and at the time, many may not have realized their important impact. These pioneering efforts laid important groundwork, and inspired continued research in areas related to distributed computing.

Beginning in the mid-1970s, many important research efforts produced extremely important advances in distributed computing. These breakthroughs provided a solid, stable foundation for the continued efforts through the 1990s, mentioned earlier. Considering the modern distributed operating system and its future, one must look no further than the current incredible challenges of many-core and multi-processor science. The accelerating proliferation of multi-processor and multi-core processor systems research has led to a resurgence of the distributed operating system concept. Many of these research efforts are investigating interesting, exciting, and plausible paradigms impacting the future of distributed computing.

Distributed computing models

The nature of distribution

General overview of centralized node organization. Notice the distinct single-level, master-slave relationships.

Generalized organization of nodes in a centralized model.

General overview of networked node organization. Notice the multi-level relationships within distinct partitions.

Generalized organization of nodes in a networked model.

General overview of distributed node organization. Notice the open relationships, without levels or partitions.

Generalized organization of nodes in a distributed model.

Node Organization in different computing models

The unique nature of the Distributed operating system is both subtle and complex. A distributed operating system’s hardware infrastructure elements are not centralized; that is, the elements do not have a tight proximity to one another at a single location. A given distributed operating system’s structure elements could reside in various rooms within a building, or in various buildings around the world. This geographically spatial dissemination defines its decentralization; however, the distributed operating system is distributed, not simply decentralized.

This distinction is the source of the subtlety and complexity. While decentralized systems and distributed operating systems are both spatially diverse, it is the specific manner of and relative degree in linkage between the elements, or nodes in the systems that differentiate the two. In the case of these two types of operating system, these linkages are the lines of communication between the nodes of the system.

Three basic distributions

To better illustrate this point, examine three system architectures; centralized, decentralized, and distributed. In this examination, consider three tightly related aspects of their structure: organization, connection, and control. Organization will describe a system's physical arrangement characteristics, connection will involve the associations among constituent structural entities, and control will correlate the manner, necessity, and rationale of the earlier two considerations.

Organization

Firstly, the subject of organization. A centralized system is organized most simply; basically one real level of structure, where all constituent elements are highly influenced by and are ultimately dependent upon this organization. The decentralized system is a more federated structure composed of multiple levels, where subsets of a system’s entities unite. These entity subsets in turn unite at higher levels, in the direction of and ultimately culminating at the central master element. The (purely) distributed system has no discernable concept of, or indeed any necessity for levels; it is purely an autonomous collection of discrete elements.

It is important to note that all of these systems are distributed, in that they comprise separate and distinct constituent elements connected together to form a system. This is a generic idea of the distributed organization of system elements; however, a distributed system is a quite specific entity unto itself. It is this distributed system concept that will be approached in the following sub-sections.

Connection

Association linkages between elements are the second consideration. In each case, physical association is inextricably linked (or disconnected), to conceptual organization. The centralized system has its constituent members directly united to a central master entity. One could conceptualize holding a bunch of balloons -- each on a string, -- with the hand being the central figure. A decentralized system (or network system) incorporates a single-step direct, or multi-step indirect path between any given constituent element and the central entity. This can be understood by thinking of a corporate organizational chart, the first level connecting directly, and lower levels connecting indirectly through successively higher levels (no lateral “dotted” lines). Finally, the distributed operating system has no inherent pattern; direct and indirect connections are possible between any two given elements of the system. Consider the 1970s phenomena of “string art” or a spirograph drawing as a fully connected system, and the spider’s web or the Interstate Highway System between U.S. cities as examples of a partially connected system.

Control

Notice, that the centralized and decentralized systems have distinctly directed flows of connection towards the central entity, while the distributed operating system is in no way influenced specifically by virtue of its organization. This is the pivotal notion of the third consideration. What correlations exist between a system’s organization, and its associations? In all three cases, it is an extremely delicate balance between the administration of processes, and the scope and extensibility of those processes. More simply said, it is about the control required or dictated by a particular organization and its collection of connections.

Notice that in the directed systems (centralized and decentralized) there is more control, therefore easing the administration of processes, but constraining their possible scope of influence. On the other hand, the distributed operating system — without directed connections — is much more difficult to control, but is effectively limited in extensible scope only by the capabilities of its individual, autonomous, and interdependent nodes. The associations of the distributed operating system conform only to the needs imposed by its many design considerations, and not in any way by organizational limitations.

Major design considerations

Transparency

Transparency, is the quality of a distributed operating system to be seen and understood as a single-system image, and is the greatest overriding consideration in the high-level conceptual design of a distributed operating system. While a simple concept, the consideration of transparency directly effects decision making in every aspect of design of a distributed operating system. Depending on the degree to which transparency is implemented into a system, certain requirements and/or restrictions may be imposed upon the many other design considerations, and their relationships.

Transparency allows a user to accomplish a system-related objective with absolute minimal knowledge of the particular internal details related to the objective. A system or application may expose as much, or as little transparency in a given area of functionality as deemed necessary. That is to say, the degree to which transparency is implemented can vary between subsets of functionality in a system or application. There are many specific areas of a system that can benefit from transparency; access, location, performance, naming, and migration to name a few.

For example, a distributed operating system may present access to a hard drive as "C:" and access to a DVD as "G:". The user does not require any knowledge of device drivers or methods of direct memory access techniques possibly used behind-the-scenes; both devices work the same way, from the user's perspective. This example demonstrates a high-level of transparency; and displays how low-level details are made somewhat "invisible" to the user through transparency. On the other hand, if a user desires to access another system or server, a host name or IP address may be required, along with a remote-machine user login and password. This would indicate a low-degree of transparency, as there is detailed knowledge required of the user in order to accomplish this task.

Generally, transparency and user-required knowledge form an inverse relation. As transparency is designed and implemented into various areas of a system, great care must be taken not to adversely affect other areas of transparency and other basic design concerns. Transparency, as a design concept, is one of the grand challenges in design of a distributed operating system; as it is a factor in the necessity for a complete upfront understanding.

Location transparency - Location transparency comprises two distinct sub-aspects of transparency, Naming transparency and User mobility. Naming transparency requires that nothing in the physical or logical references to any system entity should expose any indication of the entities location, or its local or remote relationship to the user. User mobility requires the consistent referencing of system entities, regardless of the system location from which the reference originates. Transparency dictates that the relative location of a system entity—either local or remote—must be both invisible to, and undetectable by the user. [DCD Pradeep, pg 20]

Access transparency - Local and remote system entities must remain indistinguishable when viewed through the user interface. The distributed operating system maintains this perception through the exposure of a single access mechanism for a system entity, regardless of that entity being local or remote to the user. Transparency dictates that any differences in methods of accessing any particular system entity—either local or remote—must be both invisible to, and undetectable by the user. [TLD, pg 84]

Migration transparency - Logical resources and physical processes migrated by the system, from one location to another in an attempt to maximize efficiency, reliability, availability, security, or whatever reason, should do so automatically controlled solely by the system. There are a myriad of possible reasons for migration; in any such event, the entire process of migration—before, during, and after—should occur without user knowledge or interaction. Transparency dictates that both the need for, and the execution of any system entity migration must be both invisible to, and undetectable by the user. [DOS, pg 24]

Replication transparency - A system's elements or components may need to be copied to strategic remote points in the system in an effort to possibly increase efficiencies through better proximity, or provide for improved reliability through the duplication of a back-up. This duplication of a system entity and its subsequent movement to a remote system location may occur for any number of possible reasons; in any event, the entire process—before, during, and after—should occur without user knowledge or interaction. Transparency dictates that both the necessity and execution of replication, as well as the existence of replicated entities throughout the system must be both invisible to, and undetectable by the user. [DCP, pg 16]

Concurrency transparency - The distributed operating system allows for simultaneous use of system resources by multiple users and processes, who are kept completely unaware of the concurrent usage. Transparency dictates that both the necessity for concurrency, and the multiplexed usage of system resources must be both invisible to, and undetectable by the user. [DSC, pg 23]

Failure transparency - In the event of a partial system failure, the system is responsible for the automatic, rapid, and accurate detection and orchestration of a remedy. These measures should exhibit minimal user imposition, and should initiate and execute without user knowledge or interaction. Transparency dictates that users and processes be exposed to absolute minimal imposition as a result of partial system failure; and any system-employed techniques of detection and recovery must be both invisible to, and undetectable by the user. [DSA, pg 30]

Performance Transparency - In any event where parts of the system experience significant delay or load imbalance, the system is responsible for the automatic, rapid, and accurate detection and orchestration of a remedy. These measures should exhibit minimal user imposition, and should initiate and execute without user knowledge or interaction. While reasonable and predictable performance are important goals in these situations, there should be no expressed or implied concepts of fairness or equality among affected users or processes. Transparency dictates that users and processes be exposed to absolute minimal imposition as a result of performance delay or load imbalance; and any system-employed techniques of detection and recovery must be both invisible to, and undetectable by the user. [DCD, pg 23]

Size/Scale transparency - A system's geographic reach, number of nodes, level of node capability, or any changes therein should exists without any required user knowledge or interaction. Transparency dictates that system and node composition, quality, or changes to either must be both invisible to, and undetectable by the user. [DCD, pg 23]

Revision transparency - System occasionally have need for system-software version changes and changes to internal implementation of system infrastructure. While a user may ultimately become aware of, or discover the availability of new system functions or services, their implementation should in no way be the prompt for this discovery. Transparency dictates that the implementation of system-software version changes and changes to internal system infrastructure must be both invisible to, and undetectable by the user; except as revealed by administrators of the system. [DSA, pg 30]

Control transparency - All system information, constants, properties, configuration settings, etc. should be completely consistent in appearance, connotation, and denotation to all users and software applications aware of them. [TLD, pg 84]

Data transparency - No system data-entity should expose itself as peculiar with respect its location or purpose in the system, as a result of user interaction. [TLD, pg 85]

Parallelism transparency - Arguably the most difficult aspect of transparency, and described by Tanenbaum as the "Holy grail" for distributed system designers. A system's parallel execution of a task among various processes throughout the system should occur without any required user knowledge or interaction. Transparency dictates that both the need for, and the execution of parallel processing must be both invisible to, and undetectable by the user. [DOS, pg 24]

Inter-process communication

Inter-Process Communication (IPC) is the implementation of general communication, process interaction, and dataflow between threads and/or processes both within a system node, and between all nodes in a distributed operating system. The distributed nature of a system's nodes and the multi-level considerations of intra-node and inter-node requirements provide the base-line for high-level IPC design considerations. However, IPC in a distributed operating system is a low-level implementation. IPC is the low-level critical complement to the high-level concept of transparency. Many of the requirements and restrictions imposed on a system as a result of transparency will be accomplished directly or indirectly through IPC. In this sense, IPC is the greatest underlying concept in the low-level design considerations of a distributed operating system.

Process management

Process management provides policies and mechanisms for effective and efficient sharing of a system's distributed processing resources between that system's distributed processes. These policies and mechanisms support operations involving the allocation and de-allocation of processes and ports to processors, as well as provisions to run, suspend, migrate, halt, or resume execution of processes. While these distributed operating system resources and the operations on them can be either local or remote with respect to each other, the distributed operating system must still maintain complete state of and synchronization over all processes in the system; and do so in a manner completely consistent from the user's unified system perspective.

As an example, load balancing is a common process management function. One consideration of load balancing is which process should be moved. The kernel may have several mechanisms, one of which might be priority-based choice. This mechanism in the kernel defines what can be done; in this case, choose a process based on some priority. The system management components would have policies implementing the decision making for this context. One of these policies would define what priority means, and how it is to be used to choose a process in this instance.

Resource management

Systems resources such as memory, files, devices, etc. are distributed throughout a system, and at any given moment, any of these nodes may have light to idle workloads. Load sharing and load balancing require many policy-oriented decisions, ranging from finding idle CPUs, when to move, and which to move. Many algorithms exist to aid in these decisions; however, this calls for a second level of decision making policy in choosing the algorithm best suited for the scenario, and the conditions surrounding the scenario.

Reliability

One of the basic tenets of distributed operating systems is a high-level of reliability. This quality attribute of a distributed operating system has become a staple expectation. Reliability is most often considered from the perspectives of availability and security of a system's hardware, services, and data. Issues arising from availability failures or security violations are considered faults. Faults are physical or logical defects that can cause errors in the system. For a system to be reliable, it must somehow overcome the adverse effects of faults.

There are three general methods for dealing with faults: fault avoidance, fault tolerance, and fault detection and recovery. Fault avoidance is considered to be the proactive measures taken to minimize the occurrence of faults. These proactive measures can be in the form of transactions, replicated resources and processes, and primary back-ups of complete servers. Fault tolerance is the ability of a system to continue some meaningful level of operation in the face of a fault. In the event a fault does occur, the system should detect the fault and have the capability to respond quickly and effectively to recover full functionality. In any event, any actions taken should make every effort to preserving the single system image.

Performance

Performance is arguably the quintessential computing concern, and in the distributed operating system, it is no different. Many benchmark metrics exist for performance; throughput, job completions per unit time, system utilization, etc. Each of these benchmarks are more meaningful in describing some scenarios, and less in others. With respect to a distributed operating system, this consideration most often distills to a balance between process parallelism and IPC. Managing the task granularity of parallelism in a sensible relation to the messages required for support is extremely effective. Also, identifying when it is more beneficial to migrate a process to its data, rather than copy the data, is effective as well. Many process and resource management algorithms, and algorithms in this space work to maximize performance.

Synchronization

Cooperating concurrent processes have an inherent need for synchronization. Three basic situations that define the scope of this need:

one or more processes must synchronize at a given point for one or more other processes to continue,
one or more processes must wait for an asynchronous condition in order to continue,
or a process must establish mutual exclusive access to a shared resource.

There are a multitude of algorithms available for these scenarios, and each have many variations. Unfortunately, whenever synchronization is required the opportunity for process deadlock usually exists.

Flexibility

Flexibility in a distributed operating system is made possible through the modular characteristics of the microkernel. With the microkernel presenting an absolute minimal—but complete—set of primitives and basic functionally cohesive services, The higher-level management components can be composed in a similar cohesive manner. This capability leads to exceptional flexibility in the management components collection; but more importantly, it allows the opportunity to dynamically swap, upgrade, or install additional instances of components above the kernel.

Historical perspectives

Pioneering inspirations

With a cursory glance around the internet, or a modest perusal of pertinent writings, one could very easily gain the notion that computer operating systems were a new phenomenon in the mid-twentieth century. In fact, important research in operating systems was being conducted at this time.^[15]^[16]^[17]^[18]^[19]^[20] While early exploration into operating systems took place in the years leading to 1950; shortly afterward, highly advanced research began on new systems to conquer new problems. In the first decade of the second-half of the 20th century, many new questions were asked, many new problems were identified, many solutions were developed and working for years, in controlled production environments.

Aboriginal distributed computing

The DYSEAC^[12] (1954)

One of the first solutions to these new questions was the DYSEAC, a self-described general-purpose synchronous computer; but at this point in history, exhibited signs of being much more than general-purpose^{[clarification needed]}. In one of the earliest publications of the Association for Computing Machinery, in April 1954, a researcher at the National Bureau of Standards – now the National Institute of Standards and Technology (NIST) – presented a detailed implementation design specification of the DYSEAC. Without carefully reading the entire specification, one could be misled by summary language in the introduction, as to the nature of this machine. The initial section of the introduction advises that major emphasis will be focused upon the requirements of the intended applications, and these applications would require flexible communication. However, suggesting the external devices could be typewriters, magnetic medium, and CRTs, and with the term “input-output operation” used more than once, could quickly limit any paradigm of this system to a complex centralized “ensemble.” Seemingly, saving the best for last, the author eventually describes the true nature of the system.

Finally, the external devices could even include other full-scale computers employing the same digital language as the DYSEAC. For example, the SEAC or other computers similar to it could be harnessed to the DYSEAC and by use of coordinated programs could be made to work together in mutual cooperation on a common task… Consequently[,] the computer can be used to coordinate the diverse activities of all the external devices into an effective ensemble operation.

—ALAN L. LEINER, System Specifications for the DYSEAC

While this more detailed description elevates the perception of the system, the best that can be distilled from this is some semblance of decentralized control. The avid reader, persevering in the investigation would get to a point at which the real nature of the system is divulged.

Each member of such an interconnected group of separate computers is free at any time to initiate and dispatch special control orders to any of its partners in the system. As a consequence, the supervisory control over the common task may initially be loosely distributed throughout the system and then temporarily concentrated in one computer, or even passed rapidly from one machine to the other as the need arises. …it should be noted that the various interruption facilities which have been described are based on mutual cooperation between the computer and the external devices subsidiary to it, and do not reflect merely a simple master-slave relationship.

—ALAN L. LEINER, System Specifications for the DYSEAC

This is one of the earliest examples of a computer with distributed control. Dept. of the Army reports^[21] show it was certified reliable and passed all acceptance tests in April 1954. It was completed and delivered on time, in May 1954. In addition, was it mentioned that this was a portable computer? It was housed in a tractor-trailer, and had 2 attendant vehicles and 6 tons of refrigeration capacity.

Multi-programming abstraction

The Lincoln TX-2^[13] (1957)

Described as an input-output system of experimental nature, the Lincoln TX-2 placed a premium on flexibility in its association of simultaneously operational input-output devices. The design of the TX-2 was modular, supporting a high degree of modification and expansion, as well as flexibility in operating and programming of its devices. The system employed The Multiple-Sequence Program Technique.

This technique allowed for multiple program counters to each associate with one of 32 possible sequences of program code. These explicitly prioritized sequences could be interleaved and executed concurrently, affecting not only the computation in process, but also the control flow of sequences and switching of devices as well. Much discussion ensues related to the complexity and sophistication in the sequence capabilities of devices.

Similar to the previous system, the TX-2 discussion has a distinct decentralized theme until it is revealed that efficiencies in system operation are gained when separate programmed devices are operated simultaneously. It is also stated that the full power of the central unit can be utilized by any device; and it may be used for as long as the device's situation requires. In this, we see the TX-2 as another example of a system exhibiting distributed control, its central unit not having dedicated control.

Memory access abstraction

Intercommunicating Cells, Basis for a Distributed Logic Computer^[14] (1962)

One early memory access paradigm was Intercommunicating Cells, where a cell is composed of a collection of memory elements. A memory element was basically an electronic flip-flop or relay, capable of two possible values. Within a cell there are two types of elements, symbol and cell elements. Each cell structure stores data in a string of symbols, consisting of a name and a set of associated parameters. Consequently, a system's information is linked through various associations of cells.

Intercommunicating Cells fundamentally break from tradition in that it has no counters or any concept of addressing memory. The theory contends that addressing is a wasteful and non-valuable level of indirection. Information is accessed in two ways, direct and cross-retrieval. Direct retrieval looks to a name and returns a parameter set. Cross-retrieval projects through parameter sets and returns a set of names containing the given subset of parameters. This would be similar to a modified hash table data structure that would allow for multiple values (parameters) for each key (name).

Cellular memory would have many advantages:
		A major portion of a system's logic is distributed within the associations of information stored in the cells,
		This flow of information association is somewhat guided by the act of storing and retrieving,
		The time required for storage and retrieval is mostly constant and completely unrelated to the size and fill-factor of the memory
		Cells are logically indistinguishable, making them both flexible to use and relatively simple to extend in size

This early research into alternative memory describes a configuration ideal for the distributed operating system. The constant-time projection through memory for storing and retrieval would be inherently atomic and exclusive. The cellular memory's intrinsic distributed characteristics would be an invaluable benefit; however, the impact on the user, hardware/device, or Application programming interfaces is uncertain. It is distinctly obvious that these early researchers had a distributed system concept in mind, as they state:

We wanted to present here the basic ideas of a distributed logic system with... the macroscopic concept of logical design, away from scanning, from searching, from addressing, and from counting, is equally important. We must, at all cost, free ourselves from the burdens of detailed local problems which only befit a machine low on the evolutionary scale of machines.

—Chung-Yeol (C. Y.) Lee, Intercommunicating Cells, Basis for a Distributed Logic Computer

Foundational work

Coherent memory abstraction

Algorithms for scalable synchronization on shared-memory multiprocessors ^[22]

File System abstraction

Measurements of a distributed file system^[23]
Memory coherence in shared virtual memory systems ^[24]

Transaction abstraction

Transactions
Sagas ^[25]

Transactional Memory
Composable memory transactions^[26]
Transactional memory: architectural support for lock-free data structures ^[27]
Software transactional memory for dynamic-sized data structures^[28]
Software transactional memory^[29]

Persistence abstraction

OceanStore: an architecture for global-scale persistent storage ^[30]

Coordinator abstraction

Weighted voting for replicated data ^[31]
Consensus in the presence of partial synchrony ^[32]

Reliability abstraction

Sanity checks
The Byzantine Generals Problem ^[33]
Fail-stop processors: an approach to designing fault-tolerant computing systems ^[34]

Recoverability
Distributed snapshots: determining global states of distributed systems^[35]
Optimistic recovery in distributed systems ^[36]

Current research

Replicated model extended to a component object model

Architectural Design of E1 Distributed Operating System^[37]
The Cronus distributed operating system^[38]
Design and development of MINIX distributed operating system^[39]

Future directions

Complexity/Trust exposure through accepted responsibility

Scale and performance in the Denali isolation kernel.^[40]

Multi/Many-core focused systems

The multikernel: a new OS architecture for scalable multicore systems.^[41]

Corey: an Operating System for Many Cores.^[42]

Distributed processing over extremes in heterogeneity

Helios: heterogeneous multiprocessing with satellite kernels.^[43]

Effective and stable in multiple levels of complexity

Tessellation: Space-Time Partitioning in a Manycore Client OS.^[44]

References

^ ^a ^b Tanenbaum, Andrew S. 1993 Distributed operating systems anno 1992. What have we learned so far? Distributed Systems Engineering, 1, 1 (1993), 3-10
^ Nutt, G. J. 1992 Centralized and Distributed Operating Systems. Prentice Hall Press.
^ ^a ^b ^c Distributed Operating Systems: The Logical Design, 1st edition Goscinski, A. 1991 Distributed Operating Systems: the Logical Design. 1st. Addison-Wesley Publishing Co., Inc.
^ Fortier, P. J. 1986 Design of Distributed Operating Systems: Concepts and Technology. Intertext Publications, Inc., McGraw-Hill, Inc.
^ P. Brinch Hansen, Ed. 2000 Classic Operating Systems: from Batch Processing to Distributed Systems. Springer-Verlag New York, Inc.
^ Using LOTOS for specifying the CHORUS distributed operating system kernel Pecheur, C. 1992. Using LOTOS for specifying the CHORUS distributed operating system kernel. Comput. Commun. 15, 2 (Mar. 1992), 93-102.
^ COOL: kernel support for object-oriented environments Habert, S. and Mosseri, L. 1990. COOL: kernel support for object-oriented environments. In Proceedings of the European Conference on Object-Oriented Programming on Object-Oriented Programming Systems, Languages, and Applications (Ottawa, Canada). OOPSLA/ECOOP '90. ACM, New York, NY, 269-275.
^ ^a ^b Distributed Operating Systems: Concepts and Design Sinha, P. K. 1996 Distributed Operating Systems: Concepts and Design. 1st. Wiley-IEEE Press.
^ Distributed Operating Systems Galli, D. L. 1999 Distributed Operating Systems: Concepts and Practice. 1st. Prentice Hall PTR.
^ ^a ^b Distributed Operating Systems and Algorithms Chow, R. and Chow, Y. 1997 Distributed Operating Systems and Algorithms. Addison-Wesley Longman Publishing Co., Inc.
^ Surajbali, B., Coulson, G., Greenwood, P., and Grace, P. 2007. Augmenting reflective middleware with an aspect orientation support layer. In Proceedings of the 6th international Workshop on Adaptive and Reflective Middleware: Held At the ACM/IFIP/USENIX international Middleware Conference (Newport Beach, CA, November 26–30, 2007). ARM '07. ACM, New York, NY, 1-6.
^ ^a ^b Leiner, A. L. 1954. System Specifications for the DYSEAC. J. ACM 1, 2 (Apr. 1954), 57-81.
^ ^a ^b Forgie, J. W. 1957. The Lincoln TX-2 input-output system. In Papers Presented At the February 26–28, 1957, Western Joint Computer Conference: Techniques For Reliability (Los Angeles, California, February 26–28, 1957). IRE-AIEE-ACM '57 (Western). ACM, New York, NY, 156-160.
^ ^a ^b Lee, C. Y. 1962. Intercommunicating cells, basis for a distributed logic computer. In Proceedings of the December 4–6, 1962, Fall Joint Computer Conference (Philadelphia, Pennsylvania, December 04 - 06, 1962). AFIPS '62 (Fall).
^ Dreyfuss, P. 1958. System design of the Gamma 60. In Proceedings of the May 6–8, 1958, Western Joint Computer Conference: Contrasts in Computers (Los Angeles, California, May 06 - 08, 1958). IRE-ACM-AIEE '58 (Western). ACM, New York, NY, 130-133.
^ Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1958. Organizing a network of computers to meet deadlines. In Papers and Discussions Presented At the December 9–13, 1957, Eastern Joint Computer Conference: Computers with Deadlines To Meet (Washington, D.C., December 09 - 13, 1957). IRE-ACM-AIEE '57
^ Leiner, A. L., Smith, J. L., Notz, W. A., and Weinberger, A. 1958. PILOT, the NBS multicomputer system. In Papers and Discussions Presented At the December 3–5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03 - 05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 71-75.
^ Bauer, W. F. 1958. Computer design from the programmer's viewpoint. In Papers and Discussions Presented At the December 3–5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03 - 05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 46-51.
^ Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1959. PILOT—A New Multiple Computer System. J. ACM 6, 3 (Jul. 1959), 313-335.
^ Estrin, G. 1960. Organization of computer systems: the fixed plus variable structure computer. In Papers Presented At the May 3–5, 1960, Western Joint IRE-AIEE-ACM Computer Conference (San Francisco, California, May 03 - 05, 1960). IRE-AIEE-ACM '60 (Western). ACM, New York, NY, 33-40.
^ Martin H. Weik, "A Third Survey of Domestic Electronic Digital Computing Systems," Ballistic Research Laboratories Report No. 1115, pg. 234-5, Aberdeen Proving Ground, Maryland, March 1961
^ Mellor-Crummey, J. M. and Scott, M. L. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21-65.
^ Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. Measurements of a distributed file system. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, October 13–16, 1991). SOSP '91. ACM, New York, NY, 198-212.
^ Li, K. and Hudak, P. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov. 1989), 321-359.
^ Garcia-Molina, H. and Salem, K. 1987. Sagas. In Proceedings of the 1987 ACM SIGMOD international Conference on Management of Data (San Francisco, California, United States, May 27–29, 1987). U. Dayal, Ed. SIGMOD '87. ACM, New York, NY, 249-259.
^ Harris, T., Marlow, S., Peyton-Jones, S., and Herlihy, M. 2005. Composable memory transactions. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Chicago, IL, USA, June 15–17, 2005). PPoPP '05. ACM, New York, NY, 48-60.
^ Herlihy, M. and Moss, J. E. 1993. Transactional memory: architectural support for lock-free data structures. In Proceedings of the 20th Annual international Symposium on Computer Architecture (San Diego, California, United States, May 16–19, 1993). ISCA '93. ACM, New York, NY, 289-300.
^ Herlihy, M., Luchangco, V., Moir, M., and Scherer, W. N. 2003. Software transactional memory for dynamic-sized data structures. In Proceedings of the Twenty-Second Annual Symposium on Principles of Distributed Computing (Boston, Massachusetts, July 13–16, 2003). PODC '03. ACM, New York, NY, 92-101.
^ Shavit, N. and Touitou, D. 1995. Software transactional memory. In Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing (Ottawa, Ontario, Canada, August 20–23, 1995). PODC '95. ACM, New York, NY, 204-213.
^ Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Wells, C., and Zhao, B. 2000. OceanStore: an architecture for global-scale persistent storage. In Proceedings of the Ninth international Conference on Architectural Support For Programming Languages and Operating Systems (Cambridge, Massachusetts, United States). ASPLOS-IX. ACM, New York, NY, 190-201.
^ Gifford, D. K. 1979. Weighted voting for replicated data. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, December 10–12, 1979). SOSP '79. ACM, New York, NY, 150-162
^ Dwork, C., Lynch, N., and Stockmeyer, L. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (Apr. 1988), 288-323.
^ Lamport, L., Shostak, R., and Pease, M. 1982. The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst. 4, 3 (Jul. 1982), 382-401.
^ Schlichting, R. D. and Schneider, F. B. 1983. Fail-stop processors: an approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222-238.
^ Chandy, K. M. and Lamport, L. 1985. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), 63-75.
^ Strom, R. and Yemini, S. 1985. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst. 3, 3
^ L.B. Ryzhyk, A.Y. Burtsev. Architectural design of E1 distributed operating system. System Research and Information Technologies international scientific and technical journal, October 2004, Kiev, Ukraine.
^ Vinter, S. T. and Schantz, R. E. 1986. The Cronus distributed operating system. In Proceedings of the 2nd Workshop on Making Distributed Systems Work (Amsterdam, Netherlands, September 08 - 10, 1986). EW 2. ACM, New York, NY, 1-3.
^ Ramesh, K. S. 1988. Design and development of MINIX distributed operating system. In Proceedings of the 1988 ACM Sixteenth Annual Conference on Computer Science (Atlanta, Georgia, United States). CSC '88. ACM, New York, NY, 685.
^ Whitaker, A., Shaw, M., and Gribble, S. D. 2002. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation
^ Baumann, A., Barham, P., Dagand, P., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., and Singhania, A. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11–14, 2009). SOSP '09.
^ S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Proceedings of the 2008 Symposium on Operating Systems Design and Implementation (OSDI), December 2008.
^ Nightingale, E. B., Hodson, O., McIlroy, R., Hawblitzel, C., and Hunt, G. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11–14, 2009). SOSP '09.
^ Rose Liu, Kevin Klues, and Sarah Bird, University of California at Berkeley; Steven Hofmeyr, Lawrence Berkeley National Laboratory; Krste Asanović and John Kubiatowicz, University of California at Berkeley. HotPar09.

External links

Categories:

Computer networks
History of computing
Operating systems

Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

Amoeba distributed operating system — Amoeba is an opensource microkernel based distributed operating system developed by Andrew S. Tanenbaum and others at the Vrije Universiteit. The aim of the Amoeba project is to build a timesharing system that makes an entire network of computers … Wikipedia
Operating system — Operating systems … Wikipedia
Distributed control system — Part of a series of articles on Industry Manufacturing methods Batch production • Job production Continuous production Improvement method … Wikipedia
Distributed file system — For other uses, see Distributed file system (disambiguation). In computing, a distributed file system or network file system is any file system that allows access to files from multiple hosts sharing via a computer network.[1] This makes it… … Wikipedia
Distributed transmission system — This article is about terrestrial broadcasting. For electrical power distribution, see distributed generation. In North American digital terrestrial television broadcasting, a distributed transmission system (DTS or DTx) is a form of single… … Wikipedia
Inferno (operating system) — Infobox OS name = Inferno caption = Inferno 4th Edition developer = Bell Labs / Vita Nuova Holdings source model = Open source kernel type = Virtual machine supported platforms = x86, MIPS, XScale, ARM, PowerPC, SPARC ui = family = Unix successor … Wikipedia
LOCUS (operating system) — Infobox OS name = LOCUS developer = UCLA family = Unix source model = Closed source working state = Historic kernel type = Monolithic kernel license = Proprietary LOCUS was a distributed operating system developed at UCLA during the 1980s. It was … Wikipedia
Operating System Embedded — Enea OSE Company / developer ENEA Programmed in Assembly, C, C++ OS family Real time embedded operating system … Wikipedia
operating system — Computers. the collection of software that directs a computer s operations, controlling and scheduling the execution of other programs, and managing storage, input/output, and communication resources. Abbr.: OS [1960 65] * * * ▪ computing… … Universalium
Web operating system — This article is about the concept. For the mobile operating system developed by Palm, see webOS. See also: Web desktop In metacomputing, WebOS and Web operating system are terms that describe network services for Internet scale distributed… … Wikipedia

Academic Dictionaries and Encyclopedias

Wikipedia