- Open Containers
OpenContainers (aka OC) is an open
C++ containers
library, similar to the C++ Standard Template Library (aka the C++ STL or STL) orBoost library .OpenContainers addresses
threading issues (see below) that the STL does not. The OC also has tools to make the C++ environment feel more like Python (much like some of the toolsin the Boost libraries).OpenContainers (OC) are freely distributable and freely modifiable.
History
OpenContainers derives directly from the containers libraries from Midas 2k, a
C++ Digital Signal Processing framework that was heavily thread-based.The interfaces of the OC are not compatible with the STL, as they predate the STL (or rather, predate the STL's ubiquity). The OC library represents a large effort for scalability work, and has been used successfully in applications that have hundreds of threads. Many of the OpenContainers assumptions are thread-based: see OpenContainers Assumptions below.Threading Issues
The current C++ STL standard is silent on issues related to threads, so the STL on a particular platform can yield very poor performance in a threaded environment. For example, threads can synchronize too often, serialize behind each other, etc.. See Library Assumptions for more discussion. OpenContainers has been written specifically to be used in a thread heavy environment. It also been written to be portable so that it can beused on multiple platforms.
OpenContainers Assumptions
# Portability. The OC has to work on a variety of platforms (Solaris,
IRIX ,Linux ,Tru64 , Windows): some that have STL, some that don't. More than that, the OC has to work the same on all platforms (performance, correctness and thread assumptions, see below) so that applications port quickly and easily. The STL is more portable and prevalent today, but there are still platforms that don't support the it. Also, different implementation of STL may not have the same assumptions across platforms (see 2-6 below).
# Thread-Safety: The OC containers are NOT thread-safe: this is on purpose. There was a time when ourC++ framework had thread-safety in most classes (Tables and strings especially) and it simply didn't scale well: Every operation would lock amutex when (most of the time) it didn't need to. There was also the collateral damage of forcing synchronization across platforms (syncingcache s). With a back of the envelope calculation of 10 cycles for synchronization and typical op of 20 cycles, this cost us 1.5x in performance. Tables, strings, etc. are all low-level data structures and synchronization (for performance reasons) should probably be done at a higher level [10] ("transactions" or at the component level).
# Re-entrancy (or "thread neutrality"). The OC containers ARE re-entrant (inasmuch as anything that uses the heap is re-entrant): two threads in different instances of the same class will not interfere with performance or correctness of another thread.
#Recursion : We avoid recursion for two main reasons. The first is to avoid overrunning the stack: this is even more critical in our applications because we allocate 100s of threads in the sameaddress space and we have to be judicious to not waste address space on each stack. The second reason is performance: An iterative solution tends to be faster than a recursive solution (because managing stack frames can be expensive if we aren't careful). The only class that uses recursion directly is the Val Serialize and Deserialize. NO other classes use "stack" recursion.
# Dynamic Memory Allocation: We don't avoid using theheap , but we avoid calling "malloc/new" excessively if we can. We tend to run on shared memory, multiple CPU machines. On those, theheap is a shared resource used by ALL CPUS in the same address space. The pressure on the heap will kill us: If 4 CPUs try to callmalloc all at once, they can get locked out or suffer performance penalties waiting to get in. In the earliest versions of malloc on most systems, there was a single lock that allowed only one thread in the heap at a time. Thus all mallocs would serialize behind each other, crippling a 4 CPU machine. Implementations of malloc got better of course, but they mitigate the problem rather than solve it. This one factor was CRITICAL in the scalability of our framework and spawned several techniques below.
## Use Stack space: Allocating stack space is trivial: a single add for the stack frame with NO THREAD LOCKING at all. Compare this to "malloc" which must be use some form of synchronization for allocation. This technique is very cheap and scales trivially. [This doesn't contradict (3) above: We only put small items in the current stack frame as opposed to potentially rampant stack usage of a recursive routine] .
## Use Lookaside Caches. Add another 32 bytes or so to an object and put information there instead of going to the heap. We use this technique extensively (in OCString, Val, and HashTableT). Note that this explicitly breaks rules from Herb Sutter's C++ Coding Standards for two extremely important reasons: Performance (by keeping "locality of reference" and avoiding extra levels of indirection via pointers/handles) andscalability (by avoiding theheap ). See the "ocval.h" code for more discussion of this technique.
## Use unions. With unions we can re-use memory that we already have and avoid theheap . (Val and OCString use this technique)
## Group Allocations. If we DO have to go to theheap , allocate a bunch of items at once so that we don't have to go back to the heap anytime soon. Most OC classes allocate internal items in groups of 8 or so. Note that two different instances of any class DO NOT share these groups: that would require thread-safety. Every instance has its own local list of chunks. This technique is for both scalability (stay out of the heap) and performance. All Tables use this technique.
# Inlined Code Only: All OC code is inline. This is partially to avoid a complex "bootstrap process" (we have to build tools that we need to build tools), but inline code avoids linkage issues (which can be problematic building with someone else's library).Code bloat can be an issue (it wasn't too much in our framework), but we canrefactor code if necessary because we have the source. Inline code tends to be the best single optimization compilers can do.Python Emulation Tools
The OpenContainers library also includes tools to make the
C++ experiencemore like the Python experience, with the feel ofdynamic typing using standarddatatypes and simple syntax. The key to this is the trinity of theVal, Tab and Arr datatypes. The Tab looks and feelslike Python dictionaries (or an STL map), Arrs look and feellike Python lists (or an STL vector), and Vals are the recursive, heterogeneouscontainers that make everything work together. See the examplebelow.----Figure 1.1: Example Tab t = "{ 'key1':1, 'key2':2 }"; // like Python dictionary Arr a = " [1, 2.2, 'three'] "; // like Python list Val v = t ["key1"] ; // lookup in a dictionary a.append(v); // append to end of a list
Val any = Tab(); // Heterogeneous container any [1] = t; /// .. contains a Tab any [2] = "hello"; /// .. contains a string any ['three'] = a; /// .. contains an Arr
For more examples, see the OpenContainers documentation.
OpenContainers also have a simple tool for pickling (Python'snomenclature for serializing data to disk, sockets, etc.) that is compatible with Python, so OC and Python can communicateeasily.
The
Boost library also has some similar Python functionality, butsacrifices simplicity for generality.External links
* [http://www.amalgama.us/oc.html Find the source]
* [http://www.open-std.org/jtc1/sc22/wg21/ C++ Standard Template Library Standard]
* [http://www.boost.org Boost Library]
Wikimedia Foundation. 2010.