Genezzo - Design and Architecture for Extensibility

Author: Jeffrey Cohen
Email: jcohen@genezzo.com
Project homepage: www.genezzo.com
$Revision: 2.0 $
$Date: 2007/06/08 05:03:21 $

Introduction

Genezzo is an open source SQL database which is written in Perl and runs on a variety of platforms. Genezzo is distinguished from other databases by its uniquely modular and extensible design, which is motivated by a series of operational and architectural goals.

Operational Goals

Scaleable: Cheap commodity hardware
Easy Manageability
- Portable
  - single code base
  - portable data files (no byte-order or numeric data type issues)
  - unicode
- simple installation
- self-tuning
- automatic scaleability
- web console
Reliable
- Transaction support (ACID)
- Zero downtime
  - Clustering for high-availability
  - Hot upgrade and hot backup

Architectural Goals

Openness
Flexibility
Extensibility
Support for parallel and distributed operations

The initial release of Genezzo is a very simple database microkernel. While the Genezzo prototype does not satisfy all of the operational goals, the design was guided by the architectural goals of flexibility and extensibility. Because of this emphasis, new capabilities can be added to the prototype to extend the basic features to fulfill the operational goals. Developers may also write packages that extend Genezzo to suit their particular needs.

Architectural Influences

Unix: Genezzo follows the idea of a simple kernel, with additional functionality as user packages.
Emacs: Like the venerable text editor, Genezzo supports "hooking" additional system functionality into the base code.
Aspect-Oriented Programming: While Genezzo isn't programmed in an aspect language, it does use declarative techniques (versus embedded code) to tie extension modules to the base code.
Subsumption Architectures: Robustness is maintained in Genezzo by enforcing simple, correct behaviors in the base code, and layering new behaviors on top.

Overview

Basic Design

Genezzo has a SQL parser and query execution mechanism plus persistent data storage. SQL tables are implemented as a hierarchy of perl hashes which map into fixed-size blocks of data (4K bytes by default). The blocksize is fixed when the database is created. Data blocks can be stored in conventional files or on raw devices. By default, Genezzo runs in a single, fixed-size data file, but it can be configured to automatically grow the filesize and allocate additional files as necessary. Files may be grouped together as tablespaces. A SQL table or index is only stored in a single tablespace. A table can have rows that are larger than a single block -- these rows are said to span multiple blocks.

Extensibility

The Havok subsystem is used to extend Genezzo. Havok extensions are organized into modules (similar to CPAN modules or Debian Apt-get packages) which can replace existing Genezzo functions or add new capabilities. Each Havok module has a metadata file which describes the module and its dependencies. Genezzo has a SQL function called HavokUse which uses these metadata files to load Havok modules. Currently, the database must be restarted after the module is loaded, but Havok will be extended to support dynamic load and unload for most modules.

Examples

SQL functions

Many SQL implementations allow user-defined functions, a way for users to add their own functions which can be evaluated as part of a SQL statement. In the Genezzo implementation of SQL, the only "built-in" SQL function is HavokUse. The rest of the standard SQL functions are defined in a Havok module which is loaded as part of database initialization.

Clustering

The most ambitious Havok module to date is Eric Rollins' Genezzo::Contrib::Clustered, which converts an existing single-user Genezzo database into a clustered, multi-process, multi-server database with locking and transaction support. This module is layered on top of the base code using only half a dozen "hooks" into the Genezzo buffer cache.

Space Management

All space management in Genezzo is done using fixed-size data blocks. The allocator usually returns a set of multiple, contiguous blocks which is referred to as an extent. In each file, all the extents for a particular database object, such as a table, are collectively called a segment. The base Genezzo space management is a simple, serial extent allocator which tracks a single highwater mark per segment. An extent is used until it runs out of space, at which point a new extent is allocated. Rather than replace the existing code, a new Havok module is under construction that adds more sophisticated allocation algorithms, like maintaining multiple lists of open extents for parallel updates, and more block usage statistics stored as metadata so empty extents can be re-used. This approach has several advantages:

Robustness: Since the old code is still running underneath, the new space management cannot overwrite or corrupt blocks that are still in use.
Hot Upgrade: The new space management algorithms can be applied to existing Genezzo databases, since they use block metadata versus some change to basic block or extent formats.
Reliable: If the new algorithms are disabled, the database continues to run using the default, less-sophisticated algorithm.
Flexible: Additional algorithms can be developed and tested easily.

Special Features for Extensibilty

Havok modules can take advantage of special features of Genezzo that let them modify persistent data structures, as well as dynamic data structures used by the running program.

Persistent State

Because Genezzo is a database, it is only natural that some Havok modules make changes to persistent state. Genezzo has a variety of extensible persistent storage mechanisms:

Dictionary Tables

File Headers

Block Metadata

Row Metadata

Dictionary Tables

When a Genezzo database is initialized, it creates about a dozen tables that form the data dictionary. These tables define and describe the database layout, the table defintions, etc. For example, the _pref1 table holds key/value pairs that are used for database initialization. Havok modules may add new parameters to the _pref1 table, query or modify other existing dictionary tables, or create, update, and query their own tables.

File Headers

Every Genezzo database file starts with a fileheader which contains key/value pairs that list the database version, the blocksize, and other essential information. The test Prefs1.t shows some of the API's which are used to query and update the fileheader. The fileheader parameters are useful because they can be queried and modified before the database is started up -- otherwise, _pref1 parameters are a better choice.

Block Metadata

The basic unit of storage in Genezzo is a fixed-size database block which is defined by the RDBlock class. RDBlock is implemented as a tied hash , so the contents of the database block are manipulated using the familiar perl hash interface. In addition to the standard interface, RDBlock supports special metadata entries. While these entries consume space in the block, they are inaccessible and invisible using the standard hash interface.

Metadata rows should only be used to store information that is directly related to the set of data rows in the current block. A dictionary table is a more appropriate location for "general" metadata about the contents of a table. For example, the cluster code uses metadata rows to track the transaction status of database blocks.

B-tree Indexes use the block metadata to define a block traversal order -- each block has metadata rows that point to its children and siblings in the tree.

Row Metadata

The Genezzo utilities module contains several functions for packing and unpacking SQL rows or arrays of data to and from a byte string storage format. The PackRow2 function has the provision to add a single metadata column to a row. Currently, this column is used to support rows that span multiple blocks -- the extra column is a chaining pointer that identifies the next piece of the row.

Dynamic State

Genezzo supports a number of extensions that allow for the execution of additional code at certain points in the program and for changes to run-time data structures:

Command-Line Parameters

Havok: User Functions and Hooks

Open Classes

Mailbag

Command-Line Parameters

The line-mode tool gendba.pl supports a -define parameter which takes a key=value pair as an argument. Some valid keys are dbsize, blocksize, and force_init_db. If the user supplies parameters with unknown keys during database initialization, Genezzo will automatically add these values to the _pref1 dictionary table. If the database is already initialized, _pref1 table will not be updated, but Havok modules can use a dictionary API to view the current command-line definitions.

Havok: User Functions and Hooks

The Havok subsystem loads additional modules that change and extend the behavior of Genezzo. Genezzo supplies two "top-level" Havok modules, UserFunctions.pm and SysHook.pm, which are designed to load specific subclasses of Havok modules. Currently, changes to Havok modules require a restart to take effect, but future versions of Havok will support dynamic loading when possible, so Havok capabilities can be enabled and disabled while the database is running.

The UserFunctions module lets developers add new SQL functions to Genezzo. It provides a basic mechanism to import Perl functions from other packages into the Genezzo namespace. Future versions of UserFunctions will support parse-time type-checking of function arguments. Genezzo uses this package to load all of the standard SQL functions.

The SysHook module lets developers replace existing system code or add new functions at well-defined locations in the base code. This functionality is similar to Emacs hooks or an aspect supplying advice at join points, though there are some crucial differences. In the case of Emacs, "normal" hooks do not take arguments, they typically ignore the return status of other hooks if multiple hooks are chained, and they are indifferent to their position in the chain. Most Genezzo hooks do take arguments, and they should note the error status of other hooks in the chain to avoid propagating errors and corrupting data. Also, the execution order of hooks is likely to be very important. For example, a chain of hooks that is activated after a buffer read might decompress the buffer and then decrypt its contents. The corresponding buffer write must be preceded by a set of hooks performing complementary operations: an encryption followed by compression. For the case of aspects, the typical notions of "cross-cutting concerns" are features like error handling or logging, which are common to multiple modules. For Genezzo SysHooks, however, a developer can construct a new module that binds hooks to several disparate, unrelated methods in multiple locations to define new functionality. The hooked routines in the base code become "friends" of a new SysHook class.

The current SysHook implementation has some deficiencies compared to aspect-oriented languages like AspectJ. In AspectJ, a developer can declare pre or post hooks on any function, but SysHook requires an if exists(hook) then &hook() code stub in the function. However, Damian Conway's Hook::LexWrap module does provide pre/post hook functionality on arbitrary perl functions, so SysHook may be adapted to use this technique. Also, an aspect declaration can use a regexp to match a set of functions, while in Genezzo, each hook must explictly declare the function name. It is feasible to extend Havok so it can examine the symbol table and use a regexp and/or SQL query-type mechanism to associate multiple functions with a hook.

Open Classes

In a similar vein to AspectJ inter-type declarations, Genezzo developers can add new members and methods to existing classes. Adding a new method is simple -- programmers can use the standard perl eval to create a new function in the namespace of a specified package. Since Genezzo classes are constructed from perl hashes, adding new members dynamically is trivial, so the only challenge is to use a consistent naming scheme. Genezzo already reserves a Contrib subdirectory for community contributions, e.g., the cluster code is stored in CPAN at Genezzo::Contrib::Clustered. Similarly, a Genezzo class will maintain a Contrib entry like:
$foo->{Contrib}.
New elements should be named according to the CPAN package: e.g.
$foo->{Contrib}->{Clustered}.
An alternate style is to use the full package name as the key:
$foo->{Contrib}->{Genezzo::Contrib::Clustered},
which is mainly intended for packages that are not under Contrib. With this approach, a Havok module can store private state which is associated with a particular instantiation of a class in the Genezzo base code. This state can serve as a communication channel, a method known as Stigmergy.

MailBag

While techniques like SysHook and Open Classes allow developers to change the overall behavior of a class, Genezzo also supports a special MailBag argument which lets developers craft changes which specifically target the class constructors in a particular control flow or dynamic scope.

Background

Perl function calls take a general list of arguments, similar to C varargs. With the exception of a limited function prototype syntax provided by the compiler, functions must perform their own checks on the validity of their function arguments. Most functions in Genezzo follow a standard Perl design pattern which uses the convention that the argument list is a set of named, position-independent values, e.g. the code
return table_func(tablename=>"foo", column=>5);
describes a call to the function table_func with the named arguments tablename and column taking the values "foo" and "5", respectively. Typically, some values are mandatory and some are optional. When it is desirable to pass values to a function which is deeply-nested beneath the caller, the function may follow the convention that "extra" arguments (that is, arguments that are not from the set of known mandatory and option arguments) are passed along to any functions beneath the current routine. This practice is quite flexible, but the resulting code is less comprehensible, since the knowledge of what the arguments are and where they are defined becomes obscured, and there is the potential for collision and ambiguity in the argument names. To mitigate these problems, Genezzo introduces the convention of a MailBag argument, an argument that contains multiple parameters for different recipients, which is specifically intended to be passed down the function call chain. Note that the Perl Aspect module has a Wormhole package which provides somewhat equivalent functionality.

Usage

In its current usage, the MailBag is utilized by class constructors and/or initializers, so it is only propagated along function call chains where new classes are instantiated. The Mailbag is "loaded" with messages which contain a named sender and intended recipient (using the Perl package names). A function can call the CheckMail method on the MailBag to see if it has any messages which match its address. The actual contents of the message and the associated recipient behaviors are open-ended, but some standards will probably evolve. The MailBag argument lets classes communicate with other classes that fall within their dynamic scope (as opposed to the more conventional notion of lexical scope).

Examples

The primary motivation for MailBag was the need to co-ordinate the actions of space management with block metadata. The buffer cache treats database blocks as raw byte buffers, but certain Havok modules, like Genezzo::Contrib::Clustered, need to update block metadata when blocks are read, written, or updated. The MailBag argument lets the buffer cache establish an association with the instance of an RDBlock class that is created for a each raw byte buffer.

Conclusion

Genezzo supports a variety of mechanisms which let developers construct novel extensions to the base functionality. The data formats are designed for long-term compatibility, while allowing the easy addition of future technologies and techniques as yet undeveloped. Its highly-adaptable design means that the basic architecture is suitable for simple, single-user installations or large clusters.