Monday, June 28, 2010

KiokuDB for DBIC Users

This is the top loaded tl;dr version of the previous post on KiokuDB+DBIC, optimized for current DBIx::Class users who are also KiokuDB non-believers ;-)

If you feel you know the answer to an <h2>, feel free to skip it.

WTF KiokuDB?

KiokuDB implements persistent object graphs. It works at the same layer as an ORM in that it maps between an in memory representation of objects and a persistent one.

Unlike an ORM, where the focus is to faithfully map between relational schemas and an object oriented representation, KiokuDB's main priority is to allow you to store objects freely with as few restrictions as possible.

KiokuDB provides a different trade-off than ORMs.

By compromising control over the precise storage details you gain the ability to easily store almost any data structure you can create in memory.[1].

Why should I care?

Here's a concrete example.

Suppose you have a web application with several types of browsable model objects (e.g. pictures, user profiles, whatever), all of which users can mark as favourites so they can quickly find them later.

In a relational schema you'd need to to query a link table for each possible type, and also take care of setting these up in the schema. When marking an item as a favourite you'd need to check what type it is, and add it to the correct relationship.

Every time you add a new item type you also need to edit the favourite management code to support that new item.

On the other hand, a KiokuDB::Set of items can simply contain a mixed set of items of any type. There's no setup or configuration, and you don't have to predeclare anything. This eliminates a lot of boilerplate.

Simply add a favourite_items KiokuDB column to the user, which contains that set, and use it like this:

# mark an item as a favourite
# $object can be a DBIC row or a KiokuDB object
$user->favourite_items->insert($object);
$user->update;

# get the list of favourites:
my @favs = $user->favourite_items->members;
 
# check if an item is a favourite:
if ( $user->favourite_items->includes($object) ) {
    ...
}

As a bonus, since there's less boilerplate this code can be more generic/reusable.

How do I use it?

First off, at least skim through KiokuDB::Tutorial to familiarize yourself with the basic usage.

In the context of this article you can think of KiokuDB as a DBIC component that adds OODBMs features to your relational schema, as a sort of auxiliary data dumpster.

To start mixing KiokuDB objects into your DBIC schema, create a column that can contain these objects using DBIx::Class::Schema::KiokuDB:

package MyApp::Schema::Result::Foo;
use base qw(DBIx::Class::Core);

__PACKAGE__->load_components(qw(KiokuDB));

__PACKAGE__->kiokudb_column('object');

See the documentation for the rest of the boilerplate, including how to get the $kiokudb handle used in the examples below.

In this column you can now store an object of any class. This is like a delegation based approach to a problem typically solved using something like DBIx::Class::DynamicSubclass.

my $rs = $schema->resultset("Foo");

my $row = $rs->find($primary_key);

$row->object( SomeClass->new( ... ) );

# 'store' is a convenience method, it's like insert_or_update
$row->object in KiokuDB

$row->store;

You can go the other way, too:

my $obj = SomeClass->new(
    some_delegate => $row,
);

my $id = $kiokudb->insert($obj);

And it even works for storing result sets:

use Foo;

my $rs = $schema->resultset("Foo")->search( ... );

my $obj = Foo->new(
    some_resultset => $rs,
);

my $id = $kiokudb->insert($obj);

So you can freely model ad-hoc relationships to your liking.

Mixing and matching KiokuDB and DBIC still lets you obsess over the storage details like you're used to with DBIC.

However, the key idea here is that you don't need to do that all the time.

For example, you can rapidly prototype a schema change before writing the full relational model for it in a final version.

Or maybe you need to preserve an intricate in memory data structure (like cycles, tied structures, or closures).

Or perhaps for some parts of the schema you simply don't need to search/sort/aggregate. You will probably discover parts of your schema are inherently a good fit for graph based storage.

KiokuDB complements DBIC well in all of those areas.

How is KiokuDB different?

There are two main things that traditional ORMs don't do easily, but that KiokuDB does.

First, collections of objects in KiokuDB can be heterogeneous.

At the representation level the lowest common denominator for any two arbitrary object might be nothing at all. This makes it hard to store objects of different types in the same relational table.

In object oriented design it's the interface that matters, not the representation. Conversely, in a relational database only the representation (the columns) matters, database rows have no interfaces.

Second, In an graph based object database the key of an object in the database should only be associated with a single object in memory, but in an ORM this feature isn't necessarily desirable:

  • It doesn't interact well with bulk fetches (for instance suppose a SELECT query fetches a collection of objects, some of which are already in memory. Should the fetched data be ignored? Should the primary keys of the already live objects be filtered out of the query?)
  • It requires additional APIs to control this tracking behavior (KiokuDB's new_scope stuff)

In the interests of flexibility and simplicity, DBIx::Class simply stays out of the way as far as managing inflated object (with one exception being result prefetched and cached resultsets). Whenever a query is is issued you're getting fresh every time.

KiokuDB does track references and provides a stable mapping between reference addresses and primary keys for the subset of objects that it manages.

What sucks about KiokuDB?

It's harder to search, sort and aggregate KiokuDB objects. But you already know a good ORM that can do those bits ;-)

By letting the storage layer in on your object representation you allow the database to help you in ways that it can't if the data is opaque.

Of course, this is precisely where it makes sense to just create a relational table, because DBIx::Class does those things very well.

Why now?

Previously you could use KiokuDB and DBIx::Class in the same application, but the data was kept separate.

Starting with KiokuDB::Backend::DBI version 1.11 you can store part of your model as relational data using DBIx::Class and rest in KiokuDB.

[1] You still get full control over serialization if you want, using KiokuDB::TypeMap, but that is completely optional, and most of the time there's no point in doing that anyway, you already know how to do that with other tools.

Sunday, June 27, 2010

KiokuDB 0.46

rafl and I have just uploaded KiokuDB::Backend::DBI version 1.11 and KiokuDB version 0.46.

These are major releases of both modules, and I will post at length on each of these new features in the coming days:

  • Caching live instances of immutable objects. For data models which favour immutability this should provide significant speedups with minimal code changes and no change in semantics.
  • Leak tracking is now in core. This was previously only available in Catalyst::Model::KiokuDB.
  • KiokuDB::Entry objects can be discarded after use to save memory (until now they were always kept around for as long as the object was still live)
  • Integration between KiokuDB managed objects and DBIx::Class managed rows, allowing for mixed relational/graph schemas as in this job queue example.

Friday, June 18, 2010

I hate software

A long standing bug in Directory::Transactional has finally been fixed.

Evidently, universally unique identifiers are only unique as long as the entire universe is contained within a single UNIX process, at least as far as e2fsprogs' libuuid is concerned.

These "unique" strings were used to create names for transaction work directories, so when they in fact turned out to be the same fucking strings across forks, the two processes would overwrite each others' private data.

uuid(3) doesn't even contain any information on how to reseed it even if I would bother checking for that myself.

I simply cannot fathom how a pseudorandom number generator is being used for such a library without taking forking into account. Isn't this stuff supposed to be reliable?