Saturday, October 31, 2009

Sub::Call::Tail

I've just released Sub::Call::Tail which allows for a much more natural tail call syntax than Perl's goto built in.

It provides a tail keyword that modifies normal invocations to behave like goto &sub, without needing the ugly @_ manipulation.

Instead of this horrible kludge:

@_ = ( $foo, $bar );
goto &foo;

You can now write:

tail foo($foo, $bar);

And much more importantly this method call emulation atrocity:

@_ = ( $object, $foo, $bar );
goto $object->can("foo");

Can now be written as:

tail $object->foo($foo, $bar);

Finally we can write infinitely tail recursive and CPS code with a constant stack space, without the syntactic letdown that is goto. Lambdacamels rejoice!

Thanks so much to Zefram for his numerous tests and contributions.

Wednesday, October 28, 2009

Versioned site_lib

Today I wanted to install a simple module on a production machine. I used the CPAN utility, as usual. Unfortunately that also pulled in an upgraded dependency which was not backwards compatible, breaking the application.

I hate yak shaving.

But not nearly as much as I hate surprise yak shaving.

I want to fix compatibility problems in my development environment on my own time, not hastily on a live server.

I wrote a small module to address this problem. To set it up run git site-perl-init. This will initialize the .git directory and configure CPAN to wrap make install and ./Build install with a helper script.

The wrapper will invoke the installation command normally, and then commit any changes to installsitelib with the distribution name as the commit message. This will happen automatically every time CPAN tells a module to install itself.

The approach is very simplistic; it does not version manpages or the bin directory, nor does it work with local::lib or CPANPLUS (at least not yet).

It is just enough to let me run git reset --hard "master@{1 hour ago}" to instantly go back to a working setup.

Friday, October 23, 2009

Authenticated Encryption

One thing that makes me cringe is when people randomly invent their own cryptographic protocols. There's a Google Tech Talk by Nate Lawson where he explains some surprising approaches to attacking a cryptographic algorithm. It illustrates why rolling your own is probably a bad idea ;-)

Perhaps the most NIH cryptographic protocol I've seen is digitally signing as well as encrypting a message, in order to store tamper resistant data without revealing its contents. This is often done for storing sensitive data in cookies.

Obviously such a protocol can be built using HMACs and ciphers, but high level tools are already available, ones that have already been designed and analyzed by people who actually know what they're doing: authenticated encryption modes of operation.

WTF is a cipher mode?

Block ciphers are the sort of like hash functions, they take a block of data and scramble the block.

Simply encrypting your data blocks one by one is not a good way of securing it though. Wikipedia has a striking example:

Even though every pixel is encrypted, the data as a whole still reveals a lot.

Suffice it to say that blocks of operation are a wrapper that takes a low level scrambling function, the block cipher, and provide a less error prone tool, one that is more difficult to misuse.

On the CPAN

Crypt::CBC and Crypt::Ctr are implementations of some of the more classic cipher modes. But this post is ranting about people not using authenticated modes.

Crypt::GCM and Crypt::EAX implement two different AEAD modes of operation using any block cipher.

These are carefully designed and analyzed algorithms, and the CPAN implementations make use of the tests from the articles describing the algorithms, so it sure beats rolling your own.

Secondly, Crypt::Util provides a convenience layer that builds on these tools (and many others), so perhaps Crypt::Util already handles what you want.

To tamper protect a simple data structure you can do something like this:

my $cu = Crypt::Util->new( key => ... );

my $ciphertext = $cu->tamper_proof_data( data = { ... }, encrypt => 1 );

Crypt::Util will use Storable to encode the data into a string, and then use an authenticated encryption mode to produce the ciphertext.

To decrypt, simply do:

my $data = $c->thaw_tamper_proof( string => $ciphertext );

Crypt::Util will decrypt and validate the ciphertext, and only after it's sure that the data is trusted it'll start unpacking the data, and if appropriate using Storable to deserialize the message. All allocations based on untrusted data are limited to 64KiB.

Don't sue me

I'm not saying that the CPAN code is guaranteed to be safe. I'm saying this is a better idea than rolling your own. If your application is sensitive you have no excuse not to open up the code and audit it.

Friday, October 16, 2009

Event driven PSGI

I spent most of today and yesterday bikeshedding event driven PSGI with miyagawa on #http-engine.

We seem to have converged on something that is both fairly portable to different event driven implementations, without being too yucky for blocking backends.

For example, if you don't yet know the response code or headers and are waiting on some other event driven thing, it's sort of like continuation passing style:

$app = sub {
    my $env = shift;

    ...

    return sub {
        my $write = shift;

        $some_event_thingy->do_your_thing( when_finished => sub {
            $write->([ 200, $headers, $body ]);
        });
    };
};

A more complex example involves streaming:

$app = sub {
    my $env = shift;

    ...

    return sub {
        my $write = shift;

        my $out = $write->([ 200, $headers ]);

        $some_event_thingy->new_data(sub {
            my $data = shift;

            if ( defined $data ) {
                $out->write($data);
            } else {
                $out->close;
            }
        });
    };
};

Lastly, if you are worried about too much memory usage in the output buffer, you can provide a callback to poll_cb:

$app = sub {
    my $env = shift;

    ...

    return sub {
        my $write = shift;

        $write->([ 200, $headers ])->poll_cb(sub {
            my $out = shift;

            $out->write($some_more);

            $out->close() if $finished;
 });
    };
};

But poll_cb should only be used on event driven backends (check for it using $out->can("poll_cb")).

This lets simple streaming applications will work nicely under blocking backends as well as event driven ones.

Even better, while I was busy implementing this this for the AnyEvent backend frodwith whipped up a POE implementation in no time at all.

This pretty much obsoletes my IO::Writer sketch. The only case it doesn't cover but which IO::Writer theoretically does is poll_cb based nonblocking output, combined with a non blocking data source, but without an event driven environment. This sucks because nonblocking IO without an event loop wastes a lot of CPU. I can't imagine why anyone would actually try that, so I hereby declare IO::Writer deprecated, thankfully before I actually wrote a robust implementation ;-)

Thursday, October 8, 2009

Roles and Delegates and Refactoring

Ovid writes about the distinction between responsibility and behavior, and what that means in the context of roles.

He argues that the responsibilities of a class may sometimes lie in tangent with additional behaviors it performs (and that these behaviors are often also in tangent with one another).

Since roles lend themselves to more horizontal code reuse (what multiple inheritance tries to allow but fails to do safely), he makes the case that they are they are more appropriate for loosely related behaviors.

I agree. However, roles only facilitate the detection of a flawed taxonomy, which under multiple inheritance seems to work. They can often validate a sensible design, but they don't provide a solution for a flawed one.

If you take a working multiple inheritance based design and change every base class into a role, it will still work. Roles will produce errors for ambiguities, but if the design makes sense there shouldn't be many of those to begin with. The fundamental structure of the code hasn't actually changed with the migration to roles.

Roles do not in their own right prevent god objects from forming. Unfortunately that has not yet been automated ;-)

Another Tool

Wikipedia defines Delegation as:

a technique where an object outwardly expresses certain behaviour but in reality delegates responsibility for implementing that behavior to an associated object

instead of merging the behavior into the consuming class (using roles or inheritence), the class uses a helper object to implement that behavior, and doesn't worry about the details.

Roles help you find out you have a problem, but delegates help you to fix it.

Delegation by Example

A simple but practical example of how to refactor a class that mixes two behaviors is Test::Builder:

  • It provides an API to easily generate TAP output
  • It provides a way to share a TAP generator between the various Test:: modules on the CPAN, using the singleton pattern.

Test::Builder's documentation says:

Since you only run one test per program new always returns the same Test::Builder object.

The problem is that the assumption that you will only generate one stream of TAP per program hasn't got much to do with the problem of generating valid TAP data.

That assumption makes it simpler to generate TAP output from a variety of loosely related modules designed to be run with Test::Harness, but it is limiting if you want to generate TAP in some other scenario.[1]

With a delegate based design the task of obtaining the appropriate TAP generation helper and the task of generating TAP output would be managed by two separate objects, where the TAP generator is oblivious to the way it is being used.

In this model Test::Builder is just the singletony bits, and it uses a TAP generation helper. It would still have the same API as it does now, but a hypothetical TAP::Generator object would generate the actual TAP stream.

The core idea is to separate the behaviors and responsibilities even more, not just into roles, but into different objects altogether.

Though this does makes taxonomical inquiries like isa and does a little more roundabout, it allows a lot more flexibility when weaving together a complex system from simple parts, and encourages reuse and refactoring by making polymorphism and duck typing easy.

If you want to use TAP for something other than testing Perl modules, you could do this without hacking around the singleton crap.

Delegating with Moose

Moose has strong support for delegation. I love this, because it means that convincing people to use delegation is much easier than it was before, since it's no longer tedius and doesn't need to involve AUTOLOAD.

To specify a delegation, you declare an attribute and use the handles option:

has tap_generator => (
    isa => "TAP::Generator",
    is  => "ro",
    handles => [qw(plan ok done_testing ...)],
);

Roles play a key part in making delegation even easier to use. This is because roles dramatically decrease the burden of maintenance and refactoring, for all the reasons that Ovid often cites.

When refactoring role based code to use delegation, you can simply replace your use of the role with an attribute:

has tap_generator => (
    does => "TAP::Generator",
    is   => "ro",
    handles => "TAP::Generator",
);

This will automatically proxy all of the methods of the TAP::Genrator role to the tap_generator attribute.[2]

Moose's handles parameter to attributes has many more features which are covered in the Delegation section of the manual.

A Metaclass Approach to ORMs

Ovid's example for roles implementing a separate behavior involves a simple ORM. It involves a Server class, which in order to behave appropriately needs some of its attributes stored persistently (there isn't much value in a server management system that can't store information permanently).

He proposes the following:

class Server does SomeORM {
   has IPAddress $.ip_address is persisted;
   has Str       $.name       is persisted;

   method restart(Bool $nice=True) {
       say $nice ?? 'yes' !! 'no';
   }
}

But I think this confuses the notion of a class level behavior with a metaclass level behavior.

The annotation is persisted is on the same level as the annotation IPAddress or Str, it is something belonging to the meta attribute.

Metaclasses as Delegates

A metaclass is an object that represents a class. In a sense it could be considered a delegate of the compiler or language runtime. In the case of Moose this is a bit of a stretch (since the metaclass is not exactly authoritative as far as Perl is concerned, the symbol table is).

Conceptually it still holds though. The metaclass is responsible for reflecting as well as specifying the definition of a single class. The clear separation of that single responsibility is the key here. The metaclass is delegated to by the sugar layer that uses it, and indirectly by the runtime that invokes methods on the class (since the metaclass is in control of the symbol table).

Furthermore, the metaclass itself delegates many of its behaviors. Accessor generation is the responsibility of the attribute meta object.

To frame the ORM example in these terms, we have several components:

  • persisted attributes, modeled by meta attribute delegate of the meta class object with an additional role for persistence[3]
  • the metaclass, which must also be modified for the additional persistence functionality (to make use of the attributes' extended interface)
  • an object construction helper that knows about the class it is constructing, as well as the database handle from which to get the data, but doesn't care about the actual problem domain.[4]
  • an object that models information about the problem domain being addressed (Server)

By separating the responsibilities of the business logic from database connectivity from class definition we get decoupled components that can be reused more easily, and which are less sensitive to changes in one another.

KiokuDB

Lastly, I'd like to mention that KiokuDB can be used to solve that Server problem far more simply. I promise I'm not saying that only on account of my vanity ;-)

The reason it's a simpler solution is that the Server class does not need to know that it is being persisted at all, and therefore does not need to accommodate a persistence layer. The KiokuDB handle would be asked to persist that object, and proceed to take it apart using reflection provided by the metaclass:

$dir->lookup($server_id);
$server->name("Pluto");
$dir->update($server);

This keeps the persistence behavior completely detached from the responsibilities of the server, which is to model a physical machine.

The problem of figuring out how to store fields in a database can be delegated to a completely independent part of the program, which is operates on Server via its metaclass, instead of being a tool that Server uses. The behavior or responsibility (depending on how you look at it) of storing data about servers in a database can be completely removed from the Server class, which is concerned solely with the shape of that data.

Summary

There is no silver bullet.

Roles are almost always better than multiple inheritance, but don't replace some of the uses of single inheritance.

Delegates provide even more structure than roles, and are usually best implemented using roles.

By leveraging both techniques at the class as well as the metaclass level you can often achieve dramatically simplified results.

Roles may help with code reuse, but the classes they create are still static (even runtime generated classes are still classes with a symbol table). Delegation allows components to be swapped and combined much more easily. When things get more complicated inversion of control goes even further, and the end result is usually both more flexible and simpler than only static role composition.

Secondly, and perhaps more importantly, delegates are not limited to single use[5]. You can have a list of delegates performing a responsibility together. Sartak's API Design talk at YAPC::Asia explained how Dist::Zilla uses a powerful combination of roles and plugin delegates, taking this even further.

At the bottom line, though, nothing can replace a well thought out design. Reducing your problem space is often the best way of finding a clean solution. What I like so much about delegates is that they encourage you to think about the real purpose of each and every component in the system.

Even the simple need for coming up with a name for each component can help you reach new understandings about the nature of the problem.

Delegation heavy code tends to force you to come up with many names because there are many small classes, but this shouldn't lead to Java hell. Roles can really help alleviate this (they sure beat interfaces), but even so, this is just code smell that points to an overly complex solution. If it feels too big it probably is. A bloated solution that hasn't been factored out to smaller parts is still a bloated solution.

Once you've taken the problem apart you can often figure out which parts are actually necessary. Allowing (and relying on) polymorphism should make things future proof without needing to implement everything up front. Just swap your simple delegate with a more complicated one when you need to.

[1] Fortunately Test::Builder provides an alternate constructor, create, that is precisely intended for this case.

[2] Note that this is currently flawed in Moose for several reasons: accessors are not delegated automatically, the ->does method on the delegator will not return true for roles of the delegate (specifically "RoleName" in this case, etc), but that's generally not a problem in practice (there are failing tests and no one has bothered to fix them yet).

[3] For clarity's sake we tend to call roles applied to meta objects traits, so in this case the Server class would be using the SomeORM and persistent class and attribute traits.

[4] in DBIC these responsibilities are actually carried out by the resultset, the result source, and the schema objects, a rich hierarchy of delegates in its own right.

[5] parameterized roles are a very powerful static abstraction that allows multiple compositions of a single role into a single consumer with different paremeters.

Monday, October 5, 2009

Are Filehandles Objects?

Perl has a very confusing set of behaviors for treating filehandles as objects.

ADHD Summary

Globs which contain open handles can be treated as objects, even if though they aren't blessed.

Always load IO::Handle and FileHandle, to allow the method syntax.

Whenever you are using filehandles, use the method syntax, regardless of whether it's a real handle or a fake one. A fake handle that works with the builtins needs to jump through some nasty hoops.

Whenever you are creating fake handles, the aforementioned hopps are that you should tie *{$self} (or return a tied glob from the *{} overload), so that the builtins will know to call your object's methods through your TIEHANDLE implementation.

The Long Story

There are several potentially overlapping data types that can be used to perform IO.

  • Unblessed globs containing IO objects
  • Blessed globs containing IO objects
  • Blessed IO objects
  • Objects resembling IO objects (i.e. $obj->can("print")), which aren't necessarily globs
  • Tied globs

and there are two ways to use these data types, but some types only support one method:

  • method calls ($fh->print("foo"))
  • builtins (print $fh, "foo")

Lastly, there are a number of built in classes (which are not loaded by default, but are in core):

When you open a standard filehandle:

use autodie;

open my $fh, "<", $filename;

the variable $fh contains an unblessed reference to a type glob. This type glob contains an IO reference in the IO slot, that is blessed into the class FileHandle by default.

The IO object is blessed even if FileHandle is not loaded.

If it is loaded, then both ways of using the handle are allowed:

# builtin
read $fh, my $var, 4096;

# as method
$fh->read(my $var, 4096);

These two forms behave similarly. One could even be led to believe that the first form is actually treated as indirect method syntax. This is unfortunately very far from the truth.

In the first form the read is executed directly. In the second form the read method would be invoked on the unblessed glob. However, instead of throwing the usual Can't call method "read" on unblessed reference error, Perl's method_common routine (which implements method dispatch) special cases globs with an IO slot to actually dispatch the method using the class of *{$fh}{IO}. In a way this is very similar to how autobox works.

If you remembered to use FileHandle then this should result in a successful dispatch to IO::Handle::read, which actually delegates to the read builtin:

sub read {
    @_ == 3 || @_ == 4 or croak 'usage: $io->read(BUF, LEN [, OFFSET])';
    read($_[0], $_[1], $_[2], $_[3] || 0);
}

When you create nonstandard IO objects, this breaks down:

{
    package MyHandle;

    sub read {
        my ( $self, undef, $length, $offset ) = @_;

        substr($_[1], $offset, $length) = ...;
    }
}

because now $myhandle->read(...) will work as expected, but read($myhandle, ...) will not. If it is a blessed glob the error will be read() on unopened filehandle, and for other data types the error will be Not a GLOB reference.

Tied Handles

Tied handles are very similar to built in handles, they contain an IO slot with a blessed object (this time the default is the FileHandle class, a subclass of IO::Handle that has a few additional methods like support for seeking.

The IO object is marked as a tied data structure so that the builtin opcodes will delegate to some object implementing the TIEHANDLE interface.

A method call on a tied object will therefore invoke the method on the FileHandle class, which will delegate to to the builtin, which will delegate to a method on the object.

Because of this, classes like IO::String employ a clever trick to allow the builtins to be used:

my $fh = IO::String->new($string);

# readline builtin calls getline method
while ( defined( my $line = <$fh> ) ) {
    ...
}

# print builtin calls print method
print $fh "foo";

IO::String::new ties the new handle to itself[1], and sets up the extra glue:

sub new
{
    my $class = shift;
    my $self = bless Symbol::gensym(), ref($class) || $class;
    tie *$self, $self;
    $self->open(@_);
    return $self;
}

The TIEHANDLE API is set up by aliasing symbols:

*READ   = \&read;

Effectively this reverses the the way it normally works. Instead of IO::Handle methods delegating to the builtin ops, the builtin ops delegate to the IO::String methods.

IO::Handle::Util provides a io_to_glob helper function which produces a tied unblessed glob that delegates to the methods of an IO object. This function is then used to implement *{} overloading. This allows non glob handles to automatically create a working glob as necessary, without needing to implement the tie kludge manually.

Conclusion

When working with standard or nonstandard handle types, method syntax always works (provided IO::Handle and FileHandle are loaded), but the builtin syntax only works for tied handles, so when using a filehandle I prefer the method syntax.

It also makes the silly print {$fh} idiom unnecessary, since direct method syntax isn't ambiguous.

Performance is a non issue, the extra overhead is nothing compared to PerlIO indirection, ties, and making the actual system calls.

However, when creating nonstandard IO objects, you should probably provide a tie fallback so that code that doesn't use method syntax will not die with strange errors (or worse, violate the encapsulation of your handle object and just work on the internal glob structure directly).

This is one of my least favourite parts in Perl. It's such a horrible cascade of kludges. In the spirit of Postel's Law, consistently using methods is the conservative thing to do (it works in all cases), and providing a tie based fallback is a way to be liberal about what you accept.

[1] Surprisingly this doesn't leak, even though weaken(tie(*$self, $self)) is not used. I suspect there is a special case that prevents the refcount increment if the object is tied to itself. See also Tie::Util

Thursday, October 1, 2009

IO::Handle::Util

My friend miyagawa has been championing PSGI and its reference implementation, Plack. This is something we've needed for a long time: a clean and simple way to respond to HTTP requests without the cruft of CGI and %ENV.

The PSGI specification requires the body of the response to be represented using an object similar to IO::Handle.

I've released IO::Handle::Util, a convenience package designed to make working with IO::Handle like objects easier.

The main package provides a number of utility functions for creating IO handles from various data structures, and for getting the data out of IO handles.

For example, if you have a string or array of strings that you would like to just pass back, you can use the io_from_any function:

my $io = io_from_any $body;

# then you can do standard operations to get the data out:
$io->getline;
$io->read(my $str, $length);

This function will sensibly coerce other things, like paths, and let already working handles pass through as is.

If you have an iterator callback that gets more data you can also use that:

my $io = io_from_getline sub {
    ...
    return $more_data; # or undef when you're done
};

This is not automated by io_from_any because you can also use a writer callback, or a callback that returns the whole handle (remember, this is not PSGI specific).

You can also go the other, taking IO handles and getting useful things out of them. io_to_array is pretty obvious, but you can also do something like:

use autodie;

open my $fh, ">", $file;

my $cb = io_to_write_cb $fh;

$cb->("blah\n");
$cb->("orz\n");

Many of the utility functions are based on IO::Handle::Iterator and IO::Handle::Prototype::Fallback, two classes which facilitate the creation of adhoc filehandles.

Hopefully this will make creating and working with IO handles a little quicker and easier.