Wednesday, September 23, 2009

Testing for Race Conditions

Concurrent programming is hard for many reason. One of them is that scheduling isn't predictable; the same code may behave differently depending on the OS's scheduling decisions. This is more complicated on single CPU machines when there is no real concurrency, just the illusion of concurrency through time sharing.

Because many operations can happen in a single time slice (which is usually 10ms on most operating systems) unless the race condition falls on the time slice boundary it may go undetected in testing.

My laptop can do several thousand IO operations and many more thread mutex operations per second. Chances are only if one of these has a race condition it'll take very long to find it accidentally.

Since most concurrent code tries to minimize the duration of critical sections, the probability of critical sections interleaving with context switches can be very low. On a single CPU machine all the non blocking operations performed in a single time slice are atomic from the point of view of other processes. For a computer 10ms is a pretty long length of time.

Therefore, when testing for concurrency bugs it's important to introduce artificial contention to increase the chance of exposing a race condition. A fairly reliable method of doing that which I use in Directory::Transactional's test suite is introducing random delays into the process:

use Time::HiRes qw(sleep);
use autodie;

sub cede {
    sleep 0.01 if rand < 0.2;
}

for ( 1 .. $j ) {
    fork and next;

    srand $$;

    for ( 1 .. $n ) {
        # critical section
        cede();
        acquire_resource();
        cede();
        read_resource();
        cede();
        write_resource();
    }

    exit;
}

wait_all();

This is very easy to add only during testing by wrapping the resource access APIs in your codebase.

If you concurrently loop through the critical section long enough the probability of finding a race condition is much higher than without the delays that encourage the OS to perform context switching even on a single CPU machine.

This is by no means an exhaustive method of checking all possible context switch permutations. However, by increasing the probability of interleaved context switches from infinitesimally small to pretty good for a reasonable number of iterations we are much more likely to trigger any race condition (as opposed to all of them).

Furthermore, by designing volatile test transactions the chance of detecting this race condition is also quite good (since an exposed race condition would be more likely to cause inconsistency). Specifically in Directory::Transactional the test simulates money transfers between three accounts, where each iteration deducts a random number from a random account and adds it to another.

The values are read before a delay, and written afterwords, so any other process touching the same account would trigger a race condition if proper locking was not in place. The fact that the values are random too increases the chance of corruption being detectable (it's much more likely the accounts will not be balanced if the value is random as opposed to some constant).

Coupled with concurrent consistency checks on the data done at intervals this was a pretty effective method for detecting race conditions, quickly exposing initial implementation flaws in Directory::Transactional that took very many iterations to reproduce at first. Throwing in random kill -9 signals provided a fun test for journal replay recovery.

Friday, September 18, 2009

Method Style Callbacks

This is a small trick I like to use when defining callbacks. Instead of invoking callbacks as code references:

$callback->($data);

I always write the invocation as a method call:

$data->$callback();

or if $data is not a natural invocant for the routine, then I invoke it as a method on the object that is sending the notification:

$self->$callback($data);

This works in more cases than just plain code reference invocation:

  • If $data is an object then $callback can be a string (a method name).
  • If $callback is a code reference there is no difference between this syntax and code dereferncing
  • Even if $data is not an object but $callback is a code reference, the method syntax will still work.

It's pretty nice to be able to pass method names as callbacks invoked on a subclass but still have the flexibility of a code reference when appropriate, and works especially well when the callback attribute has a sensible default value that resolves to a method call on $self to allow easy overriding of behavior without needing a full subclass.

Tuesday, September 15, 2009

Hackathon Summary

During the YAPC::Asia::2009 hackathon I refactored a bunch of XS modules out of some other code, both rafl's and mine.

XS::Object::Magic

This module provides an alternative to the standard T_PTROBJ approach to creating C struct based objects in Perl.

The traditional way creates a blessed scalar reference, which contains an integer value that is cast to a pointer of the right type.

This is problematic for two reasons:

  • If the pointer value is modified (accidentally or maliciously) then this could easily corrupt memory or cause segfaults.
  • Scalar references can't be extended with more data without using inside out objects.

This module provides a C api which uses sv_magicext to create safer and more extensible objects associated with some pointer, that interoperates nicely with Moose amongst other things.

Magical::Hooker::Decorate

This module lets you decorate an arbitrary SV with any other arbitrary SV using the magic hook API.

This is similar to using a fieldhash to create inside out decorations, but is designed to be used primarily from XS code, and as such it provides a C api. The memory footprint is also slightly smaller since there is no auxillary storage.

B::Hooks::XSUB::CallAsOp

This module was refactored out of Continuation::Delimited and lets an XS routine trigger code that will be invoked in the context of its caller.

This allows a little more freedom in the stack manipulations you can do (which is of course very important for continuations).

These modules are all possible due to the awesome ExtUtils::Depends module. If you find yourself cargo culting XS consider refactoring it into a reusable module using ExtUtils::Depends instead.

Saturday, September 12, 2009

Git Hate

I like Git, but sometimes I also really hate it.

My friend Sartak asked if there is a better way of finding a merge base in a command than hard coding master as the merge point, and I stupidly said "of course, you just check what that branch tracks". That is the same logic that git pull would use, so in theory you can apply the same concise logic to merge or rebase your branch without running git fetch.

In principle that's correct, but in practice it's utterly worthless.

The first step is to figure out what branch you are on.

You could easily do this with the git-current-branch command. Except that it doesn't actually exist. Instead you need to resolve the symbolic ref in head and truncate that string:

$(git symbolic-ref -q HEAD | sed -e 's/^refs\/heads\///'

Except that that's actually broken if the symbolic ref points at something not under heads.

Ignoring that, we now have a string with which we can get our merge metadata:

branch="$( git current-branch )"

git config --get "branch.$branch.remote"
git config --get "branch.$branch.merge"

The problem lies in the fact that the tracking refers to the remote, and the ref on the remote:

[branch "master"]
 remote = origin
 merge = refs/heads/master

But we want to use the local ref. Based on git config, we can see that this value should be refs/remotes/origin/master:

[remote "origin"]
 url = git://example.com/example.git
 fetch = +refs/heads/*:refs/remotes/origin/*

If you read the documentation for git-fetch you can see the description of the refspec for humans, but it's pretty feature rich.

Fortunately Git already implements this, so it shouldn't be too hard. Too bad that it is.

In builtin-fetch.c are a number of static functions that you could easily copy and paste to do this refspec evaluation in your own code.

So in short, what should have been a simple helper command has degenerated into an epic yak shave involving cargo culting C and reimplementing commands that should have built in to begin with.

And for what?

git merge-base $( git-merge-branch $( git current-branch ) ) HEAD
instead of
git merge-base master HEAD

In short, it's completely non-viable to do the right thing, nobody has that much spare time. Instead people just kludge it. These kludges later come back with a vengeance when you assume something does the right thing, and it actually doesn't handle an edge case you didn't think about before you invoked that command.

I'm not trying to propose a solution, it's been obvious what the solution is for years (a proper libgit). What makes me sad is that this is still a problem today. I just can't fathom any reason that would justify not having a proper way to do this that outweighs the benefit of having one.

Friday, September 11, 2009

YAPC::Tiny, YAPC::Asia

I'd like to thank to everyone that made my trip to Asia so great thus far.

To list but a few: thanks to the JPA for organizing YAPC::Asia, especially Daisuke for being so accommodating when it wasn't clear if I would make it till the end, gugod for organizing YAPC::Tiny, Kenichi Ishigaki and Kato Atsushi for translating my talks (though only one was presented in the end), clkao and audreyt for being such great hosts in Taiwan, Dan Kogai for hosting us at Hotel Dan, and the Japanese and Taiwanese Perl communities in general for being so much fun to be around.

I'm so indebted to everyone involved in this conference and all of the others, it makes every one of these trips fun and memorable, totally justifying the fact that Perl takes up such an unhealthily large part of my life and income.

Tomorrow is the hackathon, always my favourite part of conferences. On my agenda:

  • Hacking with gfx and rafl on Moose's XS stuff
  • Fixing a weird corruption that happens with Continuation::Delimited, so that finally it passes all of its tests. Once that happens I can focus on adding new failing tests, which is much more fun ;-)
  • Playing with PSGI/Plack, Stardust, and other cool things I learned about during the conference and will probably discover tomorrow

On Sunday, the last day of my trip, CL and I are going to try to not get killed on Mount Myogi.

Saturday, September 5, 2009

Cache Assertions

Caching an expensive computation is a great way to improving performance, but when it goes wrong the bugs can be very subtle. It's vital to be sure sure that the cached result is, for all intents and purposes, the same as what the computation would have produced.

A technique I often employ is using a thunk to defer the computation:

$cache_helper->get_value( $key, sub {
   ... produce the expensive result here ...
});

If there is a cache hit then the subroutine is simply not executed:

sub get_value {
    my ( $self, $key, $thunk ) = @_;

    if ( my $cache_hit = $self->cache->get($key) ) {
        return $cache_hit;
    } else {
        my $value = $thunk->();

        $self->cache->set( $key => $value );

        return $value;
    }
}

The get_value method must be referentially transparent; when invoked with the same $key it should always produce a consistent result regardless of the state of the cache.

When adding caching to existing code I often accidentally choose a cache key that doesn't contain all the implied parameters since the details of the code I'm caching are no longer fresh in my memory.

The reason I like the thunk approach is that $cache_helper can be swapped for something else to easily check for such caching bugs.

For instance, to confirm that a reproducible problem is a cache related bug, you can disable caching temporarily using this implementation:

sub get_value {
    my ( $self, $key, $thunk );

    return $thunk->();
}

Or better yet, use an asserting helper in your test environment, where performance isn't important:

sub get_value {
    my ( $self, $key, $thunk ) = @_;

    my $value = $thunk->();

    if ( my $cache_hit = $self->cache->get_value($key) ) {
        $self->check_cache_hit($cache_hit, $value);

        # it's important to return $cache_hit in case check_cache_hit
        # isn't thorough enough, so that this doesn't mask any bugs
        # that would manifest only in production
        return $cache_hit;
    } else {
        $self->cache->set( $key => $value );

        return $value;
    }
};

sub check_cache_hit {
    my ( $self, $cache_hit, $value ) = @_;

    confess "Bad cache hit: $key, $cache_hit != $value"
        unless ...;
}

This approach can be used on any sort of cache, from $hash{$key} ||= $thunk->() to CHI.

It's useful to have several get_value like methods, to handle the different result types more easily (check_cache_hit can be a little intricate, and adding namespacing to keys is always a good idea).

This is useful for other reasons too, for instance collect timing information, to measure the effectiveness of caching, or use different cache storage for different result types.

The key to using this effectively is to make the caching object an easily overridable delegate in your application's code. For instance, if you're using Catalyst you can specify the subclass in your configuration, or even make it conditional on $c->debug.

Lastly, it's worth mentioning that my sample implementation is actually wrong if false values are valid results.

Tuesday, September 1, 2009

Try::Tiny

I just released Try::Tiny, yet another try { } catch { } module.

The rationale behind this module is that Perl's eval builtin requires large amounts of intricate boilerplate in order to be used correctly. Here are the problems the boilerplate must address:

  • $@ should be localized to avoid clearing previous values if the eval succeeded.
  • This localization must be done carefully, or the errors you throw with die might also be clobbered.
  • if ( $@ ) { ... } is not guaranteed to detect errors.

The first problem causes action at a distance, so if you don't address it your code is very impolite to others. The other two problems reduce the reliability of your code. The documentation contains an in depth explanation of all of these issues.

Here is my standard boilerplate for a polite, defensive eval blocks:

my ( $error, $failed );

{
   local $@;

   $failed = not eval {

       ...;

       return 1;
   };

   $error = $@;
}

if ( $failed ) {
   warn "got error: $error";
}

This is extremely tedious when really all I want to do is protect against potential errors in the ... part.

Try::Tiny does that and little else; it runs on older Perls, it works with the various exception modules from the CPAN, it has no dependencies, it doesn't invent a new catch syntax, and it doesn't rely on any mind boggling internals hacks.

If you are not comfortable using TryCatch and you don't mind missing out on all of its awesome features then you should use Try::Tiny.

If you think plain eval { } is fine then your code is potentially harmful to others, so you should also use Try::Tiny.

Despite its minimalism Try::Tiny can be pretty expressive, since it integrates well with Perl 5.10's switch statement support:

use 5.010; # or use feature 'switch'

try {
    require Foo;
} catch {
    when ( /^Can't locate Foo\.pm in \@INC/ ) { } # ignore
    default { die $_ } # all other errors are fatal
};

For backwards compatibility you can obviously just inspect $_ manually, without using the when keyword.