Monday, May 25, 2009

Become a Hero Plumber

Perl's reference counting memory management has some advantages, but it's easy to get cycle management subtly wrong, causing memory and resource leaks that are often hard to find.

If you know you've got a leak and you've narrowed it down then Devel::Cycle can be used to make sense out of things, and Test::Memory::Cycle makes it very easy to integrate this into unit tests.

Harder to find leaks are usually the result of combining large components together. Reading through thousands of lines of dumps is pretty impractical; even eating colored mushrooms isn't going to help you much.

For instance, this is a classic way to accidentally leak the context object in Catalyst:

sub action : Local {
    my ( $self, $c ) = @_;

    my $object = $c->model("Thingies")->blah;

    $c->stash->{foo} = sub {
        $object->foo($c);
    };

    $c->forward("elsewhere");
}

That action will leak all the transient data created or loaded in every request. The cyclical structure is caused by $c being captured in a closure, that is indirectly referred to by $c itself. The fix is to call weaken($c) in the body of the action.

This example is pretty obvious, but if the model was arguably cleaner and used ACCEPT_CONTEXT to parameterize on $c, the leak would be harder to spot.

In order to find these trickier leaks there are a few modules on the CPAN that can be very helpful, if you know how and when to use them effectively.

The first of these is Devel::Leak. The basic principle is very simple: it makes note of all the live SVs at a given point in your problem, you let some code run, and then when that code has finished you can ensure that the count is still the same.

Devel::Leak is handy because it's fairly predictable and easy to use, so you can narrow down the source of the leak using a binary search quite easily. Unfortunately you can only narrow things down so far, especially if callbacks are involved. For instance the Catalyst example above would be hard to analyze since the data is probably required by the views. The smallest scope we can test is probably a single request.

Devel::Gladiator can be used to write your own more detailed Devel::Leak workalike. It lets you enumerate all the live values at a given point in time. Just be aware that the data structures you use to track leaks will also be reported.

Using Devel::Gladiator you can also find a list of suspicious objects and then analyze them with Devel::Cycle quite easily.

Sometimes the data that is leaking is not the data responsible for the leak. If you need to find the structures which are pointing to a leaked value then Devel::FindRef can be very helpful. The hardest challenge is picking the right value to track, so that you can get a small enough report that you can make sense of it.

Devel::Refcount and Devel::Peek can be used to check the reference count of values, but remember take into account all the references to a given value that are also in the stack. Just because the ref count is 2 for a value that's supposed to be referred to once does not mean that it's the root of a cyclical structure.

A more managed approach is using instance tracking in your leaked classes, ensuring that construction and destruction are balanced on the dynamic scope. You can do this manually for more accurate results, or you can use something like Devel::Events::Objects. I personally dislike Devel::Leak::Object because you have no control over the scope of the leak checking, but if you're writing a script then it might work for you.

Lastly, if you suspect you've found a leak then Data::Structure::Util is a rather blunt way of confirming that suspicion.

2 comments:

wolf said...

nice post. Brilliant work on CPAN. Cheers.

nothingmuch said...

Thanks =)