Wednesday, July 7, 2010

Are we ready to ditch string errors?

I can't really figure out why I'm not in the habit of using exception objects. I seem to only reach for them when things are getting very complicated, instead of by default.

I can rationalize that they are better, but it just doesn't feel right to do this all the time.

I've been thinking about what possible reasons (perhaps based on misconceptions) are preventing me from using them more, but I'm also curious about others' opinions.

These are the trouble areas I've managed to think of:

  • Perl's built in exceptions are strings, and everybody is already used to them. [1]
  • There is no convention for inspecting error objects. Even ->isa() is messy when the error could be a string or an object.[2]
  • Defining error classes is a significant barrier, you need to stop, create a new file, etc. Conversely, universal error objects don't provide significant advantages over strings because they can't easily capture additional data apart from the message.[3]
  • Context capture/reporting is finicky
    • There's no convention like croak for exception objects.
    • Where exception objects become useful (for discriminating between different errors), there are usually multiple contexts involved: the error construction, the initial die, and every time the error is rethrown is potentially relevant. Perl's builtin mechanism for string mangling is shitty, but at least it's well understood.
    • Exception objects sort of imply the formatting is partly the responsibility of the error catching code (i.e. full stack or not), whereas Carp and die $str leave it to the thrower to decide.
    • Using Carp::shortmess(), Devel::StrackTrace->new and other caller futzery to capture full information context is perceived as slow.[4]
  • Error instantiation is slower than string concatenation, especially if a string has to be concatenated for reporting anyway.[5]

[1] I think the real problem is that most core errors worth discriminating are usually not thrown at all, but actually written to $! which can be compared as an error code (see also %! which makes this even easier, and autodie which adds an error hierarchy).

The errors that Perl itself throws, on the other hand, are usually not worth catching (typically they are programmer errors, except for a few well known ones like Can't locate Foo.pm in @INC).

Application level errors are a whole different matter though, they might be recoverable, some might need to be silenced while others pass through, etc.

[2] Exception::Class has some precedent here, its caught method is designed to deal with unknown error values gracefully.

[3] Again, Exception::Class has an elegant solution, adhoc class declarations in the use statement go a long way.

[4] XS based stack capture could easily make this a non issue (just walk the cxstack and save pointers to the COPs of appropriate frames). Trace formatting is another matter.

[5] I wrote a small benchmark to try and put the various runtime costs in perspective.

Solutions

Here are a few ideas to address my concerns.

A die replacement

First, I see merit for an XS based error throwing module that captures a stack trace and the value of $@ using a die replacement. The error info would be recorded in SV magic and would be available via an API.

This could easily be used on any exception object (but not strings, since SV magic is not transitive), without weird globals or something like that.

It could be mixed into any exception system by exporting die, overriding a throw method or even by setting CORE::GLOBAL::die.

A simple API to get caller information from the captured COP could provide all the important information that caller would, allowing existing error formatters to be reused easily.

This would solve any performance concerns by decoupling stack trace capturing from trace formatting, which is much more complicated.

The idea is that die would not merely throw the error, but also tag it with context info, that you could then extract.

Here's a bare bones example of how this might look:

use MyAwesomeDie qw(die last_trace all_traces previous_error); # tentative
use Try::Tiny;

try {
 die [ @some_values ]; # this is not CORE::die
} catch {
 # gets data out of SV magic in $_
 my $trace = last_trace($_);

 # value of $@ just before dying
 my $prev_error = previous_error($_);

 # prints line 5 not line 15
 # $trace probably quacks like Devel::StackTrace
 die "Offending values: @$_" . $trace->as_string;
};

And of course error classes could use it on $self inside higher level methods.

Throwable::Error sugar

Exception::Class got many things right but a Moose based solution is just much more appropriate for this, since roles are very helpful for creating error taxonomies.

The only significant addition I would add make is having some sort of sugar layer to lazily build a message attribute using a simple string formatting DSL.

I previously thought MooseX::Declare would be necessary for something truly powerful, but I think that can be put on hold for a version 2.0.

A library for exception formatting

This hasn't got anything to do with the error message, that's the responsibility of each error class.

This would have to support all of the different styles of error printing we can have with error strings (i.e. die, croak with and without $Carp::Level futzing, confess...), but also allow recursively doing this for the whole error stack (previous values of $@).

Exposed as a role, the base API should complement Throwable::Error quite well.

Obviously the usefulness should extend beyond plain text, because the dealing with all that data is a task better suited for an IDE or a web app debug screen.

Therefore, things like code snippet extraction or other goodness might be nice to have in a plugin layer of some sort, but it should be easy to do this for errors of any kind, including strings (which means parsing as much info from Carp traces as possible).

Better facilities for inspecting objects

Check::ISA tried to make it easy to figure out what object you are dealing with.

The problem is that it's ugly, it exports an inv routine instead of a more intuitive isa. It's now possible to go with isa as long as namespace::clean is used to remove so it's not accidentally called as a method.

Its second problem is that it's slow, but it's very easy to make it comparable with the totally wrong UNIVERSAL::isa($obj, "foo") in performance by implementing XS acceleration.

Conclusion

It seems to me if I had those things I would have no more excuses for not using exception objects by default.

Did I miss anything?

13 comments:

Unknown said...

use Params::Util '_INSTANCE';

if ( _INSTANCE($error, 'Exception::Foo') ) {
# ...
}

The longer I own this, the more annoying that naming scheme gets...

But at least it doesn't modify the public method list for classes.

fREW said...

I would love to help you with this endeavor. The only issue I have (and maybe this is just me being a wuss or w/e) is that if you make a Moose based exceptions setup than I can't sanely use it in non-Moose (DBIC) cpan modules, and that's just a drag...

nothingmuch said...

I intend on making stuff that can be used to improve any exception object mechanism.

My personal priority is Throwable::Error because it already exists and it's pretty nice to work with and to hack on, but the XS die thing, the related exception formatting should both be completely independent, and object introspection goes beyond exception objects.

Unknown said...

HOLY SMOKES! I want that XS catch thing sooooo much.

JT said...

Your solution looks pretty nice. I've used a number of exception objects, and yours looks better. However, lately I've just decided to throw an array ref everywhere like so:

[404, 'Could not find object', $id ]

[ error_code, error_message, error_data ]

It's easy to inspect, easy to detect, and nothing needs to be defined up front so I get the best of all worlds.

nothingmuch said...

@JT - yeah but you don't get stack traces (though the XS die thing should at least make that data available).

I actually do this too on occasion, usually using hash refs.

The biggest problem I hit with that with that is forgetting to catch the error and reformat it into something sane, when the error is going to be fatal anyway. This often happens in unit tests. The program just prints "HASH=0xdeadbeef" and exits, and you don't know where the error was original thrown, or where it was caught and made fatal...

JT said...

@nothingmuch

Good point about the stack trace.

As for the things capturing the errors and printing out ARRAY=oxdeadbeef, I consider that a bug. I have a top level exception catcher in both my applications and my tests which see if the exception is an array and does the right thing.

One other thing. I use Moose in basically everything these days so it wouldn't bother me at all if your solution was written in moose. However MooseX::Declare (you mentioned for v2), is ridiculously slow. I've stopped using it everywhere due to it's massive performance penalty. It seems to be anywhere from 30 to 100 times slower than Moose alone. Have you not run into this problem?

nothingmuch said...

I haven't noticed MX::D being slow, but I haven't bothered checking either, startup time rarely matters for the type of code I write, so I was never really concerned.

And the point of this exercise is not to write a brand new error system, there's a million of those already, the point is to address the deficiencies in existing ones =)

Matt S Trout (mst) said...

@JT I suspect what you mean by "ridiculously slow" is "I overused the validation of method call argument lists and then forgot that in perl it has to check at every method call".

Though MooseX::Types::Structured's checking code could still do with being rather more efficient. There was an attempt to make a start on that at YAPC::NA but I'm not sure we got enough hackers - trouble is there's the people for whom it's already fast enough, and the people who say things like "ridiculously slow" without bothering to profile and find out why. Neither tends to do anything useful against making it faster.

*sigh*

-- mst

Matt S Trout (mst) said...

Oh, and while I think about it: you shouldn't ever be checking for the "Can't locate" error anyway. hachi figured out fixing loading problems in perlbal that "Can't locate" leaves %INC empty and a failure to compile leaves %INC with an undef value, so you can use that to make that determination rather than parsing an error at all.

-- mst

JT said...

@nothingmuch - Actually I'm not talking about compile time, I'm talking about run time.

@mst - We did some benchmarking at our last PM, and found a few reasons why it was slow. I just don't remember what the outcome was of that as it was a sideline discussion. I'll bring it up again at our next PM and see where we were at.

nothingmuch said...

@mst - nice

@JT - when it's slow use method foo ( $x, $y ) { } (no types) for the hot path, with comprehensive validation only on top level public APIs. That makes it as fast as sub foo { my ( $self, $x, $y ) = @_ }

Unknown said...

This post gets at the crux of why Perl and its libraries are no longer "modern." I'm fine with embracing Moose but with no consistent exception mechanism aside from "die" with a string (which is no longer consistent), Perl is headed to the dustbin. The "isa" issue is yet another black eye for Perl.