Saturday, August 15, 2009

In Defense of APIs

This is response to Laurent Dami's post about when to prefer direct access to the hash fields of an object over an encapsulated API.

The point of the article is that encapsulation is a tradeoff, especially in Perl where you could do some things more easily or efficiently if you decide to give it up.

While I agree that this tradeoff exists, I think that it's not the only tradeoff you're allowed to make, and that Laurent's reasoning in defense of hash access can be solved in ways that don't involve breaking encapsulation and that are in my opinion superior.

In short, I find most of his examples are either better solved by Moose, or fall short of being substantial in my opinion.

I will attempt to refute his arguments one by one.

Good Perl idioms are no longer available

Admittedly the fact that the results of object methods or subroutine application cannot be easily interpolated into strings is a big annoyance in Perl, but that's far from a good reason to avoid them.

I might be alone on this, but personally I find the rest of these examples quite troubling. I think they miss the point of OO programming.

By refactoring these examples into a real API we no longer need the comments to explain what they're actually doing. Maybe the code implementing these methods is not as idiomatic, but we don't need to know it's there (and since we don't know, it could be that it actually is as idiomatic as he likes under the hood.

The total lines of code may be longer for a single usage, but most applications that manipulate points in a 2D space would need to do these operations many times, reducing the overall verbosity:

my $opposite = $point->symmetry_transform;

my $zoomed = $point->zoom($factor);

# I'd actually use a wrapper object for this
# (DisplayPoint with a Menu and a Point delegate)
my $point = Point->new_with_traits(
    traits => [qw(Menu)],
    menu   => { color => "green" },
    ...
);

There is one final example:

# temporary push aside
{ local $point->{x} += $far_away;
    do_something_without_that_point_in_the_way();
} # point automatically comes back to its previous location

And though for this one there is no direct correspondence[1], the problem is that instead of passing the set of relevant points to do_something_without_that_point_in_the_way(), the point is mutated to temporarily hide its effects.

I've posted on immutability before and why I think that this approach is wrong, but suffice it to say that I would write the above using an explicit collection of points, and make the operation of creating new collections easy instead of monkey patching the data.

No obvious distinction between "setter" methods and other methods

This problem can be solved by using an alternate method naming style. For instance MooseX::Policy::SemiAffordanceAccessor makes the default setter for an attribute named foo into set_foo.

Personally I feel this is overkill. If the relationship between the object and the method name is obviously that of a noun and an attribute, such as $thing->size(10), that is far from ambiguous.

For cases that are more involved, like a field manipulation that requires additional actions I tend to have a method name that has a verb in it and keep the setter completely private. Since the method would be doing something more than just set_ that is reflected in the API.

Hard to debug

This is a deficiency in the Perl debugger. I strongly believe that the maintainability of your code shouldn't suffer because of inferior tools. Someone might fix the Perl deubgger one day. Though unlikely, it's much more unlikely that they will also all of my code.

Since the inner workings of Catalyst is obviously not what Laurent wants to debug in the example, the problem lies in the Perl debugger's inability to handle complex (and idiomatic) perl expressions.

Furthermore, in this particular case it shouldn't be a problem, $self is known and you could simply set a breakpoint for that class's min_body method and continue execution instead of stepping through every operation.

Secondly, if Catalyst had supported lazy body parsing, in which case that line might be run before the body length is actually known (the body may not have been fully read), then this operation wouldn't be a simple accessor fetch but rather a blocking read and an update of the request object.

This polymorphism in the API is precisely what makes it possible to enable such optimizations in future versions of Catalyst without breaking existing applications. If we simply used length($c->{request}{body}) directly, there would be no room for future improvements, even on an HTTP abstraction that could support that easily.

Non-scalar attributes must either copy or reinvent an API

This problem in particular is solved using MooseX::AttributeHelpers (note that this will be in core Moose soon, after receiving a big facelift).

In the more general sense, Moose deeply supports the encapsulation and packaging of patterns using its meta trait system. By applying traits to attribute we can alter their behavior, providing additional functionality in a concise and declarative way.

There's no need to reinvent or copy because of Moose's extensibility, something that is missing from other class systems, and this applies not only to non scalar data, but to many other patterns for which MooseX modules were written.

No generic data traversal modules

Again, introspection is key here. KiokuDB would never be possible without generic data traversal for Moose objects. That's what sets it apart from Pixie.

I've been meaning to write a Data::Visitor::Moose for a long while but simply never got around to it. It's definitely possible, and even not that hard.

Here is a simple example demonstrating JSON integration:

method TO_JSON {
    return {
        map { $_->name => $_->get_value($self) }
            $self->meta->get_all_attributes
    };
}

But the really nice thing is that since all the attributes have extensible metadata we can be very particular about how we filter them. Unfortunately the JSON module's API doesn't allow us to specify parameters to the TO_JSON method, but if it would then we could very easily output different variants for different contexts.

From the point of view of the class being serialized (obviously the TO_JSON method could go in a role and get reused), we could decorate attributes with traits to determine their correct behavior. For instance an attribute that is used both by an ajax frontend displaying the data, and a REST Json feed for this object could be declared as:

has name => (
    traits => [qw(MyApp::View MyApp::REST)],
    ...
);

The actual implementation involves two roles (MyApp::View and MyApp::REST) with no implementation, and a simple $attr->does($role) call in the TO_JSON method.

This is not only cleaner, but also more powerful than violating encapsulation. The difference is that the encapsulation provided by e.g. Class::Accessor or Object::Tiny is just not introspectable or extensible.

Methods are slower than direct access to attributes

I won't argue about this, but in my experience even with large (some would say gratuitous) amounts of method calls I have never been able to demonstrate that accessors or other small methods were an actual performance issue (that is, using a profiler on real code with real inputs).

My dying laptop can make about 1,000,000 method calls per second. My $20 a month virtual private hosting image does 3,000,000. The difference between 5 method calls and 500 in a single dynamic web request is still too small to actually matter, at least for the kind of apps that I write. 100x as many method calls for the same amount of code will not be 100x slower, but will probably be 10x more maintainable ;-)

That aside, the XS branch for Moose has been performing almost as fast as direct hash access in microbenchmarks, while still providing full type validation, support for lazy attributes, access control, etc. This is because XSUBs are faster than pure perl subroutines (they require no context scope, and a single opcode dispatch). And of course these accessors can be subclassed or swapped out for more involved implementations should the need arise.

The XS branch is unmerged and unfinished, but there are a number of other XS accessor implementations on the CPAN if that is actually a problem in your code.

Conclusion

Laurent concluded his post saying:

In Perl, fully encapsulated objects are sometimes the best solution, sometimes not; weight these considerations before taking strong design decisions.

An interesting design is the one of DBI : objects (handles) are totally encapsulated, yet they exploit the power of tie to expose their attributes through a conventional hashref API, instead of OO getter and setter methods. This is a very clever compromise.

As far as I am concerned, I purposedly designed DBIx::DataModel to fully exploit the dual nature of Perl objects, having both the OO API for executing methods, and the hashref API for accessing column values. I wouldn't necessarily do that everywhere, but for row objects in a database, which are very open in nature, this just seemed an appropriate solution.

Specifically in the case of DBIx::DataModel, I think that hash access is a valid approach. But this is because DB records are not objects, but tuples. You can model tuples as objects, but not the other way around.

I couldn't agree more with his sentiments that this a tradeoff you should consider, but I think his cutoff point is very different from mine. I would need to be much more desperate to reach for "naughty" hash access when accessors would do.

Lastly, every time I've used the DBI api I really wished it didn't do that horrible hashref hack. Cleverness is bad, it's confusing to newbies, it's a red flag to experienced programs new to this API, and to reference his previous point about speed, tie is about 4 times as slow as normal method calls. To me that feature seems like a half hearted attempt to allow polymorphism and encapsulation in an API where methods would break compatibility.

This complexity makes understanding DBI internals a lot harder, and by proxy makes writing DBDs harder too. This is a fact of life that we deal with because DBI has been around for longer than I've known to program and because it's still robust and performant. It has flaws but it makes up for them. That doesn't mean the tied hash attributes were a good idea.

I feel that my decision to be an OO purist, especially in a language where impurities are so tempting has made my job as a maintenance programmer much easier for me and my coworkers.

To me polymorphism is a far more important idiom than concise hash manipulation. It may not be unique to Perl, but it's damned useful.

[1] $point->localize( x => $new_value, body => sub { ... } )

8 comments:

Mithaldu said...

Something at the back of my mind is telling me that someone in this discussion is forgetting about how other languages have explicit structs in addition to classes and that they're there for a reason.

zby said...

my $opposite = $point->symmetry_transform; - what you propose here is adding a method to the Point class, but if you use a Poing class from CPAN - then you can only subclass it and it depends on the public api if you can make the new method as efficient as ($point->{x}, $point->{y}) = ($point->{y}, $point->{x});.

Unknown said...

In defence of Object::Tiny, it is not an encapsulation layer, it is a convenience module.

To extend method "foo" you do the following.

Original:
use Object::Tiny qw{
foo
bar
};

Extended:
use Object::Tiny qw{
bar
};

sub foo {
whatever...
}

Aristotle said...

Thanks for writing the post I couldn’t be bothered to! I found the article just as troubling as you did.

(I honestly think that the best result of the Iron Man thing was to get you blogging. I’m enjoying that greatly so far.)

nothingmuch said...

@Christian: c.f. MooseX::Types::Structured

@zby:

but does it matter?

Point->new( x => $proto->y, y => $proto->x )

is pretty concise no matter whether it's in a subclass or Point itself, and it is much more reusable.

In the context of CPAN/opensource and patching, i think this is even more important. The new methods could be submitted upstream, or a subclass could be released to the CPAN, etc. Not breaking this process is helpful to our community.

@Adam:

I certainly didn't mean that as criticism, I see both of those modules as just a way to generate some well known code, if you need introspection you write it yourself on top of it.

Object::Tiny in particular is designed more as a bootstrap/lightweight thing, so it's unfair to expect this from it, obviously.

Laurent's criticism could apply to code that uses either though, and that's what I was addressing.

@Aristotle: Thanks =)

nothingmuch said...

@zby: also, if efficiency is a concern then that won't make a difference. Writing Point in C++ or another high performance language, preferably one that doesn't require sacrificing kittens.

dami said...

Hi Yuval,

Thanks for answering my post is such great length; I'm delighted to see that this deliberately provocative title generated a very interesting discussion.

Don't take my examples too literally; these were just invented to illustrate a general principle.

In your answer, you more or less say "oh, but we can add this and that to the class". Fine, but this assumes that the class producer and class consumer talk together to understand each other's needs (or maybe they are the same person). However, OO is about programming by contract, so my question really was about : what do you put in the contract ?

If the class producer needs to be protective, he/she will choose an encapsulated API, which is fine. All I'm saying is that in some situations, the producer can decide that there is no risk in being more open, and publicize a hashref API in the contract, and again in some situations, this could make consumers more happy.

nothingmuch said...

I suppose the real tradeoff is what users would you like to make happy. In the DBIx::DataModel case I think it was certainly appropriate, but I think that it's very unique in that respect.

Establishing a contract that allows hash access limits the implementation changes you can make in the future without breaking that contract.

A method only API is obviously no guarantee (it could be wrong), but the odds are better.

I don't really care about protecting my class from user code (if they break it its their fault), but I do care a lot about future proofing user code to allow for drop in replacements or refactoring.

Cheers,
Yuval