Wednesday, July 29, 2009

Reducing Scope

Jay Kuri recently made the distinction between external and internal dependencies. He makes the case that when you choose to internalize a dependency the implementation of that dependency usually suffers as a result.

Whenever you choose to internalize any part of a software component, in my opinion that's very similar to hard coding a value.

My code is often criticized for being too componentized or abstract. To use it you usually have to use an OO API (even for simple things), provide configuration values explicitly (even for obvious values), it will have many CPAN dependencies, etc, but there is a reason I code this way.

FAIL

Today I was trying to generate a password string from a large integer. Specifically I wanted something I can paste into a password input box, that contains alphanumeric and punctuation characters, and is the direct result of the output of Digest::HMAC.

There are many modules on the CPAN that generate passwords (Crypt::RandPasswd, String::MkPasswd, Crypt::PassGen). However, none of them can be used to solve my problem.

Generating random passwords involves two steps:

  1. Generate a random seed
  2. Encode a password based on that seed

All of these modules provide a new approach to the second step, implementing clever pronounceable passwords, interesting encoding schemes, and so on. This is the code I would like to reuse.

Unfortunately they all also internalize the dependency of generating random numbers, and the closest thing to an API to override that is this gem from Crypt::RandPasswd's documentation:

{
    local $^W; # squelch sub redef warning.
    *Crypt::RandPasswd::rng = \&my_rng;
}

So to adapt Crypt::RandPasswd to my requirements I'd have to recreate the rand API that takes an upper bound and returns a floating point number between 0 and that number.

This is better than all the other modules, at least it would let me use something like Crypt::Random::Source to make sure the passwords are truly random, and it's documented so it's unlikely to break if I rely on it, but this is hardly a solution.

I'm left with two choices, monkey patch or reinvent.

Object Oriented design

If Crypt::RandPasswd was an OO module I could perhaps subclass it instead of monkey patching, but that's not necessarily enough to be reusable.

If these modules had OO APIs that used two delegates, one to generate random data and one to encode random data into a string representation optimized for human consumption, I probably wouldn't have this problem.

Like Jay said, by externalizing the algorithmic dependencies here (even if they still end up in the same CPAN distribution), we're increasing the likelyhood that a clean, well thought API for each delegate would emerge. This is what makes polymorphism so powerful when used right.

My critics would argue that this would be bloated and overengineered, but if the problem of encoding passwords were separated into a standalone class, at this point the work is done.

There is no need for more features, or an extensible API. Another encoding scheme could be implemented using the same API and dropped into where this fits. it wouldn't need to worry about details like generating cryptographically strong random numbers, or providing additional customizability.

This scope reduction is in my opinion fundamental to writing reusable and maintainable code. It seems like a PITA until you have a working system, and it may seem like more work when you're writing such code or installing modules that are written in this way.

Old monolithic code can obviously be cannibalized or refactored into new code during the maintenance cycle, but if its scope was properly reduced to begin with the flawed component could be replaced a lot more easily, either by dropping in a replacement if the API can remain intact, or also adjusting the dependent code to use a better API.

An example of how to not write a proper solution to a self contained problem, take a look at Catalyst::Plugin::Session.

When I rewrote the system that predated it, I made the distinction between preserving state in HTTP and storing associated data on the server. However, I unnecessarily intertwined the implementations of these two sub problems, creating an overly complex hook based API that is still monolithic.

This code cannot be saved. At best, snippets can be copied and pasted into a new implementation, but since the design is flawed and the implementation is so large and complex, there is no hope of reuse or even a backwards compatible plugin API. At best the same API (or a subset) can be provided for applications, so that user code will not have to be rewritten.

Monoliths aren't always evil

Off the top of my head I can enumerate at least 6 different components of a session management system, with no overlap and very little interaction between them. This does not mean that you would have to configure at least 6 different Catalyst components/plugins just to get $c->session. Instead, a wrapper component that configures them for you (and does nothing but configure) would be written on top of them, to provide DWIM.

This is the role of a sugar layer. I've previously stated the advantage of Moose lies in its underlying componentization. However, Moose's success lies in its appearance of a monolithic system; the sugar is concise, friendly and works out of the box. When you look closer the real extensibility becomes apparent.

In this way Moose is also very future proof. We know that the sugar layer exported by Moose.pm is flawed, we've learned many new things since it was written, and we can provide a cleaner, friendlier and more powerful syntax in the future.

Reducing burdens

Some time ago I commented on a blog post asking how some of the more prolific CPAN authors manage to keep up. Scope reduction is the most fundamental part of doing that, they key is that I don't actually maintain a lot of that code, there is nothing left to do except fix minor bugs. Most of my modules will never get a new feature, instead that new feature would be written as a module that depends on the existing one.

It's a little more work up front, hard coding seems easy when you're doing it, but it's a very near sighted optimization. I suspect this is what makes the quality of code in Haskell's hackage generally higher than other languages, the type system makes it hard to be lazy that way on the small scale, serving as a constant reminder to keep things cleanly separated.

Thanks

Catalyst::Plugin::Session makes me cringe. I am very grateful to t0m for maintaining that piece of shit. Thankfully at least it taught me a valuable lesson in software design.

5 comments:

gaal said...

Dependency infection / inversion of control is nice, certainly, but I'm not sure it really depends on OO design.

In a functional language interface, you would probably be allowed to construct your own password generator by passing the constructor a PRNG function and an encoder*. Currying makes for convenience -- and possibly, depending on your style preferences, cleaner than subclassing.


PS: One reason your modules get criticized is because your docs suck. Even if your interface is generic, you want to give a specific, concrete example near or in the synopsis so that (T_ahh - T_wtf) is minimized.


* So the generator might be implemented as, simply, (.).

nothingmuch said...

Yes, functional programming definitely fits into this just as well (if not better), but we don't really use it in Perl

Nathan Kurz said...

Why do you say that the approach in Crypt:RandPasswd is 'hardly a solution'. I don't necessarily disagree, but I wonder which parts bother you. Is it the ugliness of the syntax, or that the override can't really be redistributed?

One could probably 'improve' the syntax in a general fashion by writing a module 'Override' such that 'use Override "Crypt::RandPasswd" => {rng => \&my_rng}' works. But would this be an improvement or even worse of an abomination?

For that matter, what would your ideal interface look like? Is 'use Crypt::RandPasswd rng => \&my_rng' what you are looking for? I ask because I'm running up against similar issues, and haven't yet come up with a solution I really like.

nothingmuch said...

C::RP is giving the user something sort of like a prototype object with one overridable method, 'rng'.

If instead if just provided a fine grained OO api that could easily be subclassed, where the different tasks, such as generating random data, generating an alphabet, picking sequences from it, etc.

Now, thinking about this, there is no reason whatsoever for the public API of the random data generation method to return an arbitrary floating point number with an arbitrary number of bits of percision. The data type doesn't fit, the interface becomes clumsy, etc.

If instead the method generated N bits and returned rthem as a packed string, or it would generate an integer between 0 and $n or something like that, it would have been much easier to provide an alternate random number generator. Sticking with the rand() builtin's API is just the wrong level of abstraction here.

And no, use Crypt::RandPasswd ( rng => ... ) changes nothing in the ill fitting semantics of the 'rng' callback. The only difference is that it implies the overriding is global, which is just evil (it may or may not be true, the import method could curry, instead).

It's not a question of the overriding syntax, it's a question of thinking about what it is that the random data generation api should look like when you're generating passwords.

A sensible answer is 'a sequence of randomly chosen characters', and a silly one is 'a floating point number between 0 and 1'

nothingmuch said...

A singleton prototype object, i should say.