Friday, June 18, 2010

I hate software

A long standing bug in Directory::Transactional has finally been fixed.

Evidently, universally unique identifiers are only unique as long as the entire universe is contained within a single UNIX process, at least as far as e2fsprogs' libuuid is concerned.

These "unique" strings were used to create names for transaction work directories, so when they in fact turned out to be the same fucking strings across forks, the two processes would overwrite each others' private data.

uuid(3) doesn't even contain any information on how to reseed it even if I would bother checking for that myself.

I simply cannot fathom how a pseudorandom number generator is being used for such a library without taking forking into account. Isn't this stuff supposed to be reliable?

6 comments:

Dotan Dimet said...

Clueless question: If you're using them to create working directories, why not use File::Temp?

Anonymous said...

Wow do you guys even know how Fork works? The issue here is that you don't understand fork and you made assumptions. Even worse you ignored the Forking warning in your beloved library. Go read it! It says don't fork (well you could do the UNIX thing and just exec again).

Do you know the cost of detecting forking? You have to compare pids, you have to do a syscall to get it. You might rely on one of the time based variables too. In many cases this is too much work. As far as I looked (not far) there is no AT_FORK event so you can't hook into it.

The simple explanation is that it is too much work to detect a fork and they already told you not to and have been since 0.04 of the lib.

nothingmuch said...

@dotan:

because the work area is not public, File::Temp would overkill in theory, and also a pain to use (configuring it to use my directory is pretty finicky). If nobody but my code writes there, UUIDs should be enough, assuming they're actually unique.


@troll:

I think I understand fork quite well, thank you very much. Suggesting that I reexec instead actually makes me think that maybe you don't understand why I chose fork in the first place =P

Since it's already getting CPU clock ticks as well as the real time clock, getpid is not really that much overhead.

My el cheapo laptop makes 200,000,000 calls to getpid per second, vs. 5,500,000 uuid_generate() calls. The overhead of getpid inside uuid_generate() is negligiable.

Anyway, I'm pretty sure getpid, getuid (which it calls anyway), etc are not even system calls, just library calls, because there's no reason they would need to make a context switch to kernel mode. It is for these purposes that computers have MMUs.

Regardless, my copy of the manpage says nothing of the sort about fork being broken. I'd be happy if you linked me to the relevant manpage.

Lastly, when I fork I could manually reseed it, but there is no way to do that in the API. That's the real complaint here. "Though shalt not fork" is just an unreasonable assumption to make for a general purpose library.

nothingmuch said...

@dotan I also just remembered, tempdir( CLEANUP => 1 ) cleans up only at END { } time, not using some sort of RAII surrogate to clean it up in a more timely fashion, so I would have to remove it manually anyway.

nothingmuch said...

@anon also, I have no idea what e2fsprogs 0.04 is, so I'll assume you meant *my* module Data::UUID::LibUUID, which I certainly did not document as such.

fREW said...

The best part is that it's software turtles most of the way down: It's a universe of fail!