tag:blogger.com,1999:blog-8763583479715988862024-03-13T09:46:35.917+02:00nothingmuch's perl blognothingmuchhttp://www.blogger.com/profile/03855760206940108541noreply@blogger.comBlogger86125tag:blogger.com,1999:blog-876358347971598886.post-17018211736156934602012-07-11T23:09:00.000+03:002012-07-11T23:14:08.450+03:00Moosing to Norway<p>So it seems that the <a href="http://oslo.pm/">powers that be</a> have invited me to attend <a href="http://act.yapc.eu/mtmh2012/">the Moving to Moose hackathon</a> this August in Norway!</p><p>Though I haven't really been coding much over the last two years in which I've been a student, I've had a lot of time to think about things that have taken up my time before my "early retirement". I think that if anything, my new found laziness and detachment has made it a bit easier to be more objectively critical of some of the ideas I've had back in the day ;-)</p><p>Fortunately both the hackathon themes are exactly what I still found myself thinking about even when not writing code daily.</p><p>After this round of final exams is over I hope to catch up with all I have missed thus far, so that I can make the most of the hackathon, and hopefully give back to the community that is so generously keeping me a part of it.</p><p>See you there!</p>nothingmuchhttp://www.blogger.com/profile/03855760206940108541noreply@blogger.com0tag:blogger.com,1999:blog-876358347971598886.post-71084520587232540822010-10-13T06:34:00.002+02:002010-10-13T06:36:55.149+02:00Ritalin for your $PS1<p>In <a href="http://blog.woobling.org/2010/10/headless-virtualbox.html">my last post</a> I shared my <a href="http://gist.github.com/621452">colorful but otherwise inert bash prompt</a>. <a href="http://www.simplicidade.org/" rel="friend">Pedro Melo</a> <a href="http://bit.ly/d2bj0i">extended</a> it to integrate <a href="http://git.kernel.org/?p=git/git.git;a=blob;f=contrib/completion/git-completion.bash;hb=HEAD"><tt>__git_ps1</tt></a> with extra coloring using <tt>git status</tt>.</p>
<p>Unfortunately this can take a long while on large source trees (<tt>git status</tt> needs to scan the directory structure), and while smart bash prompts are handy, it can be frustrating if every prompt incurs a delay.</p>
<p>Of course this is annoying for any over zealous <tt>$PROMPT_COMMAND</tt>, not just <tt>git status</tt> based ones.</p>
<p><a href="https://gist.github.com/623371">My version</a> of his version has a small trick to add simple but effective throttling:</p>
<pre id="fake-gist-623449" class="fake-gist">_update_prompt () {
if [ -z "$_dumb_prompt" ]; then
# if $_dumb_prompt isn't set, do something potentially expensive, e.g.:
git status --porcelain | perl -ne 'exit(1) if /^ /; exit(2) if /^[?]/'
case "$?" in
# handle all the normal cases
...
# but also add a case for exit due to SIGINT
"130" ) _dumb_prompt=1 ;;
esac
else
# in this case the user asked the prompt to be dumbed down
...
fi
}
# helper commands to explicitly change the setting:
dumb_prompt () {
_dumb_prompt=1
}
smart_prompt () {
unset _dumb_prompt
}</pre>
<p>If the prompt is taking too long to show up I simply hit <tt>^C</tt> and my <tt>$PROMPT_COMMAND</tt> becomes a quicker dumbed down version for the current session.</p>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-876358347971598886.post-79029569606689599442010-10-12T02:25:00.009+02:002010-10-12T02:36:01.730+02:00Headless VirtualBox<p>This being the second time I've set this stuff up, I thought it's worth documenting my <a href="http://www.virtualbox.org/">VirtualBox</a> development workflow.</p>
<h2>A Decent Hacking Environment</h2>
<p>The OSX side of my laptop is working pretty smoothly. I've got my stack of tools configured to my liking, from my <a href="http://bash.org/?top">shell</a>, to my <a href="http://code.google.com/p/macvim/">editor</a>, to my <a href="http://www.google.com/support/chrome/bin/answer.py?hl=en&answer=95655">documentation browser</a>. I've spent years cargo culting all my dotfiles.</p>
<p>But pick my brain <em>any</em> day and I'll give you a mouthful about Apple and OSX. I also know that there are superior alternatives to most of my software stack.</p>
<p>That said, even if I'm not entirely happy with the setup, I'm definitely <em>content</em> with it, and I have no plans on learning anything new to gain a 3% efficiency in the way I type in text or customize the way I waste time online.</p>
<p>Part of the reason I use OSX is that there is no hope (and therefore no temptation) in trying to fix little annoyances, something that led me to sacrifice countless hours during the brief period of time when I had a fully open source desktop environment.</p>
<p>However, when it comes to installing and configuring various project dependencies (daemons, libraries, etc), OSX can be a real pain compared to a decent Linux distribution.</p>
<h2>A Decent Runtime Environment</h2>
<p>Disk space is cheap, and virtualization has come along way in recent years, so it really makes a lot more sense to <em>run</em> my code on a <a href="http://debian.org/">superior platform</a>. One image per project also gives me brainless sandboxing, and snapshots mean I can quickly start over when I break everything.</p>
<p>Sweetening the deal even more, I always seem to be surrounded by people who know how to properly maintain a Debian environment much better than I could ever hope to, so I don't even have to think about how to get things right.</p>
<h2>Bridging the Gap</h2>
<p>In order to make it easy to use both platforms simultaneously, with cheap context switching (on my wetware, that is), I've written a script that acts as my sole entry point to the entire setup.</p>
<p>I don't use the VirtualBox management GUI, and I run the Linux environment completely headless (not just sans X11, without a virtual terminal either).</p>
<p>I hard link the following script in <tt>~/bin</tt>, once per VM. To get a
shell on a VM called <tt>blah</tt>, I just type <tt>blah</tt> into my shell
prompt and hit enter:</p>
<pre id="fake-gist-621409" class="fake-gist">#!/bin/bash
VM="$( basename "$0" )"
if [ -n "$1" ]; then
# explicit control of the VM, e.g. `blah stop`
# useful commands are 'pause', 'resume', 'stop', etc
case "$1" in
status) VBoxManage showvminfo "$VM" | grep -i state ;;
*) VBoxManage controlvm "$VM" ${1/stop/acpipowerbutton} ;; # much easier to type
esac
else
# otherwise just make sure it's up and provide a shell
# boot the virtual machine in headless mode unless it's already running
# note that there is a race condition if the machine is in the process of
# powering down
VBoxManage showvminfo --machinereadable "$VM" | grep -q 'VMState="running"' || \
VBoxManage startvm "$VM" -type vrdp;
# each VM has an SSH config like this:
# Host $VM
# Hostname localhost
# Port 2222 # VBoxManage modifyvm "$VM" --natpf1 ...
# changing ssh port forwarding doesn't require restarting the VM (whereas
# fiddling with VirtualBox port forwarding does). The following section
# should probably just be a per VM include, but for my needs it does the
# job as is.
# ControlMaster works nicely with a global 'ControlPath /tmp/%r@%h:%p' in
# my ~/.ssh/config this means the port forwarding stays up no matter how
# many shells I open and close (unlike ControlMaster auto in the config)
# this loop quietly waits till sshd is up
until nc -z localhost 3000 >/dev/null; do
echo -n "."
ssh -N -f -q \
-L 3000:localhost:3000 \
-o ConnectTimeout=1 \
-o ControlMaster=yes \
"$VM" && echo;
done
# finally, start a shell
exec ssh "$VM"
fi</pre>
<p>Once I'm in, I also have my code in a mount point under my home directory. I set up a <a href="http://www.virtualbox.org/manual/ch04.html#sharedfolders">shared folder</a> using VirtualBox's management GUI (installing the VirtualBox guest additions <a href="http://www.ithowto.ro/2009/03/howto-install-guest-additions-for-virtualbox-on-debian-lenny-50/">like this</a>). To mount it automatically I've got this in <tt>/etc/fstab</tt> on the guest OS:</p>
<pre id="fake-gist-621414" class="fake-gist"># <file system> <mount point> <type> <options> <dump> <pass>
some_dir /home/nothingmuch/some_dir vboxsf uid=nothingmuch,gid=nothingmuch 0 0
</pre>
<p>I use the same path on the OSX side and the Linux side to minimize confusion. I decided not to mount my entire home directory because I suspect most of my dotfiles aren't that portable, and I'm not really running anything but Perl and services on the debian side.</p>
<p>I use <a href="http://code.google.com/p/macvim/">all</a> <a href="http://git-scm.com">of</a> <a href="http://betterthangrep.com/">my</a> <a href="http://www.manpages.info/macosx/rm.1.html">familiar</a> <a href="http://oreilly.com/pub/h/73">tools</a> on OSX, and instantly run the code on Debian without needing to synchronize anything.</p>
<p>When I'm done, <tt>blah stop</tt> will shut down the VM cleanly.</p>
<p>Finally, as a bonus, my <a href="http://gist.github.com/621452">bash prompt</a> helps keep my confusion to a minimum when <tt>ssh</tt>ing all over the place.</p>Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-876358347971598886.post-90673025882937566392010-10-06T13:39:00.002+02:002010-10-06T13:42:58.496+02:00Hire Me<p>I'm starting my B.A. at Ben Gurion University (Linguistics & Philosophy), and I'm looking for part time work (1-2 days a week) either telecommuting or in or around Beer Sheva or Tel Aviv.</p>
<p>If you're looking for a developer with strong and diverse technical skills, who is able to work either independently or in a team, feel free to contact me.</p>
<p>My <a href="http://nothingmuch.woobling.org/cv.pdf">CV</a> is available on my website.</p>
<p>Note that I'm not really looking for contract work unless it may lead to part time employment as a salaried employee, as the overheads of being a freelancer are quite high (both financially and temporally).</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-876358347971598886.post-39422618655137033002010-09-19T19:52:00.002+02:002010-09-25T20:23:58.558+02:00Moose has won<a href="http://stevan-little.blogspot.com/">Stevan</a> has always characterized <a href="http://moose.perl.org/">Moose</a> as a <a href="http://en.wikipedia.org/wiki/Disruptive_technology">disruptive technology</a>.</p>
<p>Pre-Moose metaprogramming has a <a href="http://search.cpan.org/dist/Class-Eroot/">long history</a>, but you were pretty much stuck rolling your own metamodel back then.</p>
<p>Moose changed this by providing <em>extensible</em> class generation. It tries to create a metamodel in which several specialized metamodels can coexist and work together, even on the same class.</p>
<p>Case in point, a little over a week ago <a href="http://lumberjaph.net">Franck Cuny</a> announced his new <a href="http://lumberjaph.net/misc/2010/09/17/spore.html">SPORE</a> project.</p>
<p>SPORE aims to make using REST services much easier, by generating a lot of code to deal with the transport layer, presenting the data from the REST service using simple OO methods.</p>
<p>In context what's interesting about SPORE is the way that it leverages Moose to do that.<p>
<p>SPORE extends Moose's metamodel objects, specifically the <a href="http://search.cpan.org/perldoc?Class::MOP::Method">object that represents methods in a class</a>, to create the bridge between the appealing sugar layer (a simple 1:1 mapping between HTTP requests and method calls) and the underlying HTTP client.</p>
<p>Take a look at the <a href="http://github.com/franckcuny/net-http-spore/blob/master/lib/Net/HTTP/Spore/Meta/Method.pm"><tt>Net::HTTP::Spore::Meta::Method</tt></a> class. This is the essence of the sugar layer, bridging the REST client with the sleek OO interface.</p>
<p>Compared with <a href="http://search.cpan.org/perldoc?SOAP::Lite"><tt>SOAP::Lite</tt></a> (not that the comparison is very fair), SPORE is a far simpler implementation that offers more (e.g. middlewares), even if you ignore the parts of <tt>SOAP::Lite</tt> that don't apply to SPORE.</p>
<p>Moose made it viable to design such projects <a href="http://en.wikipedia.org/wiki/Worse_is_better">"properly"</a>, without inflating the scope of the project. In fact, using Moose like this usually <em>reduces</em> the amount of code dramatically.</p>
<p>Before Moose writing a REST toolkit with a similar metaclass based design would be overengineering a simple idea to death. The project would probably never be truly finished due to the competing areas of focus (the metamodel vs. the HTTP client vs. high level REST features).</p>
<p>The alternative design approach is a hand rolled stack that does the bare minimum required for each step. This might do the job, and probably gets finished on time, the code is inherently brittle. It's hard to reuse the different parts because they don't stand alone. Most pre-Moose metaprogramming on the CPAN falls into this category.</p>
<p>KiokuDB is another example. Without Moose it's actually quite useless, it can't deal with more than a <a href="http://search.cpan.org/perldoc?KiokuDB::TypeMap::Default">handful of classes</a> out of the box. Sure, you could specify the appropriate serialization for every class you want to store, but at that point the design just doesn't make sense anymore; the limitations would make it unusable in practice.</p>
<p>Being able to assume that Moose introspection would be available for <em>most</em> objects stored in the database allowed me to remove <em>all</em> guesswork from the serialization, while still providing an acceptable user experience (it's very rare to need a custom typemap entry in practice).
<p>This shortcut automatically reduced the scope of the project immensely, and allowed me to focus on the internals. The only thing that really separates KiokuDB from <a href="http://search.cpan.org/~flora/KiokuDB-0.49/lib/KiokuDB.pm#Prior_Art_on_the_CPAN">its predecessors</a> is that I could build on Moose.</p>
<p>I'm really glad to see how Moose has literally changed the way we approach this set of problems. The <a href="http://en.wikipedia.org/wiki/Worse_is_better">MIT approach</a> is now a sensible and pragmatic choice more often than before; or in other words we get a cleaner and more reusable CPAN for the same amount of effort.</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-876358347971598886.post-11948073760549279862010-07-07T02:31:00.000+03:002010-07-07T02:35:49.865+03:00Are we ready to ditch string errors?<p>I can't really figure out why I'm not in the habit of using exception objects. I seem to only reach for them when things are getting very complicated, instead of by default.</p>
<p>I can rationalize that they are better, but it just doesn't feel right to do this all the time.</p>
<p>I've been thinking about what possible reasons (perhaps based on misconceptions) are preventing me from using them more, but I'm also curious about others' opinions.</p>
<p>These are the trouble areas I've managed to think of:</p>
<ul>
<li>Perl's built in exceptions are strings, and everybody is already used to them. <sup><a href="#28EA80AB-646C-44BA-B89A-A31A5AE88D55" name="6D6C65BA-AB4C-4951-9A53-91020CA0B4F2">[1]</a></sup></li>
<li>There is no convention for inspecting error objects. Even <tt>->isa()</tt> is messy when the error could be a string or an object.<sup><a href="#06CCB1ED-4783-4AF9-BC64-65F8CFAA0C90" name="926CE3D5-27E8-4383-93FF-B748FF2BE0E4">[2]</a></sup></li>
<li>Defining error classes is a significant barrier, you need to stop, create a new file, etc. Conversely, universal error objects don't provide significant advantages over strings because they can't easily capture additional data apart from the message.<sup><a href="#7A9EED6C-4EF1-4948-A2D8-4C20FEAA5099" name="10766D4F-51A1-4A0F-84CE-01F511203A1E">[3]</a></sup></li>
<li>Context capture/reporting is finicky
<ul>
<li>There's no convention like <tt>croak</tt> for exception objects.</li>
<li>Where exception objects become useful (for discriminating between different errors), there are usually multiple contexts involved: the error construction, the initial <tt>die</tt>, and every time the error is rethrown is potentially relevant. Perl's builtin mechanism for string mangling is shitty, but at least it's well understood.</li>
<li>Exception objects sort of imply the formatting is partly the responsibility of the error catching code (i.e. full stack or not), whereas <tt>Carp</tt> and <tt>die $str</tt> leave it to the thrower to decide.</li>
<li>Using <tt>Carp::shortmess()</tt>, <tt>Devel::StrackTrace->new</tt> and other <tt>caller</tt> futzery to capture full information context is perceived as slow.<sup><a href="#26DAD4AC-7E14-45E5-B4F8-8A297FB646F4" name="AE3D33E3-2DA7-4FCE-8E43-EEBE069A55EA">[4]</a></sup></li>
</ul></li>
<li>Error instantiation is slower than string concatenation, especially if a string has to be concatenated for reporting anyway.<sup><a href="#3A24FB35-BD79-4B99-B5C2-8D4F93030199" name="429A68A7-6169-4057-BF9C-17FC6504D22C">[5]</a></sup></li>
</ul>
<small>
<p><sup><a name="28EA80AB-646C-44BA-B89A-A31A5AE88D55" href="#6D6C65BA-AB4C-4951-9A53-91020CA0B4F2">[1]</a></sup> I think the real problem is that most core errors worth discriminating are usually not thrown at all, but actually written to <tt>$!</tt> which can be compared as an error code (see also <tt>%!</tt> which makes this even easier, and <tt>autodie</tt> which adds an error hierarchy).</p>
<p>The errors that Perl itself <em>throws</em>, on the other hand, are usually not worth catching (typically they are programmer errors, except for a few well known ones like <tt>Can't locate Foo.pm in @INC</tt>).</p>
<p>Application level errors are a whole different matter though, they might be recoverable, some might need to be silenced while others pass through, etc.</p>
<p><sup><a name="06CCB1ED-4783-4AF9-BC64-65F8CFAA0C90" href="#926CE3D5-27E8-4383-93FF-B748FF2BE0E4">[2]</a></sup> <a href="http://search.cpan.org/perldoc?Exception::Class"><tt>Exception::Class</tt></a> has some precedent here, its <tt>caught</tt> method is designed to deal with unknown error values gracefully.</p>
<p><sup><a name="7A9EED6C-4EF1-4948-A2D8-4C20FEAA5099" href="#10766D4F-51A1-4A0F-84CE-01F511203A1E">[3]</a></sup> Again, <tt>Exception::Class</tt> has an elegant solution, adhoc class declarations in the <tt>use</tt> statement go a long way.</p>
<p><sup><a name="26DAD4AC-7E14-45E5-B4F8-8A297FB646F4" href="#AE3D33E3-2DA7-4FCE-8E43-EEBE069A55EA">[4]</a></sup> XS based stack capture could easily make this a non issue (just walk the <tt>cxstack</tt> and save pointers to the <tt>COP</tt>s of appropriate frames). Trace formatting is another matter.</p>
<p><sup><a name="3A24FB35-BD79-4B99-B5C2-8D4F93030199" href="#429A68A7-6169-4057-BF9C-17FC6504D22C">[5]</a></sup> I wrote a <a href="http://gist.github.com/465959">small benchmark</a> to try and put the various runtime costs in perspective.</p>
</small>
<h2>Solutions</h2>
<p>Here are a few ideas to address my concerns.</p>
<h2>A <tt>die</tt> replacement</h2>
<p>First, I see merit for an XS based error throwing module that captures a stack trace and the value of <tt>$@</tt> using a <tt>die</tt> replacement. The error info would be recorded in SV magic and would be available via an API.</p>
<p>This could easily be used on any exception object (but not strings, since SV magic is not transitive), without weird globals or something like that.</p>
<p>It could be mixed into any exception system by exporting <tt>die</tt>, overriding a <tt>throw</tt> method or even by setting <tt>CORE::GLOBAL::die</tt>.</p>
<p>A simple API to get caller information from the captured <tt>COP</tt> could provide all the important information that <tt>caller</tt> would, allowing existing error formatters to be reused easily.</p>
<p>This would solve any performance concerns by decoupling stack trace capturing from trace formatting, which is much more complicated.</p>
<p>The idea is that <tt>die</tt> would not merely throw the error, but also tag it with context info, that you could then extract.</p>
<p>Here's a bare bones example of how this might look:</p>
<pre class="fake-gist" id="fake-gist-466085">use MyAwesomeDie qw(die last_trace all_traces previous_error); # tentative
use Try::Tiny;
try {
die [ @some_values ]; # this is not CORE::die
} catch {
# gets data out of SV magic in $_
my $trace = last_trace($_);
# value of $@ just before dying
my $prev_error = previous_error($_);
# prints line 5 not line 15
# $trace probably quacks like Devel::StackTrace
die "Offending values: @$_" . $trace->as_string;
};</pre>
<p>And of course error classes could use it on <tt>$self</tt> inside higher level methods.</p>
<h2><a href="http://search.cpan.org/perldoc?Throwable::Error"><tt>Throwable::Error</tt></a> sugar</h2>
<p><tt>Exception::Class</tt> got many things right but a Moose based solution is just much more appropriate for this, since roles are very helpful for creating error taxonomies.</p>
<p>The only significant addition I would add make is having some sort of sugar layer to lazily build a <tt>message</tt> attribute using a simple string formatting DSL.</p>
<p>I previously thought <a href="http://search.cpan.org/perldoc?MooseX::Declare"><tt>MooseX::Declare</tt></a> would be necessary for something truly powerful, but I think that can be put on hold for a version 2.0.</p>
<h2>A library for exception formatting</h2>
<p>This hasn't got anything to do with the error <em>message</em>, that's the responsibility of each error class.</p>
<p>This would have to support all of the different styles of error printing we can have with error strings (i.e. <tt>die</tt>, <tt>croak</tt> with and without <tt>$Carp::Level</tt> futzing, <tt>confess</tt>...), but also allow recursively doing this for the whole error stack (previous values of <tt>$@</tt>).</p>
<p>Exposed as a role, the base API should complement <tt>Throwable::Error</tt> quite well.</p>
<p>Obviously the usefulness should extend beyond plain text, because the dealing with all that data is a task better suited for an IDE or a web app debug screen.</p>
<p>Therefore, things like code snippet extraction or other goodness might be nice to have in a plugin layer of some sort, but it should be easy to do this for errors of any kind, including strings (which means parsing as much info from <tt>Carp</tt> traces as possible).</p>
<h2>Better facilities for inspecting objects</h2>
<p><a href="http://search.cpan.org/perldoc?Check::ISA"><tt>Check::ISA</tt></a> tried to make it easy to figure out what object you are dealing with.</p>
<p>The problem is that it's ugly, it exports an <tt>inv</tt> routine instead of a more intuitive <tt>isa</tt>. It's now possible to go with <tt>isa</tt> as long as <a href="http://search.cpan.org/perldoc?namespace::clean"><tt>namespace::clean</tt></a> is used to remove so it's not accidentally called as a method.</p>
<p>Its second problem is that it's slow, but it's very easy to make it comparable with the totally wrong <tt>UNIVERSAL::isa($obj, "foo")</tt> in performance by implementing XS acceleration.</p>
<h2>Conclusion</h2>
<p>It seems to me if I had those things I would have no more excuses for not using exception objects by default.</p>
<p>Did I miss anything?</p>Unknownnoreply@blogger.com13tag:blogger.com,1999:blog-876358347971598886.post-49808627935809583932010-07-06T16:55:00.005+03:002010-07-07T04:13:40.998+03:00KiokuDB's Leak Tracking<p>Perl uses <a href="http://en.wikipedia.org/wiki/Reference_counting">reference counting</a> to manage memory. This means that when you create circular structures this causes leaks.</p>
<p>Cycles are often avoidable in practice, but backreferences can be a huge simplification when modeling relationships between objects.</p>
<p>For this reason <a href="http://search.cpan.org/perldoc?Scalar::Util"><tt>Scalar::Util</tt></a> exports the <tt>weaken</tt> function, which can demote a reference so that its referencing doesn't add to the reference count of the referent.</p>
<p>Since cycles are very common in persisted data (because there are many potential entry points in the data), KiokuDB works hard to support them, but it can't weaken cycles for you and prevent them from leaking.</p>
<p>Apart from the waste of memory, there is another major problem.</p>
<p>When objects are leaked, they remain tracked by KiokuDB so you might see stale data in a multi worker style environment (i.e. preforked web servers).</p>
<p>The new <tt>leak_tracker</tt> attribute takes a code reference which is invoked with the list of leaked objects when the last live object scope dies.</p>
<p>This can be used to report leaks, to break cycles, or whatever.</p>
<p>The other addition, the <tt>clear_leaks</tt> attribute allows you to work around the second problem by forcibly unregistering leaked objects.</p>
<p>This completely negates the effect of live object caching and doesn't solve the memory leak, but guarantees you'll see fresh data (without needing to call <tt>refresh</tt>).</p>
<pre class="fake-gist" id="fake-gist-465439">my $dir = KiokuDB->connect(
$dsn,
# this coerces into a new object
live_objects => {
clear_leaks => 1,
leak_tracker => sub {
my @leaked = @_;
warn "leaked " . scalar(@leaked) . " objects";
# try to mop up.
use Data::Structure::Util qw(circular_off);
circular_off($_) for @leaked;
}
}
);</pre>
<p>These options were both refactored out of <a href="http://search.cpan.org/perldoc?Catalyst::Model::KiokuDB"><tt>Catalyst::Model::KiokuDB</tt></a>.</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-876358347971598886.post-55124750256295138672010-07-02T17:38:00.003+03:002010-07-03T16:25:28.316+03:00Why another caching module?<p>In the last post I namedropped <a href="http://search.cpan.org/perldoc?Cache::Ref"><tt>Cache::Ref</tt></a>. I should explain why I wrote yet another <tt>Cache::</tt> module.</p>
<p>On the CPAN most <a href="http://search.cpan.org/search?m=all&q=cache">caching modules</a> are concerned with caching data in a way that can be used across process boundaries (for example on subsequent invocations of the same program, or to share data between workers).</p>
<p>Persistent caching behaves more like on disk databases (like a DBM, or a directory of files), <tt>Cache::Ref</tt> is like an in memory hash with size limiting:</p>
<pre class="fake-gist" id="fake-gist-462562">my %cache;
sub get { $cache{$_[0]} }
sub set {
my ( $key, $value ) = @_;
if ( keys %cache > $some_limit ) {
... # delete a key from %cache
}
$cache{$key} = $value; # not a copy, just a shared reference
}</pre>
<p>The different submodules in <tt>Cache::Ref</tt> are pretty faithful implementations of algorithms originally intended for <a href="http://en.wikipedia.org/wiki/Page_replacement_algorithm">virtual memory applications</a>, and is therefore appropriate for when the cache is memory resident.</p>
<p>The goal of these algorithms is to try and choose the most appropriate key to delete quickly and without storing too much information about the key, or requiring costly updates on metadata during a cache hit.</p>
<p>This also means less control, for example there is no temporal expiry (i.e. cache something for <tt>$x</tt> seconds).</p>
<p>If most of CPAN is concerned with <a href="http://en.wikipedia.org/wiki/Memory_hierarchy">L5 caching</a>, then <tt>Cache::Ref</tt> tries to address L4.</p>
<p>High level interfaces like <a href="http://search.cpan.org/perldoc?CHI"><tt>CHI</tt></a> make persistent caching easy and consistent, but seem to add memory only caching as a sort of an afterthought, with most of the abstractions being appropriate for long term, large scale storage.</p>
<p>Lastly, you can use <a href="http://search.cpan.org/perldoc?Cache::Cascade"><tt>Cache::Cascade</tt></a> to create a multi level cache hierarchy. This is similar to CHI's <tt>l1_cache</tt> attribute, but you can have multiple levels and you can mix and match any cache implementation that uses the same basic API.</p>Unknownnoreply@blogger.com4tag:blogger.com,1999:blog-876358347971598886.post-47895500712805655942010-07-01T21:17:00.004+03:002010-07-02T00:13:21.169+03:00KiokuDB's Immutable Object Cache<p><a href="http://www.iinteractive.com/kiokudb">KiokuDB</a> 0.46 added integration with <a href="http://search.cpan.org/perldoc?Cache::Ref"><tt>Cache::Ref</tt></a>.</p>
<p>To enable it just cargo cult this little snippet:</p>
<pre class="fake-gist" id="fake-gist-460379">my $dir = KiokuDB->connect(
$dsn,
live_objects => {
cache => Cache::Ref::CART->new( size => 1024 ),
},
);</pre>
<p>To mark a Moose based object as cacheable, include the <a href="http://search.cpan.org/perldoc?KiokuDB::Role::Immutable::Transitive"><tt>KiokuDB::Role::Immutable::Transitive</tt></a>
role.</p>
<p>Depending on the cache's mood, some of those cacheable objects may
survive even after the last live object scope has been destroyed.</p>
<p>Immutable data has the benefit of being cacheable without needing to worry
about updates or stale data, so the data you get from <tt>lookup</tt> will
always be consistent, it just might come back faster in some cases.</p>
<p>Just make sure they don't point at any data that can't be cached (that's
treated as a leak), and you should notice significant performance improvements.</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-876358347971598886.post-31851248755705370462010-06-28T10:00:00.001+03:002010-06-28T17:29:10.886+03:00KiokuDB for DBIC Users<p>This is the top loaded tl;dr version of <a href="http://blog.woobling.org/2010/03/what-is-mixed-schema.html">the previous
post on KiokuDB+DBIC</a>, optimized for current <a href="http://search.cpan.org/perldoc?DBIx::Class"><tt>DBIx::Class</tt></a> users who are also KiokuDB non-believers ;-)</p>
<p>If you feel you know the answer to an <tt><h2></tt>, feel free to skip it.</p>
<h2>WTF KiokuDB?</h2>
<p><a href="http://www.iinteractive.com/kiokudb">KiokuDB</a> implements
persistent object graphs. It works at the same layer as an ORM in that it maps
between an in memory representation of objects and a persistent one.</p>
<p>Unlike an ORM, where the focus is to faithfully map between relational
schemas and an object oriented representation, KiokuDB's main priority is to allow you to
store objects freely with as few restrictions as possible.</p>
<p>KiokuDB provides a different trade-off than ORMs.</p>
<p>By compromising control over the precise storage
details you gain the ability to easily store almost any data structure you can
create in memory.<a href="#16B9B9BB-1561-4070-AA89-DBA2D7802426" name="6950B821-0E21-44D5-B623-263297DB89B2"><sup>[1]</sup></a>.</p>
<h2>Why should I care?</h2>
<p>Here's a concrete example.</p>
<p>Suppose you have a web application with several types of browsable model objects (e.g. pictures, user profiles, whatever), all of which
users can mark as favourites so they can quickly find them later.</p>
<p>In a relational schema you'd need to to query a link table for each possible
type, and also take care of setting these up in the schema. When marking an
item as a favourite you'd need to check what type it is, and add it to the
correct relationship.</p>
<p>Every time you add a new item type you also need to edit the favourite
management code to support that new item.</p>
<p>On the other hand, a <a href="http://search.cpan.org/perldoc?KiokuDB::Set"><tt>KiokuDB::Set</tt></a>
of items can simply contain a mixed set of items of any type. There's no setup
or configuration, and you don't have to predeclare anything. This eliminates a
lot of boilerplate.</p>
<p>Simply add a <tt>favourite_items</tt> KiokuDB column to the user, which
contains that set, and use it like this:</p>
<pre class="fake-gist" id="fake-gist-455522"># mark an item as a favourite
# $object can be a DBIC row or a KiokuDB object
$user->favourite_items->insert($object);
$user->update;
# get the list of favourites:
my @favs = $user->favourite_items->members;
# check if an item is a favourite:
if ( $user->favourite_items->includes($object) ) {
...
}</pre>
<p>As a bonus, since there's less boilerplate this code can be more generic/reusable.</p>
<h2>How do I use it?</h2>
<p>First off, at least skim through <a href="http://search.cpan.org/perldoc?KiokuDB::Tutorial"><tt>KiokuDB::Tutorial</tt></a>
to familiarize yourself with the basic usage.</p>
<p>In the context of this article you can think of KiokuDB as a <a href="http://search.cpan.org/perldoc?DBIx::Class::Manual::Component">DBIC component</a>
that adds OODBMs features to your relational schema, as a sort of auxiliary data dumpster.</p>
<p>To start mixing KiokuDB objects into your DBIC schema, create a column that
can contain these objects using <a href="http://search.cpan.org/perldoc?DBIx::Class::Schema::KiokuDB"><tt>DBIx::Class::Schema::KiokuDB</tt></a>:
</p>
<pre class="fake-gist" id="fake-gist-455523">package MyApp::Schema::Result::Foo;
use base qw(DBIx::Class::Core);
__PACKAGE__->load_components(qw(KiokuDB));
__PACKAGE__->kiokudb_column('object');</pre>
<p>See the documentation for the rest of the boilerplate, including how to get
the <tt>$kiokudb</tt> handle used in the examples below.</p>
<p>In this column you can now store an object of any class. This is like a
delegation based approach to a problem typically solved using something like
<a href="http://search.cpan.org/perldoc?DBIx::Class::DynamicSubclass"><tt>DBIx::Class::DynamicSubclass</tt></a>.</p>
<pre class="fake-gist" id="fake-gist-455524">my $rs = $schema->resultset("Foo");
my $row = $rs->find($primary_key);
$row->object( SomeClass->new( ... ) );
# 'store' is a convenience method, it's like insert_or_update
$row->object in KiokuDB
$row->store;</pre>
<p>You can go the other way, too:</p>
<pre class="fake-gist" id="fake-gist-455525">my $obj = SomeClass->new(
some_delegate => $row,
);
my $id = $kiokudb->insert($obj);</pre>
<p>And it even works for storing result sets:</p>
<pre class="fake-gist" id="fake-gist-455526">use Foo;
my $rs = $schema->resultset("Foo")->search( ... );
my $obj = Foo->new(
some_resultset => $rs,
);
my $id = $kiokudb->insert($obj);</pre>
<p>So you can freely model ad-hoc relationships to your liking.</p>
<p>Mixing and matching KiokuDB and DBIC still lets you obsess over the storage
details like you're used to with DBIC.</p>
<p>However, the key idea here is that you don't need to do that all the time.</p>
<p>For example, you can rapidly prototype a schema change before writing the
full relational model for it in a final version.</p>
<p>Or maybe you need to preserve an intricate in memory data structure (like
cycles, tied structures, or closures).</p>
<p>Or perhaps for some parts of the schema you simply don't need to
search/sort/aggregate. You will probably discover parts of your schema are
inherently a good fit for graph based storage.</p>
<p>KiokuDB complements DBIC well in all of those areas.</p>
<h2>How is KiokuDB different?</h2>
<p>There are two main things that traditional ORMs don't do easily, but that
KiokuDB does.</p>
<p>First, collections of objects in KiokuDB can be heterogeneous.</p>
<p>At the representation level the lowest common denominator for any two
arbitrary object might be nothing at all. This makes it hard to store objects
of different types in the same relational table.</p>
<p>In object oriented design it's the <em>interface</em> that matters, not the
representation. Conversely, in a relational database <em>only</em> the
representation (the columns) matters, database rows have no interfaces.</p>
<p>Second, In an graph based object database the key of an object in the database
should only be associated with a single object in memory, but in an ORM this
feature isn't necessarily desirable:</p>
<ul>
<li>It doesn't interact well with bulk fetches (for instance suppose a
<tt>SELECT</tt> query fetches a collection of objects, some of which are
already in memory. Should the fetched data be ignored? Should the primary
keys of the already live objects be filtered out of the query?)</li>
<li>It requires additional APIs to control this tracking behavior
(KiokuDB's <tt>new_scope</tt> stuff)</li>
</ul>
<p>In the interests of flexibility and simplicity, <tt>DBIx::Class</tt> simply
stays out of the way as far as managing inflated object (with one exception
being result prefetched and cached resultsets). Whenever a query is is issued
you're getting fresh every time.</p>
<p>KiokuDB <em>does</em> track references and provides a stable mapping between
reference addresses and primary keys for the subset of objects that it
manages.</p>
<h2>What sucks about KiokuDB?</h2>
<p>It's harder to search, sort and aggregate KiokuDB objects. But you already
know a good ORM that can do those bits ;-)</p>
<p>By letting the storage layer in on your object representation you allow the
database to help you in ways that it can't if the data is opaque.</p>
<p>Of course, this is precisely where it makes sense to just create a
relational table, because <tt>DBIx::Class</tt> does those things very well.</p>
<h2>Why now?</h2>
<p>Previously you could use KiokuDB and <tt>DBIx::Class</tt> in the same
application, but the data was kept separate.</p>
<p>Starting with <a href="http://search.cpan.org/perldoc?KiokuDB::Backend::DBI"><tt>KiokuDB::Backend::DBI</tt></a>
version 1.11 you can store part of your model as relational data using
<tt>DBIx::Class</tt> and rest in KiokuDB.</p>
<p><a name="16B9B9BB-1561-4070-AA89-DBA2D7802426" href="#6950B821-0E21-44D5-B623-263297DB89B2"><sup>[1]</sup></a> You still get
full control over serialization if you want, using <a href="http://search.cpan.org/perldoc?KiokuDB::TypeMap"><tt>KiokuDB::TypeMap</tt></a>,
but that is completely optional, and most of the time there's no point in doing
that anyway, you already know how to do that with other tools.</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-876358347971598886.post-59552000027371729932010-06-27T19:24:00.001+03:002010-06-27T21:05:41.736+03:00KiokuDB 0.46<p><a href="http://perldition.org/">rafl</a> and I have just uploaded <a href="http://search.cpan.org/perldoc?KiokuDB::Backend::DBI"><tt>KiokuDB::Backend::DBI</tt></a> version <a href="http://search.cpan.org/dist/KiokuDB-Backend-DBI-1.11">1.11</a> and <a href="http://www.iinteractive.com/kiokudb/">KiokuDB</a> version <a href="http://search.cpan.org/dist/KiokuDB-0.46">0.46</a>.<p>
<p>These are major releases of both modules, and I will post at length on each of these new features in the coming days:</p>
<ul>
<li><a href="http://search.cpan.org/perldoc?Cache::Ref">Caching</a> live instances of <a href="http://search.cpan.org/perldoc?KiokuDB::Role::Immutable::Transitive">immutable objects</a>. For data models which favour immutability this should provide significant speedups with minimal code changes and no change in semantics.</li>
<li>Leak tracking is now in core. This was previously only available in <a href="http://search.cpan.org/perldoc?Catalyst::Model::KiokuDB"><tt>Catalyst::Model::KiokuDB</tt></a>.</li>
<li><tt>KiokuDB::Entry</tt> objects can be discarded after use to save memory (until now they were always kept around for as long as the object was still live)</li>
<li>Integration between KiokuDB managed objects and <tt>DBIx::Class</tt> managed rows, allowing for mixed relational/graph schemas as in <a href="http://cpansearch.perl.org/src/FLORA/KiokuDB-Backend-DBI-1.11/examples/job_queue.pl">this job queue example</a>.</li>
</ul>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-876358347971598886.post-48195371667690865082010-06-18T04:08:00.005+03:002010-06-18T04:35:18.615+03:00I hate software<p>A <a href="http://matrix.cpantesters.org/?dist=Directory-Transactional+0.08">long standing bug</a> in <a href="http://search.cpan.org/perldoc?Directory::Transactional"><tt>Directory::Transactional</tt></a> has finally been fixed.</p>
<p>Evidently, universally unique identifiers are only unique as long as the entire universe is contained within a single UNIX process, at least as far as <tt>e2fsprogs</tt>' <tt>libuuid</tt> is concerned.</p>
<p>These "unique" strings were used to create names for transaction work directories, so when they in fact turned out to be <strong>the same fucking strings</strong> across forks, the two processes would overwrite each others' private data.</p>
<p><a href="http://www.google.com/search?hl=en&q=man+uuid(3)&btnI=I'm+Feeling+Lucky"><tt>uuid(3)</tt></a> doesn't even contain any information on how to reseed it even if I would bother checking for that myself.</p>
<p>I simply cannot fathom how a pseudorandom number generator is being used for such a library without taking forking into account. Isn't this stuff supposed to be reliable?</p>Unknownnoreply@blogger.com6tag:blogger.com,1999:blog-876358347971598886.post-1954562210794336682010-05-19T22:53:00.004+03:002010-05-22T04:04:44.747+03:00VOTE TRANSPARENT<center>
<a href="http://shadowcat.co.uk/blog/matt-s-trout/iron-man-lost?colour=transparent">
<img src="http://img.skitch.com/20100519-jy3h8chcxmpcei36bnfkdcp6fm.jpg" width="500" />
</a>
</center>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-876358347971598886.post-46817570461118759732010-04-23T03:46:00.008+03:002010-04-23T05:27:52.528+03:00Where are the open edges?<p>Forgive my cynicism, but where are the edges in the <a href="http://opengraphprotocol.org/">Open Graph Protocol</a>?</p>
<p>As far as I can tell Facebook's graph has two vertex types, people, and things. The edges go <a href="http://developers.facebook.com/docs/reference/plugins/like">between people and things</a> and between people and other people (i.e. friends).</p>
<p>Facebook rightfully requires authorization to access the other parts of the graph through their API (the data is private, after all), but what bothers me is that there's no way to describe a graph of your own, or share it with anyone else.<sup><a href="#64296E78-054B-4898-A639-A6EB77971BD3" name="AAF2FC15-7A08-4EC6-AD7C-6360DE1E6F83">[1]</a></sup></p>
<p>In more practical terms, in this supposed graph specification there's no way to link to an <tt>og:url</tt> from my homepage saying that I like that thing (or maybe dislike, or have any other connection to it).</p>
<p>As a producer of "things", if I tell those things' <tt>og:type</tt> to the internet, my customers can "Like" my <tt>og:type</tt>. And then I can contact those customers (apparently for free), and presumably later pay Facebook to tell their friends about that thing more often. And there are a few <a href="http://developers.facebook.com/plugins">other perks</a>.</p>
<p>I get that Facebook is just trying to run an advertisement business, but why sell it as some hippy Open thing? Sure, part of the data is open, but the real graphyness is in the <tt>href</tt>s, which are still proprietary.</p>
<p>Illustrating my point, they reinvent <a href="http://microformats.org/wiki/hcard">hCards</a> with a new XML namespace, instead of supporting hCards in their system. People are even less likely to adopt microformats if in order to use them they must add redundant data formats to their pages. It's not about the data being open to everyone, it's about the data being open to Facebook.</p>
<p><sup><a name="64296E78-054B-4898-A639-A6EB77971BD3" href="#AAF2FC15-7A08-4EC6-AD7C-6360DE1E6F83">[1]</a></sup> The semantic web's immense success notwithstanding.</p>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-876358347971598886.post-88023700142865571642010-03-18T15:19:00.006+02:002010-03-19T17:39:34.026+02:00What is a mixed schema?<p><a href="http://blog.woobling.org/2010/03/kiokudb-dbixclass.html">Yesterday's post</a> is a technical one that says that KiokuDB and DBIx::Class can now be used together on the same schema. What it doesn't explain is what this is actually good for.</p>
<p>Most of the application development we do involves <a href="http://en.wikipedia.org/wiki/Online_transaction_processing">OLTP</a> in one form or another. Some of the apps also do reporting on simple, highly regular data.</p>
<p><a href="http://www.iinteractive.com/kiokudb">KiokuDB</a> grew out of the need to simplify the type of task we do most often. For the reporting side we are still trying to figure out what we like best. For example <a href="http://stevan-little.blogspot.com">Stevan</a> has been experimenting with <a href="http://search.cpan.org/perldoc?Fey"><tt>Fey</tt></a> (not <a href="http://search.cpan.org/perldoc?Fey::ORM"><tt>Fey::ORM</tt></a>) for the purely relational data.</p>
<p>This approach has been far superior to what we had done before: forcing a loosely constructed, polymorphic set of objects with no reporting requirements into a normalized relational schema that's optimized for reporting applications. There is a also new, worse alternative, which is to run aggregate reports on several million data points as in memory objects with Perl ;-)</p>
<p>However, the two pronged approach still has a major drawback: the two data sets are completely separate. There is no way to refer to data in the two sets without embedding knowlege about the database handles into the domain, which is tedious and annoying.</p>
<p>What the new <a href="http://search.cpan.org/perldoc?DBIx::Class"><tt>DBIx::Class</tt></a> integration allows is to bridge that gap.</p>
<h2>Concrete Example #1: Mixing KiokuDB into a DBIC centric app</h2>
<p>Often times I would find myself making compromises about what sort of objects I put into a relational schema.</p>
<p>There is a tension between <a href="http://en.wikipedia.org/wiki/Polymorphism_in_object-oriented_programming">polymorphic</a> <a href="">graphs</a> of objects and <a href="http://en.wikipedia.org/wiki/Database_normalization">normalizd</a> relational schema.</p>
<p>Suppose you're writing an image gallery application, and you decide to add support for <a href="http://www.youtube.com/">YouTube</a> videos. Obviously YouTube videos should be treated as image objects in the UI, they should tile with the images, you should be able to rearrange them, add captions/tags, post comments, etc.</p>
<p>This is precisely where polymorphism makes sense, you have two types of things that being used in a single context, but with a completely different representation. One is probably represented by a collection of files on disk, for the original image, previews, thumbnails, etc, and table entry of metadata. The other is represented by an opaque string ID, and most of its functionality is derived by generating calls to a web service.</p>
<p>How do you put YouTube videos into your <tt>image</tt> table? Do you add a <tt>type</tt> column? What about a <tt>resource</tt> table that has <tt>NULL</tt>able foreign keys to the <tt>image</tt> table and a <tt>NULL</tt>able <tt>video_id</tt> column? What about a <tt>blob</tt> column containing serialized information about the data?</p>
<p>With a mixed schema you could create a <tt>resource</tt> table that has a foreign key to the KiokuDB entries table. You could use the resources table for things like random selection, searches, keeping track of views counts, etc.</p>
<p>I'm going to assume that you're not really interested on running reports on which characters show up most often in the YouTube video IDs or what is the average length of image filenames, so that data can be opaque without compromising any features in your application.</p>
<p>On a technical level this is is similar to using a serialized blob column approach, or some combination of <a href="http://search.cpan.org/perldoc?DBIx::Class::DynamicSubclass"><tt>DBIx::Class::DynamicSubclass</tt></a> and <a href="http://search.cpan.org/perldoc?DBIx::Class::FrozenColumns"><tt>DBIx::Class::FrozenColumns</tt></a>.</p>
<p>However, by using KiokuDB these objects become first class citizens in your schema, instead of some "extra" data that is tacked on to a data row. You get a proper API for retrieving and updating real graphs of objects, much more powerful and automatable serialization, a large number of standard modules that are supported out of the box, etc.</p>
<p>Perhaps most importantly, the encapsulation and notion of identity is maintained. You can share data between objects, and that data sharing is reflected consistently in memory. You can implement your <tt>MyApp::Resource::YouTubeVideo</tt> and <tt>MyApp::Resource::Image</tt> without worrying about mapping columns, or weird interactions with <a href="http://search.cpan.org/perldoc?Storable"><tt>Storable</tt></a>. That, to me, is the most liberating part of using KiokuDB.</p>
<h2>Concrete Example #2: Mixing DBIC into a KiokuDB centric app</h2>
<p>On the other side of the spectrum (of our apps, anyway) you'll find data models that are just too complicated to put into a relational schema easily; there are mixed data types all over the place, complex networks of data (we've put trees, graphs, DAGs, and other structures, sometimes all in a single app), and other things that are incredibly useful for rapid prototyping or complicated processing.</p>
<p>This usually all works great until you need an aggregate data type at some point. That's when things fall apart. <a href="http://search.cpan.org/perldoc?Search::GIN"><tt>Search::GIN</tt></a> is not nearly as feature complete as I hoped it would be right now, in fact, it's barely a draft of a prototype. The <a href="http://search.cpan.org/perldoc?KiokuDB::Backend::DBI">DBI Backend</a>'s column extraction is a fantastically useful hack, but it's still just a hack at heart.</p>
<p>But now we can freely refer to DBIC rows and resultsets just like we can in memory, from our OO schema, to help with these tasks.</p>
<p>One of our apps used a linked list to represent a changelog of an object graph, somewhat similarly to Git's object store. After a few months of deployment, we got a performance issue from a client, a specific page was taking about 30 seconds to load. It turned out that normally only the last few revisions had to be queried, but on that specific cases a pathological data construction meant that over a thousand revisions were loaded from the database and had their data analyzed. Since this linked list structure is opaque, this was literally hitting the database thousands of times in a single request.</p>
<p>I ended up using a crude cache to memoize some of the predicates, which let us just skip directly to the revision that had to be displayed.</p>
<p>With the new features in the DBI backend I could simply create a table of revision containers (I would still need to store revisions in KiokuDB, because there were about 6 different revision types), on which I could do the entire operation with one select statement.</p>
<p>Conceptually you can consider the DBIC result set as just an object oriented collection type. It's like any other object in KiokuDB, except that its data is backed by a much smarter representation than a serialized blob, the underlying data store understands it and can query its contents easily and efficiently. The drawback is that it requires some configuration, and it can only contain objects of the same data type, but these are very reasonable limitations, after all we've been living with them for years.</p>
<p>It's all a bit like writing a custom typemap entry to better represent your data to the backend. In fact, this is pretty much exactly what I did to implement the feature ;-)</p>
<p>This still requires making the effort to define a relational schema, but only where you need it, and only for data that make sense in a relational setting anyway. And it's probably less effort than writing a custom typemap to create a scalable/queriable collection type.</p>
<h2>Conclusion</h2>
<p>Though still far from perfect, I feel that this really brings KiokuDB into a new level of usefulness; you no longer need to drink the kool aid and sacrifice a powerful tool and methodology you already know.</p>
<p>Even though DBIC is not <em>everyone</em>'s tool of choice and has its own drawbacks, I feel that is by far the most popular Perl ORM for a reason, which is why I chose to build on it. However, there's no reason why this approach can't be used for other backend types.</p>
<p>Eventually I'd like to be able to see similar typemaps emerge for other backends. For example <a href="http://search.cpan.org/perldoc?KiokuDB::Backend::Redis">the Redis backend</a> could support <a href="http://code.google.com/p/redis/">Redis'</a> different <a href="http://code.google.com/p/redis/wiki/IntroductionToRedisDataTypes">data types</a>, <a href="http://couchdb.apache.org/">CouchDB</a> has design documents and views, and <a href="http://riak.basho.com/">riak</a>'s MapReduce jobs and queries (<a href="http://lumberjaph.net/blog/">Franck</a>'s backend is <a href="http://github.com/franckcuny/kiokudb-backend-riak">on GitHub</a>) could all be reflected as "just objects" that can coexist with other data in a KiokuDB object graph.</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-876358347971598886.post-82226940149514551992010-03-17T16:42:00.006+02:002010-03-18T15:37:50.087+02:00KiokuDB ♡ DBIx::Class<p>I just added a feature to <a href="http://www.iinteractive.com/kiokudb">KiokuDB</a>'s <a href="http://github.com/nothingmuch/kiokudb-backend-dbi/">DBI</a> backend that allows freely mixing <a href="http://search.cpan.org/dist/DBIx-Class/"><tt>DBIx::Class</tt></a> objects.</p>
<p>This resolves KiokuDB's limitations with respect to sorting, aggregating and querying by letting you use <tt>DBIx::Class</tt> for those objects, while still giving you KiokuDB's flexible schema for everything else.</p>
<p>The first part of this is that you can refer to <tt>DBIx::Class</tt> row objects from the objects stored in KiokuDB:</p>
<pre class="fake-gist" id="fake-gist-335340">my $dbic_object = $resultset->find($primary_key);
$dir->insert(
some_id => Some::Object->new( some_attr => $dbic_object ),
);</pre>
<p>The second half is that relational objects managed by <tt>DBIx::Class</tt> can specify <tt>belongs_to</tt> type relationships (i.e. an inflated column) to any object in the KiokuDB <tt>entries</tt> table:</p>
<pre class="fake-gist" id="fake-gist-335341">my $row = $rs->create({ name => "blah", object => $anything );
$row->insert;
say "Inserted ID for KiokuDB object: ",
$dir->object_to_id($row->object);</pre>
To set things up you need to tell <tt>DBIx::Class</tt> about KiokuDB:
<pre class="fake-gist" id="fake-gist-335342">package MyApp::Schema;
use base qw(DBIx::Class::Schema);
# load the KiokuDB schema component
# which adds the extra result sources
__PACKAGE__->load_components(qw(Schema::KiokuDB));
__PACKAGE__->load_namespaces;
package MyApp::Schema::Result::Foo;
use base qw(DBIx::Class);
# load the KiokuDB component:
__PACKAGE__->load_components(qw(Core KiokuDB));
# do the normal stuff
__PACKAGE__->table('foo');
__PACKAGE__->add_columns(qw(id name object));
__PACKAGE__->set_primary_key('id');
# setup a relationship column:
__PACKAGE__->kiokudb_column('object');
# connect both together
my $dir = KiokuDB->connect(
dsn => "dbi:SQLite:dbname=blah",
schema_proto => "MyApp::Schema",
);
my $schema = $dir->backend->schema;
# then you can do some work:
$dir->txn_do( scope => 1, body => sub {
my $rs = $schema->resultset("Foo");
my $obj = $rs->find($primary_key)->object;
$obj->change_something($something_else);
$dir->update($obj);
});</pre>
<p>There are still a few missing features, and this is probably not production ready, but please try it out! <strike>A dev release will be out once I've documented it.</strike> <a href="http://search.cpan.org/dist/KiokuDB-Backend-DBI-1.11_01/">KiokuDB::Backend::DBI 0.11_01</a>.</p>
<p>In the future I hope to match all of <a href="http://search.cpan.org/perldoc?Tangram"><tt>Tangram</tt></a>'s features, enabling truly hybrid schemas. This would mean that KiokuDB could store objects in more than one table, with objects having any mixture of properly typed, normalized columns, opaque data BLOBs, or something in between (a bit like <a href="http://search.cpan.org/perldoc?DBIx::Class::DynamicSubclass"><tt>DBIx::Class::DynamicSubclass</tt></a> and <a href="http://search.cpan.org/perldoc?DBIx::Class::FrozenColumns"><tt>DBIx::Class::FrozenColumns</tt></a>, but with more flexibility and less setup).</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-876358347971598886.post-26449834276714330412010-03-14T05:31:00.009+02:002010-03-14T05:40:58.850+02:00git snapshot<span style="font-style:italic;"></span><p>I've just uploaded a new tool, <a href="http://github.com/nothingmuch/git-snapshot/">git snapshot</a>, which lets you routinely capture snapshots of your working directory, and records them in parallel to your explicitly recorded history.</p>
<p>The snapshot revisions stay out of the way for the most part, but if you need to view them you can look at them, for example using <tt><a href="http://gitx.frim.nl/">gitx</a> refs/snapshots/HEAD</tt></p>
<p>For me this is primarily useful when I'm sketching out a new project and forgetting to commit anything. When working on a large patch I usually use <tt>git commit -a --amend -C HEAD</tt> fairly often, which in conjunction with <tt>git <a href="http://www.gitready.com/intermediate/2009/02/09/reflog-your-safety-net.html">reflog</a></tt> provides similar safety. However, <tt>git snapshot</tt> is designed to work well in either scenario.</p>
<p>I have a crontab set up to use <a href="http://developer.apple.com/mac/library/DOCUMENTATION/Darwin/Reference/ManPages/man1/mdfind.1.htmlA">mdfind</a> so that all directories with the red label are snapshotted once an hour.</p>Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-876358347971598886.post-23994560880634442722010-03-03T20:15:00.001+02:002010-03-05T17:35:18.180+02:00KiokuDB Introduces Schema Versioning<p>I've just released <a href="http://search.cpan.org/dist/KiokuDB">KiokuDB</a> version <a href="http://cpansearch.perl.org/src/NUFFIN/KiokuDB-0.37/Changes">0.37</a>, which introduces class versioning.</p>
<p>This feature is <em>disabled</em> by default to avoid introducing errors to existing schemas<sup><a href="#0582E125-4992-4913-A217-C1680E067A1B" name="71CDC4C9-3810-461B-A72C-F95B42BE9D78">[1]</a></sup>. To try it out pass <tt>check_class_versions => 1</tt> to <tt>connect</tt>:</p>
<pre class="fake-gist" id="fake-gist-320734">KiokuDB->connect(
dsn => ...,
check_class_versions => 1,
);</pre>
<p>To use this feature, whenever you make an incompatible change to a class, also change the <tt>$VERSION</tt>. When KiokuDB tries to load an object that has been stored before the change was made, the version mismatch is detected (versions are only compared as strings, there is no meaning to the values).</p>
<p>Without any configuration this mismatch will result in an error at load time, but the <a href="http://search.cpan.org/perldoc?KiokuDB::Role::Upgrade::Handlers::Table"><tt>KiokuDB::Role::Upgrade::Handlers::Table</tt></a> role allows you to declaratively add upgrade handlers to your classes:</p>
<pre class="fake-gist" id="fake-gist-320735">package Foo;
use Moose;
with qw(KiokuDB::Role::Upgrade::Handlers::Table);
use constant kiokudb_upgrade_handlers_table => {
# we can mark versions as being equivalent in terms of their
# data. 0.01 to 0.02 may have introduced an incompatible API
# change, but the stored data should be compatible
"0.01" => "0.02",
# on the other hand, after 0.02 there may have been an
# incompatible data change, so we need to convert
"0.02" => sub {
my ( $self, %args ) = @_;
return $args{entry}->derive(
class_version => our $VERSION, # up to date version
data => ..., # converted entry data
);
},
};</pre>
<p>For more details see the documentation, especially <a href="http://search.cpan.org/perldoc?KiokuDB::TypeMap::Entry::MOP"><tt>KiokuDB::TypeMap::Entry::MOP</tt></a>.</p>
<p><sup><a name="0582E125-4992-4913-A217-C1680E067A1B" href="#71CDC4C9-3810-461B-A72C-F95B42BE9D78">[1]</a></sup> In the future this might be enabled by default, but when data without any version information is found in the database it is assumed to be up to date.</p>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-876358347971598886.post-61255960332179462572010-02-01T16:27:00.002+02:002010-02-01T18:10:48.950+02:00$obj->blessed<p>I've been meaning to write about this gotcha for a long time, but somehow forgot. This was actually an undiscovered bug in Moose for several <em>years</em>:</p>
<pre class="fake-gist" id="fake-gist-291746">use strict;
use warnings;
use Test::More;
use Try::Tiny qw(try);
{
package Foo;
use Scalar::Util qw(blessed);
sub new { bless {}, $_[0] }
}
my $foo = Foo->new;
is( try { blessed($foo) }, undef );
is( try { blessed $foo }, undef );
done_testing;</pre>
<p>The first test passes. <tt>blessed</tt> has't been imported into <tt>main</tt>, so the code results in the error <tt>Undefined subroutine &main::blessed</tt>.</p>
<p>The second test, on the other hand, fails. This is because <tt>blessed</tt> has been invoked as a method on <tt>$foo</tt>.</p>
<p>The Moose codebase had several instances of <tt>if ( blessed $object )</tt>, in packages that did not import <tt>blessed</tt> at all. This worked for ages, because <tt>Moose::Object</tt>, the base class for most objects in the Moose ecosystem, didn't clean up that export, and therefore provided an inherited <tt>blessed</tt> method for pretty much any class written in Moose.</p>
<p>I think this example provides a very strong case for using <a href="http://search.cpan.org/perldoc?namespace::clean"><tt>namespace::clean</tt></a> or <a href="http://search.cpan.org/perldoc?namespace::autoclean"><tt>namespace::autoclean</tt></a> routinely in your classes.</p>
<p>To cover the other half of the <a href="http://www.shadowcat.co.uk/blog/matt-s-trout/indirect-but-still-fatal/">problem</a>, the <tt>no <a href="http://search.cpan.org/perldoc?indirect">indirect</a></tt> pragma allows the removal of this unfortunate feature from specific lexical scopes.</p>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-876358347971598886.post-26045054108277487722010-01-06T15:12:00.004+02:002010-01-06T15:23:25.836+02:00Importing Keywurl searches to Chrome<p>I've recently switched to using <a href="http://www.google.com/chrome">Chrome</a>. I used <a href="http://alexstaubo.github.com/keywurl/">Keywurl</a> extensively with Safari. Here's a script that imports the Keywurl searches into Chrome:</p>
<pre id="fake-gist-270267" class="fake-gist">#!/usr/bin/perl
use strict;
use warnings;
use Mac::PropertyList qw(parse_plist_file);
use DBI;
my $app_support = "$ENV{HOME}/Library/Application Support";
my $dbh = DBI->connect("dbi:SQLite:dbname=$app_support/Google/Chrome/Default/Web Data");
my $plist = parse_plist_file("$app_support/Keywurl/Keywords.plist");
my $keywords = $plist->{keywords};
$dbh->begin_work;
my $t = time;
my $sth = $dbh->prepare(qq{
INSERT INTO keywords VALUES (
NULL, -- id
?, -- name
?, -- keyword
"", -- favicon url
?, -- url
0, -- show in default list
0, -- safe for auto replace
"", -- originating URL
$t, -- date created
0, -- usage count
"", -- input encodings
"", -- suggest url
0, -- prepopulate id
0 -- autogenerate keyword
)
});
foreach my $link ( keys %$keywords ) {
my $data = $keywords->{$link};
my $url = $data->{expansion}->value;
$url =~ s/\{query\}/{searchTerms}/g;
$sth->execute(
$link, # name
$link, # keyword
$url,
);
}
$dbh->commit;
</pre>Unknownnoreply@blogger.com6tag:blogger.com,1999:blog-876358347971598886.post-37820836277939259342009-12-16T18:01:00.004+02:002009-12-16T18:08:31.123+02:00Ironman FAIL<p>Oops... I moved back to Chamonix over the weekend and completely forgot about blogging.</p>
<p>I guess I'll take a few days to get settled in and then start writing again. I'm aiming for chartreuse with alternating red and monkeyshit highlights and a fishnet, but unfortunately mst has been blogging much more consistently than me so far.</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-876358347971598886.post-30984649603589223822009-12-04T01:54:00.003+02:002009-12-08T16:23:40.876+02:00Simplifying BEGIN { } with Moose roles<p>This is a common Perl pattern:</p>
<pre class="fake-gist" id="fake-gist-248685">package MyClass;
use Moose
use Try::Tiny;
use namespace::autoclean;
BEGIN {
if ( try { require Foo; 1 } ) {
*bar = sub {
my $self = shift;
Foo::foo($self->baz);
};
} else {
*bar = sub {
... # fallback implementation
};
}
}</pre>
<p>However, since this is a <a href="http://moose.perl.org/">Moose</a> class there is another way:</p>
<pre class="fake-gist" id="fake-gist-248687">package MyClass;
use Moose
use Try::Tiny;
use namespace::autoclean;
with try { require Foo; 1 }
? "MyClass::Bar::Foo"
: "MyClass::Bar::Fallback";</pre>
<pre class="fake-gist" id="fake-gist-248688">package MyClass::Foo;
use Moose::Role;
use Foo qw(foo);
use namespace::autoclean;
sub bar {
my $self = shift;
foo($self->baz);
}</pre>
<pre class="fake-gist" id="fake-gist-248689">package MyClass::Bar::Fallback;
use Moose::Role;
use namespace::autoclean;
sub bar {
...; # fallback implementation
}</pre>
<p>Obviously for something that simple it doesn't make sense, but if there is more than one method involved, or the fallback implementation is a little long, it really helps readability in my opinion. Going one step further, you can create an abstract role like this:</p>
<pre class="fake-gist" id="fake-gist-248690">package MyClass::Bar::API;
use Moose::Role;
use namespace::autoclean;
requires "bar";</pre>
<p>and add it to the class's <tt>with</tt> statement to validate that all the required methods are really provided by one of the roles.</p>
<p>Role inclusion is usually thought of as something very static, but dynamism can be very handy without doesn't hurting the structure of the code.</p>
<small>If you want to be pedantic the role inclusion is not at compile time, but the loading of <tt>Foo</tt> is done at compile time inside the role (<tt>Foo</tt> is usually why it was in a <tt>BEGIN</tt> block in the first place, in most of the code I've seen).</small>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-876358347971598886.post-16753091622463707862009-11-26T23:07:00.002+02:002009-11-27T02:02:39.808+02:00The timing of values in imperative APIs<p>Option configuration is a classic example of when I prefer a purely functional approach. This post is not <a href="http://blog.woobling.org/2009/11/functional-programming-and-unreasonable.html">about broken semantics</a>, but rather about the tension between ease of implementation and ease of use.</p>
<p>Given Perl's imperative heritage, many modules default to imperative option specification. This means that the choice of one behavior over another is represented by an action (setting the option), instead of a value.</p>
<p>Actions are far more complicated than values. For starters, they are part of an ordered sequence. Secondly, it's hard to know what the complete set of choices is, and it's hard to correlate between choices. And of course the actual values must still be moved around.</p>
<p>A simple example is Perl's built in <tt>import</tt> mechanism.</p>
<p>When you <tt>use</tt> a module, you are providing a list of arguments that passed to two optional method calls on the module being loaded, <tt>import</tt> and <tt>VERSION</tt>.</p>
<p>Most people know that this:</p>
<pre class="fake-gist" id="fake-gist-243454">use Foo;</pre>
<p>Is pretty much the same as this:</p>
<pre class="fake-gist" id="fake-gist-243455">BEGIN {
require Foo;
Foo->import();
}</pre>
<p>There's also a secondary syntax, which allows you to specify a version:</p>
<pre class="fake-gist" id="fake-gist-243456">use Foo 0.13 qw(foo bar);</pre>
<p>The effect is the same as:</p>
<pre class="fake-gist" id="fake-gist-243457">BEGIN {
require Foo;
Foo->VERSION(0.13);
Foo->import(qw(foo bar));
}</pre>
<p><tt>UNIVERSAL::VERSION</tt> is pretty simple, it looks at the version number and compares it with <tt>$Foo::VERSION</tt> and then complains loudly if <tt>$Foo::VERSION</tt> isn't recent enough.</p>
<p>But what if we wanted to do something more interesting, for instance adapt the exported symbols to be compatible with a certain API version?</p>
<p>This is precisely why <tt>VERSION</tt> is an overridable class method, but this flexibility is still very far from ideal.</p>
<pre class="fake-gist" id="fake-gist-243458">my $import_version;
sub VERSION {
my ( $class, $version ) = @_;
# first verify that we are recent enough
$class->SUPER::VERSION($version);
# stash the value that the user specified
$import_version = $version;
}
sub import {
my ( $class, @import ) = @_;
# get the stashed value
my $version = $import_version;
# clear it so it doesn't affect subsequent imports
undef $import_version;
... # use $version and @imports to set things up correctly
}</pre>
<p>This is a shitty solution because really all we want is a simple value, but we have to juggle it around using a shared variable.</p>
<p>Since the semantics of <tt>import</tt> would have been made more complex by adding this rather esoteric feature, the API was made imperative instead, to allow things to be optional.</p>
<p>But the above code is not only ugly, it's also broken. Consider this case:</p>
<pre class="fake-gist" id="fake-gist-243459">package Evil;
use Foo 0.13 (); # require Foo; Foo->VERSION;
package Innocent;
use Foo qw(foo bar); # require Foo; Foo->import;</pre>
<p>In the above code, <tt>Evil</tt> is causing <tt>$import_version</tt> to be set, but <tt>import</tt> is never called. The next invocation of <tt>import</tt> comes from a completely unrelated consumer, but <tt>$import_version</tt> never got cleared.</p>
<p>We can't use <tt>local</tt> to keep <tt>$import_version</tt> properly scoped (it'd be cleared before <tt>import</tt> is called). The best solution I can come up with is to key it in a hash by <tt>caller()</tt>, which at least prevents pollution. This is something every implementation of <tt>VERSION</tt> that wants to pass the version to <tt>import</tt> must do to be robust.</p>
<p>However, even if we isolate consumers from each other, the nonsensical usage <tt>use Foo 0.13 ()</tt> which asks for a versioned API and then proceeds to import nothing, still can't be detected by <tt>Foo</tt>.</p>
<p>We have 3 * 2 = 6 different code paths<a name="E7C2927D-1E8F-467D-A994-F0017688FF8B" href="#33128C52-07EF-4416-BFCA-44FCE2AFF977"><sup>[1]</sup></a> for the different variants of <tt>use Foo</tt>, one of which doesn't even make sense (<tt>VERSION</tt> but no <tt>import</tt>), two of which have an explicit stateful dependency between two parts of the code paths (<tt>VERSION</tt> followed by <tt>import</tt>, in two variants), and two of which have an implicit stateful dependency (<tt>import</tt> without <tt>VERSION</tt> should get <tt>undef</tt> in <tt>$import_version</tt>). This sort of combinatorial complexity places the burden of ensuring correctness on the implementors of the API, instead of the designer of the API.</p>
<p>It seems that the original design goal was to minimize the complexity of the most common case (<tt>use Foo</tt>, no <tt>VERSION</tt>, and <tt>import</tt> called with no arguments), but it really makes things difficult for the non default case, somewhat defeating the point of making it extensible in the first place (what good is an extensible API if nobody actually uses it to its full potential).</p>
<p>In such cases my goal is often to avoid fragmenting the data as much as possible. If the version was an argument to <tt>import</tt> which defaulted to <tt>undef</tt> people would complain, but that's just because <tt>import</tt> uses positional arguments. Unfortunately you don't really see this argument passing style in the Perl core:</p>
<pre class="fake-gist" id="fake-gist-243460">sub import {
my ( $class, %args ) = @_;
if ( exists $args{version} ) {
...
}
... $args{import_list};
}</pre>
<p>This keeps the values together in both space and time. The closest thing I can recall from core Perl is something like <tt>$AUTOLOAD</tt>. <tt>$AUTOLOAD</tt> does not address space fragmentation (an argument is being passed using a a variable instead of an argument), but it at leasts solves the fragmentation in time, the variable is <em>reliably</em> set just before the <tt>AUTOLOAD</tt> routine is invoked.</p>
<p>Note that if <tt>import</tt> worked like this it would still be far from pure, it mutates the symbol table of its caller, but the actual computation of the symbols to export can and should be side effect free, and if the version were specified in this way that would have been easier.</p>
<p>This is related to the <a href="http://c2.com/cgi/wiki?IntentionNotAlgorithm">distinction between intention and algorithm</a>. Think of it this way: when you say <tt>use Foo 0.13 qw(foo bar)</tt>, do you intend to import a specific version of the API, or do you intend to call a method to set the version of the API and then call a method to import the API? The declarative syntax has a close affinity to the intent. On the other hand, looking at it from the perspective of <tt>Foo</tt>, where the intent is to export a specific version of the API, the code structure does not reflect that at all.</p>
<p><a href="http://blogs.perl.org/users/ovid/">Ovid</a> wrote about a <a href="http://use.perl.org/~Ovid/journal/39878">similar issue with Test::Builder</a>, where a procedural approach was taken (diagnosis output is treated as "extra" stuff, not really a part of a test case's data).</p>
<p><a href="http://moose.perl.org">Moose</a> also suffers from this issue in its sugar layer. When a Moose class is declared the class definition is modified step by step, causing load time performance issues, order sensitivity (often you need to include a role after declaring an attribute for required method validation), etc.</p>
<p>Lastly, <a href="http://plackperl.org/">PSGI</a>'s raison d'etre is that the <tt>CGI</tt> interface is based on stateful values (<tt>%ENV</tt>, globally filehandles). The gist of the PSGI spec is encapsulating those values into explicit arguments, without needing to <a href="http://search.cpan.org/perldoc?HTTP::Request::AsCGI">imperatively monkeypatch global state</a>.</p>
<p>I think the reason we tend to default to imperative configuration is out of a short sighted laziness<a href="#5724A963-16E7-461D-9A68-3F639EAAC278" name="6A5E5B98-5537-438E-9250-1B5193BA3152"><sup>[2]</sup></a>. It seems like it's easier to be imperative, when you are thinking about usage. For instance, creating a data type to encapsulate arguments is tedius. Dealing with optional vs. required arguments manually is even more so. Simply forcing the user to specify everything is not very Perlish. This is where the tension lies.</p>
<p>The best compromise I've found is a multilayered approach. At the foundation I provide a low level, explicit API where all of the options are required all at once, and cannot be changed afterwords. This keeps the combinatorial complexity down and lets me do more complicated validation of dependent options. On top of that I can easily build a convenience layer which accumulates options from an imperative API and then provides them to the low level API <em>all at once</em>.</p>
<p>This was not done in Moose because at the time we did not know to detect the end of a <tt>.pm</tt> file, so we couldn't know when the declaration was finished<a name="D4C439D9-5E77-4024-98B9-DCE196EF1176" href="#12348E3B-2F94-4A48-9C3C-61EE2BFED643"><sup>[3]</sup></a>.</p>
<p>Going back to <tt>VERSION</tt> and <tt>import</tt>, this approach would involve capturing the values as best we in a thin <tt>import</tt> (the sugar layer), and passing them onwards together to some underlying implementation that doesn't need to worry about the details of collecting those values.</p>
<p>In my opinion most of the time an API doesn't actually merit a convenience wrapper, but if it does then it's easy to develop one. Building on a more verbose but ultimately simpler foundation usually makes it much easier to write something that is correct, robust, and reusable. More importantly, the implementation is also easier to modify or even just replace (using polymorphism), since all the stateful dependencies are encapsulated by a dumb sugar layer.</p>
<p>Secondly, when the sugar layer is getting in the way, it can just be ignored. Instead of needing to hack around something, you just need to be a little more verbose.</p>
<p>Lastly, I'd also like to cite the <a href="http://en.wikipedia.org/wiki/Unix_philosophy">Unix philosophy</a>, another strong influence on Perl: do one thing, and do it well<a href="#E575E230-D706-447B-90BE-BB2A641B4C51" name="4A5F8736-E00A-4EB4-83BC-3AB5C89E24C5"><sup>[4]</sup></a>. The anti pattern is creating one thing that provides two features: a shitty convenience layer and a limited solution to the original problem. Dealing with each concern separately helps to focus on doing the important part, and of course doing it well ;-)</p>
<p>This post's subject matter is obviously related to another procedural anti-pattern (<tt>$foo->do_work; my $results = $foo->results</tt> vs <tt>my $results = $foo->do_work</tt>). I'll rant about that one in a later post.</p>
<p><a href="#E7C2927D-1E8F-467D-A994-F0017688FF8B" name="33128C52-07EF-4416-BFCA-44FCE2AFF977"><sup>[1]</sup></a></p>
<pre class="fake-gist" id="fake-gist-243461">use Foo;
use Foo 0.13;
use Foo qw(foo bar);
use Foo 0.13 qw(Foo Bar);
use Foo ();
use Foo 0.13 ();</pre>
<p>and this doesn't even account for manual invocation of those methods, e.g. from delegating <tt>import</tt> routines.</p>
<p><a name="5724A963-16E7-461D-9A68-3F639EAAC278" href="#6A5E5B98-5537-438E-9250-1B5193BA3152"><sup>[2]</sup></a> This is the wrong kind of laziness, the virtuous laziness is long term</p>
<p><a href="#D4C439D9-5E77-4024-98B9-DCE196EF1176" name="12348E3B-2F94-4A48-9C3C-61EE2BFED643"><sup>[3]</sup></a> Now we have <a href="http://search.cpan.org/perldoc?B::Hooks::EndOfScope"><tt>B::Hooks::EndOfScope</tt></a></p>
<p><a name="E575E230-D706-447B-90BE-BB2A641B4C51" href="#4A5F8736-E00A-4EB4-83BC-3AB5C89E24C5"><sup>[4]</sup></a> Perl itself does many things, but it is intended to let you write things that do one thing well (originally scripts, though nowadays I would say the CPAN is a much better example)</p>Unknownnoreply@blogger.com5tag:blogger.com,1999:blog-876358347971598886.post-71021070221956691992009-11-21T18:39:00.003+02:002009-11-21T21:47:56.787+02:00Restricted Perl<p><a href="http://perlalchemy.blogspot.com/">zby</a>'s comments on my <a href="http://blog.woobling.org/2009/11/functional-programming-and-unreasonable.html">last post</a> got me thinking. There are many features in Perl that we no longer use, or that are considered arcane or bad style, or even features we could simply live without. However, if they were removed, lots of code would break. So we keep those features, and we keep writing new code that uses them.</p>
<p>Suppose there was a pragma, similar to <tt>no <a href="http://search.cpan.org/perldoc?indirect">indirect</a></tt> in that it restricts existing language features, and similar <a href="http://perldoc.perl.org/perllexwarn.html"><tt>strict</tt></a> in that it lets you opt out of unrelated discouraged behaviors.</p>
<p>I think this would be an interesting baby step towards solving some of the problems that plague Perl code today:</p>
<ul>
<li>Features that are often misused and need lots of <a href="http://search.cpan.org/perldoc?Perl::Critic">critique</a>.</li>
<li>Language features that are hard to change in the interpreter's implementation, limiting the revisions we can make to Perl 5.</li>
<li>Code that will be hard to translate to Perl 6, for no good reason.</li>
</ul>
<p>On top of that one could implement several different defaults sets of feature-restricted Perl (sort of like <a href="http://search.cpan.org/perldoc?Modern::Perl"><tt>Modern::Perl</tt></a>).</p>
<p>Instead of designing some sort of restricted subset of Perl 5 from the bottom up, several competing subsets could be developed organically, and if memory serves me right that is something we do quite well in our community =)</p>
<p>So anyway, what are some things that you could easily live without in Perl? What things would you be willing to sacrifice if it meant you could trade them off for other advantages? Which features would you rather disallow as part of a coding standard?</p>
<h2>My take</h2>
<p>Here are some ideas. They are split up into categories which are loosely related, but don't necessarily go hand in hand (some of them even contradict slightly).</p>
<p>They are all of a reasonable complexity to implement, either validating something or removing a language feature in a lexical scope.</p>
<p>It's important to remember that these can be opted out of selectively, when you need them, just like you can say <tt>no warnings 'uninitialized'</tt> when stringifying <tt>undef</tt> is something you intentionally allowed.</p>
<h2>Restrictions that would facilitate static modularity</h2>
<p>The first four restrictions make it possible to treat <tt>.pm</tt> files as standalone, cacheable compilation units. The fifth also allows for static linkage (no need to actually invoke <tt>import</tt> when evaluating a <tt>use</tt> statement), since the semantics of <tt>import</tt> are statically known. This could help alleviate startup time problems with Perl code, per complicit compilation unit (without needing to solve the problem as a whole by crippling the adhoc nature of Perl's compile time <em>everywhere</em>).</p>
<ul>
<li>Disallow recursive <tt>require</tt>.</li>
<li>Disallow modification to a package's symbol table after its <tt>package</tt> declaration goes out of scope.</li>
<li>Restrict a file to to only one package (which must match the <tt>.pm</tt> file name).</li>
<li>Disallow modification of other packages other than the currently declared one.</li>
<li>Restrict the implementation of <tt>import</tt> to a statically known one.</li>
<li>Disallow access to external symbols that are not bound at compile time (e.g. variables from other packages, subroutines which weren't predeclared (fully qualified is OK).</li>
</ul>
<h2>Restrictions that allow easier encapsulation of side effects</h2>
<p>These restrictions address pollution of state between unrelated bits of code that have interacting dynamic scopes.</p>
<ul>
<li>Disallow modification of any global variables that control IO behavior, such as <tt>$/</tt>, <tt>$|</tt>, etc, as well as code that depends on them. <tt>IO::Handle</tt> would have to be augmented a bit to allow per handle equivalents, but it's most of the way there.</li>
<li>Disallow such variables completely, instead requiring a trusted wrapper for <tt>open</tt> that sets them at construction time and leaves them immutable thereafter.</li>
<li>Disallow <tt>/g</tt> matches on anything other than private lexicals (sets <tt>pos</tt>)</li>
<li>Disallow <tt>$SIG{__WARN__}</tt>, <tt>$SIG{__DIE__}</tt>, and <tt>$^S</tt></li>
<li>Disallow <tt>eval</tt> (instead, use <a href="http://search.cpan.org/perldoc?Try::Tiny">trusted code that gets <tt>local $@</tt> right</a>)</li>
<li>Disallow use of global variables altogether. For instance, instead of <tt>$!</tt> you'd rely on <a href="http://search.cpan.org/perldoc?autodie"><tt>autodie</tt></a>, for <tt>@ARGV</tt> handling you'd use <a href="http://search.cpan.org/perldoc?MooseX::Getopt"><tt>MooseX::Getopt</tt></a> or <a href="http://search.cpan.org/perldoc?App::Cmd"><tt>App::Cmd</tt></a>.</li>
<li>Disallow mutation through references (only private lexical variables can be modified directly, and complex data structures are therefore immutable after being constructed). This has far reaching implications for object encapsulation, too.</li>
</ul>
<h2>Restrictions that would encourage immutable data.</h2>
<p>These restrictions alleviate some of the mutation centric limitations of the <tt>SV</tt> structure, that make lightweight concurrency impossible without protecting every variable access with a mutex. This would also allow aggressive <a href="http://en.wikipedia.org/wiki/Copy-on-write">COW</a>.</p>
<ul>
<li>Only allow assignment to a variable at its declaration site. This only applies to lexicals.</li>
<li>Allow only a single assignment to an <tt>SV</tt> (by reference or directly. Once an <tt>SV</tt> is given a value it becomes readonly)</li>
<li>Disallow assignment modification of external variables (non lexicals, and closure captures). This is a weaker guarantee than the previous one (which is also much harder to enforce), but with similar implications (all assignment is guaranteed to have side effects that outlive its lexical scope)</li>
</ul>
<p>Since many of the string operations in Perl are mutating, purely functional variants should be introduced (most likely as wrappers).</p>
<p>Implicit mutations (such as the upgrading of an SV due to numification) typically results in a copy, so multithreaded access to immutable SVs could either pessimize the caching or just use a spinlock on upgrades.</p>
<h2>Restrictions that would facilitate functional programming optimizations</h2>
<p>These restrictions would allow representing simplified optrees in more advanced intermediate forms, allowing for interesting optimization transformations.</p>
<ul>
<li>Disallow void context expressions</li>
<li>...except for variable declarations (with the afore mentioned single use restrictions, this effectively makes every <tt>my $x = ...</tt> into a let style binding)</li>
<li>Allow only a single compound statement per subroutine, apart from let bindings (that evaluates to the return value). This special cases <tt>if</tt> blocks to be treated as a compound statement due to the way implicit return values work in Perl.</li>
<li>Disallow opcodes with non local side effects (including calls to non-verified subroutines) for purely functional code.</li>
</ul>
<p>This is perhaps the most limiting set of restrictions. This essentially lets you embed lambda calculus type ASTs natively in Perl. Alternative representations for this subset of Perl could allow lisp style macros and other interesting compile time transformations, without the difficulty of making that alternative AST feature complete for all of Perl's semantics.</p>
<h2>Restrictions that facilitate static binding of OO code</h2>
<p>Perl's OO is always late bound, but most OO systems can actually be described statically. These restrictions would allow you to opt in for static binding of OO dispatch for a given hierarchy, in specific lexical scopes. This is a little more complicated than just lexical restrictions on features, since metadata about the classes must be recorded as well.</p>
<ul>
<li>Only allow <tt>bless</tt>ing into a class derived from the current package</li>
<li>Enforce <tt>my Class $var</tt>, including <a href="Methods::CheckNames">static validation of method calls</a></li>
<li>Disallow introduction of additional classes at runtime (per class hierarchy or alltogether)</li>
<li>Based on the previous two restrictions, validate method call sites on typed variable invocants as static subroutine calls (with several target routines, instead of one)</li>
<li>Similar the immutable references restriction above, disallow dereferencing of any blessed reference whose class is not derived from the current package.</li>
</ul>
<h2>Restrictions that are easy to opt in to in most code (opting out only as necessary)</h2>
<p>These features are subject to lots of criticism, and their usage tends to be discouraged. They're still useful, but in an ideal world they would probably be implemented as CPAN modules.</p>
<ul>
<li>Disallow formats</li>
<li>Disallow <tt>$[</tt></li>
<li>Disallow tying and usage of tied variables</li>
<li>Disallow overloading (declaration of overloads, as well as <a href="http://perldoc.perl.org/overloading.html">their use</a>)</li>
</ul>
<h2>A note about implementation</h2>
<p>Most of these features can be implemented in terms of <a href="http://search.cpan.org/perldoc?B::Hooks::OP::Check">opcheck</a> functions possibly coupled scrubbing triggered by and <a href="http://search.cpan.org/perldoc?B::Hooks::EndOfScope">end of scope hook</a>. Some of them are static checks at <tt>use</tt> time. A few others require more drastic measures. For related modules see <a href="http://search.cpan.org/perldoc?indirect"><tt>indirect</tt></a>, <a href="http://search.cpan.org/perldoc?Safe"><tt>Safe</tt></a>, <a href="http://search.cpan.org/perldoc? Sys::Protect"><tt>Sys::Protect</tt></a>, and <a href="http://search.cpan.org/perldoc?Devel::TypeCheck"><tt> Devel::TypeCheck</tt></a> to name but a few</p>
<p>I also see a niche for modules that implement <em>alternatives</em> to built in features, disabling the core feature and providing a better alternative that replaces it instead of coexisting with it. This is the next step in exploratory language evolution as led by <a href="http://search.cpan.org/perldoc?Devel::Declare"><tt>Devel::Declare</tt></a>.</p>
<p>The difficulty of modernizing Perl 5's internals is the overwhelming amount of orthogonal concerns whenever you try to implement something. Instead of trying to take care of these problems we could make it possible for the user to promise they won't be an issue. It's not ideal, but it's better than nothing at all.</p>
<h2>The distant future</h2>
<p>If this sort of opt-out framework turns out to be successful, there's no reason why <tt>use 5.20.0</tt> couldn't disable some of the more regrettable features by default, so that you have to explicitly ask for them instead. This effectively makes Perl's cost model per-per-use, instead of always pay.</p>
<p>This would also increase the likelihood that people stop using such features in new code, and therefore the decision making aspects of the feature deprecation process would be easier to reason about.</p>
<p>Secondly, and perhaps more importantly, it would be possible to try for alternative implementations of Perl 5 with shorter termed deliverables.</p>
<p>Compiling a restricted subset of Perl to other languages (for instance client side JavaScript, different bytecodes, adding JIT support, etc) is a much easier task than implementing the language as a whole. If more feature restricted Perl code would be written and released on the CPAN, investments in such projects would be able to produce useful results sooner, and have clearer indications of progress.</p>Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-876358347971598886.post-73047490741123519432009-11-18T23:57:00.007+02:002009-11-19T02:01:25.905+02:00Functional programming and unreasonable expectations<p><tt><record type="broken"></tt>I'm a big fan of purely functional programming<tt></record></tt>.</p>
<p>Another reason I like it so much is that purely functional software tends to be more reliable. Joe Armstrong of Erlang fame makes that point <a href="http://www.infoq.com/presentations/Systems-that-Never-Stop-Joe-Armstrong">in an excellent talk</a> much better than I could ever hope to.</p>
<p>However, one aspect he doesn't really highlight is that reliability is not only good for keeping your system running, it also makes it easier to program.</p>
<p>When a function is pure it is guaranteed to be isolated from other parts of the program. This separation is makes it much easier to change the code in one place without breaking anything unrelated.</p>
<p>Embracing this style of programming has had one huge drawback though: it utterly ruined my expectations of non functional code.</p>
<p>In imperative languages it's all too easy to add unstated assumptions about global state. When violated, these assumptions then manifest in very ugly and surprising ways (typically data corruption).</p>
<p>A good example is reentrancy (or rather the lack thereof) in old style C code. Reentrant code can be freely used in multiple threads, from inside signal handlers, etc. Conversely, non-reentrant routines may only be executed once at a given point in time. Lack of foresight in early C code meant that lots of code had to be converted to be reentrant later on. Since unstated assumptions are by definition hidden this can be a difficult and error prone task.</p>
<p>The specific disappointment that triggered this post is Perl's regular expression engine.</p>
<p>Let's say we're parsing some digits from a string and we want to create a <tt>SomeObject</tt> with those digits. Easy peasy:</p>
<pre class="fake-gist" id="fake-gist-238323">$string =~ m/(\d+)/;
push @results, SomeObject->new( value => $1 );</pre>
<p>Encapsulating that match into a resuable regex is a little harder though. Where does the post processing code go? Which capture variable does it use? Isolation would have been nice. The following example might work, but it's totally wrong:</p>
<pre class="fake-gist" id="fake-gist-238318">my $match_digits = qr/(\d+)/;
my $other_match = qr{ ... $match_digits ... }x;
$string =~ $other_match;
push @results, SomeObject->new( value => $1 ); # FIXME makes no sense</pre>
<p>Fortunately Perl's regex engine has a pretty awesome feature that let you run code during a match. This is very useful for constructing data from intermittent match results without having to think about nested captures, especially since the <tt>$^N</tt> variable conveniently contains the result of the last capture.</p>
<p>Not worrying about nested captures is important when you're combining arbitrary patterns into larger ones. There's no reliable way to know where the capture result ends up so it's easiest to process it as soon as it's available.</p>
<pre class="fake-gist" id="fake-gist-238319">qr{
(\d+) # match some digits
(?{
# use the previous capture to produce a more useful result
my $obj = SomeObject->new( value => $^N );
# local allows backtracking to undo the effects of this block
# this would have been much simpler if there was a purely
# functional way to accumulate arbitrary values from regexes
local @results = @results, $obj;
})
}x;</pre>
<p>Even though this is pretty finicky it still goes a long way. With this feature you can create regexes that also encapsulate the necessary post processing, while still remaining reusable.</p>
<p>Here is a hypothetical the definition of <tt>SomeObject</tt>:</p>
<pre class="fake-gist" id="fake-gist-238320">package SomeObject;
use Moose;
has value => (
isa => "Int",
is => "ro",
);</pre>
<p>Constructing <tt>SomeObject</tt> is a purely functional operation: it has no side effects, and only returns a new object.</p>
<p>The only problem is that the above code is totally broken. It works, but only some of the time. The breakage is pretty random.</p>
<p>Did you spot the bug yet? No? But it's oh so obvious! Look inside <tt>Moose::Util::TypeConstraints::OptimizedConstraints</tt> and you will find the offending code:</p>
<pre class="fake-gist" id="fake-gist-238321">sub Int { defined($_[0]) && !ref($_[0]) && $_[0] =~ /^-?[0-9]+$/ }</pre>
<p>The constructor Moose generated for <tt>SomeObject</tt> is in fact not purely functional at all; though seemingly well behaved, in addition to returning an object it also the side effect of shitting all over the regexp engine's internal data structures, causing random values to be occasionally assigned to <tt>$^N</tt> (but only if invoked from inside a <tt>(?{ })</tt> block during a match). You can probably imagine what a great time I had finding that bug.</p>
<p>What makes me sad is that the <tt>Int</tt> validation routine appears purely functional. It takes a value and then without modifying anything merely checks that it's defined, that it's not a reference, and that its stringified form contains only digits, returning a truth value as a result. All of the inputs and all of the outputs are clear, and therefore it seems only logical that this should be freely reusable.</p>
<p>When I came crying to <tt>#p5p</tt> it turned out that this is actually a known issue. I guess I simply shouldn't have expected the regexp engine to do such things, after all it has a very long history and these sorts of problems are somewhat typical of C code.</p>
<p>If the regexp engine was reentrant the what I tried to do would have just worked. Reentrancy guarantees one level of arbitrary combinations of code (the bit of reentrant code can be arbitrarily combined with itself). Unfortunately it seems very few people are actually in a position to fix it.</p>
<p>Purely functional code goes one step further. You can <strong>reliably</strong> mix and match any bit of code with any other bit of code, combining them in new ways, never having to expect failure. The price you have to pay is moving many more parameters around, but this is exactly what is necessary to make the boundaries well defined: all interaction between components is explicit.</p>
<p>When old code gets reused it will inevitably get prodded in ways that the original author did not think of. Functional code has a much better chance of not needing to be reimplemented, because the implementation is kept isolated from the usage context.</p>
<p>In short, every time you write dysfunctional code god kills a code reuse. Please, think of the code reuse!</p>Unknownnoreply@blogger.com14