Wednesday, August 12, 2009

reset { hack { shift { return $_[0] } } }

Lately my pet project has been implementing delimited continuations for Perl. In an effort to keep myself motivated (it's hard and frustrating), I'll try and explain what I'm actually trying to do.

Caveat lector, the repository is nasty. There is no coherent process in the versioned history, it's just git commit -a at the end of the day, so the repository will be breaking. Secondly, I don't really know what I'm doing. I do in principle, but the details are still vague and I am discovering them as I go along.

So, WTF are delimited continuations? As Perl programmers the easiest way to understand them is probably Tom Moertel's favourite explanation, by Oleg Kiselyov.

As core hackers, the easiest way to understand them is to see how they are implemented. This is what I will stick with.

But really, the best way to understand continuations is to use a language that actually has them, and play around a bit. It's a strange concept to get used to. Unlambda, the INTERCAL of functional programming uses continuations precisely because they can be difficult to understand (though technically they aren't delimited).

Continuations for Core Hackers

Delimited continuations are structured around two operations, reset and shift.

reset is the delimiter, it saves a position in the call chain for a future call to shift.

shift is what actually creates the continuation. It does that by walking up the call chain from the current position, up to the reset, and saving everything in between into the continuation. The continuation itself is reified as a function, and is passed as an argument to the block given to shift. Hold that thought, we'll get back to that in a second.

This interaction between shift and reset is much like the one between die and eval. eval delimits the stack, and die is what's called an escape continuation:

eval {
    warn "reached";
    die $value;
    warn "not reached";

return $@;

The die causes the code in the eval to be aborted, and sets the special variable $@ to $value. With shift and reset this could be expressed a little more elegantly as:

return cont_reset {
    warn "reached";
    cont_shift { return $value };
    warn "not reached";

What happens in shift is similar to die. The stack is unwound all the way up to the begining of reset. The difference is that the return value of the shift block becomes the return value of the reset block.

Scope::Upper makes this sort of abstraction possible already, as demonstrated by Continuation::Escape.

Capturing a Continuation

Semantically eval/die and delimtied continuations are actually very different. shift doesn't unwind the stack, but stashes the frames into a data structure.

Delimited.xs introduces two structs, delim_t and cont_t.

delim_t is very trivial, it contains a snapshot of the various state variables in the Perl interpreter, like the stack pointers, the current opcode, etc. When we call cont_reset a new delimiter is pushed onto a linked list.

Inside cont_shift the init_cont function will create another delimiter, and then destructively move all the stack frames between the two delimiters into a cont_t. When it's done current state of the interpreter has effectively been rewound to that of the start delimiter, but unlike die none of the data has been cleaned up, those stack frames are still "live", just not visible from the current state of the interpreter.

Reifying a Continuation

The restore_cont function takes a cont_t and appends copies of all the captured stack frames inside it to the current state of the interpreter. There are many things to fix up, and that is where the bulk of the implementation lies.

At the end of the restore_cont the next step for the interpreter is to resume execution right after the call to cont_shift, the end of the continuation capture.

The difference is when those stack frames are evaluated and we reach the end of the cont_reset block, instead of returning to the caller of cont_reset we return to the caller of restore_cont. To do this we overwrite the top level PERL_CONTEXT stack frame to use the current PL_op as the retop, instead of the one it was originally created with.

So how does restore_cont get called? We wrap the cont_t in a closure, that gets invoked as a function. Here is an example of no-op continuation usage:

my $value = cont_reset {
    return 3 + cont_shift {
        my $k = shift;

is( $value, 10 );

This is no-op because cont_shift unwinds the stack, and gets the continuation in $k, and then immediately reinvokes it, restoring everything and causing cont_shift to return a value of 7 from cont_reset. This code is functionally equivalent to:

my $value = cont_reset {
    return 3 + 7;

Which is really the same as

my $value = sub { return 3 + 7 }->();

But $k is more than that, it's literally a function that appends a bunch of things to the interpreter's stacks. You can call it multiple times:

my $add_three = cont_reset {
    return 3 + cont_shift {
        my $k = shift;
        return $k;

In this example cont_shift returns $k instead of invoking it. Since the stack has been unwound, this causes cont_reset to return $k.

So what happens when you invoke $add_three from the outside? The stack frames from reset until shift are appended, so the interpreter state is waiting for a value from shift to add 3 to.

This means that you can call $add_three->(7) and get back a value of 10, any number of times. In fact, this is what the basic sanity test does.

Gory Details

Unfortunately what goes on behind the scenes isn't that simple (almost nothing is when it comes to Perl internals).

delim_t is initialized by init_delim to capture relative offsets for PL_stack_sp, PL_markstack_ptr, and the values of cxstack_ix, PL_scopestack_ix, PL_savestack_ix and PL_tmps_ix (which are relative offsets anyway). This information captures all of the stack states.

In addition it keeps track of a number of variables (like the current opcode, PL_op, the current pad, PL_comppad, and another of other variables corresponding to the current state.

When we init_cont the start and end delim_ts are used to capture state in the cont_t using several functions.

init_cont_stack allocates buffers for the mark stack and value stack. All the stacked SVs are put into an AV, increasing their reference counts, and the marks are converted to 0 based offsets, based on the value of PL_stack_sp at the start delimiter.

init_cont_cxs captures Perl's context stack. This stack contains PERL_CONTEXT structures which contain all the information about the call chain as well as loops and other blocks requiring data.

In Perl the actual storage for lexical variables is not allocated on the context stack (though it probably should be), but instead its appended to CvPADLIST(cv), which is an array of arrays stored in the CV's data directly (this allows storage space to be cleared instead of freed, so repeated calls to the same subroutine are slightly faster).

The context stack is walked from the end to the begining, and any CXt_SUB, which denotes a subroutine call, causes the lexical pad instance of that subroutine to be popped off of CvPADLIST(cv) and stored in a pads AV in the continuation. CvDEPTH(cv) is adjusted appropriately.

The contexts also contain values like cx->blk_oldsp, previous offsets of the various stack pointers, and these, like the mark stack, are converted to be relative to the start delimiter.

init_cont_saves handles the save stack, and the associated scopestack (which is like the markstack of the savestack). The savestack is used to roll back the interpreter state when leaving a scope.

Any operation that would need to be undone is handled using the savestack, and dispatched using the ENTER and LEAVE macros.

our $foo = 1;
{ # calls ENTER
    local $foo = 3;
} # calls LEAVE

ENTER pushes PL_savestack_ix onto the scope stack. This marks the offset of the savestack.

local $foo pushes a save entry onto the savestack with the previous value of $foo, which is the IV 1.

When the scope that called local is exited the LEAVE macro pops an entry from the scope stack, and then calls leave_scope (from scope.c) to dispatch all the SAVEt_* entries between PL_savestack_ix and the value of PL_savestack_ix popped off the scope stack, causing $foo to be set back to 1.

There are many types of save entries which are crucial to properly managing Perl's data structures and runtime semantics. For instance SAVEDESTRUCTOR_X(function, pointer) can be used to automatically call function on pointer when the current scope is left, and is very useful for managing data from XS modules.

To properly handle the semantics of this stack unwinding we have to partition the savestack into entries that should be recreated every time restore_cont is called (and therefore called zero or more times), and those which should be recreated when the continuation is garbage collected (and therefore only called once).

The entries that should be recreated every time are those that keep track of the currently active lexical pad (SAVEt_COMPPAD), those that keep track of stack pointers (SAVEt_STACK_POS and SAVEt_STACK_CXPOS), and those that clear lexical variables (SAVEt_CLEARSV, pushed to the savestack by pad_sv opcodes when assigning the initial value to a lexical variable).

Localization entries will be handled in the future, but are currently deferred.

Lastly, init_cont_state resets the state of the interpreter to the start delimiter and detaches the two delimiters from the linked list.


Now that we have a detached continuation in a cont_t, appending its contents to the running interpreter is what remains.

The process of invoking a continuation is pretty much the inverse of the init_cont functions. All 0 based offsets are added to the current values of the stack counters, the cont->end delim_t is used to set the values of the interpreter variables, etc.

The main things that are done in addition is cloning of pads (and other state), and fixups at the top level context entry.

Pad cloning is done to ensure that each recreated instance of the continuation has its own copies of the lexical variables it uses. Without this the repeated invocations of the continuations will get in each others' way. Unfortunately this is pretty tricky, we have to walk the context stack top down and clone various values, keeping the addresses in a pointer table so that everything points back at the right thing. This is the most broken bit of code right now.

Similarly, SAVEt_COMPPAD entries have to be mapped as well, in order to reset the interpreter to the cloned pads, not the pads captured in cont_t.

Though not implemented, faked @_ arrays would need to be recreated with the duplicated value stack, and things like loop counters for foreach blocks also need more intricate copying.

Finally, the top level PERL_CONTEXT that was copied has to have its retop set to PL_op->op_next, where PL_op is the one making the call to the reified continuation. This causes the cont_reset block to return to where the caller invoked the continuation, properly connecting the call chain.


Tests are failing. Shit is segfaulting. Sanity is fading. But it works, sort of. This is a proof of concept, and now I have to finish off the various details to make it into a usable module.

If you'd like to help and have no core-fu, feel free to write failing tests. Anything you'd like to do with delimited continuations should be tests.

If you'd like to help and do have core-fu please review my horrible C code, point out obvious mistakes, or even implement some the missing bits.

I've been previously told that continuations aren't possible in Perl 5, but I think this has been demonstrated to be a false. Anyway, $title->().


Chip said...

That is hot.

chocolateboy said...

Wishlist: please call the subs suspend and resume (as per Oleg's definition). Having played with these in Scala, I hate having to translate the blessed-by-theory names into the get-things-done version. suspend/resume are more pragmatic, in the vein of given/when.

nothingmuch said...

chocolateboy said...

My hero!