Saturday, November 21, 2009

Restricted Perl

zby's comments on my last post got me thinking. There are many features in Perl that we no longer use, or that are considered arcane or bad style, or even features we could simply live without. However, if they were removed, lots of code would break. So we keep those features, and we keep writing new code that uses them.

Suppose there was a pragma, similar to no indirect in that it restricts existing language features, and similar strict in that it lets you opt out of unrelated discouraged behaviors.

I think this would be an interesting baby step towards solving some of the problems that plague Perl code today:

  • Features that are often misused and need lots of critique.
  • Language features that are hard to change in the interpreter's implementation, limiting the revisions we can make to Perl 5.
  • Code that will be hard to translate to Perl 6, for no good reason.

On top of that one could implement several different defaults sets of feature-restricted Perl (sort of like Modern::Perl).

Instead of designing some sort of restricted subset of Perl 5 from the bottom up, several competing subsets could be developed organically, and if memory serves me right that is something we do quite well in our community =)

So anyway, what are some things that you could easily live without in Perl? What things would you be willing to sacrifice if it meant you could trade them off for other advantages? Which features would you rather disallow as part of a coding standard?

My take

Here are some ideas. They are split up into categories which are loosely related, but don't necessarily go hand in hand (some of them even contradict slightly).

They are all of a reasonable complexity to implement, either validating something or removing a language feature in a lexical scope.

It's important to remember that these can be opted out of selectively, when you need them, just like you can say no warnings 'uninitialized' when stringifying undef is something you intentionally allowed.

Restrictions that would facilitate static modularity

The first four restrictions make it possible to treat .pm files as standalone, cacheable compilation units. The fifth also allows for static linkage (no need to actually invoke import when evaluating a use statement), since the semantics of import are statically known. This could help alleviate startup time problems with Perl code, per complicit compilation unit (without needing to solve the problem as a whole by crippling the adhoc nature of Perl's compile time everywhere).

  • Disallow recursive require.
  • Disallow modification to a package's symbol table after its package declaration goes out of scope.
  • Restrict a file to to only one package (which must match the .pm file name).
  • Disallow modification of other packages other than the currently declared one.
  • Restrict the implementation of import to a statically known one.
  • Disallow access to external symbols that are not bound at compile time (e.g. variables from other packages, subroutines which weren't predeclared (fully qualified is OK).

Restrictions that allow easier encapsulation of side effects

These restrictions address pollution of state between unrelated bits of code that have interacting dynamic scopes.

  • Disallow modification of any global variables that control IO behavior, such as $/, $|, etc, as well as code that depends on them. IO::Handle would have to be augmented a bit to allow per handle equivalents, but it's most of the way there.
  • Disallow such variables completely, instead requiring a trusted wrapper for open that sets them at construction time and leaves them immutable thereafter.
  • Disallow /g matches on anything other than private lexicals (sets pos)
  • Disallow $SIG{__WARN__}, $SIG{__DIE__}, and $^S
  • Disallow eval (instead, use trusted code that gets local $@ right)
  • Disallow use of global variables altogether. For instance, instead of $! you'd rely on autodie, for @ARGV handling you'd use MooseX::Getopt or App::Cmd.
  • Disallow mutation through references (only private lexical variables can be modified directly, and complex data structures are therefore immutable after being constructed). This has far reaching implications for object encapsulation, too.

Restrictions that would encourage immutable data.

These restrictions alleviate some of the mutation centric limitations of the SV structure, that make lightweight concurrency impossible without protecting every variable access with a mutex. This would also allow aggressive COW.

  • Only allow assignment to a variable at its declaration site. This only applies to lexicals.
  • Allow only a single assignment to an SV (by reference or directly. Once an SV is given a value it becomes readonly)
  • Disallow assignment modification of external variables (non lexicals, and closure captures). This is a weaker guarantee than the previous one (which is also much harder to enforce), but with similar implications (all assignment is guaranteed to have side effects that outlive its lexical scope)

Since many of the string operations in Perl are mutating, purely functional variants should be introduced (most likely as wrappers).

Implicit mutations (such as the upgrading of an SV due to numification) typically results in a copy, so multithreaded access to immutable SVs could either pessimize the caching or just use a spinlock on upgrades.

Restrictions that would facilitate functional programming optimizations

These restrictions would allow representing simplified optrees in more advanced intermediate forms, allowing for interesting optimization transformations.

  • Disallow void context expressions
  • ...except for variable declarations (with the afore mentioned single use restrictions, this effectively makes every my $x = ... into a let style binding)
  • Allow only a single compound statement per subroutine, apart from let bindings (that evaluates to the return value). This special cases if blocks to be treated as a compound statement due to the way implicit return values work in Perl.
  • Disallow opcodes with non local side effects (including calls to non-verified subroutines) for purely functional code.

This is perhaps the most limiting set of restrictions. This essentially lets you embed lambda calculus type ASTs natively in Perl. Alternative representations for this subset of Perl could allow lisp style macros and other interesting compile time transformations, without the difficulty of making that alternative AST feature complete for all of Perl's semantics.

Restrictions that facilitate static binding of OO code

Perl's OO is always late bound, but most OO systems can actually be described statically. These restrictions would allow you to opt in for static binding of OO dispatch for a given hierarchy, in specific lexical scopes. This is a little more complicated than just lexical restrictions on features, since metadata about the classes must be recorded as well.

  • Only allow blessing into a class derived from the current package
  • Enforce my Class $var, including static validation of method calls
  • Disallow introduction of additional classes at runtime (per class hierarchy or alltogether)
  • Based on the previous two restrictions, validate method call sites on typed variable invocants as static subroutine calls (with several target routines, instead of one)
  • Similar the immutable references restriction above, disallow dereferencing of any blessed reference whose class is not derived from the current package.

Restrictions that are easy to opt in to in most code (opting out only as necessary)

These features are subject to lots of criticism, and their usage tends to be discouraged. They're still useful, but in an ideal world they would probably be implemented as CPAN modules.

  • Disallow formats
  • Disallow $[
  • Disallow tying and usage of tied variables
  • Disallow overloading (declaration of overloads, as well as their use)

A note about implementation

Most of these features can be implemented in terms of opcheck functions possibly coupled scrubbing triggered by and end of scope hook. Some of them are static checks at use time. A few others require more drastic measures. For related modules see indirect, Safe, Sys::Protect, and Devel::TypeCheck to name but a few

I also see a niche for modules that implement alternatives to built in features, disabling the core feature and providing a better alternative that replaces it instead of coexisting with it. This is the next step in exploratory language evolution as led by Devel::Declare.

The difficulty of modernizing Perl 5's internals is the overwhelming amount of orthogonal concerns whenever you try to implement something. Instead of trying to take care of these problems we could make it possible for the user to promise they won't be an issue. It's not ideal, but it's better than nothing at all.

The distant future

If this sort of opt-out framework turns out to be successful, there's no reason why use 5.20.0 couldn't disable some of the more regrettable features by default, so that you have to explicitly ask for them instead. This effectively makes Perl's cost model per-per-use, instead of always pay.

This would also increase the likelihood that people stop using such features in new code, and therefore the decision making aspects of the feature deprecation process would be easier to reason about.

Secondly, and perhaps more importantly, it would be possible to try for alternative implementations of Perl 5 with shorter termed deliverables.

Compiling a restricted subset of Perl to other languages (for instance client side JavaScript, different bytecodes, adding JIT support, etc) is a much easier task than implementing the language as a whole. If more feature restricted Perl code would be written and released on the CPAN, investments in such projects would be able to produce useful results sooner, and have clearer indications of progress.

3 comments:

Robert said...

There should just be cycles...deprecate something and a few releases later it is gone. It is not an immediate thing so if people want to upgrade they can re-write and if they don't they stick to the version they are on but that is their choice to do so.

nothingmuch said...

That's an ongoing discussion, and it seems that not many people agree with that sentiment (though many others do).

Anyway, I'm not talking about deprecating features, many of these are legitimate, but arguably only in a limited context, just like dereferencing of strings.

Anonymous said...

Although I've had reason to violate most (all?) of the items on your "static modularity" list, and several in others, I totally agree that these can be made off by default.

I want a runtime that's stricter in using access to external symbols: I've often felt fully qualified names aren't enough; if you try to access something that's not there yet, this should fail louder that it does today (by default).

A variant on the stricter Perl theme is of course educational Perl, where more traps are resisted. This too has to have lexical scope only. One idea is to disallow symbols that differ only in sigil, so you can't have both $moose and @moose. This I don't expect to be needed in Perl 6, where sticky sigils make subscripting less confusing to beginners.