Staged Generic Programming

As with all efforts to simplify programming so that systems of greater complexity can be more easily created, there are issues to consider.

3. Staging SYB, Naively

SYB overhead. It is often observed that SYB has poor performance compared to handwritten code [Adams et al. 2015]. The causes of this inefficiency are various, but most can be traced to various forms of abstraction – that is, to delaying decisions until the last possible moment. For example,

  • Most function calls in an SYB traversal involve polymorphic overloaded functions.
  • Most function calls in an SYB traversal are indirect calls through arguments, not to statically-known functions.
  • Many SYB schemes test for type equality at each node.

Multi-stage programming can eliminate each of these sources of inefficiency, transforming poly-morphic functions into monomorphic functions, indirect calls into direct calls, and dynamic type equality checks into static code specialisers.

For example, compared to the relatively efficient code on page1 that searches for even integers in a value of type (int ˙ bool) list, the call to listify performs a great many fruitless operations, attempting to apply the generic function mkQ [] evenp to every node, including lists and pairs, and dispatching recursive calls through polymorphic functions and DATA instances.

This section shows how to narrow the performance gap between the SYB implementation (Section 2) and the hand-written code. In particular, Section 3.1 and Section 3.2 transform the inefficient SYB implementation step-by-step into a code generator and specialiser. These changes to SYB involve changing two of the three ingredients in the implementation:

The type equality code (Section 2.2) needs no changes, although it will be used statically rather than dynamically – i.e. during code generation, not during code execution.

Generic operations (Section 2.3) are specialised to particular types: gmapQ becomes a code generator (Section 3.1).

Recursive schemes (Section 2.4) are transformed, first into open-recursive functions, and then into generators that build groups of mutually-recursive monomorphic functions.

3.1   Staged Generic Operations

Staging basics. Staging involves introducing quotations and splices to a program in order to change the program’s behaviour so that rather than returning a regular value it constructs code that computes that value. Enclosing an expression e of type t in quotations:

<e>

delays its evaluation so that rather than evaluating to a value of type t it builds a code value of type t code. Conversely, splicing an expression e of type t code into a quotation:

.˜e

indicates that e should be evaluated to a code value which is then inserted into the quotation. Code generated using MetaOCaml [Kiselyov 2014] is guaranteed to be well typed; this is ensured in part by a purely generative design that provides no way to deconstruct code values.

Binding-time analysis. The first step in staging a program, known as binding-time analysis, divides its free variables into static – those whose values are available immediately – and dynamic – those whose values are not yet available. The analysis extends from variables to expressions, classifying those expressions that involve only static variables as static, and other expressions as dynamic. In multi-stage languages, binding-time analysis is typically a task for the programmer, and guides the subsequent insertion of quotations and splices into a program.

SYB has a particularly simple binding-time analysis. Values that describe type structure, passed as implicit arguments, are classified as static, and values to traverse, passed as non-implicit arguments, are classified as dynamic. Consequently, SYB functions are changed from functions that accept both type representations and values at runtime to functions that first accept type representations, which they use to generate code representing functions that accept values.

Staging DATA. Figure 2b shows the changes to DATA. Both the second argument and the result type of gmapQ acquire a code constructor, since both are classified as dynamic (line 5). The argument and result type of the query passed to gmapQ are modified similarly (line 1). However, the query itself is typically supplied directly via a scheme such as listify, and so is left as static.

The implementations of gmapQ for the Data_int and Data_list instances follow straightforwardly from the types. For Data_int the list returned by gmapQ is dynamic (line 11). In Data_list’s gmapQ, l is dynamic, and must be spliced within the quotation that builds the function body (line 17). Similarly, variables x and xs are dynamic, since they are only available when l is examined; they are passed to q as quoted code values. However both q and its implicit argument are static, and so q is called immediately, generating code to be spliced into the body quotation (line 19).

Since mkQ operates on TYPEABLE values, which are only used statically, only the type of mkQ, not its definition, needs to change to give dynamic variables type code (line 22).

The staged mkQ function examines type representations during code generation to determine how to generate code. Here is the code generated at type int → bool for the example from Section 2.3:

.; fun y evenp y

At type int list → bool, the type representations are incompatible and so the generated code is even simpler:

.      fun y false

In both cases, the generated code reveals no trace of either TYPEABLE or DATA. These generic signatures are now used only during traversal generation and are no longer needed for the traversals themselves; they can be discarded, along with the other generic function machinery and its overhead, before the call to the generated function takes place.


3.2   Staged Traversal Schemes

Staging non-recursive code such as gmapQ and mkQ is straightforward. However, the recursive schemes in SYB introduce new challenges for staging. Applying a generic scheme such as listify may involve traversing a number of mutually-recursive types, and so specialising a generic scheme involves generating a set of mutually-recursive functions (an instance of so-called polyvariant specialization.

Here is the type of listify following binding-time analysis, classifying implicit arguments static and others dynamic:

val listify         {T TYPEABLE} (T.t code bool code) T.t list genericQ

Staging listify involves deciding whether the recursion should be considered static or dynamic. Unfortunately, neither option seems to be what is needed. Since the recursion performed by SYB traverses values, which are dynamic, it is clear that the generated code must be recursive. However, if all recursion is left until the dynamic phase then listify will be unable to statically discover the type structure, which is used to generate monomorphic code.

The solution is found in the literature: replacing let rec with a fixed-point operator that traverses the static data (i.e. DATA instances) to generate recursive code. Two features of SYB constrain the required behaviour of the fixed-point operator. First, it must perform memoization to avoid non-termination, since SYB type descriptions may contain cycles; for example, the gmapQ instance for Data_list invokes q with the Data_list instance. Second, it must be able to generate arbitrary-size recursive groups, since an SYB traversal may involve any number of types.

Figure 2b lines 29-30 show the result of staging listify with a fixed-point combinator gfixQ that performs memoization and let rec insertion. The remainder of this section shows how to define that combinator.

Generic fixed-points with memoization. The following definition provides a starting point for gfixQ:


type
'a
map
Nil    'a map
| Cons   (module TYPEABLE with type t      'b) ˙ ('b        'a) ˙ map → map
val new_map                unit → 'a map ref
val lookup  {T TYPEABLE} 'a map → (T.t → 'a) option

val push      {T TYPEABLE} 'a map ref → (T.t → 'a) unit



Fig. 3. TYPEABLE-keyed maps: interface                                                                                                         

                                                    let_locus (fun ()  κ[genlet e])

val let_locus  (unit 'w code) → 'w code           {

val genlet               'a code → 'a code          . let x   e in .˜(let_locus (fun ()       κ[.
                                                        .])) .

Fig. 4a. the let insertion interface                                                       Fig. 4b. let insertion: basic operation

effect enLet             'a code → 'a code                        let let_locus body                

                                                                                          try body ()

with effect (’enLet v) k .      let x     v in .˜(continue . x           .)

Fig. 5a. ’enLet, an effect for let insertion let genlet v perform (’enLet v)       Fig. 5c. let_locus, a handler

Fig. 5b. genlet, a performer of algebraic effects

val gfixQ ('u genericQ → 'u genericQ) 'u genericQ

let rec gfixQ f {D DATA} (x D.t) f {D} (gfixQ f) x

This definition is derived from the standard fixed-point equation fix f f (fix f) by η-expanding to adapt to the call-by-value setting, then generalising over the implicit argument D.

Updating gfixQ to support memoization requires a suitable memo table. Figure 3 gives a definition: each entry in a map value is a pair of an instantiated generic scheme of type 'b → 'a and a TYPEABLE instance for 'b. The operations new_map, lookup and push define creation, resolution and extension operations for map; their implementations are straightforward and omitted.

The map type serves as the basis for a memoizing gfixQ, which interposes map lookups on every recursive call and adds an entry when lookup fails:

let gfixQ f

let m new_map ()       in

let rec h {D DATA} x                                    match lookup {D.Typeable}       with

| Some g g x

| None let g  f h {D}  in push m g g x

in h

let insertion. We take a moment to review standard techniques for let insertion in multi-stage programming as introduced by Kiselyov [2014]. Figure 4a gives the interface, with two operations: let_locus marks a point on the stack which is suitable for let-insertion, and genlet requests that its argument . . be inserted at that point. Figure 4b shows the behaviour: a call to genlet in a context

κ sends the argument e to an enclosing let_locus, which let-binds e to x and continues with x as the argument to κ, enclosing the context with let_locus. (In fact, the behaviour is more sophisticated: genlet searches the stack for the highest insertion point at which its argument is well-scoped.)

let genlet v =                                                
try perform (GenLet v) with Unhandled  v                                                                             
Fig. 6a. Improved genlet: inline when no handler is found

let let_locus body =  
try body () with
effect (GenLet v) k when is_well_scoped v    
match perform (GenLet v) with
 | v  continue k v                                                    
| exception Unhandled                                                 
.< let x = .~v in .~(continue k .< x >.)>.
Fig. 6b. Improved let_locus: insert as high as scoping permits

let genrec k =
 let r = genlet (.<ref dummy>.) in
 genlet (.<.˜r ȷ= .˜(k .< !.˜r >.) >.);
 .< !.˜r >.
Fig. 7. The genrec combinator

let gfixQ f =
 let m = new_map () in
 let rec h {D:DATA} x =
  match lookup {D.Typeable} !m with
  | Some g  .< g x >.
  | None  let g = genrec ␣␣ fun j 
                    push m j;
                    .<fun y  .˜(f h .<y>.)>.
            in .< g x >.
  in h

Fig. 8. The staged gfixQ fixed-point combinator

As the use of contexts suggests, the implementation of these operations typically involves requires some form of delimited control, such as the delimcc or algebraic effects, although implementations based on state are also possible. Figure 5a, Figure 5b and Figure 5c give a simple implementation of let insertion in terms of algebraic effects: the genlet function (Figure 5b) performs the effect ’enLet (Figure 5a) to transfer control to the handler in the body of let_locus (Figure 5c), which builds a let quotation, and passes the bound variable . . via the continuation k back to the context surrounding genlet.

Figures 6a and 6b give a more sophisticated implementation. If no handler is in scope then the genlet of Figure 6a fails gracefully, returning its argument directly to the calling context instead. The handler in let_locus (Figure 6b) is more sophisticated, too: it checks whether the code passed by getlet would be well-scoped in the current context; if so, it first tries to forward the request to the surrounding handler, only generating a let-binding if forwarding fails.

In each of these implementations, the captured continuation k that appears in an effect case within the body of a match or try handler extends to include the handler itself – that is, OCaml implements so-called deep handlers.

let rec insertion. It remains to stage gfixQ and add support for let rec insertion. Staging gfixQ requires a straightforward modification to map and its operations to give the stored functions type ('b → 'a) code.

MetaOCaml does not support generating recursive binding groups of arbitrary size. However, Landin’s observation that such groups may be simulated without recursion using mutable references suggests an encoding in terms of let.

Figure 9

Fig. 9. Naively staged SYB Performance: standard benchmarks

Figure 7 defines a function genrec, which adds a binding to a recursive group by inserting two let bindings: one to introduce a reference r, and a second to assign a value to r. The function k that constructs the right-hand side has access to r, making it possible to build recursive functions: 

Finally, Figure 8 gives a new definition of gfixQ that combines staging, heterogeneous memoization and let rec-insertion. The argument to genrec begins by adding an entry to the table for the fresh binding so that if the table is consulted during the call to f the binding will be available. The full implementation of the library provides a function instantiate (not shown here) that combines the instantiation of a generic function with a call to let_locus, ensuring that bindings are correctly grouped. 

This definition of gfixQ supports both the staged listify (Figure 2b), and the other generic schemes in the SYB library.


3.3 Performance of the Naive Staging

Figure 9 summarises the performance of the naively-staged SYB on a set of benchmarks described by Adams et al. [2015]. For each benchmark the graph records the running time of three implementations of a function on the same data: a hand-written implementation, an unstaged SYB implementation (Section 2), and an implementation generated by staged SYB (Section 3). Each benchmark has a straightforward implementation as a hand-written recursive function, and a succinct implementation as a generic function. For example, here is the hand-written function for SelectInt, which finds and sums all the weights in a weighted tree:

let rec selectInt t = match t with
| Leaf x x
| Fork (l˛ r) selectInt l ¸ selectInt r

| WithWeight (t˛ x) selectInt t ¸ x



And here is an SYB implementation of the same function:

As Figure 9 shows, the SYB implementations are between 14 and 19 times slower than equivalent handwritten functions on the test data, and the staged SYB implementations eliminate the majority of this overhead. Yallop [2016] gives further details. 

Yallop goes on to identify three remaining sources of overhead. First, there is an unnecessary gmapT call for each weight in the generated code for rmWeights. Second, the generated code for selectInts builds an intermediate list, unlike the handwritten code. Third, recursive calls through references are slower than direct calls.


Figure 10. Improved code with simpler recursion