Staged Generic Programming

As with all efforts to simplify programming so that systems of greater complexity can be more easily created, there are issues to consider.

2. Scrap Your Boilerplate

2.1 SYB Basics
SYB is designed for writing generic traversals over data, with specialised behaviour at certain types. There are three key ingredients in the SYB design. First, a run-time type equality test supports examining the type of data during a traversal (Section 2.2). Second, a small collection of shallow traversals supports traversing individual nodes (Section 2.3). Third, a collection of generic łschemesž builds on these traversals to traverse entire values (Section 2.4).

2.2  The First SYB Ingredient: Type Equality Tests

Type equality tests play a central role in SYB. For example, listify p compares the type of each node with the argument type of the predicate p.

The TYPEABLE interface (Figure 1a) supports type equality tests2. Each TYPEABLE instance has three members: the type t, instantiated to a concrete type such as int or bool list; the value tyrep, which represents the type t; and the function eqty, which determines whether tyrep is equal to some other type representation.

The type tyrep, used for type representations, is an open data type [Löh and Hinze 2006]. An instance of TYPEABLE for a type t extends tyrep with a new constructor, which takes as many arguments as t has type parameters. For example, the TYPEABLE instance for int extends TYPEABLE with a nullary constructor Int, and the instance for list adds a unary constructor List (Figure 1b).

Both tyrep and eql (used in the return type of eqty) are generalized algebraic data types (GADTs). A GADT value transports information about type equalities around a program. A match expression that examines a value of type t tyrep to find Int also discovers that the t is equal to int, which appears in the return type of Int; similarly, a match expression that examines a (s˛ t) eql value to reveal Refl also reveals that the types s and t are equal.

In place of GADTs the original SYB uses a general coercion function gcast built on an unsafe coercion. The eqty function can be used to write gcast without unsafe features.

Figure 1b also makes use of modular implicits, an OCaml extension for overloading. The implicit keyword makes a module available for implicit instantiation. Enclosing braces, as in {A TYPEABLE}, mark a parameter as implicit. No corresponding argument need be passed; instead, the compiler instantiates the argument automatically. Implicit arguments can also be supplied explicitly using braces: f {Typeable_int} x. Modular implicits are not essential to the implementation – explicit dictionaries could serve in place of implicit arguments – but significantly improve usability.

type _ tyrep = ..

module type TYPEABLE =

sig

type t

val tyrep ȷ t tyrep lazy_t

val eqty ȷ 's tyrep → (t˛ 's) eql option

end

Fig. 1a. The TYPEABLE interface for type equality tests

type _ tyrep ¸ Int                                  int tyrep                    type _ tyrep ¸ List               'a tyrep               → 'a list tyrep

implicit module Typeable_int             implicit module Typeable_list {A TYPEABLE}

struct                                                            struct

type t int                                                     type t  A.t list

let eqty  function                                     let eqty   function

| Int Some Refl                                    | List a (match A.eqty a with

| _ None                                                                            | Some Refl Some Refl

let tyrep  lazy Int                                                               | None None)

end                                                                          | _ None

let tyrep  lazy (List (force A.tyrep))

end

Fig. 1b. TYPEABLE instances for int and list


2.3   The Second SYB Ingredient: Generic Operations

The second ingredient of SYB is a small set of shallow traversals over data. The function gmapQ is a representative example: it accepts a function q and a value v, applies q to the immediate sub-values of v, and returns a list of the results:

gmapQ q (C (x1 ˛ x2 ˛ . . .˛ x N )) [q x 1   q x 2      . . . q x N ]

Figure 2a lines 1-6 give the definition of the DATA signature3, which includes the operations of TYPEABLE and the gmapQ operation. In the full SYB library DATA supports a few additional traversals. However, to simplify the exposition this paper focuses on gmapQ, since the staging techniques presented apply to the other traversals without significant modification.

The gmapQ function takes two arguments: a generic query of type genericQ (defined on Figure 2a line 1), and a value of type t. The type genericQ is sufficiently general that the query can be applied to any value with a DATA instance; however, the type 'u returned by the query is the same in each case.

Figure 2a lines 8-20 define two implicit instances of the DATA signature. The first,Data_int, defines gmapQ to return an empty list, since an int value has no sub-values. The second, Data_list, defines gmapQ to apply the argument function q to each sub-node and collect the results. The generic type of q allows it to be applied to any value for which a suitable implicit argument is in scope; in particular, it can be applied to x, which has type A.t and to xs, which has type t, since the implicit modules A and Data_list(A) are in scope.

type 'u genericQ = {D:DATA} → D.t → 'u	                1    type 'u genericQ = {D:DATA} → D.t code → 'u
module type rec DATA = sig	                        2    module type rec DATA = sig
  type t	                                        3      type t
  module Typeable : TYPEABLE with type t = t	        4      module Typeable : TYPEABLE with type t = t
  val gmapQ : 'u genericQ → t → 'u list	                5      val gmapQ : 'u genericQ → t code → 'u list code
end	                                                6    end
	                                                7
implicit module Data_int = struct	                8    implicit module Data_int = struct
  type t = int	                                        9      type t = int
  module Typeable = Typeable_int	                10     module Typeable = Typeable_int
  let gmapQ _ _ = []	                                11     let gmapQ _ _ = .<[]>.
end	                                                12   end
                                                        13
implicit module rec Data_list {A:DATA} = struct	        14  implicit module rec Data_list {A:DATA} = struct
  type t = A.t list	                                15    type t = A.t list
  module Typeable = Typeable_list{A.Typeable}	        16    module Typeable = Typeable_list{A.Typeable}
  let gmapQ q l = match l with	                        17    let gmapQ q l = .< match l with
    | []  []	                                        18        []  []
    | x :: xs  [q x; q xs]	                        19      | x :: xs→ [.˜(q <x>).); .˜(q ..)] >
end	                                                20  end
                                                        21
let mkQ {T:TYPEABLE} u (g: T.t  u) {D:DATA} x =	22  let mkQ {T:TYPEABLE} u (g: T.t code  u code) {D:DATA} x =
  match D.Typeable.eqty (force T.tyrep) with	        23    match D.Typeable.eqty (force T.tyrep) with
  | Some Refl  g x	                                24    | Some Refl  g x
  | _  u	                                        25    | _  u
                                                        26
let single p x = if p x then [x] else []	        27  let single p x = ..
                                                        28
let rec listify {T:TYPEABLE} p {D:DATA} x =	        29  let listify {T:TYPEABLE} p = gfixQ (fun self {D:DATA} x 
  mkQ [] (single p) x @ concat (D.gmapQ (listify p) x)	30    .<.˜(mkQ .<[]>. (single p) x) @ concat .˜(D.gmapQ self x)>.)

Fig. 2a. SYB: a cross-section                                                        Fig. 2b. Naively staged SYB: a cross-section

The mkQ function (Figure 2a lines 22-25) builds a generic query from a monomorphic function g and a default value u. The result is a generic query which accepts a further argument x; if x’s type representation is equal to g’s argument type representation then g is applied; otherwise, u is returned.

Here is an example of gmapQ in action:

# gmapQ (mkQ false evenp) [10 ‚

bool list [ true false]

Since gmapQ applies q only to immediate sub-values of v – here the head of the list, 10, and the tail, [20 – only the result for evenp 10 is returned. In place of the tail, gmapQ returns false, i.e. the default passed to mkQ.


2.4   The Third SYB Ingredient: Generic Schemes

The final ingredient of SYB is a set of recursive schemes built atop gmapQ and other traversals. For example, Figure 2a lines 29-30 give a definition of listify, which use an auxiliary function single (line 27), the standard list concatenation function concat, and the generic functions mkQ and gmapQ. The definition of listify may be read as follows: apply p to the current node, if the current node is suitably typed, and append the result of applying listify p to the sub-values. Thus, whereas gmapQ only applies to immediate sub-values, listify recursively traverses entire structures. This technique of building a recursive function such as listify from a shallow traversal such as gmapQ Section 2.3 is sometimes referred to as tying the knot.

Writing generic traversals in this open-recursive style allows considerable flexibility in the form of the traversal, and SYB supports a wide range of traversals besides listify, including gsize (compute the size of a value), everywhere (apply a function to every sub-value), synthesize (build values from the bottom up), and many more.

Dramatis Personñ. We pause to summarise the elements introduced so far. SYB provides interfaces for type equality (Section 2.2) and shallow traversal (Section 2.3), along with recursive schemes such as listify. Library authors may provide instances for the data types they define, following the pattern of Typeable_list and Data_list4. SYB users combine schemes and instances by calling generic schemes with suitable instances in scope. Users are, of course, also free to define their own generic schemes.