Staged Generic Programming

As with all efforts to simplify programming so that systems of greater complexity can be more easily created, there are issues to consider.

7. Related Work

That generic programming libraries often suffer from poor performance is well known, and there have been several investigations into ways to make them more efficient.

Boulytchev and Mechtaev explore how to implement SYB efficiently in OCaml. Their implementation preceded the introduction of modular implicits and GADTs, so they use a type-passing implementation together with a type equality based on an unsafe cast. Instead of language-supported staging, they carefully refactor the SYB code to eliminate inefficiencies, translating to CPS and traversing the type structure in advance to build efficient closure-based traversals. They achieve performance fairly close to hand-written code by combining these transformations with an additional optimisation whose effects are similar to the selective traversal optimisation described in Section 4.5 and Section 4.6.

The work of Adams are comparable to the earlier attempt to stage SYB described in Section 3 improve the performance of the Scrap Your Boilerplate library by means of a domain-specific optimisation,

implemented first as a Hermit script [Farmer et al. 2012], and then as a GHC optimisation. The optimisation seeks to eliminate expressions of łundesirable typesž – that is, expressions corresponding to the dictionaries for the Data and Typeable classes, expressions of type TypeRep, and some associated newtypes – from code that uses SYB by various transformations on the intermediate language. The resulting improvements are impressive, bringing the performance of the SYB benchmarks in Section 3.3 in line with handwritten code. (However, the additional benchmarks introduced in Section 6, which do not visit every node, have no direct counterpart in the work of Adams, and the critical optimizations that eliminate unnecessary traversals (Sections 4.2 and 4.6) are not supported by their implementation.)

The work described in this paper improves on the work of Adams et al. in several ways. First, as the examples in this paper demonstrate, focusing on values of łundesirablež type is not always sufficient to achieve reasonable performance; in particular, it does not help with avoiding fruitless traversals of sub-values, such as searching for integers within the list of strings in the listify example of Section 1. Second, staging avoids the need to go outside the language to improve performance – indeed, the semantics of the language stipulate precisely what code should be generated by the staged SYB library – and so the behaviour of our implementation is not vulnerable to changes in the details of optimization passes or other internal compiler issues. As Adams et al. note, the success of their optimizations depends critically on the details of GHC’s inlining behaviour, and so optimizations that are performed successfully with one version of GHC are found to fail with a later version. Finally, MetaOCaml’s type system justifies a degree of confidence in the correctness of the staged code that is not available in a compiler pass. The types of the staged SYB library are fully integrated with the rest of the program; in contrast, it is easy in a compiler optimization pass to inadvertently generate ill-typed code.

The treatment of implicit arguments as static data in a partial evaluation goes back to Jones, who applies it to the more general case of specializing overloaded functions associated with arbitrary type classes.

Magalhães applies local rewrite rules to another generic programming library for Haskell, generic-deriving, and with careful tuning achieves results equivalent to handwritten code. These results are encouraging, particularly since no compiler modifications are needed. Nonetheless, relying on extra-lingual annotations cannot provide strong guarantees that optimisations will continue to work with future versions of the compiler.

The staged SYB implementation in this paper can be seen as an kind of active library that is, a library which interacts with the compiler in some way to improve performance. Active libraries are most commonly used in scientific programming domains where performance is critical. The implicit thesis of this paper is that the active library approach also has a role to play in significantly improving the performance of very high-level libraries such as SYB, bringing them to a point where they do not suffer significant disadvantages over hand-written code.

Finally, there is an increasing body of evidence that staging can significantly improve the performance of elegant but inefficient libraries such as SYB. Two recent examples are given by Jonnalagedda et al., who present a staging transformation of high-level parser combinators using Scala’s Lightweight Modular Staging, and Kiselyov et al., who use staging techniques to implement a stream library with a high-level interface that generates low-level code with strong performance guarantees. The latter paper implements equivalent staging transformations in two languages (Scala LMS and MetaOCaml); we expect the techniques developed in the present paper to be similarly transferable to a variety of systems, including LMS and the forthcoming Typed Template Haskell, which adds MetaOCaml-style typed quotations to the currently untyped Template Haskell system.