Generic Programming

Site: Saylor Academy
Course: CS102: Introduction to Computer Science II
Book: Generic Programming
Printed by: Guest user
Date: Thursday, May 2, 2024, 10:27 PM

Description

Read this text, which discusses the basics of generic programming and relates it to different languages.

1. Introduction

Generic programming is a style of computer programming in which algorithms are written in terms of types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered by the ML programming language in 1973, permits writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplication. Such software entities are known as generics in Python, Ada, C#, Delphi, Eiffel, F#, Java, Nim, Rust, Swift, TypeScript and Visual Basic. NET. They are known as parametric polymorphism in ML , Scala , Julia , and Haskell (the Haskell community also uses the term "generic" for a related but somewhat different concept); templates in C++ and D; and parameterized types in the influential 1994 book Design Patterns.


Source: Wikipedia, https://en.wikipedia.org/wiki/Generic_programming
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.

2. Stepanov–Musser and other generic programming paradigms

Generic programming is defined in Musser & Stepanov (1989) as follows,

Generic programming centers around the idea of abstracting from concrete, efficient algorithms to obtain generic algorithms that can be combined with different data representations to produce a wide variety of useful software. 

­–­  Musser, David R.; Stepanov, Alexander A., Generic Programming 

Generic programming paradigm is an approach to software decomposition whereby fundamental requirements on types are abstracted from across concrete examples of algorithms and data structures and formalized as concepts, analogously to the abstraction of algebraic theories in abstract algebra. Early examples of this programming approach were implemented in Scheme and Ada, although the best known example is the Standard Template Library (STL), which developed a theory of iterators that is used to decouple sequence data structures and the algorithms operating on them.

For example, given N sequence data structures, e.g. singly linked list, vector etc., and M algorithms to operate on them, e.g. find, sort etc., a direct approach would implement each algorithm specifically for each data structure, giving N × M combinations to implement. However, in the generic programming approach, each data structure returns a model of an iterator concept (a simple value type that can be dereferenced to retrieve the current value, or changed to point to another value in the sequence) and each algorithm is instead written generically with arguments of such iterators, e.g. a pair of iterators pointing to the beginning and end of the subsequence or range to process. Thus, only N + M data structure-algorithm combinations need be implemented. Several iterator concepts are specified in the STL, each a refinement of more restrictive concepts e.g. forward iterators only provide movement to the next value in a sequence (e.g. suitable for a singly linked list or a stream of input data), whereas a random-access iterator also provides direct constant-time access to any element of the sequence (e.g. suitable for a vector). An important point is that a data structure will return a model of the most general concept that can be implemented efficiently ­–­ computational complexity requirements are explicitly part of the concept definition. This limits the data structures a given algorithm can be applied to and such complexity requirements are a major determinant of data structure choice. Generic programming similarly has been applied in other domains, e.g. graph algorithms. 

Note that although this approach often utilizes language features of compile-time genericity/templates, it is in fact independent of particular language-technical details. 

Generic programming pioneer Alexander Stepanov wrote, Generic programming is about abstracting and classifying algorithms and data structures. It gets its inspiration from Knuth and not from type theory. Its goal is the incremental construction of systematic catalogs of useful, efficient and abstract algorithms and data structures. Such an undertaking is still a dream. 

­–­  Alexander Stepanov, Short History of STL 

I believe that iterator theories are as central to Computer Science as theories of rings or Banach spaces are central to Mathematics. 

­–­  Alexander Stepanov, An Interview with A. Stepanov 

Bjarne Stroustrup noted, 

Following Stepanov, we can define generic programming without mentioning language features: Lift algorithms and data structures from concrete examples to their most general and abstract form. 

­–­  Bjarne Stroustrup, Evolving a language in and for the real world: C++ 1991-2006 

Other programming paradigms that have been described as generic programming include Datatype generic programming as described in "Generic Programming ­–­ an Introduction". The Scrap your boilerplate approach is a lightweight generic programming approach for Haskell. In this article we distinguish the high-level programming paradigms of generic programming, above, from the lower-level programming language genericity mechanisms used to implement them (see Programming language support for genericity). For further discussion and comparison of generic programming paradigms.


3. Programming language support for genericity

Genericity facilities have existed in high-level languages since at least the 1970s in languages such as ML, CLU and Ada, and were subsequently adopted by many object-based and object-oriented languages, including BETA, C++, D, Eiffel, Java, and DEC's now defunct Trellis-Owl language.

Genericity is implemented and supported differently in various programming languages; the term "generic" has also been used differently in various programming contexts. For example, in Forth the compiler can execute code while compiling and one can create new compiler keywords and new implementations for those words on the fly. It has few words that expose the compiler behaviour and therefore naturally offers genericity capacities that, however, are not referred to as such in most Forth texts. Similarly, dynamically typed languages, especially interpreted ones, usually offer genericity by default as both passing values to functions and value assignment are type-indifferent and such behavior is often utilized for abstraction or code terseness, however this is not typically labeled genericity as it's a direct consequence of dynamic typing system employed by the language. The term has been used in functional programming, specifically in Haskell-like languages, which use a structural type system where types are always parametric and the actual code on those types is generic. These usages still serve a similar purpose of code-saving and the rendering of an abstraction.

Arrays and structs can be viewed as predefined generic types. Every usage of an array or struct type instantiates a new concrete type or reuses a previous instantiated type. Array element types and struct element types are parameterized types, which are used to instantiate the corresponding generic type. All this is usually built-in in the compiler and the syntax differs from other generic constructs. Some extensible programming languages try to unify built-in and user defined generic types.

A broad survey of genericity mechanisms in programming languages follows. For a specific survey comparing suitability of mechanisms for generic programming.


3.1. In object-oriented languages

When creating container classes in statically typed languages, it is inconvenient to write specific implementations for each datatype contained, especially if the code for each datatype is virtually identical. For example, in C++, this duplication of code can be circumvented by defining a class template:

template <typename T>

class List {

  // Class contents.

};

List<Animal> list_of_animals;

List<Car> list_of_cars;

Above, T is a placeholder for whatever type is specified when the list is created. These "containers-of-type-T", commonly called templates, allow a class to be reused with different datatypes as long as certain contracts such as subtypes and signature are kept. This genericity mechanism should not be confused with inclusion polymorphism, which is the algorithmic usage of exchangeable sub-classes: for instance, a list of objects of type Moving_Object containing objects of type Animal and Car. Templates can also be used for type-independent functions as in the Swap example below:

// "&" passes parameters by reference

template <typename T>

void Swap(T& a, T& b) {

  T temp = b;

  b = a;

  a = temp;

}

std::string hello = "World!";

std::string world = "Hello, ";

Swap(world, hello);

std::cout << hello << world << std::endl;  // Output is "Hello, World!".

The C++ template construct used above is widely cited as the genericity construct that popularized the notion among programmers and language designers and supports many generic programming idioms. The D programming language also offers fully generic-capable templates based on the C++ precedent but with a simplified syntax. The Java programming language has provided genericity facilities syntactically based on C++'s since the introduction of J2SE 5.0.

C# 2.0, Oxygene 1.5 (also known as Chrome) and Visual Basic .NET 2005 have constructs that take advantage of the support for generics present in the Microsoft .NET Framework since version 2.0.

3.2. Generics in Ada

Ada has had generics since it was first designed in 1977–1980. The standard library uses generics to provide many services. Ada 2005 adds a comprehensive generic container library to the standard library, which was inspired by C++'s standard template library.

A generic unit is a package or a subprogram that takes one or more generic formal parameters.

A generic formal parameter is a value, a variable, a constant, a type, a subprogram, or even an instance of another, designated, generic unit. For generic formal types, the syntax distinguishes between discrete, floating-point, fixed-point, access (pointer) types, etc. Some formal parameters can have default values.

To instantiate a generic unit, the programmer passes actual parameters for each formal. The generic instance then behaves just like any other unit. It is possible to instantiate generic units at run-time, for example inside a loop.

Example

The specification of a generic package:

 generic
    Max_Size : Natural; -- a generic formal value
    type Element_Type is private; -- a generic formal type; accepts any nonlimited type
 package Stacks is
    type Size_Type is range 0 .. Max_Size;
    type Stack is limited private;
    procedure Create (S : out Stack;
                      Initial_Size : in Size_Type := Max_Size);
    procedure Push (Into : in out Stack; Element : in Element_Type);
    procedure Pop (From : in out Stack; Element : out Element_Type);
    Overflow : exception;
    Underflow : exception;
 private
    subtype Index_Type is Size_Type range 1 .. Max_Size;
    type Vector is array (Index_Type range <>) of Element_Type;
    type Stack (Allocated_Size : Size_Type := 0) is record
       Top : Index_Type;
       Storage : Vector (1 .. Allocated_Size);
    end record;
 end Stacks;

Instantiating the generic package:

 type Bookmark_Type is new Natural;
 -- records a location in the text document we are editing

 package Bookmark_Stacks is new Stacks (Max_Size => 20,
                                        Element_Type => Bookmark_Type);
 -- Allows the user to jump between recorded locations in a document

Using an instance of a generic package:

 type Document_Type is record
    Contents : Ada.Strings.Unbounded.Unbounded_String;
    Bookmarks : Bookmark_Stacks.Stack;
 end record;

 procedure Edit (Document_Name : in String) is
   Document : Document_Type;
 begin
   -- Initialise the stack of bookmarks:
   Bookmark_Stacks.Create (S => Document.Bookmarks, Initial_Size => 10);
   -- Now, open the file Document_Name and read it in...
 end Edit;

3.3. Advantages and limitations

The language syntax allows precise specification of constraints on generic formal parameters. For example, it is possible to specify that a generic formal type will only accept a modular type as the actual. It is also possible to express constraints between generic formal parameters; for example:

 generic

    type Index_Type is (<>); -- must be a discrete type

    type Element_Type is private; -- can be any nonlimited type

    type Array_Type is array (Index_Type range <>) of Element_Type;

In this example, Array_Type is constrained by both Index_Type and Element_Type. When instantiating the unit, the programmer must pass an actual array type that satisfies these constraints.

The disadvantage of this fine-grained control is a complicated syntax, but, because all generic formal parameters are completely defined in the specification, the compiler can instantiate generics without looking at the body of the generic.

Unlike C++, Ada does not allow specialised generic instances, and requires that all generics be instantiated explicitly. These rules have several consequences:

  • the compiler can implement shared generics: the object code for a generic unit can be shared between all instances (unless the programmer requests inlining of subprograms, of course). As further consequences:
    • there is no possibility of code bloat (code bloat is common in C++ and requires special care, as explained below).
    • it is possible to instantiate generics at run-time, as well as at compile time, since no new object code is required for a new instance.
    • actual objects corresponding to a generic formal object are always considered to be non-static inside the generic; see Generic formal objects in the Wikibook for details and consequences.
  • all instances of a generic being exactly the same, it is easier to review and understand programs written by others; there are no "special cases" to take into account.
  • all instantiations being explicit, there are no hidden instantiations that might make it difficult to understand the program.
  • Ada does not permit "template metaprogramming", because it does not allow specialisations.

3.4. Templates in C++

C++ uses templates to enable generic programming techniques. The C++ Standard Library includes the Standard Template Library or STL that provides a framework of templates for common data structures and algorithms. Templates in C++ may also be used for template metaprogramming, which is a way of pre-evaluating some of the code at compile-time rather than run-time. Using template specialization, C++ Templates are considered Turing complete.


3.5. Technical overview

There are two kinds of templates: function templates and class templates. A function template is a pattern for creating ordinary functions based upon the parameterizing types supplied when instantiated. For example, the C++ Standard Template Library contains the function template max(x, y) that creates functions that return either x or y, whichever is larger. max() could be defined like this:

template <typename T>

T max(T x, T y) {

  return x < y ? y : x;

}

Specializations of this function template, instantiations with specific types, can be called just like an ordinary function:

std::cout << max(3, 7);  // Outputs 7.

The compiler examines the arguments used to call max and determines that this is a call to max(int, int). It then instantiates a version of the function where the parameterizing type T is int, making the equivalent of the following function:

int max(int x, int y) {

  return x < y ? y : x;

}

This works whether the arguments x and y are integers, strings, or any other type for which the expression x < y is sensible, or more specifically, for any type for which operator< is defined. Common inheritance is not needed for the set of types that can be used, and so it is very similar to duck typing. A program defining a custom data type can use operator overloading to define the meaning of < for that type, thus allowing its use with the max() function template. While this may seem a minor benefit in this isolated example, in the context of a comprehensive library like the STL it allows the programmer to get extensive functionality for a new data type, just by defining a few operators for it. Merely defining < allows a type to be used with the standard sort(), stable_sort(), and binary_search() algorithms or to be put inside data structures such as sets, heaps, and associative arrays.

C++ templates are completely type safe at compile time. As a demonstration, the standard type complex does not define the < operator, because there is no strict order on complex numbers. Therefore, max(x, y) will fail with a compile error, if x and y are complex values. Likewise, other templates that rely on < cannot be applied to complex data unless a comparison (in the form of a functor or function) is provided. E.g.: A complex cannot be used as key for a map unless a comparison is provided. Unfortunately, compilers historically generate somewhat esoteric, long, and unhelpful error messages for this sort of error. Ensuring that a certain object adheres to a method protocol can alleviate this issue. Languages which use compare instead of < can also use complex values as keys.

The second kind of template, a class template, extends the same concept to classes. A class template specialization is a class. Class templates are often used to make generic containers. For example, the STL has a linked list container. To make a linked list of integers, one writes list<int>. A list of strings is denoted list<string>. A list has a set of standard functions associated with it, that work for any compatible parameterizing types.

3.6. Template specialization

A powerful feature of C++'s templates is template specialization. This allows alternative implementations to be provided based on certain characteristics of the parameterized type that is being instantiated. Template specialization has two purposes: to allow certain forms of optimization, and to reduce code bloat.

For example, consider a sort() template function. One of the primary activities that such a function does is to swap or exchange the values in two of the container's positions. If the values are large (in terms of the number of bytes it takes to store each of them), then it is often quicker to first build a separate list of pointers to the objects, sort those pointers, and then build the final sorted sequence. If the values are quite small however it is usually fastest to just swap the values in-place as needed. Furthermore, if the parameterized type is already of some pointer-type, then there is no need to build a separate pointer array. Template specialization allows the template creator to write different implementations and to specify the characteristics that the parameterized type(s) must have for each implementation to be used.

Unlike function templates, class templates can be partially specialized. That means that an alternate version of the class template code can be provided when some of the template parameters are known, while leaving other template parameters generic. This can be used, for example, to create a default implementation (the primary specialization) that assumes that copying a parameterizing type is expensive and then create partial specializations for types that are cheap to copy, thus increasing overall efficiency. Clients of such a class template just use specializations of it without needing to know whether the compiler used the primary specialization or some partial specialization in each case. Class templates can also be fully specialized, which means that an alternate implementation can be provided when all of the parameterizing types are known.

3.7. Advantages and disadvantages

Some uses of templates, such as the max() function, were previously filled by function-like preprocessor macros (a legacy of the C programming language). For example, here is a possible max() macro:

#define max(a,b) ((a) < (b) ? (b) : (a))

Macros are expanded by preprocessor, before compilation proper; templates are expanded at compile time. Macros are always expanded inline; templates can also be expanded as inline functions when the compiler deems it appropriate. Thus both function-like macros and function templates have no run-time overhead.

However, templates are generally considered an improvement over macros for these purposes. Templates are type-safe. Templates avoid some of the common errors found in code that makes heavy use of function-like macros, such as evaluating parameters with side effects twice. Perhaps most importantly, templates were designed to be applicable to much larger problems than macros.

There are four primary drawbacks to the use of templates: supported features, compiler support, poor error messages, and code bloat:

  1. Templates in C++ lack many features, which makes implementing them and using them in a straightforward way often impossible. Instead programmers have to rely on complicated tricks which leads to bloated, hard to understand and hard to maintain code. Current developments in the C++ standards exacerbate this problem by making heavy use of these tricks and building a lot of new features for templates on them or with them in mind.
  2. Many compilers historically have poor support for templates, thus the use of templates can make code somewhat less portable. Support may also be poor when a C++ compiler is being used with a linker that is not C++-aware, or when attempting to use templates across shared library boundaries. Most modern compilers however now have fairly robust and standard template support, and the new C++ standard, C++11, further addresses these issues.
  3. Almost all compilers produce confusing, long, or sometimes unhelpful error messages when errors are detected in code that uses templates. This can make templates difficult to develop.
  4. Finally, the use of templates requires the compiler to generate a separate instance of the templated class or function for every permutation of type parameters used with it. (This is necessary because types in C++ are not all the same size, and the sizes of data fields are important to how classes work). So, the indiscriminate use of templates can lead to code bloat, resulting in excessively large executables. However, judicious use of template specialization and derivation can dramatically reduce such code bloat in some cases:

So, can derivation be used to reduce the problem of code replicated because templates are used? This would involve deriving a template from an ordinary class. This technique proved successful in curbing code bloat in real use. People who do not use a technique like this have found that replicated code can cost megabytes of code space even in moderate size programs.

­– Bjarne Stroustrup, The Design and Evolution of C++, 1994

The extra instantiations generated by templates can also cause debuggers to have difficulty working gracefully with templates. For example, setting a debug breakpoint within a template from a source file may either miss setting the breakpoint in the actual instantiation desired or may set a breakpoint in every place the template is instantiated.

Also, because the compiler needs to perform macro-like expansions of templates and generate different instances of them at compile time, the implementation source code for the templated class or function must be available (e.g. included in a header) to the code using it. Templated classes or functions, including much of the Standard Template Library (STL), if not included in header files, cannot be compiled. (This is in contrast to non-templated code, which may be compiled to binary, providing only a declarations header file for code using it). This may be a disadvantage by exposing the implementing code, which removes some abstractions, and could restrict its use in closed-source projects.

3.8. Generics in Java

Support for the generics, or "containers-of-type-T" was added to the Java programming language in 2004 as part of J2SE 5.0. In Java, generics are only checked at compile time for type correctness. The generic type information is then removed via a process called type erasure, to maintain compatibility with old JVM implementations, making it unavailable at runtime. For example, a List<String> is converted to the raw type List. The compiler inserts type casts to convert the elements to the String type when they are retrieved from the list, reducing performance compared to other implementations such as C++ templates.