http://www.spinellis.gr/pubs/jrnl/2000-JSS-DSLPatterns/html/dslpat.html This is an HTML rendering of a working paper draft that led to a publication. The publication should always be cited in preference to this draft using the following reference:
|
Keywords: design patterns; domain-specific languages.
The remainder of this paper is structured as follows: in section 2 we introduce DSLs and outline their differences from executable specification and general purpose languages while in section 3 we present the formalism of design patterns that we use for describing the DSL realisation strategies in section 4. Finally, section 5 concludes this paper with a discussion of the relationships between the outlined design patterns and directions of future research.
DSLs are, by definition, special purpose languages. Any system architecture encompassing one or more DSLs is typically structured as a confederation of modules; some implemented in one of the DSLs and the rest implemented using a general purpose programming language (Figure 1). As a design choice for implementing software systems DSLs present a number distinct advantages over a ``hard-coded'' program logic:
Although the DSL concept bears similarity to executable specification languages [Som89], [TM87] such as OOSPEC [PH95] the DSL approach exhibits some important advantages:
The implementation of a DSL differs from the implementation of a general purpose language. Compilers for general purpose languages are typically structured as a lexical analyser, a parser, a semantic analyser, an optimiser, and a target code generator. In contrast, the limited scope of a DSL allows and requires different implementation strategies. The lexical, syntactic, and semantic simplicity of DSLs often obviate the need for some elements that would be required by a general purpose language compiler; for example instead of using a parser front-end DSL implementations often process the source language using regular expressions. In addition, the often-limited user population of a DSL does not justify a large implementation effort forcing DSL implementers to choose the most economical realisation strategies; as an example compilation into assembly code of the target machine is rarely a practical proposition. Finally, as DSLs are often part of the development process of a larger system, schedule pressures drive DSL builders towards implementation methods that can rapidly deliver results. The aim of this paper is to provide, in the form of a pattern language, a repertoire of methods often used in the implementation of a DSL.
``Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice.'' - [AIS+77]
Twenty years later Gamma et al. [GHJV95] cross-pollinated these ideas into the field of reusable object-oriented software design. Design patterns offer a convenient way to capture, document, organise, and disseminate existing knowledge from a given area in a consistent and accessible format. Patterns differ from algorithms and data structures in that the concepts they describe can not be coded and used as a subroutine or an object class. Patterns also differ from frameworks as they do not describe the structure of a complete system: interrelated patterns are typically used together to solve a general design problem in a given context.
In this paper we describe eight recurring patterns that we have identified as being used for DSL design and implementation. The description of these patterns provides the DSL designers with a clear view of the available DSL realisation strategies, the forces that will guide them towards the selection of a specific pattern, the consequences of that decision, examples of similar uses, and the available implementation alternatives. In our description of the patterns we followed - in free text form - the format and classification used by Gamma et al. [GHJV95]. We classify each pattern as creational if it involves the creation of a DSL, structural if it describes the structure of a system involving a DSL, and behavioural if it describes DSL interactions.
The piggyback structural pattern (Figure 2) uses the capabilities of an existing language as a hosting base for a new DSL. Often a DSL needs standardised support for common linguistic elements such as expression handling, variables, subroutines, or compilation. By designing the DSL on top of an existing language the needed linguistic support is provided ``for free''. The piggyback pattern can be used whenever the DSL shares common elements with an existing language. Typically, the DSL language processor passes the linguistic elements that are expressed in the existing language to the language processor of the existing language. Where the DSL is implemented as a compiled language a typical implementation compiles the DSL code into the base language: DSL code is compiled as needed, while embedded base-language elements are emitted unmodified. Consequently, the resulting output of the compilation consists entirely of the base language. If the DSL is implemented as an interpreter, a similar strategy can be applied if the base language provides a facility for calling its interpreter with suitable arguments from within the DSL interpreter.
Typical examples of this approach are the yacc [Joh75] and lex [Les75] processors. While the specifications of the input grammar (in the case of yacc) and the input strings (in the case of lex) are expressed in a DSL, the resulting actions for recognised grammar rules and tokens are specified in C which is also the processors' output language. Yacc uses the piggyback approach more aggressively as it introduces special variables (denoted by the $ sign) to the C constructs used for specifying the actions.
The piggyback approach resembles in structure compiler front-ends that generate an intermediate language. However, the structure we propose uses an existing, human-readable, and typically general-purpose language, as the compilation target rather than a specialised, machine-readable intermediate language. The effort of translating the DSL into an existing human-readable language instead of implementing an alternative compiler front-end is substantially lower. In addition, the process of this translation is relatively straightforward and can be implemented as a simple source-to-source transformation - often merely using lexical processing constructs as described in section 4.3. In contrast, the implementation of a compiler front-end requires detailed knowledge of the intermediate language, and often intimate knowledge of a specific compiler implementation.
The pipeline behavioural pattern (Figure 3) solves a problem of DSL composition. Often a system can best be described using a family of DSLs. The prototypical example for such an application is the composition of diverse mark-up languages in text processing systems. Different such languages can be used to specify tables [Les79b], mathematical equations [KC74], chemical formulas [BJK87], pictures [Ker82], graphs [BK86], and organic element chemical structures [BJK87]. In cases where a number of DSLs are needed to express the intended operations, their composition can be designed and implemented using a pipeline. Typically, all DSLs are organised as a series of communicating elements. Each DSL handles its own language elements and passes the rest down to the others. Sometimes, the output of one DSL can be expressed in terms of the input expected by another DSL further down the pipeline chain [Ben86]. The use of the pipeline pattern encourages the division of responsibility among small specialised DSLs and discourages bloated feature-rich language designs. The DSL-based system can be built in a stepwise fashion, adding components as needed with new components utilising existing ones.
As suggested by its name, the pattern can often be implemented using a pipeline of independent communicating system processes. Many modern operating systems provide facilities for setting up such a pipeline, while the Unix shells also provide a supporting built-in notation. The pipeline approach has been used by the troff [Oss79] family of text processing tools. Elements of a troff-based text processing pipeline can include eqn [KC74] for processing equations, tbl [Les79b] for processing tables, pic [Ker82] for processing pictures, grap [BK86] for drawing statistical displays, dag [GNV88] for typesetting directed graphs, chem [BJK87] for typesetting chemical structures, and refer [Les79a] for processing references. A similar structure has also been used to produce algorithm animations [BK91]. In addition, if one considers the command line arguments passed to typical Unix commands as a mini-DSL, then typical pipelines of Unix tool invocations can also be considered as an application of this pattern. This mode of use allows the implementation of sophisticated applications such as spell checkers or complicated operations on images and sound using families of tools such as the system's text processing tools, the pbm [P+93] portable bitmap collection, and the sox sound tools.
The lexical processing creational pattern (Figure 4) offers an efficient way to design and implement DSLs. Due to their - by definition - limited field of applicability DSLs impose severe restrictions to the effort that can be used for their design and implementation. Many DSLs can be designed in a form suitable for processing by techniques of simple lexical substitution; without tree-based syntax analysis. The design of the DSL is geared towards lexical translation by utilising a notation based on lexical hints such as the specification of language elements (e.g. variables) using special prefix or suffix characters. The form of input for this family of DSLs is often line-oriented, rather than free form and delimited by character tokens. This design pattern can be used together with the piggyback pattern in cases where, after some lexical processing, the output of the DSL processor can be passed to the processor of the base language.
The utilisation of this pattern lowers the implementation cost for DSLs making them a practical proposition for applications where the cost of a full parser-based translation would not be justified. As translators based on lexical structure are often implemented using interpreted or rapid prototyping languages, the DSL design and implementation can gracefully evolve together in a combined iterative process. Examples of this application include numerous DSLs implemented using tools such as sed [McM79], awk [AKW88], Perl [WS90], Python [Lut96], m4 [KR79], and the C pre-processor. Most of these tools offer a rich set of lexical processing and substitution facilities - often expressed in terms of extended regular expressions - that can be used to implement a complete DSL in tens of lines of code.
The language extension creational pattern (Figure 5) is used to add new features to an existing language. Often an existing language can effectively serve a new need with the addition of a few new features to its core functionality. In this case, a DSL can be designed and implemented as an extension of the base language. The language extension pattern differs from the piggyback pattern by the roles played by the two languages: the piggyback pattern uses an existing language as an implementation vehicle for a DSL, whereas the extension pattern is used when an existing language is extended within its syntactic and semantic framework to form a DSL. The design of a DSL using this pattern involves the addition of new language elements to an existing base language. These elements can include new data types, language block interaction mechanisms, semantic elements, or syntactic sugar. Typically the DSL inherits all syntax and semantics of the base language used, while adding its own extensions. An object-oriented class hierarchy can thus be formed with a number of DSLs being derived from base languages and forming base languages for other DSLs.
The use of the language extension pattern frees the DSL designer from the burden of designing a full featured language. In addition, where the pattern is used to design a non-trivial DSL hierarchy, the pattern offers a clear way of organising the language relationships and interactions [SDE95]. Compiled-language implementations of the extension pattern are often structured in the form of a pre-processor which transforms the DSL into the base language. Alternatively, source-to-source transformations [CHHP91], code composition [SG97b], or intentional programming [Sim95] techniques can be used to augment the language using high level operators. One of the earliest examples of this pattern is the ``rational FORTRAN'' (Ratfor) compiler [Ker75] which provided a structured version of FORTRAN. The implementation of the original C++ compiler (cfront) also used this technique [Str84]. A current effort using the extension pattern involves the addition of generic types to the Java programming language [BOSW98]. Extensions of interpreted languages can also benefit from this design pattern by implementing the language extension using a meta-interpreter [SS86] or a metacircular evaluator [ASS90]. Examples include the examination of abstract syntax trees using an interpreter of Prolog extended with an ambient current object [Cre97] and the extension of ML for graph drawing [KH97].
Language specialisation (Figure 6) is a creational pattern that removes features of a base language to form a DSL. In some cases the full power of an existing language may prevent its adoption for a specialised purpose. A representative case arises when requirements related to the safety or security aspects of a system can be satisfied only be removing some ``unsafe'' aspects (such as dynamic memory allocation, unbounded pointers, or threads) from a language [Mot94]. In such cases a DSL may be designed and implemented as a subset of an existing language. Whenever some specific features of an existing language render it unsuitable for a given application, the design of a DSL following the specialisation pattern can result in a mature language that satisfies the given requirements. The design of the DSL involves the removal from the base language of the unwanted syntactic or semantic features. Since the DSL is effectively a subset of the base language the removal can be guaranteed by a language processor that checks the DSL conformance. In a limited number of cases additional run-time checks may be required. Examples of DSLs designed following the specialisation pattern are Javalight [NvO98] which is a type-safe subset of Java, the educational subsets of Pascal used for a stepwise introduction to the language [Sav95], the HTML [BLC95] application of SGML [ISO86], and the automotive `safer-subset' of C [ER97].
The source to source transformation creational pattern allows the efficient implementation of DSL translators. As outlined in section 2, the resources available for implementing a DSL are often severely constrained. Source to source transformation can be used to ease the burden of implementation. When the DSL can not be designed as a language extension, specialisation, or using the piggyback pattern it is often possible to leverage the facilities provided by existing language tools using a source to source transformation technique. The DSL source code is transformed via a suitable shallow or deep translation process into the source code of an existing language. The tools available for the existing language are then used to host - compile or interpret - the code generated by the transformation process.
When using this pattern one capitalises on the existing language processor infrastructure. This can include optimising compilers, linkers, and native code instruction schedulers. In addition, in some circumstances, even tools that rely on mappings between the source code and the machine code (such as profilers, execution tracers, and symbolic debuggers) can be used. In particular, some candidate host languages such as C offer a mechanism for specifying the file and source line of the DSL code that generated a particular sequence of host code instructions. The use of this pattern makes it also relatively easy to troubleshoot the DSL compilation process, because the resulting code will often be easy to read and reason about. One other possibility involves the translation of the DSL code into the intermediate language used by existing language compilers. The pattern can be implemented using a traditional lexical analysis, parsing, and host code generation process. In addition, a number of tools such as TXL [CHHP91] and KHEPERA [FNP97] can be used to speed-up the implementation process.
The data structure representation creational pattern (Figure 8) allows the declarative and domain-specific specification of complex data. Data-driven code [KP78] relies on initialised data structures whose complexity can often make them difficult to write and maintain. Complicated structures are better expressed using a language rather than their underlying representation (e.g. a graph adjacency list may be easier expressed as a list of path connections). Designing a DSL to represent the data offers an attractive solution to the problem. The pattern is of use whenever a non-trivial data structure (anything other than rectangular arrays) needs to be initialised with data. It is particularly applicable to the initialisation of data structures whose elements are interrelated such as trees, graphs, arrays of pointers to statically initialised structure elements, arrays of pointers to functions, and multilingual text elements.
The DSL typically defines a user-friendly, alternative but isomorphic, representation of the underlying data structure elements. The DSL compiler can then parse the alternative data representation and transform the data elements to the structure needed for the internal representation. The adoption of this pattern minimises the chances of initialising data structures with wrong or inconsistent data, as the DSL compiler can perform such checks when compiling the data into the internal format. In addition, the data can be generated in the most efficient internal representation using tools such as the perfect hash function generator gperf [Sch90]. The pattern is most often implemented as a DSL compiler from the external to the internal representation. Where runtime efficiency is not a major constraint the DSL can be directly coded within the system's hosting language source code utilising the user-friendly alternative data structure and suitably interpreted at runtime. Such strategies are often employed in systems written in interpreted declarative languages such as Prolog or Lisp. A representative example of a DSL based on this pattern is FIDO [KS97] which is designed to concisely express regular sets of strings or trees. Other cases of DSL-based data specifications are the table initialisations generated by yacc [Joh75] and lex [Les75] for the table-driven parsing and lexical analysis automata they create.
The configuration and adaptation of a system can often be relegated to a DSL front-end (Figure 9). Complicated software systems offer hundreds of configuration options, while their users require ever-increasing adaptation possibilities. Adding more features and configuration options can enlarge and complicate the system with diminishing returns on real functionality and user-friendliness. Making the system programmable by means of the DSL front-end structural pattern provides its users with a declarative, maintainable, organised, and open-ended mechanism for configuring and adapting it. Systems with more than a few configuration options, and systems whose operation can not be adequately specified by means of some arguments or a graphical user interface typically benefit from the addition of a DSL front-end.
Using this strategy the system's configuration parameters and internal functionality are exposed as elements of the DSL - e.g. as variables and functions respectively. At this point it is often advantageous to remove from the system all elements that can be specified by means of the DSL, and code them in terms of it, thus simplifying its structure.
Often the addition of a DSL to a system can reveal synergistic effects by enabling its communication with other systems, allowing for the automatic generation of DSL programs with increased functionality, establishing a common language among its user base, providing a mechanism for optimising or checking the system's configuration, and opening a market for third-party add-on applications. The pattern is most often implemented as an interpreted language embedded within the target system. A number of existing interpretative languages have been used or targeted explicitly for this purpose. Lisp-like languages have been used by systems such as the Emacs [Sta84] editor and the AutoCAD package [RH98], while languages such as Tcl [Ous94], Perl [WS90], and Microsoft's Application Basic [Boc99] provide explicit support for system embedding.
We have described a DSL pattern language consisting of eight DSL design patterns. Our pattern language does not include the design of a DSL using traditional programming language design and implementation techniques (lexical analysis, parsing, code generation), as the aspects of those are extensively covered in the existing literature. The relationships of the patterns we described are depicted in Figure 10. The interrelationships between the patterns are both interesting, and typical of a pattern language focused on a specific domain. Throughout our literature research for drafting this work we were impressed by the multitude of DSL designs, implementation strategies, and resulting systems, and the scarcity of supporting design frameworks and methodologies. We hope that the pattern language we presented can be used as a building block for a systematic view of the software development process involving DSLs.