Better Programming

Advice for programmers.

Follow publication

How to Write Syntax Tree-Based Domain-Specific Languages in Go

The power of AST-based DSLs in representing recursive structures

Victor Brun
Better Programming
Published in
7 min readJul 6, 2022

--

Photo by Jeremy Bishop on Unsplash

For about a year now I have been writing Go, and for an equally long I have been implementing syntax trees. It’s going OK now, but when I started out it was a real struggle due to me not finding any beginner-friendly content covering the subject. So, this is me doing my younger self, and hopefully some of you, a favour.

One of the useful features of syntax tree-based domain-specific languages is their recursive nature, which makes them excellent at representing mathematical areas such as logic and algebra. As a working example, we will thus write a program that can symbolically represent basic algebraic expressions as well as manipulate and evaluate them. If you’re not a maths nerd like me; don’t worry! I will mainly focus on the implementation details of our program, and not the maths.

Prerequisites

Domain-Specific Language (DSL)

You do not have to be an expert on domain-specific languages (DSLs) to follow along. In fact, you don’t really have to know anything about it more than that it is a language with a smaller scope than a general-purpose language (GPR), such as Go. This means that a DSL is optimised to solve problems in one specific area very efficiently (both in terms of computation and implementation) and it may not even have support for doing anything outside that area.

Since DSLs have very limited functionality they are often implemented in other languages. Consequently, a package for building SQL queries could be considered a DSL, where the domain is SQL queries. In the case of our working example, the domain will be algebraic expressions and implemented as a Go package.

Abstract Syntax Tree (AST)

Figure 1: Tree representation of the expression 4+6x.

An abstract syntax tree (AST) is a data structure used to represent the abstract syntactic structure of some data. What this means for us is that, as the name suggests, an AST represents the structure and meaning of some data as a tree. The type of data that is easily modelled by ASTs is usually recursive, e.g. algebraic expressions or source code. ASTs are commonly used in compilers since they make it possible to represent the meaning.

For our intents and purposes, it suffices to realise that an algebraic expression, e.g. 4 + 6x which is presented in figure 1, nicely fits the tree structure. More precisely, each operator (+, -, *, /, Const, x) is represented by a node and its arguments, if it has any, are represented by child nodes.

Combining DSL and AST

Figure 2: Visualisation of evaluation process; going from AST to a number.

With the notion of what a DSL and an AST are we can piece it together to form the idea behind our working example of a syntax tree-based domain-specific languages for simple algebraic expressions.

We want to be able to represent any algebraic expression of one variable containing any, or all, of the basic arithmetic operators (+, -, *, /) and we want some way of evaluating a given expression at some specified value. All this is exemplified in figure 2 where the expression 4 + 6x is represented as an AST and the process of evaluating it at x = 10, resulting in 4 + 6*10 = 64, is visualised.

Implementation

Tree Modelling Using Structs

The most important tool when creating this syntax-based DSL is Go’s type system. However, before starting to construct the tree structure of our AST, we will look at each node in figure 1 and realise that there seem to be two major classes of different nodes; those with children and those without. The former consists of the basic arithmetic operations (+, -, *, /). They all take two arguments, which in our tree structure corresponds to its children. The nodes without children are the leaves of our tree structure and can be seen to consist of constants and variables.

Next, we realise that each of the different node types can all, individually or connected to form a tree, be considered to be algebraic expressions. We therefore define the interface as:

To make the algebraic expressions print nicely, we let this interface implement the String-method.

Now, with the different types of nodes identified and with an interface that captures their similarities, we can with the help of type embedding define all the different types of nodes as structs:

Here, the type embedding refers to the unnamed struct field of type Expr at the top of each struct. This is basically Go’s analogue to inheritance and says that each node type is also a Expr. This makes it possible to define the children, or arguments, to the nodes to be of any type that embeds the interface Expr.

Here are the details of the String-method implementation:

With all this implemented we can construct the expression shown in figure 1 and print it with below program.

> go run main.go
( 4 ) + ( ( 6 ) * ( X ))

The Eval-function

Being able to represent an algebraic expression in Go using structs and type embedding is cool, but we would also like to evaluate it for a given value of x. To do this we extend the interface Expr to also include a function called Eval:

This function’s behaviour is visualised in figure 2; it takes a syntax tree and a value for x and returns the expression evaluated at that value. To implement this we utilise recursion, giving each node type its own Eval implementation with the two types of leaf nodes (Const and x) being the base-cases for the recursion since these have no child nodes. In the case of the non-leaf nodes, the Eval function replaces the syntactical operation with the one actually used in Go, e.g. Add becomes +, and then recursively does the same to the node’s children. In practise this looks like below:

Apart from representing an algebraic expression and printing it to the terminal we can now also evaluate it at a specific value. An example of how to do this is shown in the program below:

> go run main.go
( 4 ) + ( ( 6 ) * ( X )) = 64

Transforming the Syntax Tree

OK, we can represent an algebraic expression as a tree data structure in our program and evaluate it for a given value, so what? Why don’t we just write the expression as a lambda function directly, removing the hassle of trees and recursion? I was hoping you were going to ask, because we have not yet seen the most amazing thing about syntactical representations.

Let’s say, for some reason, that you need to calculate the derivative of our example expression, 4 + 6x. If we would have represented this expression as a lambda function we could only approximate the derivative for a given value of x, but with our AST-based DSL we can actually perform the differentiation according to all the rules we have learned in maths class, and thus, get the exact derivative which we then can evaluate at any point.

To make this possible we, again, need to extend our Expr interface with the differentiation function D:

Then, just like for the Eval-function, we can recursively implement the differentiation rules for each of the different operators:

We can then differentiate our expression by running the below simple program:

> go run main.go
D(( 4 ) + ( ( 6 ) * ( X ) )) = ( 0 ) + ( ( ( 0 ) * ( X ) ) + ( ( 6 ) * ( 1 ) ) )

The output has a lot of redundant terms, which is due to the fact that we never simplify our expression, but it becomes what we expect; D(4 + 6x) = 0 + 0*x + 6*1 = 6.

Conclusion

We have now seen how useful AST-based DSLs can be to represent recursive structures, specifically algebraic expressions, and how to implement them in Go. We have also seen how useful syntactical representations can be when doing symbolic transformations, like differentiation.

If you have found this interesting, you are in luck, because there is a lot more things you can do with ASTs and DSLs. I first came across this topic in the course Domain-Specific Languages of Mathematics at the Chalmers University of Technology, which is an open source course and its content can be found in this Github repo. However, the course’s content is quite maths-heavy but the References in the repo’s README contains some literature that covers the topic without as much maths.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Victor Brun
Victor Brun

Written by Victor Brun

Engineering Mathematics and Computational Science student. I do math stuff, programming and (mathematical) finance.

Responses (1)

Write a response