Code Generation – a Tabu?

I spend most of my days developing object oriented .NET solutions, doing my best to adhere to best practices like the SOLID and DRY principles. Every once in a while, though, I find myself writing repetitive code. Not the kind of code you write in a hurry because of a tight schedule, but repetitive code enforced by the framework or other external conditions.

Enforced Redundancy

One example is custom Exception classes. The interesting bits of a custom Exception class are really only the class name, the base class and any additional data associated with it. Nevertheless, I must always remember to define a handful of constructors and make sure the class is serializable. The result is a collection of classes that follow a redundant pattern of boilerplate code, just because my programming language does not support generalization of this kind of redundancy.

To avoid having to write this code by hand each and every time, it is tempting to define a code snippet in Visual Studio that generates the skeleton for an Exception class. Then, I would only have to fill in the custom bits like the class name and base class. Problem solved! Or?

What if I make a change to my code snippet? Maybe I want a different formatting of the code, or I want to override a method. These changes would naturally not propagate to the code generated with my old snippet. To avoid inconsistency, I now face a tedious task of updating all the existing code, crossing my fingers that further changes will not be required.

What if changes to the snippet template could automatically update all previously generated code…

Code Generation

This is where code generation enters the picture. Since the DRY principle is about maintainability, it only applies to code that has to be maintained. If only the template adheres to the DRY principle, it does not really matter if the generated code is repetitive.

For .NET developers, T4 is the most accessible tool for code generation. T4 is short for Text Template Transformation Toolkit and is built into Visual Studio. It allows my to define some source data and a template which together produce a text file, typically a source code file. The resulting file is added to the project as a sub-item of the template. Any changes to the T4 template will regenerate the entire output file.

Image

Let us consider the issue with custom Exception classes with our new point of view. With T4, I can simply create a template which defines which classes I want and how I want them generated. Such a T4 template can look like this (the portion of the file you typically maintain is highlighted):

<#@ template language="C#" #>
<#@ output extension=".cs" #>
<#
var exceptions = new []
{
	DefineException("Message"),
	DefineException("BadResponse").DerivedFrom("Message"),
	DefineException("InvalidState")
};
//----------------------------------------------------------------------------------
#>
using System;
using System.Runtime.Serialization;

namespace MyNamespace
{
<# foreach(var exception in exceptions) { #>
	[Serializable]
	public partial class <#= exception.ClassName #> : <#= exception.BaseClassName #>
	{
		public <#= exception.ClassName #> () : base () {}
		public <#= exception.ClassName #> (string message) : base (message) {}
		public <#= exception.ClassName #> (string message, Exception inner) : base (message, inner) {}
		protected <#= exception.ClassName #> (SerializationInfo info, StreamingContext context) : base (info, context) {}
	}

<# } #>
}
<#+ 
//----------------------------------------------------------------------------------
ExceptionDefinition DefineException(string name)
{
	return new ExceptionDefinition { Name = name, BaseName = "" };
}

class ExceptionDefinition
{
	public string Name;
	public string BaseName;

	public string ClassName { get { return Name + "Exception"; } }
	public string BaseClassName { get { return BaseName + "Exception"; } }

	public ExceptionDefinition DerivedFrom(string baseName) { BaseName = baseName; return this; }
}
#>

The code generated from this template looks like this:

using System;
using System.Runtime.Serialization;

namespace MyNamespace
{
	[Serializable]
	public partial class MessageException : Exception
	{
		public MessageException () : base () {}
		public MessageException (string message) : base (message) {}
		public MessageException (string message, Exception inner) : base (message, inner) {}
		protected MessageException (SerializationInfo info, StreamingContext context) : base (info, context) {}
	}

	[Serializable]
	public partial class BadResponseException : MessageException
	{
		public BadResponseException () : base () {}
		public BadResponseException (string message) : base (message) {}
		public BadResponseException (string message, Exception inner) : base (message, inner) {}
		protected BadResponseException (SerializationInfo info, StreamingContext context) : base (info, context) {}
	}

	[Serializable]
	public partial class InvalidStateException : Exception
	{
		public InvalidStateException () : base () {}
		public InvalidStateException (string message) : base (message) {}
		public InvalidStateException (string message, Exception inner) : base (message, inner) {}
		protected InvalidStateException (SerializationInfo info, StreamingContext context) : base (info, context) {}
	}

}

Notice that I make use of partial classes from C#. Remember that Visual Studio regenerates the code whenever the template is touched. Hence, we need a way of augmenting the generated types without modifying the generated file:

namespace MyNamespace
{
	public partial class InvalidStateException
	{
		public int StatusCode { get; set; }
	}
}

Code generation has an undeservedly bad reputation, mainly due to many examples of abuse. And understand me right, code generation must not become your golden hammer. If you use it right, however, code generation can drastically improve the maintainability of a code base. It can also make debugging and troubleshooting easier, as generated code typically has fewer abstractions.

The best developers are those who manage to approach problems from multiple angles, looking for the best solution. Next time you want to create a code snippet, consider if code generation might be a suitable solution.

Links

Picking the Right Tool for the Job

When your only tool is a hammer, every problem looks like a nail.
- Abraham Maslow

I recently organized a coding dojo where we solved the bowling kata. In short, the bowling kata is about programming a score-keeper for a game of ten-pin bowling. At any given time during the game, the score-keeper must be able to yield the current score for all players. Additionally, the program must be able to tell which player is the current player, in order to assign scores correctly.

I began solving the kata in my programming language of choice, C#. The solution naturally converged to an imperative state machine, incrementing scores as the game progressed. This lead to entangled code with many special cases, struggling with the tracking of arbitrary strikes and spares.

Then I realized that the problem is in fact two-fold. One part of the problem is to keep track of which player knocks over which pins, while the other part is the actual calculation of the scores. Given a sequence of numbers representing the amount of pins knocked over, the score can be calculated as a relatively simple function. At this point, I reached for my .NET toolbox and picked the tool best suited for writing functional code, F#.

module BowlingCalculator

[<CompiledNameAttribute("CalculateScore")>]
let calcScore pins =

    let rec calcScore pins frame =

        match pins with

        // Strike with determined bonus
        | 10 :: y :: z :: rest -> 10 + y + z + calcScore (y :: z :: rest) (frame + 1)

        // Strike -without- determined bonus
        | 10 :: y :: [] -> 0

        // Spare with determined bonus
        | x :: y :: z :: rest when x + y = 10 -> 10 + z + calcScore (z :: rest) (frame + 1)

        // Spare -without- determined bonus
        | x :: y :: [] when x + y = 10 -> 0

        // Open frame
        | x :: y :: rest -> x + y + calcScore (rest) (frame + 1)

        // Special last frame
        | x :: y :: z :: [] when frame = 10 -> x + y + z

        // Otherwise
        | _ -> 0

    calcScore pins 1

If you are familiar with functional programming and pattern matching, the code above should be pretty obvious. I will not go into much depth explaining it, but suffice it to say that it is a recursive function traversing the list of pins knocked over, aggregating the score as it goes.

The rest of the program, responsible for keeping track of state, was kept in C#. After adding a reference to the F# module, calling into the calculating function is as simple as:

public class Player
{
    private readonly List<int> pinsKnockedOver;
    
    // snip...
    
    public int CalculateScore()
    {
        var pins = ListModule.OfSeq(pinsKnockedOver);
        return BowlingCalculator.CalculateScore(pins);
    }
}

Both being first class .NET citizens, interoperability between C# and F# is a breeze. The only hitch at this point was that my F# function required an F# list as its argument, while the Player class uses a regular List<T> to keep track of the pins knocked over. ListModule.OfSeq() converts any IEnumerable<T> into an F# list, solving that problem with ease.

The complete source code is available on GitHub at https://github.com/tormodfj/katas/tree/master/mixed/Bowling.

In my opinion, this solution takes the best from two worlds, using the imperative C# for state tracking and the functional F# for calculations. Learning the functional paradigm is like acquiring a new tool in your toolbox, enabling you to view problems from other points of view.

Converting an IList<T> to an FSharpList<T>

When calling F# functions from other .NET languages, you may encounter situations where you need to pass parameters of type 'T list. F# lists are immutable linked lists, appearing as the type FSharpList<T> in other .NET languages. Hence, passing a typical IList<T> is not possible. Luckily, converting an IList<T> to an FSharpList<T> is easily accomplished by recursively calling FSharpList<T>.Cons, passing each element of the source list. I keep the following code around for those occasions:

public static class Interop
{
	public static FSharpList<T> ToFSharpList<T>(this IList<T> input)
	{
		return CreateFSharpList(input, 0);
	}

	private static FSharpList<T> CreateFSharpList<T>(IList<T> input, int index)
	{
		if(index >= input.Count)
		{
			return FSharpList<T>.Empty;
		}
		else
		{
			return FSharpList<T>.Cons(input[index], CreateFSharpList(input, index + 1));
		}
	}
}

Note how F# lists are terminated using FSharpList<T>.Empty. Using this piece of code is as simple as:

var list = new List<int> { 1, 2, 3, 4 };
var fsharpList = list.ToFSharpList();

Update: @rickasaurus made me aware of the List.ofSeq<'T> function in the F# core library. This function solves the same issue. And, unlike my solution, its implementation is not prone to stack overflows when the input list grows large. In C#, this function is called like this:

var list = new List<int> { 1, 2, 3, 4 };
var fsharpList = ListModule.OfSeq(list);

Simple but Useful Extension Methods

In my previous post, I gave a fairly quick introduction to extension methods in C#. This post will present two examples to illustrate how readability can be improved by means of very simple extension methods.

One of the most common checks you perform on a string is whether it has any value. The string type has a static IsNullOrEmpty method intended for this purpose. The reason this method is static is that it could never check for null if it was an instance method. Rather, it would throw a NullReferenceException. Consider this extension method, however.

public static class Extensions
{
	public static bool IsNullOrEmpty(this string value)
	{
		return string.IsNullOrEmpty(value);
	}
}

Being static, this method can be invoked even when value is null. But, due to the fact that it is defined as an extension method, you can invoke it using instance method syntax, improving readability.

string foo = null;
if(foo.IsNullOrEmpty())
{
	// Do something
}

Another common scenario is parsing string values into corresponding enumeration values. Again, .NET provides a static method for this purpose. The Enum type has a static Parse method which takes a Type parameter and a string parameter, and returns an object which then has to be casted to the specified type.

string day = "Monday";
DayOfWeek dayOfWeek = (DayOfWeek)Enum.Parse(typeof(DayOfWeek), day);

The signal-to-noise ratio of that second line of code is rather poor. Consider the following generic extension method.

public static class Extensions
{
	public static T ToEnum<T>(this string value)
	{
		return (T)Enum.Parse(typeof(T), value);
	}
}

Notice how this method does exactly the same as the concrete DayOfWeek example above. With this extension method in place, however, each parse operation can now be reduced to the following.

string day = "Monday";
DayOfWeek dayOfWeek = day.ToEnum<DayOfWeek>();

Again, the major benefit is with the readability.

The examples in this post are extremely simple, but they illustrate how easily you can improve readability by simply wrapping existing functionality in a reasonably named extension methods. For more handy extension methods, I recommend browsing through this StackOverflow thread:
http://stackoverflow.com/questions/271398/post-your-extension-goodies-for-c-net

Extension Methods in C#

Extension methods were introduced as a C# feature in version 3. An extension method is really nothing but a plain old static method. The difference is how you can invoke that static method. Conventionally, a static method is called by explicitly telling the compiler which class contains the method.

int absoluteValue = Math.Abs(-5);

Here, the static method Abs is called on the Math class with the argument -5. If the Abs method was declared an extension method, the first parameter could have been passed using instance method invocation syntax.

int absoluteValue = -5.Abs(); // Not valid

The most obvious place to find extension methods in .NET is in the LINQ namespaces. One such extension method is Enumerable.Where. Consider these two invocations of this method.

var values = new int[]{ 1, 2, 3, 4, 5 };

var filtered1 = values.Where(i => i < 3);
var filtered2 = Enumerable.Where(values, i => i < 3);

The two calls to Where are equivalent. In fact, the C# compiler will simply transform the former syntax into the latter before compilation. This transformation does require, however, that the compiler looks for a static method called Where in all static classes in all included namespaces, but this operation is reasonably fast.

Creating your own extension methods is very easy. The only requirements are that the method is static, its class is static and the first parameter of the method specifies a this keyword. Consider this example

public static class IntExtensions
{
	public static bool IsEven(this int value)
	{
		return value % 2 == 0;
	}

	public static bool IsOdd(this int value)
	{
		return !value.IsEven();
	}
}

These two extension methods will seemingly augment all ints with the two methods IsEven and IsOdd, making the following code compile.

if(2.IsEven() && 3.IsOdd())
{
	Console.WriteLine("All is good");
}

Extension methods can drastically improve readability, especially when performing multiple operations. Consider these two lines of code.

Utils.DoSomethingElse(Utils.DoSomething(Utils.Transform(x, "arg1"), "arg2"));
x.Transform("arg1").DoSomething("arg2").DoSomethingElse();

There is no argument that the second line is much easier to interpret while reading than the first line. Extension methods make such a “chaining” syntax of static method calls possible.

Before you go bananas and convert all your static utility methods to extension methods, however, consider this warning from MSDN:

Extension methods are less discoverable and more limited in functionality than instance methods. For those reasons, it is recommended that extension methods be used sparingly and only in situations where instance methods are not feasible or possible.

In my next post, I will present a couple of simple but handy extension methods which can be useful in most any project.

Tail Recursion in C# and F#

For those of you who are unfamiliar with the notion of tail recursion, let me quote Wikipedia’s definition.

In computer science, tail recursion (or tail-end recursion) is a special case of recursion in which the last operation of the function, the tail call, is a recursive call

Tail recursion is essential in functional languages like F#, where iterative solutions are often implemented using recursion. If the recursion gets too deep, a stack overflow occurs, and your program crashes brutally. The rationale behind tail recursion is that if the recursive call is the last operation of the function, the stack frame of the current function invocation can be discarded before the recursive function invocation is made.

Rather than spending too much time discussing programming theory, let me present two equivalent programs, both containing tail recursion.

C#

class Program
{
	static int n = 1000000;

	static void Countdown()
	{
		if (0 > n--) return;
		Countdown();
	}

	static void Main(string[] args)
	{
		Countdown();
		Console.WriteLine("Done");
	}
}

F#

let n = 1000000

let rec countdown n =
    match n with
    | 0 -> ()
    | _ -> countdown (n-1)

countdown n
printfn "Done"

These two programs are semantically equivalent. They both use tail recursion to count from 1 000 000 to zero, before writing “Done” to the console.

Let us first look at the F# solution. Apart from being precise and easy to comprehend, it actually works. In fact, the F# compiler is smart enough to optimize the countdown function into a simple while loop, producing MSIL equivalent to the following C# code:

public static void countdown(int n)
{
    while (true)
    {
        switch (n)
        {
            case 0:
                return;
        }
        n--;
    }
}

But what about the tail recursive C# solution? While tail recursion optimization has been proposed to Microsoft, the current C# compiler does nothing of the kind. Hence, the resulting MSIL contains a recursive Countdown method. The question is then: “Will the C# solution result in a stack overflow?” Interestingly, the answer is: “It depends.”

It turns out, if you compile the C# code with “Platform target: Any CPU” and run it on a 64-bit version of the Microsoft .NET runtime, the JIT compiler will actually perform tail recursion optimization from the MSIL itself, resulting in a working program. If, however, you compile with “Platform target: x86″ or run the program on a 32-bit version of the Microsoft .NET runtime, a stack overflow occurs. This behavior is described in the blog post “Tail call JIT conditions” by David Broman. Basically, the feature sets of the 64-bit and 32-bit versions of the JIT compiler do not coincide.

So, unless you are 100 % certain that your C# application will run on the 64-bit runtime, do no employ tail recursion with the intent of preventing stack overflows. Then again, if you are writing imperative C# code, tail recursion will probably not cross your mind as the best solution to any of your problems.

Fibonacci Fun with F#

As you probably know, the Fibonacci sequence is the infinite sequence of integers where each element is the sum of the previous two (the first two elements being 0 and 1). Recently, I was inspired by a blog post, Ruby vs. Haskell – project Euler #25 deathmatch. In particular, I enjoyed the Haskell solution for its simplicity and declarativeness.

I decided to try and solve the same problem, but using F#, the functional programming language being introduced as a first class .NET citizen for the first time with Visual Studio 2010. If you have never seen F# code before, the snippets included in this post may be difficult to comprehend, especially if you are used to reading code written in imperative languages like C# or Java.

To declaratively create infinite sequences in F#, the Seq module provides the unfold function. This function takes two parameters, a generator function and an initial state. The generator function must take a state parameter and produce an option tuple with a sequence element and a new state. In F# notation, the unfold function has the signature Seq.unfold : ('State -> 'T * 'State option) -> 'State -> seq<'T>. Note that if the generator function always returns Some(_) and never None, the resulting sequence will be infinite.

An example of Seq.unfold in action is shown in the following one-liner, producing an infinite sequence of all positive integers.

let positiveIntegers = Seq.unfold (fun x -> Some(x, x + 1)) 1

In this example, the generator function takes an integer state as input. The sequence element produced by this function is the current state, while the next state is calculated by incrementing the current state. Thus, each time the generator function is called, the input integer state is one higher than the previous time. The initial state, 1, is passed as the final parameter to Seq.unfold. The result is the sequence “1, 2, 3, …, ∞” (or, strictly speaking, as far as 32 bit integers go).

So, how do we go from this sequence to the Fibonacci sequence? First of all, since each element of the Fibonacci sequence is the sum of the previous two, the state cannot consist only of a single integer. Rather, the state has to be a tuple of two integers. By choosing (0, 1) as the initial state, the generator function can use the first tuple element as sequence output and construct the next state as ([next], [current] + [next]), where [current] and [next] are the first and second element, respectively, from the current state tuple.

Translated into F# code, this yields the following definition of the Fibonacci sequence.

let fibonacci =
    Seq.unfold
        (fun (current, next) -> Some(current, (next, current + next)))
        (0, 1)

When enumerating this sequence, however, one problem becomes apparent. Element number 48 is a negative number. This is definitely erroneous behavior, as the Fibonacci sequence consists solely of positive integers. The error is due to the limited value space of 32 bit integers, causing an overflow. To circumvent this problem, we can use the BigInteger type, capable of representing integers of arbitrary size. The only change we need to make to our original Fibonacci definition is to change the initial state tuple to contain BigInteger values. The F# type inference system handles the rest.

open System.Numerics

let fibonacci =
    Seq.unfold
        (fun (current, next) -> Some(current, (next, current + next)))
        (BigInteger 0, BigInteger 1)

Now to the actual solution to Project Euler #25. Modelled after the previously mentioned Haskell solution, my solution also counts the number of elements in the Fibonacci sequence having a value less than 10999.

Again, translating this into F# results in the following code.

open System.Numerics

let limit = BigInteger.Pow(BigInteger 10, 999)

let fibonacci =
    Seq.unfold
        (fun (current, next) -> Some(current, (next, current + next)))
        (BigInteger 0, BigInteger 1)

let term =
    fibonacci
    |> Seq.takeWhile (fun n -> n < limit)
    |> Seq.length

printfn "%d" term

I am intrigued by how this functional solution focuses on what the Fibonacci sequence is, rather than how it is calculated. Constructing an infinite Fibonacci sequence in C# would typically require an iterator consisting of an infinite loop, representing state with two local variables. Counting elements having a value less than 10999, however, could easily have been accomplished in a functional manner using LINQ.