Monday, May 24, 2010

The coolest code

At [the appropriately-numbered] revision 42 in a certain source control repository lies, without a doubt, the biggest masterpiece of software ever written since the advent of the parentheses:
public static IEnumerable<T> PreOrder<T>(this T startingPoint, Func<T, IEnumerable<T>> children)
{
    yield return startingPoint;
    foreach (var child in children(startingPoint))
    {
        var preOrderedChildren = PreOrder(child, children);
        foreach (var preOrderedChild in preOrderedChildren)
        {
            yield return preOrderedChild;
        }
    }
}
"What is it?" you say? It's a generic, recursive generator, implemented as an extension method with a functor.

"Uhh, so... what does it do?" you counter? It traverses a tree of items (of type T) by yielding them, starting at the provided startingPoint and obtaining the children of a given instance of T using the provided children function object.

"I am from Missouri. You have got to show me." Sure thing! Suppose we have this Node class:
public class Node : IEnumerable<Node>
{
    private readonly IList<Node> _children = new List<Node>();
    public IEnumerable<Node> Children { get { return _children; } }

    private readonly string _name;
    public string Name { get { return _name; } }

    public Node(string name)
    {
        _name = name;
    }

    public Node Add(string nodeName)
    {
        return Add(new Node(nodeName));
    }

    public Node Add(Node node)
    {
        _children.Add(node);
        return node;
    }

    #region IEnumerable<Node> Members
    public IEnumerator<Node> GetEnumerator()
    {
        return Children.GetEnumerator();
    }
    #endregion

    #region IEnumerable Members
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return Children.GetEnumerator();
    }
    #endregion
}
...we can then use the Node class to represent its own high-level parse tree:
var compilationUnit = new Node("Node.cs")
{
    new Node("namespace Test")
    {
        new Node("public class Node : IEnumerable<Node>")
        {
            new Node("public IEnumerable<Node> Children")
            {
                new Node("get;")
            },
            new Node("public string Name")
            {
                new Node("get;")
            },
            new Node("public Node(string name);"),
            new Node("public Node Add(string nodeName);"),
            new Node("public Node Add(Node node);"),
            new Node("#region IEnumerable<Node> Members")
            {
                new Node("public IEnumerator<Node> GetEnumerator();"),
            },
            new Node("#region IEnumerable Members")
            {
                new Node("System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator();"),
            },
        },
    },
};
Notice how the Node constructor accepts both regular arguments (in this case, a string representing the node's Name) as well as a list of Node instances? That feature -- collection initializers -- is made available to classes that implement IEnumerable as well as an Add method.

The PreOrder method can then be called on the compilationUnit instance, as follows, due to the extension method feature:
var sequenceOfNodes = compilationUnit.PreOrder(n => n.Children);
The second argument to the method is a lambda expression that, given an instance of Node, returns an IEnumerable<Node>. In other words, it explains to PreOrder how to get a sequence of Node instances given a single Node instance. In our case, it is rather simple, as the Node class has the Children property for that purpose (and it could have been even simpler than that, since Node implements IEnumerable and thus the second parameter could have been written as n => n).

Anyway, calling the PreOrder method looked like it did nothing and it almost did indeed do nothing, which is the point of a generator: until you start pulling on the IEnumerable, no work is performed and no items are generated. All that was done by calling PreOrder was setting up the generator instance in the sequenceOfNodes variable. Let's actually start generating (a.k.a. yielding):
foreach (var node in sequenceOfNodes)
{
    Console.WriteLine(node.Name);
}
...when that loop starts executing, the code in PreOrder kicks in and the first item yielded is the startingPoint, which was initialized in compilationUnit, so its Name is printed to the console. The children of startingPoint are obtained by calling the children functor on startingPoint itself. As you will remember, that's simply the Children property. The process repeats recursively behind the scenes, yielding a node, then its children, while our loop doesn't need to worry about all of that. The loop will end up printing a flat list version of the original tree.

Conclusion (a.k.a. Too long; didn't read)

In 12 lines of code, I made use of the following groovy C# 3.5 compiler features:
  1. Extension methods: extend a closed type with a static method that appears like an instance method
  2. Lambda expressions: inline, anonymous methods that replace private classes that implement an interface AND can operate on local variables
  3. Generators (a.k.a. Iterators): the yield return keyword in methods that return IEnumerable
  4. Implicitly-typed local variables: the var keyword, to avoid repeating yourself

Bonus

If this modest display of mad skillz hasn't convinced you to switch to .NET 3.5, well, you don't even need to! You can compile all this code with the C# 3.5 compiler but still target the .NET 2.0 runtime or even the JVM!

Targetting the .NET 2.0 runtime

Not only can you make use of the new compiler features, you can also make use of the new IDE features, such as call hierarchy and reference highlighting.
  1. Open Visual Studio (this should work in Visual Studio 2008 and 2010)
  2. File > New > Project...
  3. Select .NET Framework 2.0 from the drop-down list on the right:
  4. Create the ExtensionAttribute replacement by adding a file called ExtensionAttribute.cs in your project with the following contents:
    namespace System.Runtime.CompilerServices
    {
       public class ExtensionAttribute : Attribute { }
    }
    
  5. Create a Delegates.cs file that contains the missing Action and Func delegates from the System namespace (you probably only need up to 4 arguments)
  6. Start writing cool code!

Targetting the JVM

This one is more complicated, but the tool you need to download is Mainsoft Grasshopper. You'll need to perform similar additions of missing attribute and delegates as above, but then you should be fine.

Wednesday, May 19, 2010

Arrange, Act, Assert

There's a pattern in unit test writing that I noticed a few years back, but it wasn't until recently that I discovered this pattern actually has a name: Arrange, Act, Assert. These represent the three terrifying waves distinct phases of a good unit test: the first part prepares the necessary conditions that simulate a scenario or use-case (arrange), the second invokes the functionality being tested (act) and the third checks that some post-conditions hold (assert).

Here's some Java code I was writing at the time I first noticed the pattern (November 2007):
/**
 * Tests the <i>shuffle</i> method against the Collections.shuffle(List<?>,
 * Random) implementation from which it was derived.
 */
@Test
public void shuffle_AgainstReference ( )
{
    // { initialization
    Random randomSource;
    int length = 20;
    double[] sourceArray = new double[length];
    List<Double> expectedList = new ArrayList<Double> ( length );
    for ( int i = 0; i < length; i++ )
    {
        sourceArray[i] = i;
        expectedList.add ( (double) i );
    }
    // }

    // { double-check
    for ( int i = 0; i < length; i++ )
    {
        String message = "Source array is different at index [" + i + "]";
        assertEquals ( message, expectedList.get ( i ), sourceArray[i] );
    }
    // }

    randomSource = new Random ( 42 );
    ArrayUtil.shuffle ( sourceArray, randomSource );

    randomSource = new Random ( 42 );
    Collections.shuffle ( expectedList, randomSource );

    // { validation
    for ( int i = 0; i < length; i++ )
    {
        String message = "Shuffled array is different at index [" + i + "]";
        assertEquals ( message, expectedList.get ( i ), sourceArray[i] );
    }
    // }
}
That code tests that my implementation of ArrayUtil.shuffle() on an array of doubles works just like the implementation of Collections.shuffle(). One will notice that I called the first block or phase "initialization" and the last one "validation" (which, come to think of it, should have been called "verification" -- more on this at Wikipedia). The block labeled "double-check" should probably have been taken out into its own test.

I recently noticed the more formalized use in the NamedStringFormatSolution.zip project (more about this project at Phil Haack's Named Formats Redux blog post), where the 3 phases of unit testing were explicitly called out by comments in the code:
[Fact]
public void Eval_WithNamedExpressionAndFormat_EvalsPropertyOfExpression()
{
    //arrange
    var expr = new FormatExpression("{foo:#.##}");

    //act
    string result = expr.Eval(new { foo = 1.23456 });

    //assert
    Assert.Equal("1.23", result);
}
The arrange phase is sometimes so trivial that its contents is folded into the act phase, but unless a value is repeated in several tests that it becomes cleaner or less error-prone to extract it out in a constant, it should remain in the test for maximum clarity.

So there you have it: the next time you write a test, design it to execute in three distinct phases of arrangement, acting and assertion. Your test will be better designed, easier to read and other maintainers will thank you for it.