Greg Reddick: Code Coverage

2017-06-01

C# Optimization of Switch Statement with Strings

C# does some interesting things when you have a switch statement comparing a lot of strings: Suppose you have this:

   switch (input)
    {
        case "AAAA":
            Console.WriteLine("AAAA branch");
            break;

        case "BBBB":
            Console.WriteLine("BBBB branch");
            break;

        default:
            Console.WriteLine("default branch");
            break;
    }

    Console.WriteLine("Complete");

When you look at the IL (intermediate language) that it compiles into, it is essentially the same as a bunch of if and else if statements. Converted back into C# code, it is as if you wrote this:

    if (input == "AAAA")
    {
        Console.WriteLine("AAAA branch");
    }
    else if (input == "BBBB")
    {
        Console.WriteLine("BBBB branch");
    }
    else
    {
        Console.WriteLine("default branch");
    }

    Console.WriteLine("Complete");

However, if you continue to add case statements, this becomes inefficient. There are a lot of string comparisons that are really expensive. At a certain point, as you add cases, the compiler uses an entirely different technique to handle the cases. It creates a hash table of the strings. The IL looks like this, if it were converted back into C# code (assume there are more case statements):

    string s = input;

    switch (ComputeStringHash(s))
    {
        case 0x25bfaac5:
            if (s == "BBBB")
            {
                Console.WriteLine("BBBB branch");
                goto Label_0186;
            }

            break;

        case 0xff323f9:
            if (s == "AAAA")
            {
                Console.WriteLine("AAAA branch");
                goto Label_0186;
            }

            break;
    }
    Console.WriteLine("default branch");
Label_0186:
    Console.WriteLine("Complete");

The ComputeStringHash method is a pretty simple hash function that looks like this:

    internal static uint ComputeStringHash(string s)
    {
        uint num = 0;

        if (s != null)
        {
            num = 0x811c9dc5;
            for (int i = 0; i < s.Length; i++)
            {
                num = unchecked((s[i] ^ num) * 0x1000193);
            }
        }

        return num;
    }

This is a version of the FNV-1a hashing algorithm.

The change to using hashing seems to occur at about eight string case statements. The advantage is that there will be, on average, just one string comparison, the other comparisons are all comparing uint values. There is some overhead in performing the computing of the hash, which is why it doesn't use it for small number of case statements.

This actually becomes important when you are trying to write unit tests for the code. If you are trying to cover all of the branches in the unit tests, you will need to write code that hashes to 0xff323f9 but is not "AAAA" to get the goto Label_0186 branches to get covered. Your chances of finding something that hashes to the same value as your legitimate "AAAA" string without being "AAAA" is unlikely unless you are specifically trying to get a hash collision. This means that your code coverage will show branches as not being covered, even though you test every case statement in the switch statement. This will show a failure in your code coverage branch statistics (usually around only 60% covered), even though your unit test are actually adequate.

I have been working with the AxoCover and OpenCover programmers to try to get the coverage statistics for branches to be meaningful, but there may be no way to handle this correctly.

Addendum: The logic of the switch statements when it optimizes is slightly more complicated that what is presented above. The C# compiler actually performs a binary search on the hash index rather than just linearly searching through them, before getting to the comparison of the string. Performing hash collisions will raise your coverage to more than 90%, but will not go through all of the code for the binary search.

2017-05-29

Code Coverage with AxoCover

When performing unit tests on code, how do you know if your unit tests are covering all of the code? The answer is a code coverage tool. I first used a tool like this when I was on the Microsoft Access 1.0 programming team, where we got weekly reports on how well the unit tests were covering the code. Microsoft has a code coverage tool in Visual Studio 2017, but only in the Enterprise edition. Unit tests and code coverage go together, and for Microsoft to have unit tests in the Community and Professional releases but not code coverage is kind of dumb.

Fortunately, there is a pretty competent, free, third party choice called AxoCover. You can download it from the Extensions and Updates menu item on the Tools menu in Visual Studio. It places an AxoCover menu item on the Tools menu that brings up a window to control its use. Build your solution once for AxoCover to figure out what is in there to cover. Then on the AxoCover window, click the Run button at the top. This is what it will look like:

AxoCover window in Visual Studio 2017

The critical information is on the Report tab on the left. After running the tests, AxoCover shows how much of the code got covered by percent. There are two numbers: what percent of lines and what percent of branches got covered. If there is an "if" statement, a test that hits the line counts will result in the line counting 100%, but only if both the true and false conditions are tested will the branches show 100%.

AxoCover report tab

The goal here is to get the branches to over 90%. Why not 100%? In production code, there are sometimes error handling code that can happen in only the rarest of conditions. For example: there may be code to handle if the fixed hard drive that you are writing to fails while the program is running. Sometimes trying to create a unit test for these conditions is difficult or impossible. So in general, greater than 90% is considered covered. Obviously, higher is better, but if all of your code is at 90%, you are doing pretty well. So in the example above, the coverage is 85.7%, which is good, but not good enough.

After running the tests, you can right click on a procedure in the report window and select "Show source". The source will show, with some information on the left border on what was covered and what wasn't.

Code window after running AxoCover

The green bar show lines that were covered by the unit tests. The red bar shows lines that were uncovered. The little circles show if branches were covered. In an "if" statement, you want both circles filled in. In the example above, there needs to be a unit test that covers the case "a" branch of the switch statement. After adding a unit test that covers that statement, go back to the Tests tab, build and run the tests. The report tab, then looks like this:

AxoCover report after adding new test case

Here, the coverage is perfect.

If there is one thing lacking on AxoCover, it's the documentation. The Settings tab has pretty much all of the documentation that exists.

AxoCover is really just a UI that integrate into Visual Studio. The engine that drives the code coverage is a different open source project called OpenCover, that has a command-line only interface. OpenCover is installed automatically when you install AxoCover.