F# vs C# Performance

This article compares F# and C# performance for the Black Scholes equation and finds the F# implementation to be significantly faster, mainly due to the faster power operator available in F# when raising to integer powers. So, given that the F# version can be called from C#, why not use it instead of Math.Pow? Time to run a quick test…

However, it turns out that calling Microsoft.FSharp.Core.Operators.PowInteger directly from C# is over 5x slower than calling System.Math.Pow.

But all is not lost. Instead, I tried defining an F# function:

let pow (x:double) y = pown x y

and then calling that from C#. This was over 5x faster than Math.Pow. Go figure…

The tests were carried out on Windows 7 Professional 64-bit SP1 running on a Core i5 M460 2.53GHz  with 8GB of RAM.

The VS 11 Beta Solution containing all the code is here.

The Two Things About Software Engineering

Oliver Burkeman’s column in today’s Guardian is about the idea that:

For every subject, there are only two things you need to know. Everything else is the application of those two things, or just not important.

The economist Glen Whitman wrote about it and gave some examples about a range of subjects, including software engineering. I thought I’d add my own effort:

  1. Don’t over-engineer your solution: nobody knows what the future may hold.
  2. The future will punish you for your poor design decisions.

What do you think? What is the essence of software engineering?

The Leprechauns of Software Engineering

I recently finished reading a draft of “The Leprechauns of Software Engineering“, a self-published book by Laurent Bossavit that investigates an important issue in software engineering:

The software profession has a problem, widely recognized but which nobody seems willing to do anything about. You can think of this problem as a variant of the well known “telephone game”, where some trivial rumor is repeated from one person to the next until it has become distorted beyond recognition and blown up out of all proportion.

Unfortunately, the objects of this telephone game are generally considered cornerstone truths of the discipline, to the point that their acceptance now seems to hinder further progress.

Some of the “cornerstone truths” that the author investigates include the “10x variation in programmer productivity”, the “cost of change curve” and the “cone of uncertainty”. In all cases he finds little hard evidence to back the claims: it is leprechauns all the way down.

The author has done a great job of presenting the information in a clear and engaging manner. It is to his credit that it never comes across as dry, dusty academic material, despite the painstaking research that has obviously gone into it.

Now, if I were being as diligent as the author, I would want to personally verify all of his claims. However, I’m not made of such stern stuff! This highlights the lack of an accepted peer-review mechanism for software engineering publications: most people don’t have the time and/or inclination to verify each and every claim, but neither can we afford to succumb to confirmation bias and cherry-pick from all the information that comes our way. To address this, the author ends the book with “two modest proposals for publications on software development”. I hope these gain traction as they will only improve the quality of our profession.

If you believe any of these “cornerstone truths” then you should read this book and be prepared to re-evaluate your beliefs in the light of the evidence;  if you are affected by processes that are derived from these “truths” then you should read this book in order to better assess their validity.

And the next time that you hear someone making one of these claims, please remember to ask them: “Can you show me the data?”

My Last Post, Only Better

Shortly after publishing my last post I discovered this video of a talk by Glenn Vanderburg at the Scottish Ruby Conference 2011 where he covered the same ground (and much more). In it he proposes a definition for Software Engineering that recognises the differences from traditional engineering disciplines, and he then argues that Agile practices meet this definition. I highly recommend taking an hour to watch the video.

I may still write some more on the implications of “Code as Design”, but I’ll need to do a more thorough search before posting!

An Overextended Analogy?

Douglas Hofstadter believes that analogy lies at the core of all cognition. Even if you don’t subscribe to this unorthodox opinion, it is hard to argue against the value of a good analogy when getting to grips with a new subject. The danger, of course, is in overextending the analogy; conversely, the key to its effective use is knowing when to stop.

Is the term “Software Engineering” an analogy that has been overextended? It was originally coined by the organisers of the NATO Software Engineering Conferences:

In the fall of 1968 and again the in fall of 1969, NATO hosted a conference devoted to the subject of software engineering. Although the term was not in general use at that time, its adoption for the titles of these conferences was deliberately provocative. As a result, the conferences played a major role in gaining general acceptance, perhaps even premature, for the term.

It is worth reading the thoughts of Brian Randell, who was one of the editors of the proceedings of the two conferences. For me, the key statement is:

Unlike the first conference, at which it was fully accepted that the term software engineering expressed a need rather than a reality, in Rome there was already a slight tendency to talk as if the subject already existed. And it became clear during the conference that the organizers had a hidden agenda, namely that of persuading NATO to fund the setting up of an International Software Engineering Institute. However things did not go according to their plan. The discussion sessions which were meant to provide evidence of strong and extensive support for this proposal were instead marked by considerable scepticism, and led one of the participants, Tom Simpson of IBM, to write a splendid short satire on “Masterpiece Engineering“.

And so the bandwagon started rolling. I think it was reasonable to adopt the term “Software Engineering” to indicate that producing large software systems is analogous to established engineering disciplines in terms of  some of the disciplines required in order to be successful. However, that does not imply that it is similar to more traditional engineering disciplines in all ways: in particular, it does not imply that the “design then build” construction analogy is valid.

This was best explained by Jack Reeves in three articles on Code as Design. I think these articles should be mandatory reading for anyone involved in software development projects, so please take the time to read them for yourself. His key point is that:

The overwhelming problem with software development is that everything is part of the design process. Coding is design, testing and debugging are part of design, and what we typically call software design is still part of design.

Almost 20 years ago he had great insights into the nature of software development, and yet our industry still hasn’t fully grasped the implications. Design is an inherently creative, unpredictable process; construction is an inherently rote (though often very skillful), predictable process. They require different kinds of people and different approaches to project delivery.

It is too late to consign the term “Software Engineering” to the history books, but we need to be aware of the limitations of the analogy. So, if we accept Jack Reeves’ premise, what should we be doing differently?

Where Angels Fear To Tread

I should have known better than to choose performance optimisation as the topic for one of my first posts: any attempts to draw general conclusions are almost inevitably doomed to failure. I’ve had no joy trying to explain the performance difference for the matrix multiplication example: for example, I tried the following code in C#:

private static double MySum(double[] vals)
{
	var sum = 0.0;

	for (int i = 0; i < vals.Length; i++)
	{
		sum += vals[i];
	}

	return sum;
}

I didn’t bother using unsafe code as this example is one of the cases where the array bound check is eliminated by the JIT compiler.

I compared the performance to the following C code, which I called from C# via P/Invoke:

__declspec(dllexport) double Sum(double A[], int n)
{
    double sum = 0.0;

    for (int i = 0; i < n; i++)
    {
        sum += A[i];
    }

    return sum;
}

For an array of 1e8 random numbers in the range [0, 1), the C# code took about 2% longer to run than the C code. The result was the same both when running as a 32 bit application (WOW on Windows 7 64 bit) and when running as a 64 bit application. Great, so C# is just as good as C...until I changed the loop body to:

sum += A[i] * A[i];

Now C# is still 2% slower than C on 32 bit, but 25% slower than C on 64 bit. Also, the difference is due to C# performance deteriorating from 32 to 64 bit rather than the C code improving (it is 2-3% faster running on 32 bit than on 64 bit).

For a final twist I tried changing the loop body to:

sum += Math.Exp(A[i]);

This gave similar results to the multiplication example i.e. C# almost on par with C running on 32 bit, but noticeably worse on 64 bit. At first I thought that the slower 64 bit performance of C# might account for the performance differences found in Peter Sestoft's paper, but it turns out that his figures were produced on a 32 bit OS.

I'm going to admit defeat on this one. Can anyone shed some light?

Seven Languages in Seven Weeks

Seven Languages in Seven Weeks I recently finished reading Seven Languages in Seven Weeks and highly recommend it to anyone who wants to get out of the comfort zone of their own favourite language and gain an understanding of the strengths and weaknesses of different programming models. The book covers Ruby, Io, Prolog, Scala, Erlang, Clojure and Haskell. Clearly such breadth means that the author cannot go into the minutiae of each language, but he does a great job of explaining the philosophy behind each language, together with the key features that support that philosophy.

He also manages to avoid it being a purely theoretical book, providing examples of each language in action together with problems for the reader to work through in their own time. If you work through all of the problems then a week per language seems like a reasonable pace. However, the author does such a good job of explaining the examples that I tried the different approach of skimming through the whole book in little over a week, selectively trying out a few of the problems, and still came away with an appreciation of the strengths of each language. For example, I didn’t even download Erlang, but just reading the chapter and following the examples made it clear how the “let it crash” philosophy together with language support simplifies the process of writing robust applications.

Unknown Unknowns
(Image courtesy of Ami Clarke)

There are many technologies out there, and even within the mainstream ones such as Java and .NET there are vast libraries to be mastered; clearly we have to be selective about where we focus our attention. However I feel that sometimes the truly simple solutions require switching paradigms, and if we aren’t even aware of the possibilities then we are more likely to make poor choices. This book helps fill that knowledge gap in a very accessible manner.

C# Numeric Performance

I’ve been looking into the performance of C# for floating point calculations, and have come across some interesting articles.

Firstly this article compared the performance of C and C# for a Fast Fourier Transform, and found no significant difference in performance between the two languages for this CPU intensive task. Then, this paper (PDF) compared the performance of C, C# (both Microsoft and Mono) and Java for four different numerical calculations: matrix multiplication; a division-intensive loop; polynomial evaluation; and distribution function evaluation. The paper as a whole is definitely worth reading, but I’m going to focus on the C# vs C comparison.

For the division-intensive loop and polynomial evaluation, again there was no significant difference in performance between C and C#. However, for the distribution function evaluation, C# was about 20% slower, which seems to contradict the earlier results. And finally, for matrix multiplication, C was over 4.5 times faster than C#. But simply using a little unsafe code to avoid array bound checks improved the C# performance so that it was about 60% slower than C. This result surprised me; I was aware of unsafe code but had never had cause to use it (or so I thought).

So what does it all mean? Clearly there is a category of floating point tasks for which C# can match C. So what is causing the performance degradation in the remaining cases? I’m hypothesising that loop evaluation is slower in C#, which is why the tight loops in the matrix multiplication example cause it to perform significantly slower that C. As for the distribution function, I’m guessing that the exponential function is the culprit, as the rest of the code is mostly floating point arithmetic which is very similar to the cases where C# matched C performance.

I’m going to investigate further to see if I can test these hunches. In the meantime, if you do most of your development in .NET, I think there is a strong case for remaining in your language of choice before dropping down to unmanaged code for performance reasons:

  1. The debugging experience is far nicer in managed code, allowing you to produce correct code more quickly.
  2. The performance might well be better than you expect!
  3. A little use of unsafe code might provide impressive gains in performance, whilst still allowing your assemblies to target AnyCPU, rather than having to manage bit-specific versions of unmanaged DLLs.

public static void Main() {}

My colleague Richard Brown started blogging last year and has kept posting a steady stream of articles ever since. Inspired by his efforts I’ve decided to try blogging about the software development topics that currently interest me. This will be mainly around .NET development (as that is the focus of my day job) with occasional forays into more general topics.