I have assembled the Triforce

Recently I put in the final piece of a mini project I’d been thinking about for a while.  I made a change to an Agent Ralph source file and committed it to my Mercurial repo.  Within about a minute I was browsing the MbUnit test results.  Now that may not sound very exciting, but it is.  It is because all of this was conducted online, through my browser, on a pc that didn’t have Mercurial, Visual Studio, or any development tools whatsoever.  I was coding in the cloud, baby.

Now with Agent Ralph I have boiled my unit tests down to the point where they are just code snippets.  Adding a new test case is as simple as dropping a csharp file into a directory.  It is automatically picked up and fed into a test harness, which parses the code and gets all Ralphy on it.  Here’s a sample, right out of the repo:

using System;
namespace AgentRalph.CloneCandidateDetectionTestData
{
    public class CloneInForeachBlock
    {
        void Target()
        {
            foreach (int i in new int[] { })
            {
                /* BEGIN */
                Console.WriteLine(7);
                /* END */
            }
        }

        private void Expected()
        {
            Console.WriteLine(7);
        }
    }
}

This sample comes from the CloneCandidateDetectionTestData folder.  Any file in that folder is assumed to hold a class containing at least two methods, Target and Expected.  Target is scanned for clones, and the test passes if a clone is found that matches Expected AND consists of all the code between the START and END ‘markup’ embedded in the comments.   Thanks to the magic of MbUnit’s generative tests each code file appears as it’s own test, as if each were a method with it’s own [Test] attribute.  So you see, whipping up new tests is crazy easy.

Occasionally an idea for a test case will come to me, and it’s usually when I’m at a place where I have no access to Agent Ralph code.  Like work.  Typically I’d make some notes in an email and code it up when I got home.  It got me thinking, I don’t really need Visual Studio and the whole dev setup to create these test cases.  They’re just simple code snippets, scraped out of a directory.  I could be scraping them from anywhere, like off a wiki even.  The next thought of course was to run that test automatically.  How could I put this all together?

Building and running tests is easy, any continuous integration server would do.  I chose TeamCity, which we’ve been using at work and is just great.  I can’t say enough nice things about it.  For this mini-project, it’s easy third party report integration and build artifact downloading features were exactly what I wanted.  A little dyndns magic and I had it online.  My MbUnit tests look pretty nice I think.

At some point I stumbled across Bespin.  Bespin is a Mozilla project that “proposes an open extensible web-based framework for code editing”.  It’s a web based, code centric text editor, and more.  One part of the more is version control integration.  Mercurial is supported, which is what I use for Agent Ralph.  I can pull down the code, hack on it, and push changes back out to the Google code hosted repository, all right in the browser.  There’s the missing piece, my editor.

And of course, Google Code is the glue that brings them together.

And there you have it, coding in the cloud!  This graphic ought to help you fully grasp the awesomeness.

MyForce

The Oscillating Shrinking Window

I talked previously about how Agent Ralph identifies functionally equivalent methods.  It also can find clones embedded within a method, as shown at the end of my last post.  This time I’ll talk a bit about how that works.

Agent Ralph’s preferred unit of comparison is the method.  The comparison implementation accepts two methods and walks their ASTs, comparing each node.  To find clones embedded within a method then it must somehow convert a subset of method statements into a new method.  This provisional method need only exist for the lifetime of the comparison.  If the provisional method matches some other existing method then the set of statements under consideration are a clone.

So, how to create that provisional method?  If you guessed the Extract Method refactoring then indeed you are correct.  Agent Ralph will scan a method from top to bottom choosing sub sets of contiguous statements to perform an extract method on.  If the Extract Method operation is successful and the new method matches some other method, we’ve found a clone.

A “window” is a set of contiguous statements.  We start with the largest possible window, and that consists of all the statements except the last.  (Using all the statements would simply create an identical function.)  Next, perform an Extract Method operation, which creates the provisional method, and compare.  Then proceed to the next window and repeat.  The next window is determined by shifting the window down one statement.  IOW, we now include the last statement and exclude the first statement.  Extract method, compare, continue.  At this point we’ve run out of windows at this size so we shrink the window size by one and start at the top again.  When the window size reaches 1 that is the last iteration.

I call this algorithm the Oscillating Shrinking Window.  An illustrative graphic is in order, and nothing says cutting edge technology like an animated gif:

Each yellow block encompasses a set of statements that will have an Extract Method performed on them.

Some statements can of course contain sets of statements as children.  If statements, for loops, ect.  This requires recursing into the children of said statements and repeating the algorithm.  Another graphic:

As you might guess, this is very computationally expensive.  Indeed, the algorithmic complexity is at least O(n^3) where n is number of AST nodes.  The at least qualifier is on there because I didn’t prove it for all cases.  I choose a single particular case and analyzed that (with some help[1]).  The case was a tree of n nodes where the root node has n-1 child nodes.  (Think very wide and very shallow.)  Heuristics will play a large part in making this actually usable.

[1]    I brought in a consultant.  Sean helped me with the math in exchange for one large Franco Cajun Pride and a Diet Dr. Pepper.

Agent Ralph In Action

I’ve been yack yack yacking about clone detection and Agent Ralph.  It’s time to put up or shut up.  This post is some screen shots of Agent Ralph in action.

Agent Ralph‘s front end is a Resharper plug-in.  Any clones detected are passed up to the plug-in which presents them to the user as highlights and quick fixes.  This is how we achieve the automated repair that a modern clone tool needs.  The backend scans source files handed to it by the front end, using the techniques I’ve been blogging about.  Specifically, clones are identified by comparing abstract syntax trees of methods.  The ASTs may be modified by the application of safe refactorings (refactorings that do not change the inputs, outputs, or side effects).  If an AST can be safely coerced until it matches another then we can consider the originals functionally equivalent clones.  This technique will detect clones that would otherwise be overlooked by text based clone finders.

So, here’s the basic case.  Two identical methods:

identicalmethodshighlight-cropped

Note the Resharper squigglies telling us something is up.  Passing the mouse over either method name brings up a tooltip identifying the method as a clone of the other.

Placing the cursor on the method name prompts you with a quick fix

identicalmethodsquickfix-cropped

…and invoking it…

identicalmethodsquickfixapplied-cropped

…replaces the body of the clone with a call to the original.  That’s automated clone repair!  An inline method applied to Test1 will complete the removal.

The next methods are identical, but only if a rename local variable refactoring is applied.  And indeed you can see that it is, indicated by the highlighting and quickfix offering.

clonewithrenamelocal-cropped

The last example is one I am particularly proud of.   Here we are detecting a clone that is a block within a larger method.  Methods EmbeddedClone1 and EmbeddedClone2 both contain clones of Test2.

embeddedclonequickfix-cropped

Thus far I’ve restricted myself to using methods as the only unit of comparison.  Doing so made it easier to reason and implement as I worked through ideas.  At some point I realized that I could use an extract method refactoring to create provisional methods from indiscriminate code blocks on the fly.  If the provisional method is a clone then it follows that the original code block is a clone.  In this way I can continue to think and code in terms of methods, yet rely on the extract method refactoring to apply my algorithms to sub-units of method (aka, arbitrary blocks and statements).

An Idea For Robust Clone Detection Using Abstract Syntax Trees

My last post concluded with the promise to go into detail on some implementation ideas of my clone analyzer.

As I argued previously, a text based matching tool is not good enough, it’s simply too easy to fool. What we want is a matching tool that considers the full syntax of the language being analyzed. That leads us to a solution based on the analysis of abstract syntax trees (ASTs).

ASTs can be generated easily from a partial compilation of the files under analysis. Basic comparison is easy too. It’s a straightforward tree walking algorithm. There are many ways to do it, and I’ll go into my implementation later. Where things get interesting is when we consider ASTs that are functionally equivalent but not identical. That is, ASTs that differ in unimportant ways. The initial impulse is to begin ‘relaxing’ the tree walking comparison. I.e., ignore things like local variables names, parameter order, and other obvious irrelevancies to the concern of functional equivalence. Instead I proposed that we attempt to refactor one tree and see if we can transform it into an AST that does match. If so, we can conclude the original ASTs match. We don’t actually need to perform these refactorings ‘for real’. Knowing the safe transform exists is enough for a clone repair tool to replace one method with the other.

Now, my theory here can be divided into two distinct parts. First, we need to be able to tell when any two methods are completely identical. Second, if we can we take two non matching methods and automatically apply a series of safe refactorings that will convert one into an identical match of the other then we can conclude that the original methods are clones.

Part I – Matching Identical ASTs

The initial match algorithm is a careful tree traversal with a node for node comparison which exits at the first mismatch. For this part of the project I relied on the open source and very nice NRefactory project. It includes a C# parser, among other useful stuff. Thanks to the availability of source and decent examples I was able to get up and running very quickly.

The first step is to get an AST by passing the class file to a Parse() function. One caveat of my implementation is that it will not work on code that does not compile. When Parse() encounters a syntax error it returns null. In practice, I don’t anticipate this limitation having much effect on usefulness.

The AST generated from this method…

int Foo() {
    return 7 + 8 * (4 - 6);
}

…looks like this[1]:
syntax_tree
There’s a couple of things to note about these ASTs. Each node has a distinct type like MethodDeclaration, ReturnStatement, Operator, Literal, ect. Some nodes also have other properties. For example, Operators have an Op property that is (in this example) one of +, -, or *. Literals hold the literal value in the property named Val (’7′, ’8′, ’4′, and ’6′ here), and the type of that value, called Type.

We now need to compare the ASTs. Let’s call the function to do this Compare(left_tree, right_tree):bool. Starting at the root node (in this case, a MethodDeclaration node) of the left hand tree we begin walking that tree. At each left hand node we compare to the corresponding right hand node. The individual node comparison first checks that the node types match (both are Operators, both are Literals, ect). Then it compares the values of each of the node properties. At this point we have confirmed the node matches and we can proceed to it’s children.

The actual implementation is based on a slightly modified Visitor pattern. Each of the Visitor class’s Visit methods take a second parameter of type INode (base class of all AST nodes), in addition to the normal strictly typed first parameter. The second parameter is there because we need to drag the right hand node along on each Visit call, and then pass corresponding right hand child node(s) to each Accept call. Here’s the partial IVisitor definition:

public interface IVisitor {
    void Visit(MethodDeclaration left, INode right);
    void Visit(ReturnStatement left, INode right);
    void Visit(Operator left, INode right);
    void Visit(LiteralType left, INode right);
}

Here’s the modified INode.Accept method interface. Note the inclusion of the second parameter, right, which will hold the right hand tree node that corresponds to the left hand AST node. The left node is of course ‘this’.

interface INode {
   void Accept(IVisitor v, INode right);
    ...
}

And all Accept implementations look pretty much identical. Here’s Operator’s. Notice it is dutifully passing that right parameter on to the Visit call?

public class ReturnStatement : INode {
    void Accept(IVisitor v, INode right) {
        v.Visit(this, right);
    }

    public INode LeftExpr;
    public INode RightExpr;
}

So far it’s been pretty boiler plate Visitor pattern stuff, with the inclusion of the extra INode parameter named right. Now let’s look at the concrete Visitor implementation which is where the good stuff happens.

If you recall, I said earlier that the actual comparison of two single INodes involves these steps:
1. Confirm the nodes’ types match.
2. Compare each of the node specific property values. Since they are of the same type, they have the same property sets.
3. Recursively call Accept() on the left hand node’s children, passing the right hand node’s children as the second parameter.

In standard Visitor pattern fashion there is an IVisitor.Visit method for every non-abstract INode subclass. Operator’s Visit looks like this[2]:

public class ComparisonVisitor : IVisitor {
    public void Visit(Operator left, INode right) {
        // 1.    Confirm nodes' types match.
        Operator right_operator = right as Operator;
        if(right_operator == null) {
            SetFailure();
            return;
        }

        // 2.    Compare each of the node specific property values.
        if(this.Op != right_operator.Op) {
            SetFailure();
            return;
        }

        // 3.    Recursively call Accept on the left children, passing the right children.
        left.LeftExpr.Accept(v, right_operator.LeftExpr);
        left.RightExpr.Accept(v, right_operator.RightExpr);
    }
    ... // Repeat for the remaining IVisitor implementations
}

This does require adding a new Accept(left,right) method overload on each node. That is, I had to go back and modify the NRefactory AST implementation to make this work. It was one of those moments where I realize, again, how much open source rocks.

Further

This is surprisingly trivial to implement. In fact it was so easy that I got bored doing it and ended up autogenenerating all of the tree walking and much of the comparison code. NRefactory has it’s own generator that creates all of the INode subclasses (Operator, MethodDeclaration, ect). It lays out the node specific properties, default ctors, and Accept implementations. It even generates some premade concrete Visitor implementations of it’s own. I hijacked this and hooked in the additional generation of my modified Visitor to the concrete INode subclasses. It also generates the vast majority of my ComparisonVisitor as a partial class. The parts that remain are from the node specific property matching (step 2) which lives in node type specific Match(left,right) functions. An example is bool Match(Operator left, Operator right), and it is called right there in the conditional of the step 2 if. I wrote just a handful of those so that I could have a decent subset of C# to carry on with.

As I wrote this blog post it occurred to me that I might be able to auto generate the Match functions too. Clearly enough info is delivered to the generation routines so that they can lay out the properties on the INode subclasses. I can use the same info to autogenerate the Match method of step 2.

Part II – Applying refactorings

There you have it, the basic, core clone detection algorithm. It’s so basic in fact, that it’s going to do no better than a text based match which ignores whitespace. It will not detect a clone like this:

public double Area(double radius) {
   double PI = 3.14;
   return PI*Math.Pow(r, 2);
}

public double CircleArea(double radius) {
   double pi = 3.14;
   return pi*Math.Pow(r, 2);
}

The difference in the clones is the name of a local variable. At the beginning I stated that we were starting with the case of identical method clones. Because we are working from that precondition it made our Compare() function extremely easy to implement. But actually it’s a no more productive match function than a basic text based clone finder. A more useful AST comparison implementation might ignore the names of automatic variables. This is not how I go about it.

The way I do it is to transform the AST in a way that does not change it’s functionality, yet creates a new AST that can be recompared. We apply so called ‘safe’ AST transformations – transformations that produce new ASTs yet the methods take the same inputs, produce the same outputs, and have not had their side effects modified. The ASTs under consideration can be considered clones if there is a safe transformation – or even a series of safe transformations – that would convert one tree into the other. These “safe AST transformations” are simply common code refactorings. Going back to the example above, we could apply a rename local variable refactoring directly to the AST. A recomparison would show them as equivalent, and that is enough to deem the original methods clones. Each refactoring can be coded and applied in isolation. That will help keep the implementation complexity low.

Summary

And that’s all there is to it. Apply refactorings to one abstract syntax tree until it matches the other abstract syntax tree, or not. My algorithm at this point is really just brute force. I try all combinations of available refactorings (exactly one at the time of this writing) by applying them to one of the trees until I get a match or exhaust the possibilities. One of my next steps is looking for heuristics that will allow us to reduce the number of refactorings that get performed during the search. For example, if the compare fails due to a name mismatch on a local variable then it can store that fact for use in later selecting a candidate refactoring like ‘rename local variable’.

In the next post I’ll show how I used Extract Method to deal with the methods-only limitation of the tool so far. And, I’ll be writing more refactoring operations and I’m sure I’ll learn some things worth writing about then as well.

Special thanks to my friend Sean for all the proofreading, feedback, skepticism, and challenging questions.

[1] I produced the graph with this nifty tool, using the phrase [MethodDeclaration(Name=Foo,ReturnType=int) [ReturnStatement [Operator(Op=+) Literal(Type=int,Val=7) [Operator(Op=*) [Literal(Type=int,Val=8)][Operator(Op=-) [Literal(Type=int,Val=4)][Literal(Type=int,Val=6)]]]]]].

[2] Why don’t I return bools from the IVisitor methods instead of calling SetFailure() and returning anyway? – Because I reserve the right to continue analyzing in the event of a mismatch. This might be useful when choosing heuristics for later refactoring application.

Code Clones

Code clones are code constructs or functionality that is repeated throughout a system.  It’s a well documented problem.  In short, the issue is duplication of logic.  Take this example:

double Area(double radius)
{
    return 3.14*Math.Pow(radius, 2);
}

double ComputeArea(double radius)
{
    return 3.14*Math.Pow(radius, 2);
}

Two functions identical in every way except name.  The cost of future modifications to whatever system they reside in is increasing.  Suffice it to say that when the maintenance programmer decides to increase precision of the area calculation he must append digits to the 3.14 constant in at least two places.  When this kind of duplication is allowed it leads to unmaintainable systems.  It is yet another form of unnecessary complexity, and perhaps even the worst.  Take any single instance of unnecessary complexity and it can likely be brushed aside as poor form that is easily corrected.  While that is true, it is the repetition of poor form that ultimately renders our systems unmaintainable.


We usually we think of clones as being textually equivalent sections of code (think cut and paste coding), but they can also be functionally equivalent. I define textually equivalent to mean two code sections which are line for line identical (comments and whitespace optionally ignored). Two code blocks are functionally equivalent when their logic is the same even though the text may differ.  Building on our earlier example, here pi is called out explicitely:

double Area(double radius)
{
     const double PI = 3.14;
     return PI*Math.Pow(radius, 2);
}

Here pi is a literal, defined inline.

double ComputeArea(double radius)
{
    return 3.14*Math.Pow(radius, 2);
}

Clearly these are functionally equivalent. For all inputs they produce the same output. They are not textually equivalent. The magic number has been replaced with a constant. Another example is two functions that differ only by local variable naming, where all types and operations are identical, are functionally equivalent.


My general term for correcting code clones is collapsing. Collapsing is the process of safely merging code clones into, or replacing code clones with, a common implementation. The ‘safely’ part implies that a collapsing operation does not modify the expected inputs, outputs, and side effects, as perceived by the clients of the former clone. It is a form of refactoring specifically targeted at the elimination of code duplication. For the last example, the collapsing operation consists of replacing each call to ComputeArea with a call to Area.  Once ComputeArea is no longer used anywhere it can be deleted.


We do have tools available to us for identifying clones. I have used three good ones: Simian, Clone Detective, and TeamCity. These tools are lacking in two distinct ways. First, they seem to treat clone detection as a text matching problem. Second, they do not provide clone relationship analysis and automated correction assistance.


Text matching clone finders are easily defeated by differences in variable names, whitespace, comments, brace/delimiter placement, and literals.  Differences of these types are inconsequential in most languages and will cause primitive clone detectors to return fewer results, or fragmented result sets (many smaller clones instead of fewer large clones).  Member order within cloned classes also causes result fragmentation.  I will say that all the tools I’ve mentioned do have options that take these inhibitors into account.  However those issues are relatively easy problems to solve.  Not so easy is things like parameter order, which screws up matches at both the cloned parameter definitions and cloned call sites.


Ultimately, text based clone detectors do not provide the rich result sets that we need to create powerful automated clone collapsing tools.  The alternative to text based matchers that I propose is clone detectors that operate directly on parsed syntax tree representations of source code.  Analyzers operating against syntax structures will facilitate the elimination of the match noise in clean, language specific ways.


The poor state of clone identification being what it is, simple detection is not enough. A proper clone analysis tool should analyze the clone and it’s context and then present the user with automated options for collapsing and removal. Unfortunately, it is not always as simple as doing a couple of extract method operations with some find and replace.   Consider the clones in these two classes (the lines above and below the calculation of the area variable, on lines 7 and 19):

class Cylinder
{
     public double Volume()
     {
          double height = GetHeight();
          double scaling = GetScalingFactor();
          double area = 3.14 * Math.Pow(this.radius, 2);
          double volume =  height * area;
          return volume * scaling;
     }
     ...
}
class Box
{
     public double ComputeVolume()
     {
          double height = GetHeight();
          double scaling = GetScalingFactor();
          double area = this.length * this.width;
          double volume =  height * area;
          return volume * scaling;
     }
     ...
}

The ideal collapse result is to isolate the differing logic into two distinct functional units.  That is, perform an extract method on the unique logic between the clones.  Now it becomes clear that the two classes are collapsable into a single class where the unique logic can be inserted via polymorphism or dependency injection.  Like so:

abstract class Shape
{
     public abstract double ComputeArea();

     public double Volume()
     {
          double height = GetHeight();
          double scaling = GetScalingFactor();
          double area = ComputeArea();
          double volume =  height * area;
          return volume * scaling;
     }
     ...
}
class Cylinder : Shape
{
     public override double ComputeArea()
     {
          return 3.14 * Math.Pow(this.radius, 2);
     }
     ...
}
class Box : Shape
{
     public override double ComputeArea()
     {
          return this.length* this.width;
     }
     ...
}

The contextual analysis of the clones’ relationship to surrounding code and other clones led us to a more optimal result.  Indeed, otherwise the most likely path of collapsing chosen would be to jump in with extract methods against the clones themselves.  This would have put us in the situation of having a number of identical methods instead of blocks, but would not have provided any clear direction on the best way to proceed with collapsing.


It’s worth noting that sometimes collapsing will result in more lines of code.  This does not diminish the value of collapsing.   The extra lines are boiler plate like class, interface, and function definitions.   The value is derived from the reduction and isolation of the formerly duplicated logic.  Additionally, as we increase the number of functional units (methods and classes) we make more malleable code because these units provide the boundaries that our existing code manipulation tools tend to operate on.


In my next post I will present some implementation ideas for the rich clone analyzer I have outlined here. In further posts I will discuss how else we might put such a tool to use.

Unnecessary Complexity Case Study #1: Untyped Enum Comparisons

This post is the first in a series of posts on specific examples of unnecessary complexity.  

Consider this code:

enum MyEnum  { First, Second }
public void Match1(MyEnum e)
{
    if (e.ToString() == "First") { }
}

The problem lies in the condition of the if.  The developer is converting an enum instance to a string for the purposes of a comparison, circumventing type safety.  The primary problem here is that the statement can no longer safely participate in automated refactorings.  For example, if my MyEnum.First is renamed this code will break such that the condition is always false and the body will never execute.  A Find Usages executed on MyEnum.First would not identify this site.  

It also performs worse than the alternatives.  I hesitate to even mention that though because, like most micro performance optimizations, it is only going to matter in uncommon situations like a tight loop or heavy load.

The correct code is this:

if(e == MyEnum.First) { }

The code is now typesafe, fast (it’s a comparison of integers), and refactoring friendly.  And it’s a simple enough fix.  The exact steps are:

  1. Scan the enum values list for a member whose name exactly matches the string literal.
  2. If found, replace the string literal with a reference to the qualified enum value.

There are some special cases to consider as well.

Here the string is being manipulated before comparing, by calling ToLower().

if(e.ToString().ToLower() == "first") { }

When comparing the result of a ToLower() or ToUpper() call the string constant is single cased, likely with the intention of reducing typos.  I suspect the programmer considered this defensive programming.

Unnecessary complexity begets bugs. When your code base is littered with untyped enum comparisons, this mistake is inevitable:

if(e.ToString().ToLower() == "First") { }

The string constant actually matches the declared enum casing, yet due to the ToLower() call the condition is always false, and the branch is never executed.  This permutation is particularly dangerous as the fix revives a branch that hasn’t been executing. You are eliminating a side effect, and a danger of eliminating side effects is breaking other parts of the system that could be dependent on them.

I have created a tool for automating the correction of untyped enum comparisons.  It’s a Resharper plug-in based quick fix and is available as part of my Agent Ralph project.

Make Enum Comparison Typesafe

Using Agent Ralph to automate a Make Enum Comparison Typesafe code correction.

Currently it only handles the simple case, pictured above.  I hope to correct that deficiency soon.

Thanks to Mark for the great suggestion on a WordPress friendly code syntax highlighter.

Unnecessary Complexity

A lot of the work I’ve done recently has involved substantial improvements to largish production code bases.  Currently I’m working on a system of .Net apps that seems to have experienced exponential LOC growth in the few years of it’s development.  Before that I was doing maintenance and upgrade work on an app that started life as some Excel spreadsheets wired together with VB, and over 10+ years (and many programmers) grew to a large Asp.Net web app with an Oracle back end.  Maybe after hearing all that you’re ready to write me off as another jaded maintenance programmer but hang on, I’m going somewhere with this.  These projects are very different but share some common threads.  They tend to be under performing.  New feature work is very expensive or nonexistent.  As I spelunk the code base trying to grok these systems what I see again and again is what I have come to call unnecessary complexity.

Unnecessary complexity is a code smell, as the refactoring proponents use the term.  I haven’t formed a definition but I do have some examples.  A chunk of code is unnecessarily complex when it is making more method invocations or using more logic constructs than are necessary to complete it’s intended purpose.  A system is unnecessarily complex when it has half implemented features that are not being further developed.  A component is unnecessarily complex when it has features that are thought to be working and even in active use, yet actually contain a bug or bugs that invalidates the feature’s implementation.  A prime example of this is buggy code wrapped in overzealous exception suppressing try/catch blocks.

How does code get unnecessarily complex?  One reason I have found is developer ignorance of the language/platform and it’s associated APIs.  This is most pronounced when a new development project is undertaken by developers who are also new to the chosen language and/or platform, even if they are otherwise seasoned developers.  Another reason is code rot introduced in periods of maintenance, particularly when there has long been pressure to complete changes quickly.

So often I see that the two problems I mentioned earlier, under-performance and slow feature development, can be solved with a two step process.  First reduce the complexity in a series of micro steps.  Keep going, until the point you’ve transformed it into simple understandable code.  Eventually it becomes obvious where to target your feature or performance changes.  I know this sounds like plain ole refactoring – improving the design of existing code – but it is subtly different.  Is modifying try/catch blocks or reducing if/else nesting really design?  Maybe it’s design in the small.

I’ll be doing a series of posts on specific examples of unnecessary complexity.  I don’t want to create my own little daily wtf, but rather engage in a productive analysis of examples inspired by my daily work, both past and present.  I hope to eventually identify some psuedo-formal patterns and anti-patterns inherent in unnecessarily complex code, but certainly will suggest my own improvements, best practices, and complexity reduction transformations as I go.  Most examples will be at the micro level: statements, blocks, and functions.