<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>blog of josh &#187; Uncategorized</title>
	<atom:link href="http://landofjosh.com/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://landofjosh.com</link>
	<description>software development under the big arch</description>
	<lastBuildDate>Thu, 22 Oct 2009 06:26:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>I have assembled the Triforce</title>
		<link>http://landofjosh.com/2009/10/i-have-assembled-the-triforce/</link>
		<comments>http://landofjosh.com/2009/10/i-have-assembled-the-triforce/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 06:26:24 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bespin]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[mercurial]]></category>
		<category><![CDATA[teamcity]]></category>

		<guid isPermaLink="false">http://landofjosh.com/?p=213</guid>
		<description><![CDATA[Recently I put in the final piece of a mini project I&#8217;d been thinking about for a while.  I made a change to an Agent Ralph source file and committed it to my Mercurial repo.  Within about a minute I was browsing the MbUnit test results.  Now that may not sound very exciting, but it [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I put in the final piece of a mini project I&#8217;d been thinking about for a while.  I made a change to an <a href="http://agentralphplugin.googlecode.com/"><span style="color: #000000;"><span style="text-decoration: none;">Agent Ralph</span></span></a> source file and committed it to my <a href="http://mercurial.selenic.com/wiki/"><span style="color: #000000;"><span style="text-decoration: none;">Mercurial</span></span></a> repo.  Within about a minute I was browsing the <a href="http://www.mbunit.com/"><span style="color: #000000;"><span style="text-decoration: none;">MbUnit</span></span></a> test results.  Now that may not sound very exciting, but it is.  It is because all of this was conducted online, through my browser, on a pc that didn&#8217;t have Mercurial, Visual Studio, or any development tools whatsoever.  I was coding in the cloud, baby.</p>
<div style="margin-top: 0px; margin-bottom: 0px; text-align: left;">
<p>Now with Agent Ralph I have boiled my unit tests down to the point where they are just code snippets.  Adding a new test case is as simple as dropping a csharp file into a directory.  It is automatically picked up and fed into a test harness, which parses the code and gets all Ralphy on it.  Here&#8217;s a sample, right out of the repo:</p>
<pre name="code" class="csharp">using System;
namespace AgentRalph.CloneCandidateDetectionTestData
{
    public class CloneInForeachBlock
    {
        void Target()
        {
            foreach (int i in new int[] { })
            {
                /* BEGIN */
                Console.WriteLine(7);
                /* END */
            }
        }

        private void Expected()
        {
            Console.WriteLine(7);
        }
    }
}</pre>
<p>This sample comes from the <a href="http://code.google.com/p/agentralphplugin/source/browse/#hg/Ralph.NRefactory/CloneCandidateDetectionTestData%3Fstate%3Dclosed">CloneCandidateDetectionTestData</a> folder.  Any file in that folder is assumed to hold a class containing at least two methods, Target and Expected.  Target is scanned for clones, and the test passes if a clone is found that matches Expected AND consists of all the code between the START and END &#8216;markup&#8217; embedded in the comments.   Thanks to the magic of MbUnit&#8217;s generative tests each code file appears as it&#8217;s own test, as if each were a method with it&#8217;s own [Test] attribute.  So you see, whipping up new tests is crazy easy.</p>
<p>Occasionally an idea for a test case will come to me, and it&#8217;s usually when I&#8217;m at a place where I have no access to Agent Ralph code.  Like work.  Typically I&#8217;d make some notes in an email and code it up when I got home.  It got me thinking, I don&#8217;t really need Visual Studio and the whole dev setup to create these test cases.  They&#8217;re just simple code snippets, scraped out of a directory.  I could be scraping them from anywhere, like off a wiki even.  The next thought of course was to run that test automatically.  How could I put this all together?</p>
<div style="margin-top: 0px; margin-bottom: 0px;">
<p>Building and running tests is easy, any continuous integration server would do.  I chose <a href="http://www.jetbrains.com/teamcity/">TeamCity</a>, which we&#8217;ve been using at work and is just great.  I can&#8217;t say enough nice things about it.  For this mini-project, it&#8217;s easy third party report integration and build artifact downloading features were exactly what I wanted.  A little dyndns magic and I had it <a href="http://jbuedel.isa-geek.net">online</a>.  My MbUnit <a href="http://jbuedel.isa-geek.net/viewLog.html?buildId=54&amp;buildTypeId=bt2&amp;tab=Gallio_Test_Report">tests</a> look pretty nice I think.</p>
<p>At some point I stumbled across Bespin.  <a href="https://bespin.mozilla.com/">Bespin</a> is a Mozilla project that &#8220;proposes an open extensible web-based framework for code editing&#8221;.  It&#8217;s a web based, code centric text editor, and more.  One part of the more is version control integration.  Mercurial is supported, which is what I use for Agent Ralph.  I can pull down the code, hack on it, and push changes back out to the Google code hosted repository, all right in the browser.  There&#8217;s the missing piece, my editor.</p>
<p>And of course, <a href="http://code.google.com/">Google Code</a> is the glue that brings them together.</div>
<p>And there you have it, coding in the cloud!  This graphic ought to help you fully grasp the awesomeness.</p>
<p><img class="aligncenter size-full wp-image-214" title="MyForce" src="http://landofjosh.com/wp-content/uploads/2009/10/MyForce.png" alt="MyForce" width="300" height="300" /></div>
]]></content:encoded>
			<wfw:commentRss>http://landofjosh.com/2009/10/i-have-assembled-the-triforce/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Oscillating Shrinking Window</title>
		<link>http://landofjosh.com/2009/08/the-oscillating-shrinking-window/</link>
		<comments>http://landofjosh.com/2009/08/the-oscillating-shrinking-window/#comments</comments>
		<pubDate>Sat, 29 Aug 2009 19:08:12 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://landofjosh.com/?p=199</guid>
		<description><![CDATA[I talked previously about how Agent Ralph identifies functionally equivalent methods.  It also can find clones embedded within a method, as shown at the end of my last post.  This time I&#8217;ll talk a bit about how that works. Agent Ralph&#8217;s preferred unit of comparison is the method.  The comparison implementation accepts two methods and walks their [...]]]></description>
			<content:encoded><![CDATA[<p>I talked <a id="dz5c" title="previously" href="http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/">previously</a> about how Agent Ralph identifies functionally equivalent methods.  It also can find clones embedded within a method, as shown at the end of my <a href="http://landofjosh.com/2009/08/agent-ralph-in-action/">last post</a>.  This time I&#8217;ll talk a bit about how that works.</p>
<p>Agent Ralph&#8217;s preferred unit of comparison is the method.  The comparison implementation accepts two methods and walks their ASTs, comparing each node.  To find clones embedded within a method then it must somehow convert a subset of method statements into a new method.  This provisional method need only exist for the lifetime of the comparison.  If the provisional method matches some other existing method then the set of statements under consideration are a clone.</p>
<p>So, how to create that provisional method?  If you guessed the Extract Method refactoring then indeed you are correct.  Agent Ralph will scan a method from top to bottom choosing sub sets of contiguous statements to perform an extract method on.  If the Extract Method operation is successful and the new method matches some other method, we&#8217;ve found a clone.</p>
<p>A &#8220;window&#8221; is a set of contiguous statements.  We start with the largest possible window, and that consists of all the statements except the last.  (Using all the statements would simply create an identical function.)  Next, perform an Extract Method operation, which creates the provisional method, and compare.  Then proceed to the next window and repeat.  The next window is determined by shifting the window down one statement.  IOW, we now include the last statement and exclude the first statement.  Extract method, compare, continue.  At this point we&#8217;ve run out of windows at this size so we shrink the window size by one and start at the top again.  When the window size reaches 1 that is the last iteration.</p>
<p>I call this algorithm the Oscillating Shrinking Window.  An illustrative graphic is in order, and nothing says cutting edge technology like an animated gif:</p>
<div>
<div id="u1ze"><img src="http://docs.google.com/File?id=ajctmzgpsdbh_118fzzc2vcs_b" alt="" /></div>
</div>
<div></div>
<p>Each yellow block encompasses a set of statements that will have an Extract Method performed on them.</p>
<p>Some statements can of course contain sets of statements as children.  If statements, for loops, ect.  This requires recursing into the children of said statements and repeating the algorithm.  Another graphic:</p>
<div>
<div id="oqy0"><img src="http://docs.google.com/File?id=ajctmzgpsdbh_115dcx39tg4_b" alt="" /></div>
</div>
<p>As you might guess, this is very computationally expensive.  Indeed, the algorithmic complexity is <em>at least</em> O(n^3) where n is number of AST nodes.  The <em>at least</em> qualifier is on there because I didn&#8217;t prove it for all cases.  I choose a single particular case and analyzed that (with some help[1]).  The case was a tree of n nodes where the root node has n-1 child nodes.  (Think very wide and very shallow.)  Heuristics will play a large part in making this actually usable.</p>
<p>[1]    I brought in a consultant.  <a id="x4q4" title="Sean" href="http://seanfoy.blogspot.com/">Sean</a> helped me with the math in exchange for one large <a id="ve8:" title="Franco Cajun Pride" href="http://www.joaniespizzeria.com/pizza_page.htm">Franco Cajun Pride</a> and a Diet Dr. Pepper.</p>
]]></content:encoded>
			<wfw:commentRss>http://landofjosh.com/2009/08/the-oscillating-shrinking-window/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Agent Ralph In Action</title>
		<link>http://landofjosh.com/2009/08/agent-ralph-in-action/</link>
		<comments>http://landofjosh.com/2009/08/agent-ralph-in-action/#comments</comments>
		<pubDate>Wed, 05 Aug 2009 05:15:51 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[agent ralph]]></category>
		<category><![CDATA[code-clones]]></category>
		<category><![CDATA[resharper]]></category>

		<guid isPermaLink="false">http://landofjosh.com/?p=169</guid>
		<description><![CDATA[I&#8217;ve been yack yack yacking about clone detection and Agent Ralph.  It&#8217;s time to put up or shut up.  This post is some screen shots of Agent Ralph in action. Agent Ralph&#8216;s front end is a Resharper plug-in.  Any clones detected are passed up to the plug-in which presents them to the user as highlights and [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been yack yack yacking about clone detection and Agent Ralph.  It&#8217;s time to put up or shut up.  This post is some screen shots of Agent Ralph in action.</p>
<p><a title="Agent Ralph Project" href="http://code.google.com/p/agentralphplugin/">Agent Ralph</a>&#8216;s front end is a <a title="JetBrain's Resharper" href="http://www.jetbrains.com/resharper/">Resharper</a> plug-in.  Any clones detected are passed up to the plug-in which presents them to the user as highlights and quick fixes.  This is how we achieve the automated repair that a modern clone tool needs.  The backend scans source files handed to it by the front end, using the techniques I&#8217;ve been <a href="http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/">blogging about</a>.  Specifically, clones are identified by comparing abstract syntax trees of methods.  The ASTs may be modified by the application of safe refactorings (refactorings that do not change the inputs, outputs, or side effects).  If an AST can be safely coerced until it matches another then we can consider the originals functionally equivalent clones.  This technique will detect clones that would otherwise be overlooked by text based clone finders.</p>
<p>So, here&#8217;s the basic case.  Two identical methods:</p>
<p><img class="size-full wp-image-170 alignnone" title="identicalmethodshighlight-cropped" src="http://landofjosh.com/wp-content/uploads/2009/08/identicalmethodshighlight-cropped.png" alt="identicalmethodshighlight-cropped" width="436" height="195" /></p>
<p>Note the Resharper squigglies telling us something is up.  Passing the mouse over either method name brings up a tooltip identifying the method as a clone of the other.</p>
<p>Placing the cursor on the method name prompts you with a <a title="Resharper Quick Fixes" href="http://www.jetbrains.com/resharper/features/code_analysis.html#Quick-Fixes">quick fix</a>&#8230;</p>
<p><img class="alignnone size-full wp-image-173" title="identicalmethodsquickfix-cropped" src="http://landofjosh.com/wp-content/uploads/2009/08/identicalmethodsquickfix-cropped.png" alt="identicalmethodsquickfix-cropped" width="471" height="229" /></p>
<p>&#8230;and invoking it&#8230;</p>
<p><img class="alignnone size-full wp-image-172" title="identicalmethodsquickfixapplied-cropped" src="http://landofjosh.com/wp-content/uploads/2009/08/identicalmethodsquickfixapplied-cropped.png" alt="identicalmethodsquickfixapplied-cropped" width="455" height="198" /></p>
<p>&#8230;replaces the body of the clone with a call to the original.  That&#8217;s automated clone repair!  An inline method applied to Test1 will complete the removal.</p>
<p>The next methods are identical, but only if a rename local variable refactoring is applied.  And indeed you can see that it is, indicated by the highlighting and quickfix offering.</p>
<p><img class="size-full wp-image-175 alignnone" title="clonewithrenamelocal-cropped" src="http://landofjosh.com/wp-content/uploads/2009/08/clonewithrenamelocal-cropped.png" alt="clonewithrenamelocal-cropped" width="458" height="265" /></p>
<p>The last example is one I am particularly proud of.   Here we are detecting a clone that is a block within a larger method.  Methods EmbeddedClone1 and EmbeddedClone2 both contain clones of Test2.</p>
<p><img class="alignnone size-full wp-image-179" title="embeddedclonequickfix-cropped" src="http://landofjosh.com/wp-content/uploads/2009/08/embeddedclonequickfix-cropped.png" alt="embeddedclonequickfix-cropped" width="470" height="415" /></p>
<p>Thus far I&#8217;ve restricted myself to using methods as the only unit of comparison.  Doing so made it easier to reason and implement as I worked through ideas.  At some point I realized that I could use an extract method refactoring to create provisional methods from indiscriminate code blocks on the fly.  If the provisional method is a clone then it follows that the original code block is a clone.  In this way I can continue to think and code in terms of methods, yet rely on the extract method refactoring to apply my algorithms to sub-units of method (aka, arbitrary blocks and statements).</p>
]]></content:encoded>
			<wfw:commentRss>http://landofjosh.com/2009/08/agent-ralph-in-action/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>An Idea For Robust Clone Detection Using Abstract Syntax Trees</title>
		<link>http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/</link>
		<comments>http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/#comments</comments>
		<pubDate>Sun, 19 Jul 2009 20:09:50 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[code-clones]]></category>

		<guid isPermaLink="false">http://landofjosh.com/?p=145</guid>
		<description><![CDATA[My last post concluded with the promise to go into detail on some implementation ideas of my clone analyzer. As I argued previously, a text based matching tool is not good enough, it&#8217;s simply too easy to fool. What we want is a matching tool that considers the full syntax of the language being analyzed. [...]]]></description>
			<content:encoded><![CDATA[<p>My last <a href="http://landofjosh.com/?p=77">post</a> concluded with the promise to go into detail on some implementation ideas of my clone analyzer.</p>
<p>As I argued previously, a text based matching tool is not good enough, it&#8217;s simply too easy to fool. What we want is a matching tool that considers the full syntax of the language being analyzed.   That leads us to a solution based on the analysis of abstract syntax trees (ASTs). </p>
<p>ASTs can be generated easily from a partial compilation of the files under analysis.  Basic comparison is easy too.  It&#8217;s a straightforward tree walking algorithm.  There are many ways to do it, and I&#8217;ll go into my implementation later.  Where things get interesting is when we consider ASTs that are functionally equivalent but not identical.  That is, ASTs that differ in unimportant ways.  The initial impulse is to begin &#8216;relaxing&#8217; the tree walking comparison.  I.e., ignore things like local variables names, parameter order, and other obvious irrelevancies to the concern of functional equivalence.  Instead I proposed that we attempt to refactor one tree and see if we can transform it into an AST that does match. If so, we can conclude the original ASTs match.  We don&#8217;t actually need to perform these refactorings &#8216;for real&#8217;.  Knowing the safe transform exists is enough for a clone repair tool to replace one method with the other. </p>
<p>Now, my theory here can be divided into two distinct parts.  First, we need to be able to tell when any two methods are completely identical.  Second, if we can we take two non matching methods and automatically apply a series of safe refactorings that will convert one into an identical match of the other then we can conclude that the original methods are clones.</p>
<p><strong>Part I &#8211; Matching Identical ASTs</strong></p>
<p>The initial match algorithm is a careful tree traversal with a node for node comparison which exits at the first mismatch.  For this part of the project I relied on the open source and very nice NRefactory project.  It includes a C# parser, among other useful stuff.  Thanks to the availability of source and decent examples I was able to get up and running very quickly.</p>
<p>The first step is to get an AST by passing the class file to a Parse() function.  One caveat of my implementation is that it will not work on code that does not compile.  When Parse() encounters a syntax error it returns null.  In practice, I don&#8217;t anticipate this limitation having much effect on usefulness.</p>
<p>The AST generated from this method&#8230;</p>
<pre name="code" class="csharp">
int Foo() {
    return 7 + 8 * (4 - 6);
}
</pre>
<p>&#8230;looks like this[1]:<br />
<img src="http://landofjosh.com/wp-content/uploads/2009/07/syntax_tree.png" alt="syntax_tree" title="syntax_tree" width="554" height="208" class="aligncenter size-full wp-image-160" /><br />
There&#8217;s a couple of things to note about these ASTs.  Each node has a distinct type like MethodDeclaration, ReturnStatement, Operator, Literal, ect.  Some nodes also have other properties.  For example, Operators have an Op property that is (in this example) one of +, -, or *.  Literals hold the literal value in the property named Val (&#8217;7&#8242;, &#8217;8&#8242;, &#8217;4&#8242;, and &#8217;6&#8242; here), and the type of that value, called Type.</p>
<p>We now need to compare the ASTs.  Let&#8217;s call the function to do this Compare(left_tree, right_tree):bool.  Starting at the root node (in this case, a MethodDeclaration node) of the left hand tree we begin walking that tree.  At each left hand node we compare to the corresponding right hand node.  The individual node comparison first checks that the node types match (both are Operators, both are Literals, ect).  Then it compares the values of each of the node properties.  At this point we have confirmed the node matches and we can proceed to it&#8217;s children.</p>
<p>The actual implementation is based on a slightly modified <a href="http://en.wikipedia.org/wiki/Visitor_pattern">Visitor</a> pattern.  Each of the Visitor class&#8217;s Visit methods take a second parameter of type INode (base class of all AST nodes), in addition to the normal strictly typed first parameter.  The second parameter is there because we need to drag the right hand node along on each Visit call, and then pass corresponding right hand child node(s) to each Accept call.  Here&#8217;s the partial IVisitor definition:</p>
<pre  name="code" class="csharp">
public interface IVisitor {
    void Visit(MethodDeclaration left, INode right);
    void Visit(ReturnStatement left, INode right);
    void Visit(Operator left, INode right);
    void Visit(LiteralType left, INode right);
}
</pre>
<p>Here&#8217;s the modified INode.Accept method interface.  Note the inclusion of the second parameter, right, which will hold the right hand tree node that corresponds to the left hand AST node.  The left node is of course &#8216;this&#8217;.</p>
<pre name="code" class="csharp">
interface INode {
   void Accept(IVisitor v, INode right);
    ...
}
</pre>
<p>And all Accept implementations look pretty much identical.  Here&#8217;s Operator&#8217;s.  Notice it is dutifully passing that right parameter on to the Visit call?</p>
<pre name="code" class="csharp">
public class ReturnStatement : INode {
    void Accept(IVisitor v, INode right) {
        v.Visit(this, right);
    }

    public INode LeftExpr;
    public INode RightExpr;
}
</pre>
<p>So far it&#8217;s been pretty boiler plate Visitor pattern stuff, with the inclusion of the extra INode parameter named right.  Now let&#8217;s look at the concrete Visitor implementation which is where the good stuff happens.</p>
<p>If you recall, I said earlier that the actual comparison of two single INodes involves these steps:<br />
1.    Confirm the nodes&#8217; types match.<br />
2.    Compare each of the node specific property values.  Since they are of the same type, they have the same property sets.<br />
3.    Recursively call Accept() on the left hand node&#8217;s children, passing the right hand node&#8217;s children as the second parameter.</p>
<p>In standard Visitor pattern fashion there is an IVisitor.Visit method for every non-abstract INode subclass.  Operator&#8217;s Visit looks like this[2]:</p>
<pre name="code" class="csharp">
public class ComparisonVisitor : IVisitor {
    public void Visit(Operator left, INode right) {
        // 1.    Confirm nodes' types match.
        Operator right_operator = right as Operator;
        if(right_operator == null) {
            SetFailure();
            return;
        }

        // 2.    Compare each of the node specific property values.
        if(this.Op != right_operator.Op) {
            SetFailure();
            return;
        }

        // 3.    Recursively call Accept on the left children, passing the right children.
        left.LeftExpr.Accept(v, right_operator.LeftExpr);
        left.RightExpr.Accept(v, right_operator.RightExpr);
    }
    ... // Repeat for the remaining IVisitor implementations
}
</pre>
<p>This does require adding a new Accept(left,right) method overload on each node.  That is, I had to go back and modify the NRefactory AST implementation to make this work.  It was one of those moments where I realize, again, how much open source rocks. </p>
<p><strong>Further</strong></p>
<p>This is surprisingly trivial to implement.  In fact it was so easy that I got bored doing it and ended up autogenenerating all of the tree walking and much of the comparison code.  NRefactory has it&#8217;s own generator that creates all of the INode subclasses (Operator, MethodDeclaration, ect). It lays out the node specific properties, default ctors, and Accept implementations.  It even generates some premade concrete Visitor implementations of it&#8217;s own.  I hijacked this and hooked in the additional generation of my modified Visitor to the concrete INode subclasses.  It also generates the vast majority of my ComparisonVisitor as a partial class.  The parts that remain are from the node specific property matching (step 2) which lives in node type specific Match(left,right) functions.  An example is bool Match(Operator left, Operator right), and it is called right there in the conditional of the step 2 if.  I wrote just a handful of those so that I could have a decent subset of C# to carry on with.</p>
<p>As I wrote this blog post it occurred to me that I might be able to auto generate the Match functions too.  Clearly enough info is delivered to the generation routines so that they can lay out the properties on the INode subclasses.  I can use the same info to autogenerate the Match method of step 2.</p>
<p><strong>Part II &#8211; Applying refactorings</strong></p>
<p>There you have it, the basic, core clone detection algorithm.  It&#8217;s so basic in fact, that it&#8217;s going to do no better than a text based match which ignores whitespace.  It will not detect a clone like this:</p>
<pre  name="code" class="csharp">public double Area(double radius) {
   double PI = 3.14;
   return PI*Math.Pow(r, 2);
}

public double CircleArea(double radius) {
   double pi = 3.14;
   return pi*Math.Pow(r, 2);
}</pre>
<p>The difference in the clones is the name of a local variable.  At the beginning I stated that we were starting with the case of identical method clones.  Because we are working from that precondition it made our Compare() function extremely easy to implement.  But actually it&#8217;s a no more productive match function than a basic text based clone finder.  A more useful AST comparison implementation might ignore the names of automatic variables.  This is not how I go about it. </p>
<p>The way I do it is to transform the AST in a way that does not change it&#8217;s functionality, yet creates a new AST that can be recompared.  We apply so called &#8216;safe&#8217; AST transformations &#8211; transformations that produce new ASTs yet the methods take the same inputs, produce the same outputs, and have not had their side effects modified.  The ASTs under consideration can be considered clones if there is a safe transformation &#8211; or even a series of safe transformations &#8211; that would convert one tree into the other.  These &#8220;safe AST transformations&#8221; are simply common code refactorings.  Going back to the example above, we could apply a rename local variable refactoring directly to the AST.  A recomparison would show them as equivalent, and that is enough to deem the original methods clones.  Each refactoring can be coded and applied in isolation.  That will help keep the implementation complexity low.</p>
<p><strong>Summary</strong></p>
<p>And that&#8217;s all there is to it.  Apply refactorings to one abstract syntax tree until it matches the other abstract syntax tree, or not.  My algorithm at this point is really just brute force.  I try all combinations of available refactorings (exactly one at the time of this writing) by applying them to one of the trees until I get a match or exhaust the possibilities.  One of my next steps is looking for heuristics that will allow us to reduce the number of refactorings that get performed during the search.  For example, if the compare fails due to a name mismatch on a local variable then it can store that fact for use in later selecting a candidate refactoring like &#8216;rename local variable&#8217;. </p>
<p>In the next post I&#8217;ll show how I used Extract Method to deal with the methods-only limitation of the tool so far.  And, I&#8217;ll be writing more refactoring operations and I&#8217;m sure I&#8217;ll learn some things worth writing about then as well.</p>
<p><em>Special thanks to my friend <a href="http://seanfoy.blogspot.com/">Sean</a> for all the proofreading, feedback, skepticism, and challenging questions.</em></p>
<p>[1]   I produced the graph with <a href="http://ironcreek.net/phpsyntaxtree/">this nifty tool</a>, using the phrase [MethodDeclaration(Name=Foo,ReturnType=int) [ReturnStatement [Operator(Op=+) Literal(Type=int,Val=7) [Operator(Op=*) [Literal(Type=int,Val=8)][Operator(Op=-) [Literal(Type=int,Val=4)][Literal(Type=int,Val=6)]]]]]].</p>
<p>[2]  Why don&#8217;t I return bools from the IVisitor methods instead of calling SetFailure() and returning anyway?  &#8211; Because I reserve the right to continue analyzing in the event of a mismatch.  This might be useful when choosing heuristics for later refactoring application.</p>
]]></content:encoded>
			<wfw:commentRss>http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
