<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: An Idea For Robust Clone Detection Using Abstract Syntax Trees</title>
	<atom:link href="http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/feed/" rel="self" type="application/rss+xml" />
	<link>http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/</link>
	<description>software development under the big arch</description>
	<lastBuildDate>Sat, 30 Jan 2010 20:20:08 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: josh</title>
		<link>http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/comment-page-1/#comment-91</link>
		<dc:creator>josh</dc:creator>
		<pubDate>Sat, 30 Jan 2010 20:20:08 +0000</pubDate>
		<guid isPermaLink="false">http://landofjosh.com/?p=145#comment-91</guid>
		<description>Ira,

I didn&#039;t realize that there was a exact match following the hashed match.  I agree, my false positive concern is misplaced.

I am looking forward to reading your papers.

Josh</description>
		<content:encoded><![CDATA[<p>Ira,</p>
<p>I didn&#8217;t realize that there was a exact match following the hashed match.  I agree, my false positive concern is misplaced.</p>
<p>I am looking forward to reading your papers.</p>
<p>Josh</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ira Baxter</title>
		<link>http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/comment-page-1/#comment-90</link>
		<dc:creator>Ira Baxter</dc:creator>
		<pubDate>Thu, 28 Jan 2010 04:33:46 +0000</pubDate>
		<guid isPermaLink="false">http://landofjosh.com/?p=145#comment-90</guid>
		<description>Well, the algorighm only uses hashes to find possible matches, and then compares them exactly.  So it detects exact clones without error.

It also detects &quot;near miss&quot; clones which you can think of as parameterized code, e.g., if you made a macro out of block of code and replaced well-formed sections, you&#039;d end up with a parameterized clone.   

The present (2010) implementation operates somewhat differently than the 1998 paper, but the basic ideas are the same.  A 2004 study (published in IEEE Transactions on Software Engineering) by Steve Bellon compared several detectors, and concluded that ours produce the smallest number of false positives, so I think your worry is misplaced.

You&#039;ve observed that matching exact trees is &quot;easy&quot;.  In fact, I agree. What isn&#039;t easy is matching inexact trees to produce the near miss clones, and making this work at the 2 million line scale for multiple programming languages.

You can find a number of examples of clone detection runs on different languages at the website.

-- IDB</description>
		<content:encoded><![CDATA[<p>Well, the algorighm only uses hashes to find possible matches, and then compares them exactly.  So it detects exact clones without error.</p>
<p>It also detects &#8220;near miss&#8221; clones which you can think of as parameterized code, e.g., if you made a macro out of block of code and replaced well-formed sections, you&#8217;d end up with a parameterized clone.   </p>
<p>The present (2010) implementation operates somewhat differently than the 1998 paper, but the basic ideas are the same.  A 2004 study (published in IEEE Transactions on Software Engineering) by Steve Bellon compared several detectors, and concluded that ours produce the smallest number of false positives, so I think your worry is misplaced.</p>
<p>You&#8217;ve observed that matching exact trees is &#8220;easy&#8221;.  In fact, I agree. What isn&#8217;t easy is matching inexact trees to produce the near miss clones, and making this work at the 2 million line scale for multiple programming languages.</p>
<p>You can find a number of examples of clone detection runs on different languages at the website.</p>
<p>&#8211; IDB</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: josh</title>
		<link>http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/comment-page-1/#comment-89</link>
		<dc:creator>josh</dc:creator>
		<pubDate>Mon, 25 Jan 2010 05:59:00 +0000</pubDate>
		<guid isPermaLink="false">http://landofjosh.com/?p=145#comment-89</guid>
		<description>Hello Ira, 

I read your paper, though I have not tried out your implementation.  If I remember correctly, the high level view of the algorithm was to compute a hash on the ASTs and sub-ASTs, and then compare hashes for clone detection.  I think that one advantage to my approach is that when a clone is reported there is a higher degree of certainty that it is valid.  (I think like 100% certainty, though I haven&#039;t proven that or anything.)  The &#039;fuzzy hash&#039;  idea where the hash calculation handles some constructs differently in an effort to detect near miss clones (like ignoring small sub trees for example) seems like it would generate false positives.  For my tool false positives aren&#039;t acceptable as I want to automate the clone repair.  Of course, a hash based implementation would be a lot faster.  I am dealing with the kind of poor algorithmic complexity you mention in your paper, like O(n^3) and worse sometimes.

I would like to read more about how you automated the repair of the clones.  Do you talk about that in one of the other papers?  I&#039;ve only read the one.

Thanks for the comment,
Josh</description>
		<content:encoded><![CDATA[<p>Hello Ira, </p>
<p>I read your paper, though I have not tried out your implementation.  If I remember correctly, the high level view of the algorithm was to compute a hash on the ASTs and sub-ASTs, and then compare hashes for clone detection.  I think that one advantage to my approach is that when a clone is reported there is a higher degree of certainty that it is valid.  (I think like 100% certainty, though I haven&#8217;t proven that or anything.)  The &#8216;fuzzy hash&#8217;  idea where the hash calculation handles some constructs differently in an effort to detect near miss clones (like ignoring small sub trees for example) seems like it would generate false positives.  For my tool false positives aren&#8217;t acceptable as I want to automate the clone repair.  Of course, a hash based implementation would be a lot faster.  I am dealing with the kind of poor algorithmic complexity you mention in your paper, like O(n^3) and worse sometimes.</p>
<p>I would like to read more about how you automated the repair of the clones.  Do you talk about that in one of the other papers?  I&#8217;ve only read the one.</p>
<p>Thanks for the comment,<br />
Josh</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ira Baxter</title>
		<link>http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/comment-page-1/#comment-88</link>
		<dc:creator>Ira Baxter</dc:creator>
		<pubDate>Sat, 23 Jan 2010 23:00:51 +0000</pubDate>
		<guid isPermaLink="false">http://landofjosh.com/?p=145#comment-88</guid>
		<description>I implemented and wrote a technical paper on a clone detector based on AST tree matching back in 1998.   Check out the web site for discussion, link to technical paper, and same clone analysis reports for several languages.</description>
		<content:encoded><![CDATA[<p>I implemented and wrote a technical paper on a clone detector based on AST tree matching back in 1998.   Check out the web site for discussion, link to technical paper, and same clone analysis reports for several languages.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: The Oscillating Shrinking Window &#124; blog of josh</title>
		<link>http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/comment-page-1/#comment-59</link>
		<dc:creator>The Oscillating Shrinking Window &#124; blog of josh</dc:creator>
		<pubDate>Sat, 29 Aug 2009 19:08:18 +0000</pubDate>
		<guid isPermaLink="false">http://landofjosh.com/?p=145#comment-59</guid>
		<description>[...] talked previously about how Agent Ralph identifies functionally equivalent methods.  It also can find clones [...]</description>
		<content:encoded><![CDATA[<p>[...] talked previously about how Agent Ralph identifies functionally equivalent methods.  It also can find clones [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Agent Ralph In Action &#124; blog of josh</title>
		<link>http://landofjosh.com/2009/07/an-idea-for-robust-clone-detection-using-abstract-syntax-trees/comment-page-1/#comment-45</link>
		<dc:creator>Agent Ralph In Action &#124; blog of josh</dc:creator>
		<pubDate>Wed, 05 Aug 2009 05:15:56 +0000</pubDate>
		<guid isPermaLink="false">http://landofjosh.com/?p=145#comment-45</guid>
		<description>[...] backend scans source files handed to it by the front end, using the techniques I&#8217;ve been blogging about.  Specifically, clones are identified by comparing abstract syntax trees of methods.  The ASTs [...]</description>
		<content:encoded><![CDATA[<p>[...] backend scans source files handed to it by the front end, using the techniques I&#8217;ve been blogging about.  Specifically, clones are identified by comparing abstract syntax trees of methods.  The ASTs [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
