The source code for comparator can be downloaded from this page. As usual, the article is linked through [read more] below.
The source trees get sliced into overlapping three-line shreds. The shreds then get turned into a list of 32-byte signatures by a process called MD5 hashing; each signature keeps information about its file and line number range.

"If the MD5 signatures are different, then the shreds that they were made from are different. When they match, it is almost certain than the two shreds they were made from are the same, to within odds of eighteen quadrillion to one. MD5 is normally used for making unforgeable digital signatures, but the side effect I'm exploiting is that it gives you a fast way to compare texts for equality," Raymond told eWEEK on Monday.
