Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Two programs with same logic

Reply
Thread Tools

Two programs with same logic

 
 
karthikbalaguru
Guest
Posts: n/a
 
      12-20-2009
Hi,
Two set of codes can be compared by using
beyond compare or other equivalent software
to determine the areas of differences and the
areas of similarities. But, there are scenarios
in which the same logic would be used by 2 set of
programs/softwares but that the variable names
might be different. So, how to determine this
without executing the program ? Is there any tool
that will help in identifying the existence of similar
logics in two different programs ? Any ideas ?

Thx in advans,
Karthik Balaguru
 
Reply With Quote
 
 
 
 
Stefan Ram
Guest
Posts: n/a
 
      12-20-2009
karthikbalaguru <(E-Mail Removed)> writes:
>in which the same logic would be used by 2 set of


Determining whether two programs behave the same for
every input should in general be equivalent to the

http://en.wikipedia.org/wiki/Halting_problem

.

For special conditions, it should still be possible:
For example, when variables are only renamed, one
can replace both name sequences by generated names.

(I have removed other groups from the »Newsgroups:«
header.)

 
Reply With Quote
 
 
 
 
Keith Thompson
Guest
Posts: n/a
 
      12-20-2009
http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de (Stefan Ram) writes:
> karthikbalaguru <(E-Mail Removed)> writes:
>>in which the same logic would be used by 2 set of

>
> Determining whether two programs behave the same for
> every input should in general be equivalent to the
>
> http://en.wikipedia.org/wiki/Halting_problem
>
> .
>
> For special conditions, it should still be possible:
> For example, when variables are only renamed, one
> can replace both name sequences by generated names.
>
> (I have removed other groups from the »Newsgroups:«
> header.)


The original article was cross-posted to
comp.os.linux.development.apps, comp.programming,
comp.unix.programmer, and comp.lang.c. Why did you remove everything
but comp.lang.c? The question isn't specific to C or to any other
language; surely comp.programming would be more appropriate.

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      12-20-2009
Keith Thompson <(E-Mail Removed)> writes:
>The question isn't specific to C or to any other language;
>surely comp.programming would be more appropriate.


Since it was posted to comp.lang.c, I assumed that the
OP was referring to a C program. This assumption might
have been wrong, but I had it at the time I was reading
his OP.

>Why did you remove everything but comp.lang.c?


Since I had this assumption, I deemed the other groups
to be not appropriate. (I might have been wrong with
this assessment.)

It is my own posting style not to crosspost to multiple
groups without sufficient reason. This was another reason
for me to trim the Newgroup header.

 
Reply With Quote
 
Antoninus Twink
Guest
Posts: n/a
 
      12-20-2009
On 20 Dec 2009 at 15:50, Stefan Ram wrote:
> karthikbalaguru <(E-Mail Removed)> writes:
>>in which the same logic would be used by 2 set of

>
> Determining whether two programs behave the same for every input
> should in general be equivalent to the
> http://en.wikipedia.org/wiki/Halting_problem


A gold medal there for exceeding even the customary level of uselessness
in a clc response.

It's obvious that the OP doesn't care about whether two programs behave
the same for every input. He's interested in the question:
heuristically, given two programs, are they "very similar"?

Notice that there are plenty of practical reasons for wanting to answer
this question:
* if the programs are given as source code, an instructor wants to check
for copying-with-superficial-changes; or a lawyer wants to check for a
GPL infringement
* if the programs are binary, anti-virus vendors might be interested in
this

Compare that with the halting problem, which has no practical
application and is of purely theoretical interest.

 
Reply With Quote
 
spinoza1111
Guest
Posts: n/a
 
      12-22-2009
On Dec 20, 11:42*pm, karthikbalaguru <(E-Mail Removed)>
wrote:
> Hi,
> Two set of codes can be compared by using
> beyond compare or other equivalent software
> to determine the areas of differences and the
> areas of similarities. But, there are scenarios
> in which the same logic would be used by 2 set of
> programs/softwares but that the variable names
> might be different. So, how to determine this
> without executing the program ? Is there any tool
> that will help in identifying the existence of similar
> logics in two different programs ? Any ideas ?
>
> Thx in advans,
> Karthik Balaguru


As others have pointed out, the general problem is equivalent to the
Halting problem and is not solvable by an automated tool.

However, there are useful levels of similarity, where "similarity" is
a generalized identity:

(1) Similarity after compiler lexical analysis: "printf( \"Hello world\
\n\" );" as a fragment is similar in this way to "printf(\"Hello world\
\n\");"

(2) Similarity after parsing: "int a; a = 1+1;" is similar in this
sense to "int b; b = 1+1;"

(3) Similarity after code motion, elimination and optimization that
preserves intent: "int a; int b; a = 0; b = 0;" is similar to "int b;
int a; b = 0; a = 0;". "if (b || -1) c();" is similar in this sense to
"c();"

Factoring compilers better and providing an open architecture would
allow us to use compilers for far more than merely compiling. Instead
of shipping compilers, we should ship components. But, it's much more
amusing here to engage in the politics of personal destruction.
 
Reply With Quote
 
Joe Beanfish
Guest
Posts: n/a
 
      12-22-2009
karthikbalaguru wrote:
> Hi,
> Two set of codes can be compared by using
> beyond compare or other equivalent software
> to determine the areas of differences and the
> areas of similarities. But, there are scenarios
> in which the same logic would be used by 2 set of
> programs/softwares but that the variable names
> might be different. So, how to determine this
> without executing the program ? Is there any tool
> that will help in identifying the existence of similar
> logics in two different programs ? Any ideas ?


As others say, it would be difficult at best to find dups. But a
couple approaches that could help. One might be to run both codes
through an obfuscator which would replace the programmer's meaningful
variable names with generated names. Then compare. But there's still
a lot of room for programming style to mess you up.

A second approach would be to compile both to assembler and compare
the assembler code. That should remove a fair amount of programmer
style problems but not all.
 
Reply With Quote
 
Kaz Kylheku
Guest
Posts: n/a
 
      12-22-2009
On 2009-12-20, karthikbalaguru <(E-Mail Removed)> wrote:
> Hi,
> Two set of codes can be compared by using
> beyond compare or other equivalent software
> to determine the areas of differences and the
> areas of similarities. But, there are scenarios
> in which the same logic would be used by 2 set of
> programs/softwares but that the variable names
> might be different. So, how to determine this
> without executing the program ? Is there any tool
> that will help in identifying the existence of similar
> logics in two different programs ? Any ideas ?


The structural similarity you are hinting at can be easily found if you parse
the code into a nice abstract syntax tree representation. Then it's just a
matter of writing an equality function which compares two pieces of code.

This is easiest to understand and illustrate through the Lisp language.

The comparison is similar to a routine that compares two trees, except that it
has to understand all ``special forms'' in the language, and their semantics.
That is to say, the comparison has to behave as a ``code walker'', with respect
to two trees at the same time. It has to recognize all special syntactic
constructs, so that it can apply the right kind of comparison.

It also must be capable of performing variable renaming.

So for instance we might want these two to be found equivalent:

(let ((x 1)) (+ x x))

(let ((y 1)) (+ y y))

For thaat we have to recognize that we have two matching constructs
(both are ``let''). They both bind the same number of variables,
using the same initialization values. Next, we have to normalize
the bodies. We can do that by renaming the variables on each
side: e.g. both x and y get renamed to t0 (temporary variable #0).
Each time we rename a variable we generate a new temporary.
When we rename x to t0 and substitute, we end up with:

(let ((t0 1)) (+ t0 t0))

And when we do the same substitution on the variable y on the second form we
of course also end up with:

(let ((t0 1)) (+ t0 t0))

So now having done this substituion over the let, we can recursively process
the body of each let and continue comparing: we now end up comparing (+ t0 t0)
to (+ t0 t0). We have a positive match on the same symbol +, and we know that
it's a function call. The two functions calls have the same number of
arguments, which we can compare one by one, finding an exact match in
ech position: the expression t0 trivially matches the expression t0.
Thus having reached bottom, we conclude that the two constructs are equivalent
(modulo variable naming).

This applies to the comparison of C programs too because for the C language,
you can pick this kind of tree representation with symbols and atoms.
Parse the code to that, and write a comparison which applies the right
kind of rules.

The question is the semantic depth that you want in the comparison.

Should the expression !(!P || !Q) match P && Q, by the application of
De Morgan's law?

Does your equivalence function fold constant expressions, so that
2 + 2 matches 1 + 3?

There is a huge range of things you can do in between the most naive
static comparison (in which even differences in variable names do matter)
to running the program and trying to identify that it does the same things
to all inputs (running into the halting problem).


Now about implementing this. Turns out, the annoying part of parsing C to
structure is already done. For instance, there is this Japanese gentleman's
project called SC:

http://super.para.media.kyoto-u.ac.j...c/index-e.html

With this software, you would be able to just concentrate on writing the
equivalence function.
 
Reply With Quote
 
karthikbalaguru
Guest
Posts: n/a
 
      12-27-2009
On Dec 23, 12:05*am, Joe Beanfish <(E-Mail Removed)> wrote:
> karthikbalaguru wrote:
> > Hi,
> > Two set of codes can be compared by using
> > beyond compare or other equivalent software
> > to determine the areas of differences and the
> > areas of similarities. But, there are scenarios
> > in which the same logic would be used by 2 set of
> > programs/softwares but that the variable names
> > might be different. So, how to determine this
> > without executing the program ? Is there any tool
> > that will help in identifying the existence of similar
> > logics in two different programs ? Any ideas ?

>
> As others say, it would be difficult at best to find dups. But a
> couple approaches that could help. One might be to run both codes
> through an obfuscator which would replace the programmer's meaningful
> variable names with generated names. Then compare. But there's still
> a lot of room for programming style to mess you up.
>


Agreed !

> A second approach would be to compile both to assembler and compare
> the assembler code. That should remove a fair amount of programmer
> style problems but not all.


Interesting approach ! I think, this is better than the earlier
approach. But, Need to know the kind of programmer style
problems that will not get covered by this second approach.
Any ideas ?

Thx in advans,
Karthik Balaguru
 
Reply With Quote
 
karthikbalaguru
Guest
Posts: n/a
 
      12-27-2009
On Dec 23, 12:53*am, Kaz Kylheku <(E-Mail Removed)> wrote:
> On 2009-12-20, karthikbalaguru <(E-Mail Removed)> wrote:
>
> > Hi,
> > Two set of codes can be compared by using
> > beyond compare or other equivalent software
> > to determine the areas of differences and the
> > areas of similarities. But, there are scenarios
> > in which the same logic would be used by 2 set of
> > programs/softwares but that the variable names
> > might be different. So, how to determine this
> > without executing the program ? Is there any tool
> > that will help in identifying the existence of similar
> > logics in two different programs ? Any ideas ?

>
> The structural similarity you are hinting at can be easily found if you parse
> the code into a nice abstract syntax tree representation. Then it's just a
> matter of writing an equality function which compares two pieces of code.
>
> This is easiest to understand and illustrate through the Lisp language.
>
> The comparison is similar to a routine that compares two trees, except that it
> has to understand all ``special forms'' in the language, and their semantics.
> That is to say, the comparison has to behave as a ``code walker'', with respect
> to two trees at the same time. It has to recognize all special syntactic
> constructs, so that it can apply the right kind of comparison.
>
> It also must be capable of performing variable renaming.
>
> So for instance we might want these two to be found equivalent:
>
> * (let ((x 1)) (+ x x))
>
> * (let ((y 1)) (+ y y))
>
> For thaat we have to recognize that we have two matching constructs
> (both are ``let''). They both bind the same number of variables,
> using the same initialization values. Next, we have to normalize
> the bodies. We can do that by renaming the variables on each
> side: e.g. both x and y get renamed to t0 (temporary variable #0).
> Each time we rename a variable we generate a new temporary.
> When we rename x to t0 and substitute, we end up with:
>
> * (let ((t0 1)) (+ t0 t0))
>
> And when we do the same substitution on the variable y on the second form we
> of course also end up with:
>
> * (let ((t0 1)) (+ t0 t0))
>
> So now having done this substituion over the let, we can recursively process
> the body of each let and continue comparing: we now end up comparing (+ t0 t0)
> to (+ t0 t0). We have a positive match on the same symbol +, and we know that
> it's a function call. *The two functions calls have the same number of
> arguments, which we can compare one by one, finding an exact match in
> ech position: the expression t0 trivially matches the expression t0.
> Thus having reached bottom, we conclude that the two constructs are equivalent
> (modulo variable naming).
>
> This applies to the comparison of C programs too because for the C language,
> you can pick this kind of tree representation with symbols and atoms.
> Parse the code to that, and write a comparison which applies the right
> kind of rules.
>


Interesting. Agreed !

> The question is the semantic depth that you want in the comparison.
>


True !

> Should the expression *!(!P || !Q) match P && Q, by the application of
> De Morgan's law?
>
> Does your equivalence function fold constant expressions, so that
> 2 + 2 matches 1 + 3?
>
> There is a huge range of things you can do in between the most naive
> static comparison (in which even differences in variable names do matter)
> to running the program and trying to identify that it does the same things
> to all inputs (running into the halting problem).


Yeah. True !

> Now about implementing this. Turns out, the annoying part of parsing C to
> structure is already done. For instance, there is this Japanese gentleman's
> project called SC:
>
> http://super.para.media.kyoto-u.ac.j...c/index-e.html
>
> With this software, you would be able to just concentrate on writing the
> equivalence function.


Okay, I will look into this !

Thx,
Karthik Balaguru
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Python Logic Map/Logic Flow Chart. (Example Provided) spike Python 8 02-09-2010 12:31 PM
Asynchronous Logic Gates and Analog Logic Gates Jyoti Ballabh Software 3 11-26-2009 06:48 PM
Prevent two users from accessing the same file at the same time Shawn ASP .Net 2 02-19-2006 03:11 AM
Two programs accessing the same array greyham433@hotmail.com C++ 3 02-01-2005 01:11 PM
Two PIX on same subnet with same gateway? This Old Man Cisco 4 10-20-2003 07:27 PM



Advertisments