Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > tests

Reply
Thread Tools

tests

 
 
nikolay marinov
Guest
Posts: n/a
 
      08-09-2007
Hi, everyone.Does anybody have an idea how can i test two xls files for
equality with Python
 
Reply With Quote
 
 
 
 
kyosohma@gmail.com
Guest
Posts: n/a
 
      08-09-2007
On Aug 9, 8:21 am, nikolay marinov <(E-Mail Removed)>
wrote:
> Hi, everyone.Does anybody have an idea how can i test two xls files for
> equality with Python


You should be able to read chunks of each file in binary mode and do a
compare to check for equality. Some kind of loop should do the trick.

Mike

 
Reply With Quote
 
 
 
 
brad
Guest
Posts: n/a
 
      08-09-2007
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> You should be able to read chunks of each file in binary mode and do a
> compare to check for equality. Some kind of loop should do the trick.


Why not a simple md5 or sha with the hash library?
 
Reply With Quote
 
dijkstra.arjen@gmail.com
Guest
Posts: n/a
 
      08-09-2007
On Aug 9, 4:04 pm, brad <(E-Mail Removed)> wrote:
> (E-Mail Removed) wrote:
> > You should be able to read chunks of each file in binary mode and do a
> > compare to check for equality. Some kind of loop should do the trick.

>
> Why not a simple md5 or sha with the hash library?


Or even:

http://docs.python.org/lib/module-filecmp.html

 
Reply With Quote
 
special_dragonfly
Guest
Posts: n/a
 
      08-09-2007

<(E-Mail Removed)> wrote in message
news:(E-Mail Removed) oups.com...
> On Aug 9, 4:04 pm, brad <(E-Mail Removed)> wrote:
>> (E-Mail Removed) wrote:
>> > You should be able to read chunks of each file in binary mode and do a
>> > compare to check for equality. Some kind of loop should do the trick.

>>
>> Why not a simple md5 or sha with the hash library?

>
> Or even:
>
> http://docs.python.org/lib/module-filecmp.html
>


My understanding of reading that is that it only looks at the file names
themselves and not their contents. So whether filename1=filename2 and in the
case of the function below it, whether one directory has files which are in
the other.
Correct me if I'm wrong.
Dom

P.S. md5 or sha hash is what I'd go for, short of doing:

MyFirstFile=file("file1.xls")
MySecondFile=file("file2.xls")
If MyFirstFile==MySecondFile:
print "True"

although this won't tell you where they're different, just that they are...


 
Reply With Quote
 
Jason
Guest
Posts: n/a
 
      08-09-2007
On Aug 9, 8:46 am, "special_dragonfly" <(E-Mail Removed)>
wrote:
> <(E-Mail Removed)> wrote in message
> >http://docs.python.org/lib/module-filecmp.html

>
> My understanding of reading that is that it only looks at the file names
> themselves and not their contents. So whether filename1=filename2 and in the
> case of the function below it, whether one directory has files which are in
> the other.
> Correct me if I'm wrong.
> Dom
>
> P.S. md5 or sha hash is what I'd go for, short of doing:
>
> MyFirstFile=file("file1.xls")
> MySecondFile=file("file2.xls")
> If MyFirstFile==MySecondFile:
> print "True"
>
> although this won't tell you where they're different, just that they are...


You're incorrect. If the shallow flag is not given or is true, the
results of os.stat are used to compare the two files, so if they have
the same size, change times, etc, they're considered the same.

If the shallow flag is given and is false, their contents are
compared. In either case, the results are cached for efficiency's
sake.

--Jason


The documentation for filecmp.cmp is:
cmp( f1, f2[, shallow])
Compare the files named f1 and f2, returning True if they seem
equal, False otherwise.

Unless shallow is given and is false, files with identical
os.stat() signatures are taken to be equal.

Files that were compared using this function will not be
compared again unless their os.stat() signature changes.

Note that no external programs are called from this function,
giving it portability and efficiency.

 
Reply With Quote
 
Steve Holden
Guest
Posts: n/a
 
      08-09-2007
Jason wrote:
> On Aug 9, 8:46 am, "special_dragonfly" <(E-Mail Removed)>
> wrote:
>> <(E-Mail Removed)> wrote in message
>>> http://docs.python.org/lib/module-filecmp.html

>> My understanding of reading that is that it only looks at the file names
>> themselves and not their contents. So whether filename1=filename2 and in the
>> case of the function below it, whether one directory has files which are in
>> the other.
>> Correct me if I'm wrong.
>> Dom
>>
>> P.S. md5 or sha hash is what I'd go for, short of doing:
>>
>> MyFirstFile=file("file1.xls")
>> MySecondFile=file("file2.xls")
>> If MyFirstFile==MySecondFile:
>> print "True"
>>
>> although this won't tell you where they're different, just that they are...

>
> You're incorrect. If the shallow flag is not given or is true, the
> results of os.stat are used to compare the two files, so if they have
> the same size, change times, etc, they're considered the same.
>
> If the shallow flag is given and is false, their contents are
> compared. In either case, the results are cached for efficiency's
> sake.
>
> --Jason
>
>
> The documentation for filecmp.cmp is:
> cmp( f1, f2[, shallow])
> Compare the files named f1 and f2, returning True if they seem
> equal, False otherwise.
>
> Unless shallow is given and is false, files with identical
> os.stat() signatures are taken to be equal.
>
> Files that were compared using this function will not be
> compared again unless their os.stat() signature changes.
>
> Note that no external programs are called from this function,
> giving it portability and efficiency.
>


This discussion seems to assume that Excel spreadsheets are stored in
some canonical form so that two spreads with the same functionality are
always identical on disk to the last bit. I very much doubt this is true
(consider as an example the file properties that can be set).

So really you need to define "equality". So far the tests discussed have
concentrated on identifying identical files.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

 
Reply With Quote
 
Jay Loden
Guest
Posts: n/a
 
      08-09-2007
Steve Holden wrote:
> This discussion seems to assume that Excel spreadsheets are stored in
> some canonical form so that two spreads with the same functionality are
> always identical on disk to the last bit. I very much doubt this is true
> (consider as an example the file properties that can be set).
>
> So really you need to define "equality". So far the tests discussed have
> concentrated on identifying identical files.
>
> regards
> Steve


I was wondering myself if the OP was actually interested in binary identical
files, or just duplicated content. If just duplicated content, perhaps this
could be used as a starting point:

http://aspn.activestate.com/ASPN/Coo.../Recipe/440661

and the actual data can be compared

-Jay
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Junit tests, setting up tests without having to create a billion methods xyzzy12@hotmail.com Java 8 02-28-2006 08:59 PM
Tests without study guides or practice tests? =?Utf-8?B?Q2hyaXNS?= Microsoft Certification 8 12-20-2005 04:59 AM
Constant.t fails 240 of 272 tests and recurs.t fails 1 of 25 tests on HPUX using perl 5.8.7 dayo Perl Misc 11 12-16-2005 09:09 PM
Saving the db state for bug reproduction and tests George Homorozeanu ASP .Net 0 09-20-2005 04:48 PM
Re: Discount for tests Techie Microsoft Certification 0 07-05-2003 12:02 AM



Advertisments