Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Re: How to check two text files have the same content or not?

Reply
Thread Tools

Re: How to check two text files have the same content or not?

 
 
Tom Anderson
Guest
Posts: n/a
 
      09-03-2008
On Wed, 3 Sep 2008, www wrote:

> I need a method to check two text files whether or not have the same
> content. The file is small, about 10 lines. But I don't like to compare
> them line by line or jam all lines into one line and compare that line.
> I thought there should be some way easier.


Then you have another think coming.

There's no way to do this without reading in both files in their entirety.
If you're going to do that, you might as well just compare them
line-by-line or byte-by-byte. What don't you like about doing that?

> I googled and have written the following method. But it does not work
> correctly. Even the two files have different content, the method returns
> true.
>
> Thank you very much for your help.
>
> /**
> * Check two files' content are the same or not. The two files are
> allowed to be in different locations. If their
> * content inside are the same, return true; otherwise, return false.
> * <p>
> * Note: is possible that the replacement does not alter the checksum?
> */
> public static boolean checkTwoFilesEqual(final File fileA, final File
> fileB) throws IOException
> {
> final CheckedInputStream cisA = new CheckedInputStream(new
> FileInputStream(fileA), new CRC32());
> final CheckedInputStream cisB = new CheckedInputStream(new
> FileInputStream(fileB), new CRC32());
>
> if(cisA.getChecksum().getValue() == cisB.getChecksum().getValue())
> {
> return true;
> }
>
> //not equal
> return false;
> }


A CheckedInputStream only checksums bytes that are read through it. In the
above code, you haven't read any bytes through the streams, so they're
both in their initial states, and so have the same checksum.

You can just do something like this:

InputStream a = new BufferedInputStream(newFileInputStream(fileA)) ;
InputStream b = new BufferedInputStream(newFileInputStream(fileB)) ;
int byteA = 0 ;
int byteB = 0 ;
while ((byteA | byteB) >= 0) {
int byteA = a.read() ;
int byteB = b.read() ;
if (byteA != byteB) return false ;
}
return byteA == byteB ;

tom

--
To a great extent, it's about 'Am I going to go crazy and smash things
as I'm trying to fix them?' -- Bill Jemas, on the Hulk
 
Reply With Quote
 
 
 
 
John B. Matthews
Guest
Posts: n/a
 
      09-03-2008
In article <g9mitg$e2i$>, www <>
wrote:

> Tom Anderson wrote:
>
> > You can just do something like this:
> >
> > InputStream a = new BufferedInputStream(newFileInputStream(fileA)) ;
> > InputStream b = new BufferedInputStream(newFileInputStream(fileB)) ;
> > int byteA = 0 ;
> > int byteB = 0 ;
> > while ((byteA | byteB) >= 0) {
> > int byteA = a.read() ;
> > int byteB = b.read() ;
> > if (byteA != byteB) return false ;
> > }
> > return byteA == byteB ;
> >
> > tom
> >

>
> Thank you. That is interesting. I don't understand the line:
>
> while ((byteA | byteB) >= 0)
>
> Can you explain a little more?


Look at what BufferedInputStream.html#read() returns:

<http://java.sun.com/javase/6/docs/api/java/io/BufferedInputStream.html>

Consider, the possible values returned by indicated operator:

<http://java.sun.com/docs/books/tutorial/java/nutsandbolts/op3.html>

--
John B. Matthews
trashgod at gmail dot com
home dot woh dot rr dot com slash jbmatthews
 
Reply With Quote
 
 
 
 
Joshua Cranmer
Guest
Posts: n/a
 
      09-03-2008
www wrote:
> Thank you. That is interesting. I don't understand the line:
>
> while ((byteA | byteB) >= 0)
>
> Can you explain a little more?


The typical way of terminating a read is:
while ((input = stream.read()) >= 0) /* handle input */

With two bytes, this is:
while (byteA >= 0 && byteB >= 0)

Which is essentially "while the high bit of a and the high bit of b are
both 0"... or "while the high bit of (a | b) is 0":
while ((byteA | byteB) >= 0)

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      09-03-2008
On Wed, 3 Sep 2008, Joshua Cranmer wrote:

> www wrote:
>> Thank you. That is interesting. I don't understand the line:
>>
>> while ((byteA | byteB) >= 0)
>>
>> Can you explain a little more?

>
> The typical way of terminating a read is:
> while ((input = stream.read()) >= 0) /* handle input */
>
> With two bytes, this is:
> while (byteA >= 0 && byteB >= 0)
>
> Which is essentially "while the high bit of a and the high bit of b are both
> 0"... or "while the high bit of (a | b) is 0":
> while ((byteA | byteB) >= 0)


Well put. It's some entirely unnecessary clever bit-twiddling - sorry, i
couldn't resist. Joshua's first version, anding two separate comparisons,
is a more direct way of expressing it, and would be much better from a
readability and maintainability perspective.

tom

--
The literature is filled with bizarre occurrances for which we have
no explanation
 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      11-19-2008
On Wed, 19 Nov 2008, rmcog wrote:

[to me directly - an error, i assume]

> There is a open source utility called md5.It will return the checksum
> based on the file content.


It involves at least as much IO as comparing the files directly (it will
do more when they differ, because it doesn't fail fast), and isn't
guaranteed to work - two different files can have the same MD5 hash. It's
no better.

tom

--
It's a surprising finding, but that's science all over: the results
are often counterintuitive. And that's exactly why you do scientific
research, to check your assumptions. Otherwise it wouldn't be called
"science", it would be called "assuming", or "guessing", or "making it
up as you go along". -- Ben Goldacre
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      11-22-2008
On Wed, 19 Nov 2008 15:49:16 +0000, Tom Anderson
<> wrote, quoted or indirectly quoted someone who
said :

>> There is a open source utility called md5.It will return the checksum
>> based on the file content.

>
>It involves at least as much IO as comparing the files directly (it will
>do more when they differ, because it doesn't fail fast), and isn't
>guaranteed to work - two different files can have the same MD5 hash. It's
>no better.


You can fail fast by comparing file length. If they don't match you
are done. You don't even need to read either file for that.

For the usual case, I would just do a single unbuffered byte read of
each file, then a loop to compare corresponding indexes. For long
files, you must compare in long chunks, reusing the same buffers to
avoid swamping the GC.
--
Roedy Green Canadian Mind Products
http://mindprod.com
Your old road is
Rapidly agin'.
Please get out of the new one
If you can't lend your hand
For the times they are a-changin'.
 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      11-22-2008
On Fri, 21 Nov 2008, Roedy Green wrote:

> On Wed, 19 Nov 2008 15:49:16 +0000, Tom Anderson
> <> wrote, quoted or indirectly quoted someone who
> said :
>
>>> There is a open source utility called md5.It will return the checksum
>>> based on the file content.

>>
>> It involves at least as much IO as comparing the files directly (it will
>> do more when they differ, because it doesn't fail fast), and isn't
>> guaranteed to work - two different files can have the same MD5 hash. It's
>> no better.

>
> You can fail fast by comparing file length. If they don't match you
> are done. You don't even need to read either file for that.


True. But if they're the same size and differ in the first byte, comparing
hashes is a really losing strategy.

> For the usual case, I would just do a single unbuffered byte read of
> each file, then a loop to compare corresponding indexes. For long
> files, you must compare in long chunks, reusing the same buffers to
> avoid swamping the GC.


Sounds about right. Rather than dealing with small and large files
separately, i'd do both through a BufferedInputStream, with a buffer set
to something sensibly large - a few tens of kB, perhaps. Small files will
be slurped into the buffer in one go, and large ones will be handled in
chunks.

tom

--
He's taking towel fandom to a whole other bad level. -- applez,
of coalescent
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
error: Only Content controls are allowed directly in a content page that contains Content controls. hazz ASP .Net 6 06-09-2010 01:54 PM
How do I manipulate the scroll bars so that a div with content thatchanges size shows the content at the same location? Shauniwthanau Javascript 1 03-25-2010 01:55 PM
How to re-write a text files content to have right-justified columns ? Thomas Blabb Perl Misc 3 11-16-2007 02:32 PM
How to compare two SOAP Envelope or two Document or two XML files GenxLogic Java 3 12-06-2006 08:41 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57