Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > IO#sysread on windows

Reply
Thread Tools

IO#sysread on windows

 
 
pihentagy
Guest
Posts: n/a
 
      06-14-2006
Hi!

I tried to write a file dupe finder. For this to work, I created an
improved File::Stat, like this:

class File::StatWithSha < File::Stat
attr_reader :filename, :read
def initialize fn
@filename=File.expand_path fn
@read = 0
super fn
end
def sha1sum
return @sha1sum if @sha1sum ||= nil
warn "Calculating sha1sum for #@filename"
chunk = nil
fs = 0
d = Digest::SHA1.new
File.open(filename) {|f|
begin
while chunk = f.sysread(1048576)
fs += chunk.length
d.update(chunk)
end
rescue EOFError
warn "\nResult is #{d} #{fs} <=> #{self.size}"
return @sha1sum = d
rescue e
warn "Holy ****! #{e}"
end
}
warn "Oh my god!"
exit
end
def inspect; @filename;end
end


When under windows, it fails with both ruby1.8.2 and ruby1.8.4

irb(main):006:0> fws.sha1sum
Calculating sha1sum for F:/private/prg/ruby/g2.rb
Chunk is 2113

Result is c75de1a39ce389e7e198c97345ffad52b074e5e9 2113 <=> 2210
=> c75de1a39ce389e7e198c97345ffad52b074e5e9

Under linux it works fine.
Anyway, how should I calculate the sha1sum of a BIG file, just using
ruby?

 
Reply With Quote
 
 
 
 
Tim Hunter
Guest
Posts: n/a
 
      06-14-2006
pihentagy wrote:
> Hi!
>
> I tried to write a file dupe finder. For this to work, I created an
> improved File::Stat, like this:
>
> class File::StatWithSha < File::Stat
> attr_reader :filename, :read
> def initialize fn
> @filename=File.expand_path fn
> @read = 0
> super fn
> end
> def sha1sum
> return @sha1sum if @sha1sum ||= nil
> warn "Calculating sha1sum for #@filename"
> chunk = nil
> fs = 0
> d = Digest::SHA1.new
> File.open(filename) {|f|
> begin
> while chunk = f.sysread(1048576)
> fs += chunk.length
> d.update(chunk)
> end
> rescue EOFError
> warn "\nResult is #{d} #{fs} <=> #{self.size}"
> return @sha1sum = d
> rescue e
> warn "Holy ****! #{e}"
> end
> }
> warn "Oh my god!"
> exit
> end
> def inspect; @filename;end
> end
>
>
> When under windows, it fails with both ruby1.8.2 and ruby1.8.4
>
> irb(main):006:0> fws.sha1sum
> Calculating sha1sum for F:/private/prg/ruby/g2.rb
> Chunk is 2113
>
> Result is c75de1a39ce389e7e198c97345ffad52b074e5e9 2113 <=> 2210
> => c75de1a39ce389e7e198c97345ffad52b074e5e9
>
> Under linux it works fine.



Probably you should open the files with "rb" instead of letting it
default to "r".

> Anyway, how should I calculate the sha1sum of a BIG file, just using
> ruby?
>


For finding dups, I wonder if it's useful to compare checksums unless
you've already computed them in advance. I notice that Ruby's own
FileUtils.install checks filea == fileb by simply comparing the files
until it finds a difference or gets to EOF.
 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      06-14-2006
Tim Hunter wrote:
> For finding dups, I wonder if it's useful to compare checksums unless
> you've already computed them in advance. I notice that Ruby's own
> FileUtils.install checks filea == fileb by simply comparing the files
> until it finds a difference or gets to EOF.


It depends. If you want to find duplicates in a set of files then using
the digest as hash key can make finding duplicates much faster. OTOH if
you can detect candidates by looking at other attributes (size,
mtime...) then the additional overhead for the checksum calculation
might slow things down. It depends - as always.

Btw, I don't see a reason to use sysread in this scenario. read will do.

Kind regards

robert
 
Reply With Quote
 
pihentagy
Guest
Posts: n/a
 
      06-14-2006
Tim Hunter wrote:
> Probably you should open the files with "rb" instead of letting it
> default to "r".

Holy s**t! Since I tried and failed on textfiles, I don't know why does
it count anyway.
Ah, that damned \r\n - \n transformation I guess.

> For finding dups, I wonder if it's useful to compare checksums unless
> you've already computed them in advance. I notice that Ruby's own
> FileUtils.install checks filea == fileb by simply comparing the files
> until it finds a difference or gets to EOF.

Well, first I'd like to partition files based on filesize. And after
that, I compare them.
If you have more than 2 files having the same size, it's better to
calculate sha1sum for all the files involved once. And, if you'd like
to live on the safe side, you can compare by content the files having
the same sha1sum.
And, you can improve caching sha1sums (say in a file in every
directory).

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
!Windows Live Mail replace Outlook Express on Windows XP and Windows Mail on Vista... Max Burke NZ Computing 8 05-18-2007 12:10 AM
Windows XP Home Connected to Windows XP Pro via TCP/IP Armstrong Wong Wireless Networking 1 11-25-2004 01:12 PM
wireless ad-hoc with Windows XP and Windows 2000 =?Utf-8?B?ZHVtbWthdWY=?= Wireless Networking 1 09-23-2004 11:34 AM
Windows XP laptop and Windows 2000 desktop won't communicate =?Utf-8?B?UmlmbGVtYW4=?= Wireless Networking 0 08-19-2004 03:35 AM



Advertisments