Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Computing > Computer Support > Finding new / dupe files on large file system

Reply
Thread Tools

Finding new / dupe files on large file system

 
 
Anonymous Coward
Guest
Posts: n/a
 
      09-17-2004
I would appreciate some help with the following challange.

I'm looking for a way to compare two rather large file systems. They
contain hundreds of directories, going multiple levels deep, > 10K
files, long file names.

For simplicity's sake, let's call them "FS_A" and "FS_B". I'm trying
to obtain an inventory (dumped out into a file that I can manipulate)
of all files in "FS_B" with the following validation against "FS_A":

Minimum:
List each file in "FS_B" for which there is no corresponding entry in
"FS_A".

Gravy:
In addition to "minimum", for each file in "FS_B", where a
corresponding entry in "FS_A" is found, list the path and filename for
both entries.

In the above context, "corresponding" means a file with identical
content (even if folder, filename and / or date stamp don't match).

Here are some of the assumptions about the two filesystems and the
files contained therein:

- The folder / directory structures are not identical.
- File names and date stamps may have been altered without changing
the actual file content.

Also, I'm not sure if I picked the right newsgroup(s) to post this to.
If there is a better choice, I would appreciate it if someone could
point me in the right direction.

Thank you.
 
Reply With Quote
 
 
 
 
Anonymous Coward
Guest
Posts: n/a
 
      09-21-2004
Well, I didn't find the type of "off the shelf" product I was looking
for. Ultimately, I decided to cut my losses and go the
"do-it-yourself" route.

In short, I used MD5Deep (http://md5deep.sourceforge.net/) to obtain
MD5 hashes for the files in the filesystems in question. MD5Deep has
a handy recursive directory processing feature which allowed me to
scan each tree with a single command. I grabbed the output from file
and compared the filesystems based on the MD5 hashes.

Unfortunately, I only found out about MD5Deep after I'd already burned
time writing code to work with some md5sum windows ports that ran into
problems with some long filenames in the filesystem in question. If I
had known about MD5Deep up front, I could have saved myself a lot of
time. But then again, there's always a next time...


On Fri, 17 Sep 2004 00:54:39 GMT, Anonymous Coward
<(E-Mail Removed)> wrote:

>I would appreciate some help with the following challange.
>
>I'm looking for a way to compare two rather large file systems. They
>contain hundreds of directories, going multiple levels deep, > 10K
>files, long file names.
>
>For simplicity's sake, let's call them "FS_A" and "FS_B". I'm trying
>to obtain an inventory (dumped out into a file that I can manipulate)
>of all files in "FS_B" with the following validation against "FS_A":
>
>Minimum:
>List each file in "FS_B" for which there is no corresponding entry in
>"FS_A".
>
>Gravy:
>In addition to "minimum", for each file in "FS_B", where a
>corresponding entry in "FS_A" is found, list the path and filename for
>both entries.
>
>In the above context, "corresponding" means a file with identical
>content (even if folder, filename and / or date stamp don't match).
>
>Here are some of the assumptions about the two filesystems and the
>files contained therein:
>
>- The folder / directory structures are not identical.
>- File names and date stamps may have been altered without changing
>the actual file content.
>
>Also, I'm not sure if I picked the right newsgroup(s) to post this to.
>If there is a better choice, I would appreciate it if someone could
>point me in the right direction.
>
>Thank you.


 
Reply With Quote
 
 
 
 
Terry Pinnell
Guest
Posts: n/a
 
      09-25-2004
Anonymous Coward <(E-Mail Removed)> wrote:

>Well, I didn't find the type of "off the shelf" product I was looking
>for. Ultimately, I decided to cut my losses and go the
>"do-it-yourself" route.

<snip>

It's plainly too late now, but for future reference you might check
out a utility I saw called FileSync, from
http://www.fileware.com/index.htm

--
Terry, West Sussex, UK

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Finding Dupe in a List Larry Bud ASP .Net 4 07-31-2007 08:36 PM
Firefox (mybe a dupe) MrMagoo Computer Support 2 12-19-2005 11:54 PM
dupe records on refresh 2obvious ASP General 1 02-11-2005 05:35 PM
*HELP* please: how to dupe STDIN in an IO::Scalar? J Krugman Perl Misc 1 11-25-2004 01:53 AM
Backing Up Large Files..Or A Large Amount Of Files Scott D. Weber For Unuathorized Thoughts Inc. Computer Support 1 09-19-2003 07:28 PM



Advertisments