Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Trouble with embedded whitespace in filenames using File::Find

Reply
Thread Tools

Trouble with embedded whitespace in filenames using File::Find

 
 
Clint O
Guest
Posts: n/a
 
      01-21-2013
The following program I wrote I'm using to find duplicate files. The problem is that I have files with whitespace or potentially other special characters:

#!/opt/local/bin/perl

use Digest::MD5;
use File::Find;
use Data:umper;

use strict;
use warnings;

my %results = ();

sub do_file;

my @files = @ARGV;

exit 1 if !@files;

find(sub { do_file(\%results) }, @files );

for (keys %results) {
my @f = @{$results{$_}};

if (scalar @f > 1) {
print "$f[0] => $f[1]\n";
}
}

sub do_file {
my ($hash) = @_;
return if -d $_;

open(my $fh, $_) or die "Can't open '$File::Find::name': $!";
binmode $fh;

my $digest;

$digest = Digest::MD5->new->addfile($fh)->hexdigest;
close $fh;

push @{$hash->{$digest}}, $File::Find::name;
}

0;

If I create a test directory:

$ mkdir test_dir
$ cd test_dir
$ touch " my file"
$ ./dupcheck testdir
Can't open 'testdir/ my file': No such file or directory at ./dupcheck line 32.

I can't be the first one who has run into this problem, and I'm sure there's a reasonable explanation for how to cope with this, but I haven't been able to find anything via the searching etc. on the web.

Thanks,

-Clint
 
Reply With Quote
 
 
 
 
Clint O
Guest
Posts: n/a
 
      01-21-2013
On Monday, January 21, 2013 1:15:19 PM UTC-8, Henry Law wrote:
>
> > Can't open 'testdir/ my file': No such file or directory at ./dupcheck line 32.

>
>
>
> You created
>
>
>
> "test_dir/my file"
>
> ^
>
>
>
> and you're trying to open
>
>
>
> "testdir/ my file".
>
> ^
>
>
>
> It's not there, so the program complains.


Well, that "test_dir" is clearly a typo. This program would have never generated this output with a non-existent directory:

$ ./dupcheck /asfasfasdfasdf
Can't stat /asfasfasdfasdf: No such file or directory
at ./dupcheck line 18

Anyway, my issue still stands. I cannot open a local file with embedded whitespace.

Thanks,

-Clint
 
Reply With Quote
 
 
 
 
Rainer Weikusat
Guest
Posts: n/a
 
      01-21-2013
Clint O <(E-Mail Removed)> writes:

[...]

> The following program I wrote I'm using to find duplicate files. The
> problem is that I have files with whitespace or potentially other
> special characters:


[...]

> open(my $fh, $_) or die "Can't open '$File::Find::name': $!";


Since you didn't specify an explicit open mode, perl parses $_ in
order to look for one and it skips leading whitespace, cf

The filename passed to 2-argument (or 1-argument) form of
open() will have leading and trailing whitespace deleted, and
the normal redirection characters honored.
[perldoc -f open]

Using open($fh, '<', $_) instead works.

BTW: Assuming you're running this as root, someone who doesn't like
you could create a file named |rm -rf `printf "\x2f"` and you probably
wouldn't like the result of trying to open that.

NB: DO NOT TRY THIS. Except if I made an error, this will execute rm
-rf / with the privileges of the invoker.

More harmless: td/|ls `printf "..\x2f"`. This will list the contents
of the directory above td.
 
Reply With Quote
 
Clint O
Guest
Posts: n/a
 
      01-21-2013
On Monday, January 21, 2013 1:24:28 PM UTC-8, Rainer Weikusat wrote:
> Since you didn't specify an explicit open mode, perl parses $_ in
>
> order to look for one and it skips leading whitespace, cf
>
>
>
> The filename passed to 2-argument (or 1-argument) form of
>
> open() will have leading and trailing whitespace deleted, and
>
> the normal redirection characters honored.
>
> [perldoc -f open]
>
>
>
> Using open($fh, '<', $_) instead works.
>
>
>
> BTW: Assuming you're running this as root, someone who doesn't like
>
> you could create a file named |rm -rf `printf "\x2f"` and you probably
>
> wouldn't like the result of trying to open that.
>
>
>
> NB: DO NOT TRY THIS. Except if I made an error, this will execute rm
>
> -rf / with the privileges of the invoker.
>
>
>
> More harmless: td/|ls `printf "..\x2f"`. This will list the contents
>
> of the directory above td.


Ok, thanks for the tip and the heads-up. I am running the program as root on a NAS, and the files are created by my family, but just as a good FYI, are there ways I can protect myself against malicious code? Running as root ensures I can read all the files w/o question. I've used Safe before, but I'm not sure whether it's necessary or appropriate for this application.

Thanks,

-Clint
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      01-21-2013
Clint O <(E-Mail Removed)> wrote:
>On Monday, January 21, 2013 1:15:19 PM UTC-8, Henry Law wrote:
>>
>> > Can't open 'testdir/ my file': No such file or directory at ./dupcheck line 32.

>>
>> You created
>> "test_dir/my file"
>> ^
>> and you're trying to open
>> "testdir/ my file".
>> ^
>>
>> It's not there, so the program complains.

>
>Well, that "test_dir" is clearly a typo.


So, you should be thankful that Clint found that typo and pointed it out
to, right?

>Anyway, my issue still stands. I cannot open a local file with embedded whitespace.


Well, nobody claimed that there is only on issue in your program.

jue
 
Reply With Quote
 
Clint O
Guest
Posts: n/a
 
      01-21-2013
On Monday, January 21, 2013 2:21:26 PM UTC-8, Jürgen Exner wrote:
> Clint O wrote:
>
> >On Monday, January 21, 2013 1:15:19 PM UTC-8, Henry Law wrote:

>
> >>

>
> >> > Can't open 'testdir/ my file': No such file or directory at ./dupcheck line 32.

>
> >>

>
> >> You created

>
> >> "test_dir/my file"

>
> >> ^

>
> >> and you're trying to open

>
> >> "testdir/ my file".

>
> >> ^

>
> >>

>
> >> It's not there, so the program complains.

>
> >

>
> >Well, that "test_dir" is clearly a typo.

>
>
>
> So, you should be thankful that Clint found that typo and pointed it out
>
> to, right?
>
>
>
> >Anyway, my issue still stands. I cannot open a local file with embedded whitespace.

>
>
>
> Well, nobody claimed that there is only on issue in your program.


Well, if you're going to critique my program and bother to post a reply, atleast make it relevant. People request that you post entire scripts so that the problem can be seen by others. I did due diligence by posting the script and made a mistake in the testcase.

-Clint
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      01-21-2013
Clint O <(E-Mail Removed)> wrote:
[Fullquote to prove my point]
>On Monday, January 21, 2013 2:21:26 PM UTC-8, Jürgen Exner wrote:
>> Clint O wrote:
>>
>> >On Monday, January 21, 2013 1:15:19 PM UTC-8, Henry Law wrote:

>>
>> >>

>>
>> >> > Can't open 'testdir/ my file': No such file or directory at ./dupcheck line 32.

>>
>> >>

>>
>> >> You created

>>
>> >> "test_dir/my file"

>>
>> >> ^

>>
>> >> and you're trying to open

>>
>> >> "testdir/ my file".

>>
>> >> ^

>>
>> >>

>>
>> >> It's not there, so the program complains.

>>
>> >

>>
>> >Well, that "test_dir" is clearly a typo.

>>
>>
>>
>> So, you should be thankful that Clint found that typo and pointed it out
>>
>> to, right?
>>
>>
>>
>> >Anyway, my issue still stands. I cannot open a local file with embedded whitespace.

>>
>>
>>
>> Well, nobody claimed that there is only on issue in your program.

>
>Well, if you're going to critique my program and bother to post a reply, at least make it relevant. People request that you post entire scripts so that the problem can be seen by others. I did due diligence by posting the script and made a mistake in the testcase.


Ok, because you explicitely asked for it:
- is there a specific reason why you are adding an empty line after
every line you quote? That doesn't improve readability one bit and makes
quoting your post rather tedious.
- Is there a specific reason why your lines are longer than the usual
70-75 characters?

jue
 
Reply With Quote
 
Clint O
Guest
Posts: n/a
 
      01-21-2013
On Monday, January 21, 2013 3:08:35 PM UTC-8, Jürgen Exner wrote:
> Ok, because you explicitely asked for it:
>
> - is there a specific reason why you are adding an empty line after
>
> every line you quote? That doesn't improve readability one bit and makes
>
> quoting your post rather tedious.
>
> - Is there a specific reason why your lines are longer than the usual
>
> 70-75 characters?


I'm guessing these might be artifacts of Google Groups web interface. That's what I'm using to read the group. It's hard(er) to control the formattingof my responses. Coming from a hard-nosed slrn background, I agree that itis annoying, and if I can figure it out I will fix it.

Thanks,

-Clint
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      01-22-2013
Ben Morrow <(E-Mail Removed)> writes:
> Quoth Clint O <(E-Mail Removed)>:
>> On Monday, January 21, 2013 1:24:28 PM UTC-8, Rainer Weikusat wrote:
>> >
>> > BTW: Assuming you're running this as root, someone who doesn't like
>> >
>> > you could create a file named |rm -rf `printf "\x2f"` and you probably
>> >
>> > wouldn't like the result of trying to open that.

>>
>> Ok, thanks for the tip and the heads-up. I am running the program as
>> root on a NAS, and the files are created by my family, but just as a
>> good FYI, are there ways I can protect myself against malicious code?
>> Running as root ensures I can read all the files w/o question.


[...]

> If you must do this as root, I would seriously consider using find(1),
> xargs(1) and md5(1) instead, assuming your find and xargs support the
> -print0 and -0 arguments. You're much less likely to make a serious
> mistake using preexisting utilities than trying to write your own.


Sorry to be so blunt but this is a really stupid suggestion: It's not
only that a lot of characters valid in filenames are of syntactic
relevance to the shell but it will also perform multiple passes of
textual substitution on a complete input line and happily execute
whatever the combined result happens to be, IOW, the shell does not
genuinely distinguish between 'script text from a file' and 'text
produced as result of an operation performed by the script', making it
an extremely poor choice for writing code supposed to run in a hostile
environment. perl is much better in this respect because it not only
doesn't execute data 'by default' (just when explicitly asked to) but
it can also be made to complain about a lot of potentially unsafe
'data flows', see 'Taint mode' in perlsec. These checks can be onerous
at times but they should catch a lot of accidental errors (such as the
2-arg open of a string which came from the file system).

 
Reply With Quote
 
Mike Scott
Guest
Posts: n/a
 
      01-23-2013
On 21/01/13 21:39, Clint O wrote:
.....
>
> Ok, thanks for the tip and the heads-up. I am running the program as
> root on a NAS, and the files are created by my family, but just as a
> good FYI, are there ways I can protect myself against malicious code?
> Running as root ensures I can read all the files w/o question. I've
> used Safe before, but I'm not sure whether it's necessary or
> appropriate for this application.
>


If I may ask a naive question.... Why are you writing a duplicate-file
finder from scratch when programs such as fdupes already exist and
presumably have such issues already resolved?

fdupes "searches the given path for duplicate files. Such files are
found by comparing file sizes and MD5 signatures, followed by a
byte-by-byte comparison". That last bit is important.

--
Mike Scott (unet2 <at> [deletethis] scottsonline.org.uk)
Harlow Essex England
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Splitting text at whitespace but keeping the whitespace in thereturned list MRAB Python 3 01-26-2010 11:36 PM
Structure using whitespace vs logical whitespace cmdrrickhunter@yaho.com Python 10 12-16-2008 03:51 PM
problem with filenames, Filenames and FILENAMES B.J. HTML 4 04-23-2005 08:13 PM
Whitespace where I don't want whitespace! Oli Filth HTML 9 01-17-2005 08:47 PM
How to display images embedded in e-mail as embedded, not attachments Jim Firefox 4 12-11-2004 05:36 AM



Advertisments