Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   Which characters are really unsafe to use in Linux filenames (from Perl)? (http://www.velocityreviews.com/forums/t886167-which-characters-are-really-unsafe-to-use-in-linux-filenames-from-perl.html)

Craig Manley 04-27-2004 04:19 PM

Which characters are really unsafe to use in Linux filenames (from Perl)?
 
Hi,

From testing (using Perl + Slackware Linux) I've found that the only
characters I can't use in a directory/file name are the 0 byte and path
seperator /. Below is my test script and function that makes tainted strings
safe to use as directory/file names. Because a mistake or misassumption here
can open a huge security hole I'ld like to know if this is really correct in
the opinions of others and if this idea is valid for all *nix variants. My
goal is to create a filename validator for html form uploaded file names
that is as unrestrictive as possible (yet safe).

Another question for those of you know much about MSWin32: which characters
can't be used in a MSWin32 directory/filename (I think it's much more than
Linux)?

Another question: are these single byte character file systems?

-Craig Manley.

#!/usr/bin/perl -w
use strict;
use bytes;

sub safe {
my $s = shift;
# replace path seperators
$s =~ s|/|_|g;
# replace 0 bytes.
$s =~ s|\000|_|g;
# keep length <= 255 characters
return substr($s,0,255);
}

my $backslash = '\\';

# these all work
#mkdir('hoi' . $backslash . 'nbla') || warn $!;
#mkdir('hoi..bla') || warn $!;
#mkdir('hoi' . $backslash . 'bla') || warn $!;
#mkdir('..hoi') || warn $!;

# these don't work
#mkdir($backslash . '/hoi') || warn $!;
#mkdir($backslash . '../hoi') || warn $!;

# try all possible bytes
my %chars;
for (my $i = 0; $i <= 255; $i++) {
$chars{$i} = chr($i);
}
my $s = join('',sort values(%chars));

if (mkdir(safe($s))) {
my $h;
opendir($h,'.');
my @entries = grep(/.{20,}/,readdir($h));
closedir($h);
open($h, '>t.bin') or die $!;
binmode $h;
print $h join("\n\n",@entries);
close($h);
}
else {
warn($!);
}



Juha Laiho 04-27-2004 06:27 PM

Re: Which characters are really unsafe to use in Linux filenames (from Perl)?
 
"Craig Manley" <recycle@bin.com> said:
>From testing (using Perl + Slackware Linux) I've found that the only
>characters I can't use in a directory/file name are the 0 byte and path
>seperator /.


Correct, in the strictly technical sense. The reason to forbid '/' is that
that is the directory separator character - and thus the ability to use
it would make path names ambiguous -- f.ex. is "/tmp/x" a file named "tmp/x"
in the root directory, or a file named "x" in the /tmp diretory. The reason
to forbid \0 comes from the use of it as the string terminator in C language.
No such reasons exist for any of the other possible byte values, so they
are allowed.

Words of warning, though; there are tools that have problems properly
understanding anything except US-ASCII (so, byte values from 32 to 127
inclusive), and even within this range there are some characters that
I'd consider ill-advised. Space (32) is perhaps the hardest one; there
are tools that emit/expect lists of file names using whitespace as the
separator, and for them whitespace within a file name is a problem that
cannot be overcome. The most commonly seen pair of such tools are "find"
and "xargs" ("cpio" being yet another tool having this problem). There
are implementations (GNU) of these tools that have workarounds for this
problem, but the workarounds are not generally applicable (as the
availability of the GNU tools cannot be universally assumed).

Other characters that I would consider problematic are
!, ", ', `, *, ?, $, {, }, [, ], (, ), ~, <, >, |, #, & and \,
as these have special meanings in various shells and tools.

So, complementing this would leave characters
-, _, ,, ., ;, :, ^, =, +, %, 0-9, a-z and A-Z as the safe ones.
"-" and "." are ill-advised as the first characters in a file name.

Then, lately I've heard some reports of XFS filesystem on Linux having
trouble coping with UTF-8 byte sequences; it apparently is trying to
do something smart with non-US-ASCII file names and failing miserably.

>Another question: are these single byte character file systems?


Unix filesystems tend to be; until I heard about the XFS issues, I had
assumed all Unix filesystems to be purely byte-oriented.

What you might do to allow "any" character in names at application level,
though, is to encode known problematic characters -- something like URL
encoding (%xx where xx is the two-digit hexadecimal value for the
character) should be usable -- just remember that when using this, %
becomes an unsafe character, so it needs to be encoded, too).
--
Wolf a.k.a. Juha Laiho Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)

Tom 04-27-2004 06:43 PM

Re: Which characters are really unsafe to use in Linux filenames(from Perl)?
 
Craig Manley wrote...
<>
> Another question for those of you know much about MSWin32: which
> characters can't be used in a MSWin32 directory/filename (I think
> it's much more than Linux)?



\/:*?"<>|


you shouldn't use a leading space or a leading dot
you also need to avoid reserved names like:
COM
LPT1
PRN
AUX

....etc.

http://support.microsoft.com/default...b;EN-US;120716




Randal L. Schwartz 04-27-2004 09:57 PM

Re: Which characters are really unsafe to use in Linux filenames (from Perl)?
 
>>>>> "Craig" == Craig Manley <recycle@bin.com> writes:

Craig> From testing (using Perl + Slackware Linux) I've found that the only
Craig> characters I can't use in a directory/file name are the 0 byte and path
Craig> seperator /.

This has been true for every version of Unix I've used since 1977.
Can't say what it was before Unix V6 though... didn't get to use
those. :)

Gets fun when you permit \n in a filename. Lots of programs
don't expect that, and break. But those are broken programs, I say.
Not a broken filename.

print "Just another Perl hacker,"; # the first!
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Bryan Castillo 04-28-2004 10:32 PM

Re: Which characters are really unsafe to use in Linux filenames (from Perl)?
 
merlyn@stonehenge.com (Randal L. Schwartz) wrote in message news:<f0a0aaf57b25e959de4fdc159c6c75f2@news.terane ws.com>...
> >>>>> "Craig" == Craig Manley <recycle@bin.com> writes:

>
> Craig> From testing (using Perl + Slackware Linux) I've found that the only
> Craig> characters I can't use in a directory/file name are the 0 byte and path
> Craig> seperator /.
>
> This has been true for every version of Unix I've used since 1977.
> Can't say what it was before Unix V6 though... didn't get to use
> those. :)
>
> Gets fun when you permit \n in a filename. Lots of programs
> don't expect that, and break. But those are broken programs, I say.
> Not a broken filename.
>


I like putting "\x07" in file names. Its great when listing a
directory makes noise.

> print "Just another Perl hacker,"; # the first!



All times are GMT. The time now is 03:59 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.