Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Which characters are really unsafe to use in Linux filenames (from Perl)?

Reply
Thread Tools

Which characters are really unsafe to use in Linux filenames (from Perl)?

 
 
Craig Manley
Guest
Posts: n/a
 
      04-27-2004
Hi,

From testing (using Perl + Slackware Linux) I've found that the only
characters I can't use in a directory/file name are the 0 byte and path
seperator /. Below is my test script and function that makes tainted strings
safe to use as directory/file names. Because a mistake or misassumption here
can open a huge security hole I'ld like to know if this is really correct in
the opinions of others and if this idea is valid for all *nix variants. My
goal is to create a filename validator for html form uploaded file names
that is as unrestrictive as possible (yet safe).

Another question for those of you know much about MSWin32: which characters
can't be used in a MSWin32 directory/filename (I think it's much more than
Linux)?

Another question: are these single byte character file systems?

-Craig Manley.

#!/usr/bin/perl -w
use strict;
use bytes;

sub safe {
my $s = shift;
# replace path seperators
$s =~ s|/|_|g;
# replace 0 bytes.
$s =~ s|\000|_|g;
# keep length <= 255 characters
return substr($s,0,255);
}

my $backslash = '\\';

# these all work
#mkdir('hoi' . $backslash . 'nbla') || warn $!;
#mkdir('hoi..bla') || warn $!;
#mkdir('hoi' . $backslash . 'bla') || warn $!;
#mkdir('..hoi') || warn $!;

# these don't work
#mkdir($backslash . '/hoi') || warn $!;
#mkdir($backslash . '../hoi') || warn $!;

# try all possible bytes
my %chars;
for (my $i = 0; $i <= 255; $i++) {
$chars{$i} = chr($i);
}
my $s = join('',sort values(%chars));

if (mkdir(safe($s))) {
my $h;
opendir($h,'.');
my @entries = grep(/.{20,}/,readdir($h));
closedir($h);
open($h, '>t.bin') or die $!;
binmode $h;
print $h join("\n\n",@entries);
close($h);
}
else {
warn($!);
}


 
Reply With Quote
 
 
 
 
Juha Laiho
Guest
Posts: n/a
 
      04-27-2004
"Craig Manley" <(E-Mail Removed)> said:
>From testing (using Perl + Slackware Linux) I've found that the only
>characters I can't use in a directory/file name are the 0 byte and path
>seperator /.


Correct, in the strictly technical sense. The reason to forbid '/' is that
that is the directory separator character - and thus the ability to use
it would make path names ambiguous -- f.ex. is "/tmp/x" a file named "tmp/x"
in the root directory, or a file named "x" in the /tmp diretory. The reason
to forbid \0 comes from the use of it as the string terminator in C language.
No such reasons exist for any of the other possible byte values, so they
are allowed.

Words of warning, though; there are tools that have problems properly
understanding anything except US-ASCII (so, byte values from 32 to 127
inclusive), and even within this range there are some characters that
I'd consider ill-advised. Space (32) is perhaps the hardest one; there
are tools that emit/expect lists of file names using whitespace as the
separator, and for them whitespace within a file name is a problem that
cannot be overcome. The most commonly seen pair of such tools are "find"
and "xargs" ("cpio" being yet another tool having this problem). There
are implementations (GNU) of these tools that have workarounds for this
problem, but the workarounds are not generally applicable (as the
availability of the GNU tools cannot be universally assumed).

Other characters that I would consider problematic are
!, ", ', `, *, ?, $, {, }, [, ], (, ), ~, <, >, |, #, & and \,
as these have special meanings in various shells and tools.

So, complementing this would leave characters
-, _, ,, ., ;, :, ^, =, +, %, 0-9, a-z and A-Z as the safe ones.
"-" and "." are ill-advised as the first characters in a file name.

Then, lately I've heard some reports of XFS filesystem on Linux having
trouble coping with UTF-8 byte sequences; it apparently is trying to
do something smart with non-US-ASCII file names and failing miserably.

>Another question: are these single byte character file systems?


Unix filesystems tend to be; until I heard about the XFS issues, I had
assumed all Unix filesystems to be purely byte-oriented.

What you might do to allow "any" character in names at application level,
though, is to encode known problematic characters -- something like URL
encoding (%xx where xx is the two-digit hexadecimal value for the
character) should be usable -- just remember that when using this, %
becomes an unsafe character, so it needs to be encoded, too).
--
Wolf a.k.a. Juha Laiho Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)
 
Reply With Quote
 
 
 
 
Tom
Guest
Posts: n/a
 
      04-27-2004
Craig Manley wrote...
<>
> Another question for those of you know much about MSWin32: which
> characters can't be used in a MSWin32 directory/filename (I think
> it's much more than Linux)?



\/:*?"<>|


you shouldn't use a leading space or a leading dot
you also need to avoid reserved names like:
COM
LPT1
PRN
AUX

....etc.

http://support.microsoft.com/default...b;EN-US;120716



 
Reply With Quote
 
Randal L. Schwartz
Guest
Posts: n/a
 
      04-27-2004
>>>>> "Craig" == Craig Manley <(E-Mail Removed)> writes:

Craig> From testing (using Perl + Slackware Linux) I've found that the only
Craig> characters I can't use in a directory/file name are the 0 byte and path
Craig> seperator /.

This has been true for every version of Unix I've used since 1977.
Can't say what it was before Unix V6 though... didn't get to use
those.

Gets fun when you permit \n in a filename. Lots of programs
don't expect that, and break. But those are broken programs, I say.
Not a broken filename.

print "Just another Perl hacker,"; # the first!
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<(E-Mail Removed)> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
 
Reply With Quote
 
Bryan Castillo
Guest
Posts: n/a
 
      04-28-2004
http://www.velocityreviews.com/forums/(E-Mail Removed) (Randal L. Schwartz) wrote in message news:<(E-Mail Removed) ws.com>...
> >>>>> "Craig" == Craig Manley <(E-Mail Removed)> writes:

>
> Craig> From testing (using Perl + Slackware Linux) I've found that the only
> Craig> characters I can't use in a directory/file name are the 0 byte and path
> Craig> seperator /.
>
> This has been true for every version of Unix I've used since 1977.
> Can't say what it was before Unix V6 though... didn't get to use
> those.
>
> Gets fun when you permit \n in a filename. Lots of programs
> don't expect that, and break. But those are broken programs, I say.
> Not a broken filename.
>


I like putting "\x07" in file names. Its great when listing a
directory makes noise.

> print "Just another Perl hacker,"; # the first!

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is Linux Really Dead On The Desktop? Linus's Own Family Doesn't Use Linux!!!!!!!!!!!!1 linux.freak.detector@gmail.com Computer Support 6 10-21-2007 11:47 AM
visual studio 2005 unsafe code may only appear if compiling with /unsafe rockdale ASP .Net 3 11-03-2006 05:45 PM
How do I parse a string into individual characters? (really simple!) really! Jeannie C++ 15 08-30-2005 08:34 AM
problem with filenames, Filenames and FILENAMES B.J. HTML 4 04-23-2005 08:13 PM
filenames with non ascii characters Nicholas Clarke Java 1 01-15-2004 04:08 AM



Advertisments