Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Parse a filename (this SHOULD be easy, right?)

Reply
Thread Tools

Parse a filename (this SHOULD be easy, right?)

 
 
usenet@DavidFilmer.com
Guest
Posts: n/a
 
      12-15-2005
This sounds easy, but it has puzzled me. I want to parse a filename
into "path" , "basename", and "suffix" where terms are defined thus:
PATH - everything up to the last (or only) forward-slash.
SUFFIX - anything after the last (or only) dot
BASENAME - all between the path and the dot before the suffix.
Disgard the trailing slash in the path and the dot before the suffix.
It may be assumed that the parser will only process names of plain
files (so '/foo' is a file named 'foo' in '/', not a directory).

The parser should work if the filename has no path and/or no suffix
(those values should resolve to undef if not present in the filename).
I don't know what suffixes it might encounter (so using File::Basename
is not so obvious, though I've tried a qr// without much success,
because I can't figure out how to curb the greediness of the
expressions in this context).

I've been playing around with some code within this test framework
using multiple possible styles of filenames:

#!/usr/bin/perl
use strict;
use File::Basename;

foreach my $file(<DATA>) {chomp $file;
#my ($path, $name, $suffix) =
($file =~ m!^(?.*)/)?(.*)(?:\.(.*))?$!);
#my ($name,$path,$suffix) = fileparse($file, qr{\..*});
#Gotta come up with SOMETHING here!!!
printf ("%-19s%-7s%-9s%-4s\n", $file, $path, $name, $suffix);
}

__DATA__
/PATH/NAME.SUFFIX
/foo
/foo.txt
/tmp.xyz/foo.txt
tmp.xyz/foo.bar.txt
/tmp.xyz/foo
tmp/foo.txt
../tmp/foo.bar.txt
/tmp/foo
foo.txt
foo.bar.txt
foo

###### DESIRED OUTPUT #########################

/PATH/NAME.SUFFIX /PATH NAME SUFFIX
/foo / foo
/foo.txt / foo txt
/tmp.xyz/foo.txt /tmp.xyz foo txt
tmp.xyz/foo.bar.txt tmp.xyz foo.bar txt
/tmp.xyz/foo /tmp.zyz foo
/tmp/foo.txt /tmp foo txt
../tmp/foo.bar.txt ./tmp foo.bar txt
/tmp/foo /tmp foo
foo.txt foo txt
foo.bar.txt foo.bar txt
foo foo

I can ALMOST get it to work, but not quite... if I fix one test case, I
break another. I appreciate any suggestions...

--
http://DavidFilmer.com

 
Reply With Quote
 
 
 
 
Big and Blue
Guest
Posts: n/a
 
      12-15-2005
wrote:
>
> This sounds easy, but it has puzzled me.


> foreach my $file(<DATA>) {chomp $file;
> #my ($path, $name, $suffix) =
> ($file =~ m!^(?.*)/)?(.*)(?:\.(.*))?$!);
> #my ($name,$path,$suffix) = fileparse($file, qr{\..*});


my ($name,$path,$suffix) = fileparse("./$file", qr{\.[^.]*});
$path = substr($path, 2);

> #Gotta come up with SOMETHING here!!!
> printf ("%-19s%-7s%-9s%-4s\n", $file, $path, $name, $suffix);
> }


Seems to work. Not the corrected regex (to only match the last
component), the "./" prepended for processing, which is then removed by the
substr.

--
Just because I've written it doesn't mean that
either you or I have to believe it.
 
Reply With Quote
 
 
 
 
usenet@DavidFilmer.com
Guest
Posts: n/a
 
      12-15-2005
wrote:
>>> some code


That works perfectly, thanks. But it does seem like a whole lot of
code to throw at what seems like a fairly simple problem.

If I get close to my screen and take a whiff, it smells like a problem
that needs to be sprayed with a regex. But the only bottle of regex
skills that I have is not very full...

--
http://DavidFilmer.com

 
Reply With Quote
 
attn.steven.kuo@gmail.com
Guest
Posts: n/a
 
      12-15-2005
wrote:
> wrote:
> >>> some code

>
> That works perfectly, thanks. But it does seem like a whole lot of
> code to throw at what seems like a fairly simple problem.
>
> If I get close to my screen and take a whiff, it smells like a problem
> that needs to be sprayed with a regex. But the only bottle of regex
> skills that I have is not very full...
>



Actually, now that I'm looking at the solution posted by
"Big and Blue", I prefer his regular expession
to mine.

--
Regards,
Steven

 
Reply With Quote
 
usenet@DavidFilmer.com
Guest
Posts: n/a
 
      12-15-2005
wrote:
> Actually, now that I'm looking at the solution posted by
> "Big and Blue", I prefer his regular expession to mine.


B&B's solution is good, but it doesn't disgard the trailing slash on
the path or the leading dot on the extension. It could be easily done
with a couple of extra statements, but that also seems to be getting
unweildy for what seems like a simple task.

 
Reply With Quote
 
attn.steven.kuo@gmail.com
Guest
Posts: n/a
 
      12-15-2005
wrote:
> wrote:
> >>> some code

>
> That works perfectly, thanks. But it does seem like a whole lot of
> code to throw at what seems like a fairly simple problem.
>
> If I get close to my screen and take a whiff, it smells like a problem
> that needs to be sprayed with a regex. But the only bottle of regex
> skills that I have is not very full...
>
> --
> http://DavidFilmer.com



If you prefer a solution using
regular expressions, then
how about:


while (<DATA>)
{
chomp;

my @matches = map
defined $_ ? $_ : '',
m#(?.+)/|^(/)|)([^/]+?)(?:\.([^.]*))?$#;

# check for successful match omitted ...

my $path = join '', splice(@matches, 0, 2);
my ($file, $suffix) = @matches;

printf("%-21s%-17s%-9s%-4s\n", $_, $path, $file, $suffix);
}

--
Regards,
Steven

 
Reply With Quote
 
Big and Blue
Guest
Posts: n/a
 
      12-15-2005
wrote:
>
> B&B's solution is good, but it doesn't disgard the trailing slash on
> the path or the leading dot on the extension.


Hmmmm - I'm sure it handled these at some point....

> It could be easily done
> with a couple of extra statements, but that also seems to be getting
> unweildy for what seems like a simple task.


It becomes 4 lines.

my ($name,$path,$suffix) = fileparse("//$file", qr{\.[^.]*});
$suffix = substr($suffix || ' ', 1);
$path = substr($path, 2);
do {local $/='/'; chomp $path unless ($path eq '/')};

The oddity on the substr of the suffix is to avoid warnings when there
isn't one.

--
Just because I've written it doesn't mean that
either you or I have to believe it.
 
Reply With Quote
 
Ilya Zakharevich
Guest
Posts: n/a
 
      12-16-2005
[A complimentary Cc of this posting was sent to

<>], who wrote in article < .com>:
> wrote:
> >>> some code

>
> That works perfectly, thanks. But it does seem like a whole lot of
> code to throw at what seems like a fairly simple problem.


This puzzles me too - for more than 10 years now... I have no idea
why File::Basename has so lousy an API.

On the other hand, it it were critical, somebody would have fixed
it...

Hope this helps,
Ilya
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
The filename set in the response.setHeader("Content-Disposition", "attachment; filename=test.csv") is being ignored! Ed Java 10 07-13-2010 12:43 PM
Re: filename.gif or filename.gif.jpg? Beauregard T. Shagnasty HTML 1 05-30-2008 01:23 PM
Stitch rar files ( filename.part01 and filename.part02) Please help ixgor Software 1 10-15-2006 02:33 AM
how to get 8.3 format filename from long filename jacobyv@sis.unibe.ch Java 1 06-15-2006 10:39 AM
Extract filename from a filename typed by user =?Utf-8?B?Sm9l?= ASP .Net 1 08-23-2004 11:29 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57