Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Splitting a filename

Reply
Thread Tools

Splitting a filename

 
 
Noel Sant
Guest
Posts: n/a
 
      03-07-2007
I want to split a filename into the name itself and the extension. This:

($name, $extension) = split /\./, $input_file;

works fine, providing there's only one dot in the filename, but if there are
more I just get the first two bits of name. I really want to get the last
bit into $extension and all the rest, including dots, into $name.

I suppose I could use an array on the left-hand side, find out how many
element there are and just build up $name from all the arrays bar the last,
but this seems long-winded. I tried using "split /\.$/, ..." but then I got
evrything in $name, and $extension was undefined. As though it's just
looking at the end of the string and saying "Nope! no dot there" and not
going any further back. Obviously I don't understand what $ does.

How do I say "just match on the last dot", please?


 
Reply With Quote
 
 
 
 
DJ Stunks
Guest
Posts: n/a
 
      03-07-2007
On Mar 7, 10:10 am, "Noel Sant" <(E-Mail Removed)> wrote:
> I want to split a filename into the name itself and the extension


perldoc File::Basename

-jp

 
Reply With Quote
 
 
 
 
Paul Lalli
Guest
Posts: n/a
 
      03-07-2007
On Mar 7, 1:10 pm, "Noel Sant" <(E-Mail Removed)> wrote:
> I want to split a filename into the name itself and the extension. This:
>
> ($name, $extension) = split /\./, $input_file;
>
> works fine, providing there's only one dot in the filename, but if there are
> more I just get the first two bits of name. I really want to get the last
> bit into $extension and all the rest, including dots, into $name.
>
> I suppose I could use an array on the left-hand side, find out how many
> element there are and just build up $name from all the arrays bar the last,
> but this seems long-winded. I tried using "split /\.$/, ..." but then I got
> evrything in $name, and $extension was undefined. As though it's just
> looking at the end of the string and saying "Nope! no dot there" and not
> going any further back. Obviously I don't understand what $ does.
>
> How do I say "just match on the last dot", please?


The answer to your actual question is to only split on a dot that's
not followed by anything which includes dots:

my ($name, $ext) = split /\.(?!.*\.)/, $file;
(read about lookaheads in `perldoc perlre`)

However, the correct answer to the problem you're actually trying to
solve is to stop reinventing the wheel:
perldoc File::Basename

Paul Lalli

 
Reply With Quote
 
Mirco Wahab
Guest
Posts: n/a
 
      03-07-2007
Noel Sant wrote:
> I want to split a filename into the name itself and the extension. This:
>
> ($name, $extension) = split /\./, $input_file;
>
> works fine, providing there's only one dot in the filename, but if there are
> more I just get the first two bits of name. I really want to get the last
> bit into $extension and all the rest, including dots, into $name.
>
> I suppose I could use an array on the left-hand side, find out how many
> element there are and just build up $name from all the arrays bar the last,
> but this seems long-winded. I tried using "split /\.$/, ..." but then I got
> evrything in $name, and $extension was undefined. As though it's just
> looking at the end of the string and saying "Nope! no dot there" and not
> going any further back. Obviously I don't understand what $ does.
>
> How do I say "just match on the last dot", please?


There's a module for it, as Paul and DJS said,
so try to use it if possible.

But, for "learning purpose", you could have
splitted simple filenames by simple regular
expressions.

In case we have the 4 "splendid variants", like:

my @names = qw'
fi.le.ext
file.file.
file
.ext
';

(note the dots). Then we can "split them apart"
by, eg.

my $rg = qr/ (.*) \. (.*) $ | (.*) /x;


In this case, the 'filename component' is in $1 or in $3
($3 => if no dot at all was there), so lets extract that:

print
map +($_->[0] || $_->[2] || '(undef)') ."\t". ($_->[1] || '(undef)') ."\n",
map [ /$rg/g ],
@names;

The second (short) map expression applies the
regular expression and converts ($1,$2,$3) to
a list (reference).
The "complicated looking" first map expression
only serves the purpose to give some fancy
output, in our case:

fi.le ext
file.file (undef)
file (undef)
(undef) ext

we want to see which parts are 'defined'
and which are not.

Regards

M.
 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      03-08-2007
>>>>> "PV" == Petr Vileta <(E-Mail Removed)> writes:

PV> I use my own function. Maybe stupid but 100% functional

very stupid!

PV> sub parsename {
PV> my $filename = shift;
PV> my ($name, $ext) = ('') x2;

why the initialization of both? $name is ALWAYS set to something below.

PV> if($filename =~ m/\./) {$filename =~ s/^(.*)(\..*)$/$1$2/; $name =

are you allowing a empty file name with just a .suffix? what about a
name with a dot with no suffix. ?

PV> $1;

why the destruction and rebuilding of $filename with s///? $filename is
never used again inside that block. you can use the m// to get the same
$1 and $2 as the s///.

PV> $ext = $2; }

your $ext always has the . which is not typical when breaking up a
filename and extension.

why save $1 and $2 when you can just return them?

PV> else {$name = $filename;}

your indenting is either very bad or your usenet program ruined it.

PV> return ($name, $ext);
PV> }

this is almost just (untested) this one line sub:

return $_[0] =~ /^(.*)(\..*)$/ ;

other than making sure both parts are '' if not matched. that could be
fixed easily too.

stick with the module.

uri

--
Uri Guttman ------ http://www.velocityreviews.com/forums/(E-Mail Removed) -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
 
Reply With Quote
 
Noel Sant
Guest
Posts: n/a
 
      03-08-2007
Wow!

As you say, I'll stop trying to re-invent the wheel, and use fileparse.

In answer to the query, I do use strict in programs, but not for that
example.

Many thanks.


 
Reply With Quote
 
anno4000@radom.zrz.tu-berlin.de
Guest
Posts: n/a
 
      03-08-2007
Michele Dondi <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> On Thu, 8 Mar 2007 01:19:54 +0100, "Petr Vileta"
> <(E-Mail Removed)> wrote:


[...]

> Just a rewrite of your code with the same semantics but a saner
> syntax:
>
> sub parsename {
> local $_ = shift;
> return($_, '') unless /\./;
> /^(.*)(\..*)$/;
> }


As a side note, I'd avoid localizing $_ if possible. There's a bug
lurking that bites when $_ is aliased to a value in a tied hash (or
something exotic like that). Ask Brian McCauley about it, he's
the resident expert on that bug

Sometimes a one-shot "for" can do the trick:

sub parsename {
for ( shift ) {
return($_, '') unless /\./;
return /^(.*)(\..*)$/;
}
}

Anno
 
Reply With Quote
 
anno4000@radom.zrz.tu-berlin.de
Guest
Posts: n/a
 
      03-10-2007
Michele Dondi <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> On Thu, 08 Mar 2007 21:46:42 +0100, Michele Dondi
> <(E-Mail Removed)> wrote:
>
> >> sub parsename {
> >> for ( shift ) {
> >> return($_, '') unless /\./;
> >> return /^(.*)(\..*)$/;
> >> }
> >> }

> >
> >Oh, I'm a big fan of one shot C<for>s. But then I heard about lexical

>
> And of course that could even be cast in the form of a single
> statement:
>
> sub parsename { return /\./ ? /^(.*)(\..*)$/ : $_, '' for shift }
>
> Although just as obviously I would call that an *abuse*.
>


It's not entirely equivalent, however. The sub body is parsed as

return ( /\./ ? /^(.*)(\..*)$/ : $_), '' for shift;

so it returns a spurious third value (an empty string) when an extension
is present. This would do:

return /\./ ? /^(.*)(\..*)$/ : ( $_, '') for shift;

It's probably better to re-write the regex to capture the right stuff
in both cases. Also, it shouldn't include the "." in the extension,
but that's secondary.

[an embarrassing amount of time passes]

This isn't as easy as I thought. I haven't found a single regex
that captures first the name, and then the extension or an empty
string if there is none, in all cases. I'll leave the solution as
an exercise.

It is possible to use two regexes only one of which ever matches,
but that's no improvement.

sub parsename { return ( /(.*)\.(.*)/, /^([^.]*)()$/) for shift }

Anno


 
Reply With Quote
 
Michele Dondi
Guest
Posts: n/a
 
      03-10-2007
On 10 Mar 2007 00:02:42 GMT, http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de wrote:

>It's not entirely equivalent, however. The sub body is parsed as
>
> return ( /\./ ? /^(.*)(\..*)$/ : $_), '' for shift;


That's what I thought, too. Then before risking of being too ashamed
after posting, I did some tests *before*: funnily enough I checked
with

my ($name, $ext) = parsename $_;

and it worked as expected, just by throwing away spurious '', thus
somewhat to my own surprise I wrongly concluded that it did the right
altogether. Had I checked with -MO=Deparse,-p I would have known
better.


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to get 8.3 format filename from long filename jacobyv@sis.unibe.ch Java 1 06-15-2006 10:39 AM
Extract filename from a filename typed by user =?Utf-8?B?Sm9l?= ASP .Net 1 08-23-2004 11:29 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? John Dibling C++ 0 07-19-2003 04:41 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? Mark C++ 0 07-19-2003 04:24 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? John Ericson C++ 0 07-19-2003 04:03 PM



Advertisments