Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Regex substitute w/ match variables

Reply
Thread Tools

Regex substitute w/ match variables

 
 
Gary sCHENK
Guest
Posts: n/a
 
      05-05-2005
I am a self-taught at Perl. I use Perl a few times a year, mostly to
process text files. I'm trying to rename files in a directory. My
skills are quite rudimentary.

The files are currently named like this: SR-01-234-5.jpg
I want to rename them like this: SR-01-234-0005.jpg

I have a couple of thousand of these. I've already written several
several variations of the following script to get them to this stage,
but adding the extra zeros has me stumped. This is the script:
================================================== =============================
#!perl -w

opendir( DH, "d:\\temp" ) or die "couldn't open d:\\temp: $! ";
while ( defined ( my $filename = readdir( DH ) ) ) {
my $foo = $filename;
if ($foo =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/ ) {
if ( length( $2 ) == 1 ) {
$foo =~ s/$1$2$3/$1000$2$3/;
rename( $filename, $foo );
#print "$1\n";
}
}
}
closedir( DH );

================================================== =============================

The print statement is an attempt at debugging. When I comment out the
substitution and the call to rename and just print $1, the output is
what I expect. When I run this script as shown above, however, files
come up missing, or the zeros are added in the wrong place.

Is it possible to use match variables in substitutions? The llama book
shows match variables being used outside of regular expression
operations, but not in this fashion.

And why are the files being deleted? I'm really stumped, and would
appreciate any and all help.

All the best,
Gary Schenk

 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      05-05-2005
"Gary sCHENK" <(E-Mail Removed)> wrote in
news:(E-Mail Removed) oups.com:

> I am a self-taught at Perl. I use Perl a few times a year, mostly to
> process text files. I'm trying to rename files in a directory. My
> skills are quite rudimentary.
>
> The files are currently named like this: SR-01-234-5.jpg
> I want to rename them like this: SR-01-234-0005.jpg
>
> I have a couple of thousand of these. I've already written several
> several variations of the following script to get them to this stage,
> but adding the extra zeros has me stumped. This is the script:


> #!perl -w


use warnings;

is better because it allows you to selectively turn warnings on/off. See

perldoc warnings

> opendir( DH, "d:\\temp" ) or die "couldn't open d:\\temp: $! ";


Good.

> while ( defined ( my $filename = readdir( DH ) ) ) {
> my $foo = $filename;


Completely unnecessary.

> if ($foo =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/ ) {


I think this is better written as:

if ($foo =~ /^(SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg)$/ ) {

> if ( length( $2 ) == 1 ) {
> $foo =~ s/$1$2$3/$1000$2$3/;


sprintf will work very nicely here:

my $new = sprintf "$1%4.4d$3", $2;

> And why are the files being deleted?


From perldoc -f rename:

rename OLDNAME,NEWNAME
Changes the name of a file; an existing file NEWNAME will be
clobbered.

I would suggest skipping the rename if the new name is the same as the
old name.

Also, note perldoc -f readdir:

If you're planning to filetest the return values out of a
"readdir", you'd better prepend the directory in question.
Otherwise, because we didn't "chdir" there, it would have been
testing the wrong file.

So, you should either chdir to the working directory, or prepend the
directory name to each file name.

Putting all of this together, here is a revised version of your script:

#! /usr/bin/perl

use strict;
use warnings;

use File::Spec::Functions 'catfile';

my $dir = shift || $ENV{TMP};

opendir my $dh, $dir
or die "Error opening directory $dir: $! ";

while( my $old = readdir $dh ) {
if ($old =~ /^(SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg)$/ ) {
my $new = sprintf "$1%4.4d$3", $2;

if($new eq $old) {
print "Skipping $old\n";
next;
}

$old = catfile $dir, $old;
$new = catfile $dir, $new;

print "$old => $new\n";

# rename $old, new
# or warn "Error renaming $old to $new: $!";
}
}

closedir $dh or die "Error closing directory $dir: $!";

Sinan


--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
 
 
 
Anno Siegel
Guest
Posts: n/a
 
      05-05-2005
Gary sCHENK <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> I am a self-taught at Perl. I use Perl a few times a year, mostly to
> process text files. I'm trying to rename files in a directory. My
> skills are quite rudimentary.
>
> The files are currently named like this: SR-01-234-5.jpg
> I want to rename them like this: SR-01-234-0005.jpg
>
> I have a couple of thousand of these. I've already written several
> several variations of the following script to get them to this stage,
> but adding the extra zeros has me stumped. This is the script:
> ================================================== =============================
> #!perl -w


Why not strict? Your program seems to be written for it.

> opendir( DH, "d:\\temp" ) or die "couldn't open d:\\temp: $! ";
> while ( defined ( my $filename = readdir( DH ) ) ) {
> my $foo = $filename;
> if ($foo =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/ ) {


Your regex is fine though slightly more general than your example. However,
substitution with s/// isn't always the best way to turn a string into
another. For formatting numbers, there is sprintf.

> if ( length( $2 ) == 1 ) {
> $foo =~ s/$1$2$3/$1000$2$3/;
> rename( $filename, $foo );
> #print "$1\n";
> }
> }
> }
> closedir( DH );
>
> ================================================== =============================
>
> The print statement is an attempt at debugging. When I comment out the
> substitution and the call to rename and just print $1, the output is
> what I expect. When I run this script as shown above, however, files
> come up missing, or the zeros are added in the wrong place.


So why didn't you print out $foo for debugging? That way you'd have known
what you are trying to rename your files to. You are probably renaming
many files all to the same name. That's the same as deleting all but one
of them.

> Is it possible to use match variables in substitutions? The llama book
> shows match variables being used outside of regular expression
> operations, but not in this fashion.


It's using them inside *another* regex that's problematic. Every regex
evaluation resets them. You can assign the matches to named variables
that don't have that problem (see below).

Here's how I would do it (your regex is unchanged):

my $filename = 'SR-01-234-5.jpg';
my ( $pre, $num, $suf) =
$filename =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/;
my $foo = sprintf "%s%04d%s", $pre, $num, $suf;
print "$filename -> $foo\n";

Anno
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      05-05-2005
A. Sinan Unur <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> "Gary sCHENK" <(E-Mail Removed)> wrote in
> news:(E-Mail Removed) oups.com:


[Good advice]

> if ($old =~ /^(SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg)$/ ) {
> my $new = sprintf "$1%4.4d$3", $2;


Just one note. It is generally a bad idea to put variable strings into
a sprintf format. They could decide to contain a "%" one day. I realize
the regex doesn't allow this in this case, but on principle I'd do

sprintf '%s%4.4d%s', $1, $2, $3;

Anno
 
Reply With Quote
 
Damian James
Guest
Posts: n/a
 
      05-05-2005
On 5 May 2005 14:06:26 -0700, Gary sCHENK said:
> #!perl -w
>
> opendir( DH, "d:\\temp" ) or die "couldn't open d:\\temp: $! ";
> while ( defined ( my $filename = readdir( DH ) ) ) {
> my $foo = $filename;
> if ($foo =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/ ) {
> if ( length( $2 ) == 1 ) {
> $foo =~ s/$1$2$3/$1000$2$3/;
> rename( $filename, $foo );
> #print "$1\n";
> }
> }
> }
> closedir( DH );
> ...
> Is it possible to use match variables in substitutions? The llama book
> shows match variables being used outside of regular expression
> operations, but not in this fashion.


That substitution in the inner loop is doing rather differently than
what you appear to be expecting. Looking at it...

$foo =~ s/$1$2$3/$1000$2$3/;

First, the pattern you are matching will be the contents of the
matched strings from the previous pattern, not the pattern itself,
and NOT including hte parentheses. So taking those strings, concatenated
together, as a pattern, you are not in fact assigning anything to $1, $2 and
$3 the second time. This does mean that they retain their previous values.
The string you are substituting however, starts with the variable $1000,
which is not populated. Doing "${1}000" instead should help, but I don't
understand why you are using a substitution here at all. Why not just
assign the result?

Have you tried printing $foo? Try replacing the substitution with:

$foo = "${1}000$2$3";

> And why are the files being deleted? I'm really stumped, and would
> appreciate any and all help.


Well, $1000 is empty, thus "5a.jpg" or something like it
has been the resulting string several times, so you're renaming
multiple files to the same name? Couldn't say for sure without
seeing your directory listing.

NB, if I were doing this I'd probably have used glob() rather
than opendir(). Also, perl even on win32 can understand normal
slashes, so there's no need to the double-backwhacks. I'd still
only put the path in once, though:

my $path = 'd:/temp';
my @files = glob( "$path/SR*.jpg" );

....or somesuch

Hope this helps
--damian


 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      05-05-2005
http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de (Anno Siegel) wrote in
news:d5e4ub$j6s$(E-Mail Removed)-Berlin.DE:

> A. Sinan Unur <(E-Mail Removed)> wrote in comp.lang.perl.misc:
>> "Gary sCHENK" <(E-Mail Removed)> wrote in
>> news:(E-Mail Removed) oups.com:

>
> [Good advice]
>
>> if ($old =~ /^(SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg)$/ ) {
>> my $new = sprintf "$1%4.4d$3", $2;

>
> Just one note. It is generally a bad idea to put variable strings
> into a sprintf format. They could decide to contain a "%" one day. I
> realize the regex doesn't allow this in this case, but on principle
> I'd do
>
> sprintf '%s%4.4d%s', $1, $2, $3;


Definitely, that was on my list of things to add, but forgot. Thanks for
catching it.

Sinan.

--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      05-05-2005
"A. Sinan Unur" <(E-Mail Removed)> wrote in
news:Xns964DB3D428E50asu1cornelledu@127.0.0.1:

Important correction:

> while( my $old = readdir $dh ) {


I edited out the crucial test for defined when I was changing things. This
line should have been, as it was in the original post,

while( defined (my $old = readdir $dh) ) {

Sorry.

Sinan.
--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
Damian James
Guest
Posts: n/a
 
      05-05-2005
On 5 May 2005 21:49:17 GMT, Anno Siegel said:
> ...
> It's using them inside *another* regex that's problematic. Every regex
> evaluation resets them. You can assign the matches to named variables
> that don't have that problem (see below).


Reset? My understanding was, previous matches are retained (which makes
what the OP was trying to do more confusing, beacuse sometimes it may
have succeeded). From perlre:

NOTE: failed matches in Perl do not reset the match variables, which
makes easier to write code that tests for a series of more specific
cases and remembers the best match.

> Here's how I would do it (your regex is unchanged):
>
> my $filename = 'SR-01-234-5.jpg';
> my ( $pre, $num, $suf) =
> $filename =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/;
> my $foo = sprintf "%s%04d%s", $pre, $num, $suf;
> print "$filename -> $foo\n";


Indeed.

--damian
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      05-05-2005
Damian James <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> On 5 May 2005 21:49:17 GMT, Anno Siegel said:
> > ...
> > It's using them inside *another* regex that's problematic. Every regex
> > evaluation resets them. You can assign the matches to named variables
> > that don't have that problem (see below).

>
> Reset? My understanding was, previous matches are retained (which makes
> what the OP was trying to do more confusing, beacuse sometimes it may
> have succeeded). From perlre:
>
> NOTE: failed matches in Perl do not reset the match variables, which
> makes easier to write code that tests for a series of more specific
> cases and remembers the best match.


Yes, *failed* matches retain the values. A successful match resets
them (even if it doesn't capture anything itself). Since the pattern
/$1$2$3/ would match the original string ("." matching itself), at
the time of substitution $1, $2 and $3 would be undefined.

my $filename = 'SR-01-234-5.jpg';
$filename =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/;
{
no warnings 'uninitialized';
$filename =~ s/$1$2$3/$1$2$3/;
}
print "*$filename*\n";

Anno
 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      05-06-2005
A. Sinan Unur <(E-Mail Removed)> wrote:
> "A. Sinan Unur" <(E-Mail Removed)> wrote in
> news:Xns964DB3D428E50asu1cornelledu@127.0.0.1:
>
> Important correction:
>
>> while( my $old = readdir $dh ) {

>
> I edited out the crucial test for defined when I was changing things.



It actually isn't crucial at all.


> This
> line should have been, as it was in the original post,
>
> while( defined (my $old = readdir $dh) ) {



perl -MO=Deparse -e 'while( my $old = readdir $dh ) { }'

and

perl -MO=Deparse -e 'while( defined (my $old = readdir $dh) ) { }'

make the same output.


If you leave out the defined(), perl will put it in for you.


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to substitute (regex) single newline (0A) character on Win32 msciwoj Perl Misc 3 07-17-2009 09:13 PM
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
Function to Match & Substitute lukes555@gmail.com Javascript 1 01-10-2006 01:18 PM
Java regex can't match lengthy match? hiwa Java 0 01-29-2004 10:09 AM
Result from a regex substitute Julia deSilva Perl Misc 5 07-29-2003 04:08 PM



Advertisments