Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Perl RegExp question

Reply
Thread Tools

Perl RegExp question

 
 
Keith
Guest
Posts: n/a
 
      04-19-2011
All:
I am having a problem with Perl's regular expressions.

I am trying to change this --
King of the Forest Rangers S.O.S. Ranger Chapter 9.AVI

To this --
King of the Forest Rangers Ch 9 S.O.S. Ranger.avi

I tried to use this regexp but it didn't work as expected --
s/^(.*)(\s*)(.*)(\s*)(Chapter)(\s*)(\d*).AVI/$1 Ch $7 $3.avi/g

All I got was this --
King of the Forest Rangers S.O.S. Ranger Ch 9.avi

How do I make a Perl regexp that works with any number of strings at the beginning (the title of the file),
any number of strings in the middle (the title of that file's episode) and the Chapter number?


 
Reply With Quote
 
 
 
 
ccc31807
Guest
Posts: n/a
 
      04-19-2011
On Apr 19, 8:04*am, Keith <(E-Mail Removed)> wrote:
> I am trying to change this --
> King of the Forest Rangers S.O.S. Ranger *Chapter 9.AVI
>
> To this --
> King of the Forest Rangers Ch 9 S.O.S. Ranger.avi


The Perl mavens will tar and feather me for saying this, but I'll say
it anyway. And BTW, this is true not only for Perl but also for
anything that uses REs (like vi, for example).

Build up your RE piece by piece. This is somewhat difficult to do
without a top loop (like Lisp) but you can still do it. Match your
original term by term, using the $1, $2, etc., variables to see what
you get. Also, the prematch and postmatch variables ($', $`, and kin)
variables are eyeopening as well. Look at perlvar.

My strategy, which isn't the best by any means, is to start from the
front and match token by token, until I can capture everything I want
in the numbered variables, and then the job is as good as done. I find
it much harder to compose a RE in one stroke -- it's better to have
one that does ten percent of what it's supposed to do than to have one
that does 100 percent of nothing.

CC.
 
Reply With Quote
 
 
 
 
Keith
Guest
Posts: n/a
 
      04-19-2011
CC:
But how does one match a title that in one example starts off with "King of the Royal Mounties" and
another that has as it's title "King's Row"? Is there a global RE that takes care of such titles or are each
controlled by the number of words in each file's title?


Keith
 
Reply With Quote
 
ccc31807
Guest
Posts: n/a
 
      04-19-2011
On Apr 19, 11:21*am, Keith <(E-Mail Removed)> wrote:
> * But how does one match a title that in one example starts off with "King of the Royal Mounties" and
> another that has as it's title "King's Row"? *Is there a global RE thattakes care of such titles or are each
> controlled by the number of words in each file's title?


That's not what you have. You have a sequence of tokens separated by
white space, most composed of alphabetical characters, some of numeric
characters, some both, and some with non-alphanumeric characters. It's
possible to have a non-alphanumeric character in a title, like 'King's
Row'.

You can't match what you don't have, and you don't have a book title.
If you want to consider everything before the literal string 'Chapter'
as the title, you can do that, but that's a characteristic you impose
on the data, not something that's inherent in the data.

Of course, in a case like this one, where you are reading in a mass of
character data, you should normalize the data in some way -- something
that Perl excels at. You should also assume that you will get error
lines, and write those out to an error file.

It might be helpful if you could post several dozen lines of your
input data.

CC.
 
Reply With Quote
 
Keith
Guest
Posts: n/a
 
      04-19-2011
CC:
OK, I understand now. Here are some more examples of input and wanted output.

Title of show -- "King of Royal Mounted" or "King's Row"
Titlle of episode -- "Murderer's Row" or "Saps at Sea"
Number of episode -- 1 through 13

Input Examples:
King of Royal Mounted Murderer's Row Chapter 2.AVI

King's Row Saps at Sea Chapter 11.AVI


Again, how do I make some generic regexp in Perl in order to change the above to the following output?

Output Examples:
King of Royal Mounted Ch 2 Murderer's Row.avi

King's Row Ch 11 Saps at Sea.avi
 
Reply With Quote
 
Ted Zlatanov
Guest
Posts: n/a
 
      04-19-2011
On Tue, 19 Apr 2011 16:46:37 +0000 (UTC) Keith <(E-Mail Removed)> wrote:

K> OK, I understand now. Here are some more examples of input and wanted output.

K> Title of show -- "King of Royal Mounted" or "King's Row"
K> Titlle of episode -- "Murderer's Row" or "Saps at Sea"
K> Number of episode -- 1 through 13

K> Input Examples:
K> King of Royal Mounted Murderer's Row Chapter 2.AVI

K> King's Row Saps at Sea Chapter 11.AVI


K> Again, how do I make some generic regexp in Perl in order to change the above to the following output?

K> Output Examples:
K> King of Royal Mounted Ch 2 Murderer's Row.avi

K> King's Row Ch 11 Saps at Sea.avi

Build a list of possible show names. Untested:

my @shows = ('King of Royal Mounted', "King's Row");
foreach my $show (@shows)
{
# note the i modifier to match AVI, avi, etc.
$name =~ s/^$show\s+(.*)\s+Chapter\s+(\d+).avi/$show Ch $2 $1.avi/i;
}

Ted
 
Reply With Quote
 
Keith
Guest
Posts: n/a
 
      04-19-2011
On Tue, 19 Apr 2011 13:03:58 -0500, Ted Zlatanov wrote:

> On Tue, 19 Apr 2011 16:46:37 +0000 (UTC) Keith <(E-Mail Removed)>
> wrote:
>
> K> OK, I understand now. Here are some more examples of input and
> wanted output.
>
> K> Title of show -- "King of Royal Mounted" or "King's Row" K> Titlle
> of episode -- "Murderer's Row" or "Saps at Sea" K> Number of episode
> -- 1 through 13
>
> K> Input Examples:
> K> King of Royal Mounted Murderer's Row Chapter 2.AVI
>
> K> King's Row Saps at Sea Chapter 11.AVI
>
>
> K> Again, how do I make some generic regexp in Perl in order to change
> the above to the following output?
>
> K> Output Examples:
> K> King of Royal Mounted Ch 2 Murderer's Row.avi
>
> K> King's Row Ch 11 Saps at Sea.avi
>
> Build a list of possible show names. Untested:
>
> my @shows = ('King of Royal Mounted', "King's Row"); foreach my $show
> (@shows)
> {
> # note the i modifier to match AVI, avi, etc. $name =~
> s/^$show\s+(.*)\s+Chapter\s+(\d+).avi/$show Ch $2 $1.avi/i;
> }
>
> Ted


Ted:
What if you don't know what the title is exactly in an .avi file? That is, you know that it's the first word(s)
of the file name but nothing more?


Keith Lee
 
Reply With Quote
 
azrazer
Guest
Posts: n/a
 
      04-20-2011
> Ted:
> What if you don't know what the title is exactly in an .avi file? That is, you know that it's the first word(s)
> of the file name but nothing more?
> Keith Lee

hulo,
you may try to build a big database (an array in this case will be
sufficient) containing all the show names (extracted from the internet)....
Otherwise, you cannot expect a computer to be "intelligent" and know
what is a show and what is not... this part of the program must come
from the programmer

best,
azra
 
Reply With Quote
 
Keith
Guest
Posts: n/a
 
      04-20-2011
Azra:
Yes, I have learned about the limitations of Perl RegExp or RegExp in general. Thank you.

Keith
 
Reply With Quote
 
ccc31807
Guest
Posts: n/a
 
      04-20-2011
On Apr 20, 8:44*am, Keith <(E-Mail Removed)> wrote:
> Azra:
> *Yes, I have learned about the limitations of Perl RegExp or RegExp in general. Thank you.
>
> Keith


This isn't a limitation of regular expressions. A regular expression
is a pattern and the regular expression is a pattern matching
languages, like Prolog, for example. In order to use it, you must have
patterns.

To illustrate, HTML is also a 'pattern matching language' in a sense.
Look at examples (a) and (b):
(a)
<html>
<head>
<title>My Home Page</title>
</head>
<body>
<h1>Keith's Home Page</h1>
<p>How do you like it?</p>
</body>
</html>
(b)
My Home Page Keith's Home Page How do you like it>

Now, suppose you had (b)? Would you say that it's valid HTML? Of
course not! The same is true for your data, YOU DON'T HAVE PATTERNS TO
MATCH.

At the risk of being a little insulting (you may have earned it)
you've been slow on the uptake. GIGO. Your input is garbage, so your
output will be garbage.

I'm not saying that your data is invalid necessarily -- I can't
determine that and make no judgment on that point. What I'm saying is
that you do not have valid input to feed to a regular expression to
generate the kind of output that you want.

If you want my advice, I would input the data with some kind of
delimited file format, and then use split() or related to break it
apart.

my $input = q(King of Royal Mounted:Murderer's Row:Chapter 2.AVI);
my ($show, $episode, $avi) = split(/:/, $input);
$avi =~ /(.+).AVI/;
my $chapter = $1;
my $vid = "$episode.AVI";
print qq(show: $show\nepisode: $episode\navi: $avi\nchapter: $chapter
\nvid: $vid\n);

outputs this:
show: King of Royal Mounted
episode: Murderer's Row
avi: Chapter 2.AVI
chapter: Chapter 2
vid: Murderer's Row.AVI

CC.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
new RegExp().test() or just RegExp().test() Matěj Cepl Javascript 3 11-24-2009 02:41 PM
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
Ruby 1.9 - ArgumentError: incompatible encoding regexp match(US-ASCII regexp with ISO-2022-JP string) Mikel Lindsaar Ruby 0 03-31-2008 10:27 AM
Programmatically turning a Regexp into an anchored Regexp Greg Hurrell Ruby 4 02-14-2007 06:56 PM
RegExp.exec() returns null when there is a match - a JavaScript RegExp bug? Uldis Bojars Javascript 2 12-17-2006 09:59 PM



Advertisments