Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Elegant equivalent to this regex?

Reply
Thread Tools

Elegant equivalent to this regex?

 
 
sherifffruitfly
Guest
Posts: n/a
 
      01-04-2007
Hi all,

I'm new to regex, and hacked this one together. It seems awfully
redundant to me, but it does have the virtue at least of wearing its
meaning on its sleeve.

The task:

(1) match all quoted-comma'd numbers consisting of either 2 or 3
"sections". That is, critters of either form:

" "ddd,ddd,ddd" "
or
" "ddd,ddd" "

(2) Capture all of the digits, leaving the quotes and commas for the
garbage man. For example:

" "123, 456, 789" "
should in some fashion capture
"123456789"

Here's the regex I came up with:

(?<whole>\"(?<one>\d{1,3}),(?<two>\d{1,3}),(?<thre e>\d{1,3})\"|\"(?<one>\d{1,3}),(?<two>\d{1,3})\")

This works fine for me, and getting the desired complete "clean" number
from it is a
triviality.

But I get the feeling that this is the regex-equivalent of baby-talk.
I'd like to know if there's a simpler, more elegant regex matching the
same class of strings, and capturing essentially the same substrings.


Thanks for any insights,

cdj

 
Reply With Quote
 
 
 
 
usenet@DavidFilmer.com
Guest
Posts: n/a
 
      01-04-2007
sherifffruitfly wrote:
> (2) Capture all of the digits, leaving the quotes and commas for the
> garbage man.


If you just want to strip out the non-numerics why futz with regexps?
Why not just use s/// to get rid of the non-numerics?

my $original = " 123, 456, 789 ' ";
(my $numbers = $original) =~ s/\D//g ;
print $numbers;


--
The best way to get a good answer is to ask a good question.
David Filmer (http://DavidFilmer.com)

 
Reply With Quote
 
 
 
 
sherifffruitfly
Guest
Posts: n/a
 
      01-04-2007

http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> sherifffruitfly wrote:
> > (2) Capture all of the digits, leaving the quotes and commas for the
> > garbage man.

>
> If you just want to strip out the non-numerics why futz with regexps?
> Why not just use s/// to get rid of the non-numerics?
>
> my $original = " 123, 456, 789 ' ";
> (my $numbers = $original) =~ s/\D//g ;
> print $numbers;


Because I want to use regex, please.

If you don't wish to help me with my question, but prefer to answer
only your own instead, that's perfectly fine, of course.

 
Reply With Quote
 
J. Gleixner
Guest
Posts: n/a
 
      01-04-2007
sherifffruitfly wrote:
> (E-Mail Removed) wrote:
>> sherifffruitfly wrote:
>>> (2) Capture all of the digits, leaving the quotes and commas for the
>>> garbage man.

>> If you just want to strip out the non-numerics why futz with regexps?
>> Why not just use s/// to get rid of the non-numerics?
>>
>> my $original = " 123, 456, 789 ' ";
>> (my $numbers = $original) =~ s/\D//g ;
>> print $numbers;

>
> Because I want to use regex, please.
>
> If you don't wish to help me with my question, but prefer to answer
> only your own instead, that's perfectly fine, of course.


Provide examples of your data, expected results, and what you've
tried. If you post a short example with all of those, it'll
be much easier to help.

The requirements and regular expression you posted don't coincide.
 
Reply With Quote
 
sherifffruitfly
Guest
Posts: n/a
 
      01-04-2007
J. Gleixner wrote:

> Provide examples of your data, expected results, and what you've
> tried. If you post a short example with all of those, it'll
> be much easier to help.


Just in case I typo'd my regex, here it is again, copy/pasted straight
from Expresso (regex testing/analysis tool):

\"(?<one>\d{1,3}),(?<two>\d{1,3}),(?<three>\d{1,3} )\"|\"(?<one>\d{1,3}),(?<two>\d{1,3})\"

This is intended for use against csv files - there may be stuff in the
regex that exploits this assumption - I forget.

Sample text:

Oct
2005,6.02,211.9,"1,573,958",31.9,"135,191",722.867 6,67.3,19.1,18.1,19.2,18.4,18.4,,
Nov
2005,6.02,212.8,"1,573,958",32.2,"135,191",722.867 6,67.3,19.2,18.2,19.2,15.7,15.7,,
Dec
2005,6.02,213.6,"1,570,805",32.5,"136,005",723.228 ,66.2,19.2,18.2,19.3,13.3,13.3,,
Jan
2006,6.02,215.6,"1,573,483",32.9,"137,032",723.36, 67.1,18.9,17.9,19.2,9.9,9.9,,
Feb
2006,6.02,216.7,"1,577,319",33.2,"137,413",723.416 5,67.0,18.8,17.9,19.2,10.3,10.3,,
Mar
2006,6.02,217.2,"1,579,222",33.5,"137,519",723.560 6,66.9,18.6,17.6,19.2,12.9,12.9,,
Apr
2006,6.02,218.6,"1,579,587",33.8,"138,393",723.63, 66.8,18.5,17.5,19.2,10.8,10.8,,
May
2006,6.02,218.9,"1,578,357",34.2,"138,669",723.687 ,66.8,18.4,17.3,19.3,12.0,12.0,,
Jun
2006,6.02,218.5,"1,572,273",34.5,"138,963",725.039 9,66.7,18.4,17.2,19.3,11.8,11.8,,
Jul
2006,6.03,218.0,"1,563,849",35.1,"139,379",725.136 4,66.6,18.3,17.2,19.3,10.1,10.1,,
Aug
2006,6.03,217.3,"1,557,949",35.4,"139,467",725.205 ,66.5,18.4,17.1,19.4,11.2,11.2,,
Sep
2006,6.04,216.7,"1,549,354",35.8,"139,867",725.318 2,66.3,18.4,17.2,19.4,9.5,9.5,,
Oct
2006,6.04,215.6,"1,541,800",35.8,"139,826",725.385 5,66.2,18.5,17.2,19.4,10.6,10.6,,

Representative sample of matches/captures:

match <whole> <one> <two>
<three>
"1,573,958" "1,573,958" 1 573
958
"135,191" "135,191" 135 191
(empty)
etc.

In my *ideal* regex, there would be just 1 capture from a given match:
the concatenation of the numerically named capture group values
(1573958 or 135191, in the above). Achieving th effect of this ideal
result is trivial from what my regex *does* provide, however.


> The requirements and regular expression you posted don't coincide.


I included "qualifier-words" (e.g., "essentially") in my OP that were
intended to make your statement false. I may have failed. Also I didn't
explicitly state an obvious limitation of my regex: that it only
"works" for quoted-comma'd numbers consisting of either 2 or 3 blocks
(e.g., it fails for numbers in the billions); that suffices for my
needs

A regex that satisfies my ideal situation would be great. But if that's
not possible, simply a more elegant more-or-less-equivalent of my own
would be greatly appreciated. There are many possible ways to
de-pretty-print - I just want an elegant one.

Does that help?

Thanks for responding,

cdj

 
Reply With Quote
 
sherifffruitfly
Guest
Posts: n/a
 
      01-04-2007
Mirco Wahab wrote:

> You didn't specify how *exact* is your matching requirement,
> eg. if you have data like this:


Yah - the reason I left it vague is because what I *want* is best
described in English as
"de-pretty-printing-numerical-csv-file-entries-that-shouldn't-have-been-pretty-printed-in-the-first-place".

As I expect there to be many ways to skin that particular cat, I didn't
want to unnecessarily lock-in one particular approach or whatever. Does
that make sense?

Thanks,

cdj

 
Reply With Quote
 
Mumia W. (on aioe)
Guest
Posts: n/a
 
      01-04-2007
On 01/04/2007 03:47 PM, sherifffruitfly wrote:
> [...]
> (2) Capture all of the digits, leaving the quotes and commas for the
> garbage man. For example:
>
> " "123, 456, 789" "
> should in some fashion capture
> "123456789"
> [...]


my $string = "123, 456, 789";
my $num = join('',$string =~ /\d+/g);
print "num = $num\n";


--
(E-Mail Removed)
Windows Vista and your freedom in conflict:
http://www.badvista.org/
 
Reply With Quote
 
J. Gleixner
Guest
Posts: n/a
 
      01-05-2007
sherifffruitfly wrote:
> J. Gleixner wrote:
>
>> Provide examples of your data, expected results, and what you've
>> tried. If you post a short example with all of those, it'll
>> be much easier to help.

>
> Just in case I typo'd my regex, here it is again, copy/pasted straight
> from Expresso (regex testing/analysis tool):
>
> \"(?<one>\d{1,3}),(?<two>\d{1,3}),(?<three>\d{1,3} )\"|\"(?<one>\d{1,3}),(?<two>\d{1,3})\"


OK.. so how are you using it?? Show some actual code.

>
> This is intended for use against csv files - there may be stuff in the
> regex that exploits this assumption - I forget.
>
> Sample text:
>
> Oct
> 2005,6.02,211.9,"1,573,958",31.9,"135,191",722.867 6,67.3,19.1,18.1,19.2,18.4,18.4,,


> Representative sample of matches/captures:
>
> match <whole> <one> <two>
> <three>
> "1,573,958" "1,573,958" 1 573
> 958
> "135,191" "135,191" 135 191
> (empty)
> etc.
>
> In my *ideal* regex, there would be just 1 capture from a given match:
> the concatenation of the numerically named capture group values
> (1573958 or 135191, in the above). Achieving th effect of this ideal
> result is trivial from what my regex *does* provide, however.
>
>
>> The requirements and regular expression you posted don't coincide.

>
> I included "qualifier-words" (e.g., "essentially") in my OP that were
> intended to make your statement false. I may have failed. Also I didn't
> explicitly state an obvious limitation of my regex: that it only
> "works" for quoted-comma'd numbers consisting of either 2 or 3 blocks
> (e.g., it fails for numbers in the billions); that suffices for my
> needs
>
> A regex that satisfies my ideal situation would be great. But if that's
> not possible, simply a more elegant more-or-less-equivalent of my own
> would be greatly appreciated. There are many possible ways to
> de-pretty-print - I just want an elegant one.


Maybe you're simply after what's captured by ()??

$_ = q{"123,456,789"};
if ( /^"(\d{1,3}),(\d{1,3}),?(\d{1,3})?"$/ )
{
print "$1$2$3\n";
}

>
> Does that help?


No.. where's the short script??

To be safe, use one of the CSV modules available from CPAN. (e.g.
Text::CSV::Simple)
Parse data.
Iterate through data, looking at each item/cell.
If the item contains a ',', remove it and if the value is > 999, then
do whatever you want with it.


if ( /,/ )
{
my $entry = $_;
$entry =~ tr/,//d;
print $entry, "\n" if $entry > 999;
}

or if the item matches above expression, then do whatever you want
with $1, $2, and $3.

if ( /^"(\d{1,3}),(\d{1,3}),?(\d{1,3})?"$/ )
{
print "$1$2$3\n";
}

Or, if you're really sure of your data...

my $str =<<EOT;
Oct
2005,6.02,211.9,"1,573,958",31.9,"135,191",722.867 6,67.3,19.1,18.1,19.2,18.4,18.4,,
Nov
2005,6.02,212.8,"1,573,958",32.2,"135,191",722.867 6,67.3,19.2,18.2,19.2,15.7,15.7,,
Dec
2005,6.02,213.6,"1,570,805",32.5,"136,005",723.228 ,66.2,19.2,18.2,19.3,13.3,13.3,,
EOT
$str =~ s/,"(\d{1,3}),(\d{1,3}),?(\d{1,3})?",/,$1$2$3,/g;
print $str;

Oct
2005,6.02,211.9,1573958,31.9,135191,722.8676,67.3, 19.1,18.1,19.2,18.4,18.4,,
Nov
2005,6.02,212.8,1573958,32.2,135191,722.8676,67.3, 19.2,18.2,19.2,15.7,15.7,,
Dec
2005,6.02,213.6,1570805,32.5,136005,723.228,66.2,1 9.2,18.2,19.3,13.3,13.3,,
 
Reply With Quote
 
DJ Stunks
Guest
Posts: n/a
 
      01-05-2007

sherifffruitfly wrote:
> (E-Mail Removed) wrote:
> > sherifffruitfly wrote:
> > > (2) Capture all of the digits, leaving the quotes and commas for the
> > > garbage man.

> >
> > If you just want to strip out the non-numerics why futz with regexps?
> > Why not just use s/// to get rid of the non-numerics?
> >
> > my $original = " 123, 456, 789 ' ";
> > (my $numbers = $original) =~ s/\D//g ;
> > print $numbers;

>
> Because I want to use regex, please.


that DOES use a regular expression, but why would you insist on a
particular method anyway? is this a homework assignment? do you want
a solution or don't you?

> If you don't wish to help me with my question, but prefer to answer
> only your own instead, that's perfectly fine, of course.


no need to get snotty.

-jp

 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      01-05-2007
"sherifffruitfly" <(E-Mail Removed)> wrote:

> Mirco Wahab wrote:
>
>> You didn't specify how *exact* is your matching requirement,
>> eg. if you have data like this:

>
> Yah - the reason I left it vague is because what I *want* is best
> described in English as
> "de-pretty-printing-numerical-csv-file-entries-that-shouldn't-have-been
> -pretty-printed-in-the-first-place".


Use a module that reads and parses a CSV file
for each column in a row that contains a pretty-printed number turn it
into a non-pretty printed number.

In fact there was no need to mention a CSV file at all, you should be
smart enough to find the right module for that one. Your problem could be
reduced to:

I have a number that can be written as:

example(s)

How can I turn this into a normal number.

s/,// seems to be the right answer.

--
John Experienced Perl programmer: http://castleamber.com/

Perl help, tutorials, and examples: http://johnbokma.com/perl/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Is there an elegant way to set an unsigned vector to 1 Jan De Ceuster VHDL 5 01-13-2005 07:26 AM
Elegant algorithm. Vladimir ASP .Net 0 07-31-2004 05:51 PM
Any elegant solution for managing upload file size? Braky Wacky ASP .Net 8 07-15-2004 08:19 PM
More Elegant Column Widths in ASP:Table Objects? =?Utf-8?B?QWxleCBNYWdoZW4=?= ASP .Net 1 05-14-2004 07:38 PM
Elegant way of returning FieldNames? Jay Balapa ASP .Net 1 08-07-2003 10:25 PM



Advertisments