Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > A regex to search for numeric ranges...

Reply
Thread Tools

A regex to search for numeric ranges...

 
 
Mr P
Guest
Posts: n/a
 
      04-19-2011
I read up on this on the www and I found ideas like

if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

which is pretty uncipherable at a glance and just in general not
elegant in any sense.

I generally do something like

if ( /(\d+)/ && $1 > 256 && $1 < 1024 )


Which to me is a lot more readable at a glance, but like the example
above not overly elegant..

But what I'd REALLY like to do is, similar to the trick for numeric
sort, a way to do it in the regex like

/[256-1024]/ # but force it to be numeric, not literal perhaps with a
switch

Thoughts, Masters?
 
Reply With Quote
 
 
 
 
Mr P
Guest
Posts: n/a
 
      04-19-2011
On Apr 19, 3:57*pm, Eli the Bearded <*(E-Mail Removed)> wrote:
> In comp.lang.perl.misc, Mr P *<(E-Mail Removed)> wrote:
>
> > I read up on this on the www and I found ideas like

>
> > if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

>
> > which is pretty uncipherable at a glance and just in general not
> > elegant in any sense.

>
> True. That's why it's much better to not use regexps for numerical
> ranges.
>
> > I generally do something like

>
> > *if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

>
> I'd write that as
>
> * *if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )
>
> because I like to make sure things operate in the order I want them
> to.
>
> > Which to me is a lot more readable at a glance, but like the example
> > above not overly elegant..

>
> > But what I'd REALLY like to do is, similar to the *trick for numeric
> > sort, a way to do it in the regex like

>
> > /[256-1024]/ # but force it to be numeric, not literal perhaps with a
> > switch

>
> sub mknumre($$) {
> * my $low = shift;
> * my $hi *= shift;
>
> * my $set = join('|', ($low .. $hi));
>
> * return qr/($set)/;
>
> }
>
> > Thoughts, Masters?

>
> Why does this have to be a regular expression? Use the right tool
> for the job.


I guess my answer to that question is that my 1-line regex is a lot
easier to read and much shorter than your 9-line monster!
 
Reply With Quote
 
 
 
 
Mr P
Guest
Posts: n/a
 
      04-19-2011

> > I generally do something like

>
> > *if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

>
> I'd write that as
>
> * *if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )
>
> because I like to make sure things operate in the order I want them
> to.
>


There is no ambiguity in the order of my example- study ORDER
PRECEDENCE. Mine is just less syntax-intensive.
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      04-20-2011
On Tue, 19 Apr 2011 12:35:56 -0700 (PDT), Mr P <(E-Mail Removed)> wrote:

>I read up on this on the www and I found ideas like
>
>if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...
>
>which is pretty uncipherable at a glance and just in general not
>elegant in any sense.
>
>I generally do something like
>
> if ( /(\d+)/ && $1 > 256 && $1 < 1024 )
>
>
>Which to me is a lot more readable at a glance, but like the example
>above not overly elegant..
>
>But what I'd REALLY like to do is, similar to the trick for numeric
>sort, a way to do it in the regex like
>
>/[256-1024]/ # but force it to be numeric, not literal perhaps with a
>switch
>
>Thoughts, Masters?



/[256-1024]/ is generally possible.
It has limitations that affect the surrounding expressions, but it
could be worked around and functionally generalized (again within
specific limitations).

-sln

-----------------------

use strict;
use warnings;

my $str = '0001023 widgets';

# Inline code is going to be a thing of the future and definitely
# going to happen (see perl 6 regex).
# This allows parameter checking and is usefull when the source
# has extended data to be regex analyzed in one expression.

if ($str =~ / \b (\d+) \b
(?(?{$^N > 256 && $^N < 1024}) # is this number between 256-1024?
# yes, continue processing
|
(*FAIL) # no, fail outright
)
# more expressions here ..
\s*
(.+)
/x )
{
print "Number: '$1', Type: '$2'\n";
}
else {
print "failed\n";
}

print "\n";

# This does a source conversion of \d+ to a single utf8 character.
# It then allows checking it in a HEX numeric range character class.
# Even though the source is decimal, '1023', when magically assumed to
# be hex and converted to a utf8 char like "\x{1023}", its code point
# will be corectly matched within a regex character class range.
# Example: "\x{1023}" =~ /[\x{257}-\x{1023}]/ will match.
# And, only "\x{N}" where N is between 257-1023 will match.

for (0 .. 4096)
{
# Construct a fake string using the current counter.
# In reality, you have to parse the source string and do the conversion
# so that you end up doing something like this:
# $src =~ /^(.*?)\b(\d+)\b(.*?)$/
# eval "\$temp_src = \"$1\\x{$2}$3\" ";
# Then use the $temp_src in place of the $str below.

my $padded_string = "000$_"; # the extra '000' padding is just a test
eval "\$str = \"\\x{$padded_string} widgets\" ";

if ( $str =~ /^ ([\x{257}-\x{1023}])
\s*
(.+)
/x )
{
print "Number: '$padded_string', Type: '$2'\n";
}
}
__END__

Output
------------

Number: '0001023', Type: 'widgets'

Number: '000257', Type: 'widgets'
Number: '000258', Type: 'widgets'
Number: '000259', Type: 'widgets'
Number: '000260', Type: 'widgets'
Number: '000261', Type: 'widgets'
Number: '000262', Type: 'widgets'
Number: '000263', Type: 'widgets'
Number: '000264', Type: 'widgets'
Number: '000265', Type: 'widgets'
Number: '000266', Type: 'widgets'
Number: '000267', Type: 'widgets'
...
...
Number: '0001012', Type: 'widgets'
Number: '0001013', Type: 'widgets'
Number: '0001014', Type: 'widgets'
Number: '0001015', Type: 'widgets'
Number: '0001016', Type: 'widgets'
Number: '0001017', Type: 'widgets'
Number: '0001018', Type: 'widgets'
Number: '0001019', Type: 'widgets'
Number: '0001020', Type: 'widgets'
Number: '0001021', Type: 'widgets'
Number: '0001022', Type: 'widgets'
Number: '0001023', Type: 'widgets'

 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      04-21-2011
>>>>> "s" == sln <(E-Mail Removed)> writes:

s> /[256-1024]/ is generally possible.

s> It has limitations that affect the surrounding expressions, but it
s> could be worked around and functionally generalized (again within
s> specific limitations).

limitations? it is just wrong. that is a char class of all those digits
(and i am not even sure what [6-1] will generate).

uri

--
Uri Guttman ------ http://www.velocityreviews.com/forums/(E-Mail Removed) -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
 
Reply With Quote
 
Ilya Zakharevich
Guest
Posts: n/a
 
      04-24-2011
On 2011-04-21, Eli the Bearded <*@eli.users.panix.com> wrote:
> I'm sure. The second one, mapping integer sequences to characters to
> then use a Unicode character class has all the workings of a brilliant
> bit of obfuscation. I suspect it doesn't scale well, say 2^16 or
> 2^32, but I don't really know how Perl handles Unicode internally.


When I worked on this (long time ago), there were no compilers with
128-bit IV sitting around (are there now?). Hence the support I
implemented was intended to work "up to maximal number
representantable by UV", but it is actually coded with limitation "not
higher than 64 bits". I doubt anybody expanded to further than
this (the "hooks" for expansion are there, just probably not implemented)...

Hope this helps,
Ilya
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      04-27-2011
Eli the Bearded <*@eli.users.panix.com> writes:
> In comp.lang.perl.misc, Mr P <(E-Mail Removed)> wrote:
>> I read up on this on the www and I found ideas like
>>
>> if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...
>>
>> which is pretty uncipherable at a glance and just in general not
>> elegant in any sense.

>
> True. That's why it's much better to not use regexps for numerical
> ranges.
>
>> I generally do something like
>>
>> if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

>
> I'd write that as
>
> if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )
>
> because I like to make sure things operate in the order I want them
> to.


Really?

First off, I hope you're aware that both forms are exactly
equivalent., since "<" binds more tightly than "&&", and "&&"
imposes a left-to-right evaluation with or without the parentheses.

An argument for using the extra parentheses would be that they make
it clearer. They don't for me personally; in this particular case,
the precedence is carved deeply enough into my brain that it's clear
enough without the parentheses. But YMMV. Obviously, different
people have different levels of comfort with the precedence levels
of the various operators.

But I'd write it as:

if (/(\d+)/ and $1 > 256 and $1 < 1024)

I usually prefer "and" and "or" over "&&" and "||". On the other
hand, I have been bitten a few times by the *low* precedence of
"and" and "or"; I've mistakenly written things like

return $this and $that;

which never evaluates $that.

(And none of these are equivalent to the original regexp, which
checks for values from 0 to 255.)

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      04-27-2011
>>>>> "KT" == Keith Thompson <(E-Mail Removed)> writes:

KT> Eli the Bearded <*@eli.users.panix.com> writes:
>> I'd write that as
>>
>> if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )
>>
>> because I like to make sure things operate in the order I want them
>> to.


KT> First off, I hope you're aware that both forms are exactly
KT> equivalent., since "<" binds more tightly than "&&", and "&&"
KT> imposes a left-to-right evaluation with or without the parentheses.

KT> An argument for using the extra parentheses would be that they make
KT> it clearer. They don't for me personally; in this particular case,
KT> the precedence is carved deeply enough into my brain that it's clear
KT> enough without the parentheses. But YMMV. Obviously, different
KT> people have different levels of comfort with the precedence levels
KT> of the various operators.

i agree with the dropping of unneeded parens. one place i do use extra
parens is with ?:. i find parens around the conditional part helps given
the usually longer total expression. it highlights that as the
conditional part. not critical but a little style thing i do. and it is
especially helpful when doing nested ?: ops.

uri

--
Uri Guttman ------ (E-Mail Removed) -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
 
Reply With Quote
 
Jim Gibson
Guest
Posts: n/a
 
      04-28-2011
In article <(E-Mail Removed)>, Keith Thompson
<(E-Mail Removed)> wrote:

> Eli the Bearded <*@eli.users.panix.com> writes:
> > In comp.lang.perl.misc, Mr P <(E-Mail Removed)> wrote:
> >> I read up on this on the www and I found ideas like
> >>
> >> if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...
> >>
> >> which is pretty uncipherable at a glance and just in general not
> >> elegant in any sense.

> >
> > True. That's why it's much better to not use regexps for numerical
> > ranges.
> >
> >> I generally do something like
> >>
> >> if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

> >
> > I'd write that as
> >
> > if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )
> >
> > because I like to make sure things operate in the order I want them
> > to.

>
> Really?
>
> First off, I hope you're aware that both forms are exactly
> equivalent., since "<" binds more tightly than "&&", and "&&"
> imposes a left-to-right evaluation with or without the parentheses.
>
> An argument for using the extra parentheses would be that they make
> it clearer. They don't for me personally; in this particular case,
> the precedence is carved deeply enough into my brain that it's clear
> enough without the parentheses. But YMMV. Obviously, different
> people have different levels of comfort with the precedence levels
> of the various operators.


Another argument for using the extra, redundant parentheses is that it
will work without regard to precedence. I always use the parentheses.
That way I don't have to remember what the operator precedence is and
can worry about other things.

To quote Sherlock Holmes:

"You see," he explained, "I consider that a man's brain originally is
like a little empty attic, and you have to stock it with such furniture
as you choose. A fool takes in all the lumber of every sort that he
comes across, so that the knowledge which might be useful to him gets
crowded out, or at best is jumbled up with a lot of other things so
that he has a difficulty in laying his hands upon it. Now the skilful
workman is very careful indeed as to what he takes into his
brain-attic. He will have nothing but the tools which may help him in
doing his work, but of these he has a large assortment, and all in the
most perfect order. It is a mistake to think that that little room has
elastic walls and can distend to any extent. Depend upon it there comes
a time when for every addition of knowledge you forget something that
you knew before. It is of the highest importance, therefore, not to
have useless facts elbowing out the useful ones."

-- /A Study in Scarlet/, A. C. Doyle.

--
Jim Gibson
 
Reply With Quote
 
Justin C
Guest
Posts: n/a
 
      04-28-2011
On 2011-04-28, Jim Gibson <(E-Mail Removed)> wrote:
>
> To quote Sherlock Holmes:
>
> "You see," he explained, "I consider that a man's brain originally is
> like a little empty attic, and you have to stock it with such furniture
> as you choose. A fool takes in all the lumber of every sort that he
> comes across, so that the knowledge which might be useful to him gets
> crowded out, or at best is jumbled up with a lot of other things so
> that he has a difficulty in laying his hands upon it. Now the skilful
> workman is very careful indeed as to what he takes into his
> brain-attic. He will have nothing but the tools which may help him in
> doing his work, but of these he has a large assortment, and all in the
> most perfect order. It is a mistake to think that that little room has
> elastic walls and can distend to any extent. Depend upon it there comes
> a time when for every addition of knowledge you forget something that
> you knew before. It is of the highest importance, therefore, not to
> have useless facts elbowing out the useful ones."


Now we know where Matt Groening got Homer's quote "...every time I learn
something new it pushes some old stuff out of my brain".

I should read more... but then I'd probably forget stuff I want to
remember.

Justin.

--
Justin C, by the sea.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex testing and UTF8 awarenes or Regex and numeric pattern matching sln@netherlands.com Perl Misc 2 03-10-2009 03:51 AM
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
int to numeric numeric(18,2) ? jobs ASP .Net 2 07-22-2007 12:32 AM
Arithmetic overflow error converting numeric to data type numeric. darrel ASP .Net 4 07-19-2007 09:57 PM
check if string contains numeric, and check string length of numeric value ief@specialfruit.be C++ 5 06-30-2005 01:08 PM



Advertisments