Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regexp for matching a string with mandatory underscores

Reply
Thread Tools

regexp for matching a string with mandatory underscores

 
 
David Filmer
Guest
Posts: n/a
 
      12-27-2011
I want to be able to match the string foo1_bar2_baz3 as having
multiple underscore characters (with no intervening whitespace), but
not match foo1_bar2 which has only one underscore. I want to ignore
one match, but not two or more.

This would be easy if \w did not ALSO match underscores. But it
does. There does not seem to be a character class for alphanumeric
ONLY.

How can I match continuous alphanumeric strings which contain more
than one underscore?

Thanks!
 
Reply With Quote
 
 
 
 
Ilya Zakharevich
Guest
Posts: n/a
 
      12-27-2011
On 2011-12-27, David Filmer <(E-Mail Removed)> wrote:
> This would be easy if \w did not ALSO match underscores. But it
> does. There does not seem to be a character class for alphanumeric
> ONLY.


??? [^\W_]

Ilya
 
Reply With Quote
 
 
 
 
Tim McDaniel
Guest
Posts: n/a
 
      12-27-2011
In article
<(E-Mail Removed)>,
David Filmer <(E-Mail Removed)> wrote:
>How can I match continuous alphanumeric strings which contain more
>than one underscore?


Is it OK to use more than one regexp? If so, I might try
/^\w+$/ && /_.*_/
It's a bit brute-force, but it's also very clear. The second could be
optimized to /_[^_]*_/, but unless you're evaluating it lots of times,
"micro-optimizations leads to micro-results".

--
Tim McDaniel, http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
David Filmer
Guest
Posts: n/a
 
      12-28-2011
On Dec 27, 3:04*pm, Tad McClellan <(E-Mail Removed)> wrote:

> /_.*_/ is a _clear_ way to say "more than one underscore" ?


Yes, but it also would match "foo_bar baz_quux" which contains an
intervening whitespace. This would not satisfy the original
requirements, which stipulate finding multiple underscores within
continuous alphanumeric characters with no intervening whitespace.
 
Reply With Quote
 
Willem
Guest
Posts: n/a
 
      12-28-2011
David Filmer wrote:
) I want to be able to match the string foo1_bar2_baz3 as having
) multiple underscore characters (with no intervening whitespace), but
) not match foo1_bar2 which has only one underscore. I want to ignore
) one match, but not two or more.
)
) This would be easy if \w did not ALSO match underscores. But it
) does. There does not seem to be a character class for alphanumeric
) ONLY.

How would that make it easy?

Doesn't the following work? : m/\w*_\w*_\w*/


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
Reply With Quote
 
Tim McDaniel
Guest
Posts: n/a
 
      12-28-2011
In article <(E-Mail Removed)>,
David Filmer <(E-Mail Removed)> wrote:
>On Dec 27, 3:04*pm, Tad McClellan <(E-Mail Removed)> wrote:
>
>> /_.*_/ is a _clear_ way to say "more than one underscore" ?


Very clear to me, at least.

>Yes, but it also would match "foo_bar baz_quux" which contains an
>intervening whitespace. This would not satisfy the original
>requirements, which stipulate finding multiple underscores within
>continuous alphanumeric characters with no intervening whitespace.


Which is why I wrote
>>> /^\w+$/ && /_.*_/


--
Tim McDaniel, (E-Mail Removed)
 
Reply With Quote
 
Tim McDaniel
Guest
Posts: n/a
 
      12-28-2011
In article <(E-Mail Removed)>,
Tad McClellan <(E-Mail Removed)> wrote:
>The way


There are times to apply the phrase "The way" to Perl, but I don't
know yet that this is one of them.

>to count characters is with tr///, not regexes:
>
> /^\w+$/ && tr/_// > 1


What are your reasons to think one better than the other?

Unless the expression is being evaluated many times, efficiency isn't
so important.

How many people are familiar with tr/// versus plain m//? I rarely
use tr///. I don't remember ever using the return value of tr///.
I've never used the empty RHS feature except with /d (indeed, I had to
check the man page to see that you hadn't trashed $_).

--
Tim McDaniel, (E-Mail Removed)
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      12-28-2011
(E-Mail Removed) (Tim McDaniel) writes:
> In article <(E-Mail Removed)>,
> Tad McClellan <(E-Mail Removed)> wrote:
>>The way


[...]

>>to count characters is with tr///, not regexes:
>>
>> /^\w+$/ && tr/_// > 1

>
> What are your reasons to think one better than the other?
>
> Unless the expression is being evaluated many times, efficiency isn't
> so important.


A subroutine I encountered in the past in some script written by
someone else was

sub mod($$) { return $_[0] - $_[1] * int($_[0] / $_[1]); }

Actually, it wasn't a subroutine but an inline calculation. Provided
the language provides a more direct way to achieve the same result
(the % operator), the question is not 'why should the built-in way be
preferred' but 'why should something other than the built-in way be
used' and ...

> How many people are familiar with tr/// versus plain m//?


.... "But I didn't know about it!" is only a suitable justifcation
until this problem has been remedied.



 
Reply With Quote
 
Tim McDaniel
Guest
Posts: n/a
 
      12-29-2011
In article <(E-Mail Removed) >,
Rainer Weikusat <(E-Mail Removed)> wrote:
>Provided the language provides a more direct way to achieve the same
>result ..., the question is not 'why should the built-in way be
>preferred' but 'why should something other than the built-in way be
>used'


In the current case, it's
/_.*_/
versus
tr/_// > 1
They both use builtins pretty directly and they are both short.
Personally, I find the former to be clearer than the latter, which
uses an operator that usually causes side effects but doesn't in this
case, and I'm still don't know how many know its details.

>> How many people are familiar with tr/// versus plain m//?

>
>... "But I didn't know about it!" is only a suitable justifcation
>until this problem has been remedied.


To some extent I agree, but if someone is coding for other people, one
of the factors that the coder should consider is what is
comprehensible at a glance, in addition to other factors like brevity,
efficiency where needed, robustness, and such. For example, for my
own programs I have no problems with
my %key_lookup;
@key_lookup{@keys} = (1) x @keys;
But since I don't know how many people know that idiom, I might
hesitate to use it when coding for others, and if I did I would likely
comment it clearly.

--
Tim McDaniel, (E-Mail Removed)
 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      12-29-2011
Tim McDaniel wrote:
> In article<(E-Mail Removed) e.com>,
> Rainer Weikusat<(E-Mail Removed)> wrote:
>> Provided the language provides a more direct way to achieve the same
>> result ..., the question is not 'why should the built-in way be
>> preferred' but 'why should something other than the built-in way be
>> used'

>
> In the current case, it's
> /_.*_/
> versus
> tr/_//> 1
> They both use builtins pretty directly and they are both short.
> Personally, I find the former to be clearer than the latter, which
> uses an operator that usually causes side effects but doesn't in this
> case, and I'm still don't know how many know its details.


tr/_// is pretty simple. It is actually short for tr/_/_/ which
replaces every '_' character with a '_' character and returns the number
of replacements made. It has the advantages that it doesn't interpolate
and it only does one thing, and does it well.



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
Ruby 1.9 - ArgumentError: incompatible encoding regexp match(US-ASCII regexp with ISO-2022-JP string) Mikel Lindsaar Ruby 0 03-31-2008 10:27 AM
compilation error: "error: no matching function for call to 'String::String(String)' =?ISO-8859-1?Q?Martin_J=F8rgensen?= C++ 5 05-06-2006 03:48 PM
Using underscores as well as word boundaries to demarcate a pattern Laura Perl 1 06-03-2004 05:25 PM
Why is this JS code matching underscores? williamc Javascript 6 09-25-2003 01:03 PM



Advertisments