Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regexp for matching a string with mandatory underscores

Reply
Thread Tools

regexp for matching a string with mandatory underscores

 
 
C.DeRykus
Guest
Posts: n/a
 
      12-29-2011
On Dec 27, 2:04*am, David Filmer <(E-Mail Removed)> wrote:
> I want to be able to match the string foo1_bar2_baz3 as having
> multiple underscore characters (with no intervening whitespace), but
> not match foo1_bar2 which has only one underscore. *I want to ignore
> one match, but not two or more.
>
> This would be easy if \w did not ALSO match underscores. *But it
> does. *There does not seem to be a character class for alphanumeric
> ONLY.
>
> How can I match continuous alphanumeric strings which contain more
> than one underscore?
>



Maybe,

print '>1' if (()= /\G [[:alnum:]]+ _/gx) > 1;

--
Charles DeRykus
 
Reply With Quote
 
 
 
 
Tim McDaniel
Guest
Posts: n/a
 
      12-29-2011
In article <(E-Mail Removed)>,
C.DeRykus <(E-Mail Removed)> wrote:
>On Dec 27, 2:04*am, David Filmer <(E-Mail Removed)> wrote:
>> How can I match continuous alphanumeric strings which contain more
>> than one underscore?

>
>Maybe,
>
>print '>1' if (()= /\G [[:alnum:]]+ _/gx) > 1;


For anyone else who is wondering about the use of ()=, please see "man
perldata".

List assignment in scalar context returns the number of elements
pro- duced by the expression on the right side of the assignment:

$x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2
$x = (($foo,$bar) = f()); # set $x to f()'s return count

This is handy when you want to do a list assignment in a Boolean
context, because most list functions return a null list when
finished, which when assigned produces a 0, which is interpreted
as FALSE.

It's also the source of a useful idiom for executing a function or
performing an operation in list context and then counting the
number of return values, by assigning to an empty list and then
using that assignment in scalar context. For example, this code:

$count = () = $string =~ /\d+/g;

will place into $count the number of digit groups found in
$string. This happens because the pattern match is in list
context (since it is being assigned to the empty list), and will
therefore return a list of all matching parts of the string. The
list assignment in scalar context will translate that into the
number of elements (here, the number of times the pattern matched)
and assign that to $count. Note that simply using

$count = $string =~ /\d+/g;

would not have worked, since a pattern match in scalar context
will only return true or false, rather than a count of matches.

--
Tim McDaniel, http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
 
 
 
Ilya Zakharevich
Guest
Posts: n/a
 
      12-31-2011
On 2011-12-28, Tim McDaniel <(E-Mail Removed)> wrote:
>>> /_.*_/ is a _clear_ way to say "more than one underscore" ?

>
> Very clear to me, at least.
>
>>Yes, but it also would match "foo_bar baz_quux" which contains an
>>intervening whitespace. This would not satisfy the original
>>requirements, which stipulate finding multiple underscores within
>>continuous alphanumeric characters with no intervening whitespace.

>
> Which is why I wrote
>>>> /^\w+$/ && /_.*_/


The first one is not completely equivalent to !/\W/, but when ANDed
with the second one it is (ignoring the issue with trailing \n, of
course). Is it more clear? I'm not sure...

Ilya
 
Reply With Quote
 
Ilya Zakharevich
Guest
Posts: n/a
 
      12-31-2011
On 2011-12-29, John W. Krahn <(E-Mail Removed)> wrote:
>> In the current case, it's
>> /_.*_/
>> versus
>> tr/_//> 1
>> They both use builtins pretty directly and they are both short.
>> Personally, I find the former to be clearer than the latter, which
>> uses an operator that usually causes side effects but doesn't in this
>> case, and I'm still don't know how many know its details.

>
> tr/_// is pretty simple.


tr is extremely complicated.

> It is actually short for tr/_/_/ which
> replaces every '_' character with a '_' character and returns the number
> of replacements made. It has the advantages that it doesn't interpolate
> and it only does one thing, and does it well.


For which value of "well"? If it is applied to 2GB string, would it
make a copy of it? If the string is tied to a database entry, would
it cause a database update? If the string is shared between fork()ed
processes, would it become unshared after the operation?

In short: Do you know what you are talking about?

Best wishes for the new year,
Ilya
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      01-01-2012
Ilya Zakharevich <(E-Mail Removed)> writes:
> On 2011-12-29, John W. Krahn <(E-Mail Removed)> wrote:


[...]

>> It is actually short for tr/_/_/ which
>> replaces every '_' character with a '_' character and returns the number
>> of replacements made. It has the advantages that it doesn't interpolate
>> and it only does one thing, and does it well.

>
> For which value of "well"? If it is applied to 2GB string, would it
> make a copy of it?


Not when counting or replacing character in a non-UTF8 string.

> If the string is tied to a database entry, would
> it cause a database update?


Maybe, maybe not. That would depend on the implemention of tieing mechanism.

> If the string is shared between fork()ed
> processes, would it become unshared after the operation?


Strings are not shared between forked processes, memory pages are. As
soon as any process tries to write to a shared page, it will get its
own copy for usual COW-implementations.
 
Reply With Quote
 
Tim McDaniel
Guest
Posts: n/a
 
      01-03-2012
In article <(E-Mail Removed) >,
Rainer Weikusat <(E-Mail Removed)> wrote:
>Ilya Zakharevich <(E-Mail Removed)> writes:
>> On 2011-12-29, John W. Krahn <(E-Mail Removed)> wrote:

>
>[...]
>
>>> It is actually short for tr/_/_/ which replaces every '_'
>>> character with a '_' character and returns the number of
>>> replacements made. It has the advantages that it doesn't
>>> interpolate and it only does one thing, and does it well.

>>
>> For which value of "well"? If it is applied to 2GB string, would
>> it make a copy of it?

>
>Not when counting or replacing character in a non-UTF8 string.
>
>> If the string is tied to a database entry, would
>> it cause a database update?

>
>Maybe, maybe not. That would depend on the implemention of tieing
>mechanism.
>
> [and a forking question]


I remember Dennis Ritchie's use of the phrase "unwarranted chumminess
with the C implementation" (in a far more dubious situation). I'm
hesitant to depend on implementation details unless they're guaranteed
in the documentation. Particularly with Perl: systems I'm on have
versions variously between 5.8 and 5.14, so I wonder which versions
have which optimizations, or indeed if they are done at all.

On the other hand, when you write the scripts yourself (I do that a
lot with Perl), you can know whether it does ties, large strings, or
other unusual cases.

--
Tim McDaniel, (E-Mail Removed)
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      01-03-2012
(E-Mail Removed) (Tim McDaniel) writes:
> In article <(E-Mail Removed) >,
> Rainer Weikusat <(E-Mail Removed)> wrote:
>>Ilya Zakharevich <(E-Mail Removed)> writes:
>>> On 2011-12-29, John W. Krahn <(E-Mail Removed)> wrote:

>>
>>[...]
>>
>>>> It is actually short for tr/_/_/ which replaces every '_'
>>>> character with a '_' character and returns the number of
>>>> replacements made. It has the advantages that it doesn't
>>>> interpolate and it only does one thing, and does it well.
>>>
>>> For which value of "well"? If it is applied to 2GB string, would
>>> it make a copy of it?

>>
>>Not when counting or replacing character in a non-UTF8 string.
>>
>>> If the string is tied to a database entry, would
>>> it cause a database update?

>>
>>Maybe, maybe not. That would depend on the implemention of tieing
>>mechanism.
>>
>> [and a forking question]

>
> I remember Dennis Ritchie's use of the phrase "unwarranted chumminess
> with the C implementation" (in a far more dubious situation). I'm
> hesitant to depend on implementation details unless they're guaranteed
> in the documentation.


What is guaranteed in the documentation today will be 'accidentally
still in the documentation' tomorrow and 'a deprecated feature which
must not be used under any circumstances' (on threat of immediate
excommunication from the universe of all the just and beautiful
people) two days later, so that doesn't really buy you anything :->.

OTOH, it is sensible to assume that - usually - the people who wrote
the implementation will have tried to make it behave sensibly and in
this case, that tr/// will neither copy nor modify the string except
if this is necessary to perform the requested operation.

Re: tied scalars

What will happen when an operation is performed on a scalar tied to
something depends on the class/ module used to provide the tied
semantics and this can be anything, so the question didn't really make
sense: This class or module may well cause 'a database update' despite
perl didn't modify the data.
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      01-04-2012
On Tue, 27 Dec 2011 12:40:04 +0000, Ben Morrow <(E-Mail Removed)> wrote:

>
>Quoth David Filmer <(E-Mail Removed)>:
>> I want to be able to match the string foo1_bar2_baz3 as having
>> multiple underscore characters (with no intervening whitespace), but
>> not match foo1_bar2 which has only one underscore. I want to ignore
>> one match, but not two or more.
>>
>> This would be easy if \w did not ALSO match underscores. But it
>> does. There does not seem to be a character class for alphanumeric
>> ONLY.

>


If I understand you correctly from your example, this may work.
/^[^\W_]+(?:_[^\W_]+){2,}$/

-sln
 
Reply With Quote
 
Ilya Zakharevich
Guest
Posts: n/a
 
      01-10-2012
On 2012-01-03, Rainer Weikusat <(E-Mail Removed)> wrote:
>>>> For which value of "well"? If it is applied to 2GB string, would
>>>> it make a copy of it?
>>>
>>>Not when counting or replacing character in a non-UTF8 string.


So I read it as: "it will" (with certain exceptions).

>>>> If the string is tied to a database entry, would
>>>> it cause a database update?
>>>
>>>Maybe, maybe not. That would depend on the implemention of tieing
>>>mechanism.


Again...

> OTOH, it is sensible to assume that - usually - the people who wrote
> the implementation will have tried to make it behave sensibly and in
> this case, that tr/// will neither copy nor modify the string except
> if this is necessary to perform the requested operation.


Not applicable to Perl (in general). A lot of stuff is majorly pessimized.

> Re: tied scalars
>
> What will happen when an operation is performed on a scalar tied to
> something depends on the class/ module used to provide the tied
> semantics and this can be anything, so the question didn't really make
> sense: This class or module may well cause 'a database update' despite
> perl didn't modify the data.


This is true "literally", but AFAIK, not applicable to any situation I
know.

Essentially, for me all this boils down to: do not use tr/// unless
you can't avoid it, or know EXACTLY how and when your code is going to
be used...

Ilya
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      01-10-2012
Ben Morrow <(E-Mail Removed)> writes:

[...]

> Everyone now knows that using UTF-8 was a mistake,


That's not something "everyone knows" and in fact, some people were so
convinced that UTF-8 would be a sensible choice that they implemented
complete operating systems based on using UTF-8 as native character
encoding (that would be "Plan9"). This should rather be "every member
of some small group of people" (people currently working on Perl
Unicode support?) are strongly convinced that chosing UTF-8 was a
mistake (and I'd wager a bet that the base reason for this is "that's
not what Microsoft did and consequently, it must be WRONG !!1").
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
Ruby 1.9 - ArgumentError: incompatible encoding regexp match(US-ASCII regexp with ISO-2022-JP string) Mikel Lindsaar Ruby 0 03-31-2008 10:27 AM
compilation error: "error: no matching function for call to 'String::String(String)' =?ISO-8859-1?Q?Martin_J=F8rgensen?= C++ 5 05-06-2006 03:48 PM
Using underscores as well as word boundaries to demarcate a pattern Laura Perl 1 06-03-2004 05:25 PM
Why is this JS code matching underscores? williamc Javascript 6 09-25-2003 01:03 PM



Advertisments