Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Regex: Any character in character class

Reply
Thread Tools

Regex: Any character in character class

 
 
Arne Vajh°j
Guest
Posts: n/a
 
      02-01-2013
On 2/1/2013 5:06 PM, markspace wrote:
> On 2/1/2013 1:47 PM, Arne Vajh°j wrote:
>
>> [.]+:[.|\n]+

>
>
> Watch out for this. +, being greedy, will match a : in the selection
> expression (the 2nd part) if : is allowed in the second part.
>
> The reluctant modifier might be a better idea here:
>
> .+?:[.|\n]+
>
> Note that I don't think the initial brackets [] were needed. Also we're
> yet again starting to see the problem with regex: it always evolves into
> something that looks like your cat walked across the keyboard.


You are absolutely right.

Non greedy.

No square brackets for first part.

And also round brackets for the last part.

..+?.|\n)+

I think I must have set a new world record. 3 bugs in 12 characters.



Arne


 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      02-01-2013
On 01.02.2013 21:14, Sebastian wrote:
> Am 31.01.2013 04:27, schrieb Arne Vajh°j:
>> On 1/30/2013 4:34 AM, Sebastian wrote:
>>> I want to match any sequence of characters, including line breaks, ina
>>> suffix of a multi-line string.
>>>
>>> I do not want to use Pattern.DOTALL, because line breaks are not
>>> permissible everywhere. I cannot write [.]* because dot loses its
>>> special meaning inside a character class.
>>>
>>> I have come up with [\S\s]*
>>> as meaning any sequence of non-whitespace or whitespace (incl.
>>> line-breaks). Is there a better way?


Yes.

>> Do you always want to accept line breaks or not? If not then when?


> the string I want to match basicallyhas two parts (a "protocol" and a
> "selection expression"). I want to allow line breaks anywhere in the
> selection expression, but not in the protocol.


Of course you can use DOTALL - as an embedded flag:

package rx;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Dotty {

private static final Pattern PAT =
Pattern.compile("proto.*(?s:sel.*)");

public static void main(String[] args) {
test("protoPselS");
test("protoPPselS\nS");
test("protoP\nPselS\nS");
}

public static void test(final CharSequence cs) {
System.out.println("cs=\"" + cs + "\"");
final Matcher m = PAT.matcher(cs);

if (m.matches()) {
System.out.println("Match: \"" + m.group() + "\"");
} else {
System.out.println("Mismatch");
}

System.out.println();
}

}

Kind regards

robert


--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

 
Reply With Quote
 
 
 
 
Sebastian
Guest
Posts: n/a
 
      02-02-2013
Am 01.02.2013 23:13, schrieb Arne Vajh°j:
[snip]
> And also round brackets for the last part.
>
> .+?.|\n)+
>
> I think I must have set a new world record. 3 bugs in 12 characters.
>
>
>
> Arne
>

Here's a concrete example:

SCA:LIST, select[werks_s:default_plant],values[bukrs:bukrs,
company:company]


The second part is everything after the first comma. I was using
(.+?),[\s\S]+

Arne's suggestion modified for my needs (comma as separator, and I only
want to capture the first part as a group) will work fine as well:
(.+?),(?:.|\n)+

Can't say though that I find anything to prefer the one to the other.
Perhaps the second looks even more like the result of a cat walk...

-- Sebastian


 
Reply With Quote
 
markspace
Guest
Posts: n/a
 
      02-02-2013
On 2/2/2013 11:45 AM, Sebastian wrote:
> SCA:LIST, select[werks_s:default_plant],values[bukrs:bukrs,
> company:company]


For something this simple you might want to consider just String::split().

String test =
"SCA:LIST,select[werks_s:default_plant],values[bukrs:bukrs,company:company]
";
String[] parse = test.split( ",\\s*", 2 );
System.out.println( Arrays.toString( parse ) );

This could be faster since the second half of the regex, (?:.|\n)+,
doesn't have to execute.


 
Reply With Quote
 
Arne Vajh°j
Guest
Posts: n/a
 
      02-02-2013
On 2/2/2013 2:45 PM, Sebastian wrote:
> Am 01.02.2013 23:13, schrieb Arne Vajh°j:
> [snip]
>> And also round brackets for the last part.
>>
>> .+?.|\n)+
>>
>> I think I must have set a new world record. 3 bugs in 12 characters.
>>
>>
>>

> Here's a concrete example:
>
> SCA:LIST, select[werks_s:default_plant],values[bukrs:bukrs,
> company:company]
>
>
> The second part is everything after the first comma. I was using
> (.+?),[\s\S]+
>
> Arne's suggestion modified for my needs (comma as separator, and I only
> want to capture the first part as a group) will work fine as well:
> (.+?),(?:.|\n)+
>
> Can't say though that I find anything to prefer the one to the other.
> Perhaps the second looks even more like the result of a cat walk...


It is not unusual that there is more than one regex that
does the job.

Arne


 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      02-02-2013
Arne Vajh°j wrote:
> Sebastian wrote:
>> schrieb Arne Vajh´┐ťj:
>> [snip]
>>> And also round brackets for the last part.
>>>
>>> .+?.|\n)+
>>>
>>> I think I must have set a new world record. 3 bugs in 12 characters.

>
>>>

>
>> Here's a concrete example:
>>
>> SCA:LIST, select[werks_s:default_plant],values[bukrs:bukrs,
>> company:company]

>
>> The second part is everything after the first comma. I was using


You mean 'expression.substring(expression.indexOf(',') + 1)'?
(modulo the usual error checks, of course)

> > (.+?),[\s\S]+


>> Arne's suggestion modified for my needs (comma as separator, and I only
>> want to capture the first part as a group) will work fine as well:


You mean 'expression.substring(0, expression.indexOf(','))'?

> > (.+?),(?:.|\n)+

>
>> Can't say though that I find anything to prefer the one to the other.
>> Perhaps the second looks even more like the result of a cat walk...


If all you need to do is split a string on a comma, why use regexes at all?

> It is not unusual that there is more than one regex that
> does the job.


It is not unusual that there is more than one non-regex that does the job.

--
Lew
 
Reply With Quote
 
Arne Vajh°j
Guest
Posts: n/a
 
      02-03-2013
On 2/2/2013 4:23 PM, Lew wrote:
> Arne Vajh°j wrote:
>> Sebastian wrote:
>>> Can't say though that I find anything to prefer the one to the other.
>>> Perhaps the second looks even more like the result of a cat walk...

>
> If all you need to do is split a string on a comma, why use regexes at all?
>
>> It is not unusual that there is more than one regex that
>> does the job.

>
> It is not unusual that there is more than one non-regex that does the job.


True.

But less surprising.

Arne


 
Reply With Quote
 
Gene Wirchenko
Guest
Posts: n/a
 
      02-04-2013
On Fri, 01 Feb 2013 17:13:54 -0500, Arne Vajh°j <(E-Mail Removed)>
wrote:

[snip]

>I think I must have set a new world record. 3 bugs in 12 characters.
>
>


I may be able to save your honour. <G>

IBM had bugs in a one-instruction program of two bytes long. The
program was IEFBR14, and you can read about it on Wikipedia. There
was a series of corrections which resulted in a program several times
larger.

Sincerely,

Gene Wirchenko
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
501 PIX "deny any any" "allow any any" Any Anybody? Networking Student Cisco 4 11-16-2006 10:40 PM
pointer to any member function of any class joosteto@gmail.com C++ 6 07-12-2006 05:29 PM
Class A contains class B, class B points to class A Joseph Turian C++ 5 12-30-2005 03:24 PM
Nested Class, Member Class, Inner Class, Local Class, Anonymous Class E11 Java 1 10-12-2005 03:34 PM
A parameterized class (i.e. template class / class template) is not a class? christopher diggins C++ 16 05-04-2005 12:26 AM



Advertisments