Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Regex syntax

Reply
Thread Tools

Regex syntax

 
 
-
Guest
Posts: n/a
 
      08-08-2005
I have managed to form the regex for the following two:

CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>

String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";

LWS = [CRLF] 1*( SP | HT )

String LWS_REGEX = "((\r\n)??( |\\x09)+?)";


However, the following stumped me for hours.

TEXT = <any OCTET except CTLs, but including LWS>


String TEXT_REGEX = ...... // help me out please.
 
Reply With Quote
 
 
 
 
-
Guest
Posts: n/a
 
      08-08-2005
- wrote:
> I have managed to form the regex for the following two:
>
> CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
>
> String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";
>
> LWS = [CRLF] 1*( SP | HT )
>
> String LWS_REGEX = "((\r\n)??( |\\x09)+?)";
>
>
> However, the following stumped me for hours.
>
> TEXT = <any OCTET except CTLs, but including LWS>
>
>
> String TEXT_REGEX = ...... // help me out please.


Kindly disregard.
 
Reply With Quote
 
 
 
 
Lasse Reichstein Nielsen
Guest
Posts: n/a
 
      08-08-2005
- <> writes:

> String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";


Too many square brackets. Just use "[\\x00-\\x1f\\x7f]"

> LWS = [CRLF] 1*( SP | HT )


Im not absolutely sure how to read this notation, so I'm guessing
it means one carrige return/line feed pair followed by one or more
space/horizontal tab.

> String LWS_REGEX = "((\r\n)??( |\\x09)+?)";


Why two question marks? And the backlashes might want to be escaped
too. Look more like
"\\r\\n[\\x20\\x09]+"

(is it mail header format or something like that?

> However, the following stumped me for hours.
>
> TEXT = <any OCTET except CTLs, but including LWS>


LWS is not an octet, so how much do you want to match?

How about:
"[^\\x00-\\x1f\\x7f]|\\r\\n[\\x20\\x09]+"

/L
--
Lasse Reichstein Nielsen -
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
 
Reply With Quote
 
-
Guest
Posts: n/a
 
      08-08-2005
Lasse Reichstein Nielsen wrote:
> - <> writes:
>
>
>>String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";

>
>
> Too many square brackets. Just use "[\\x00-\\x1f\\x7f]"
>
>
>>LWS = [CRLF] 1*( SP | HT )

>
>
> Im not absolutely sure how to read this notation, so I'm guessing
> it means one carrige return/line feed pair followed by one or more
> space/horizontal tab.
>
>
>>String LWS_REGEX = "((\r\n)??( |\\x09)+?)";

>
>
> Why two question marks? And the backlashes might want to be escaped
> too. Look more like
> "\\r\\n[\\x20\\x09]+"
>
> (is it mail header format or something like that?
>
>
>>However, the following stumped me for hours.
>>
>>TEXT = <any OCTET except CTLs, but including LWS>

>
>
> LWS is not an octet, so how much do you want to match?
>
> How about:
> "[^\\x00-\\x1f\\x7f]|\\r\\n[\\x20\\x09]+"
>
> /L\\


Thanks... One more qn:

token = 1*<any CHAR except CTLs>


As corrected, CTL is ([\\x00-\\x1f\\x7f])

CHAR = <any US-ASCII character (octets 0 - 127)>

So it's CHAR = "([\\x00-\\x7F])";

I tried

String regex = "[([\\x00-\\x7F])&&[^([\\x00-\\x1f\\x7f])]]";

and then test for "\u007f".matches(regex) and it returns true which is
obviously wrong.
 
Reply With Quote
 
Lasse Reichstein Nielsen
Guest
Posts: n/a
 
      08-08-2005
- <> writes:

> I tried
>
> String regex = "[([\\x00-\\x7F])&&[^([\\x00-\\x1f\\x7f])]]";


You are guessing blindly now. Good thing it didn't appear to work.
Do read up on the format of regular expressions before trying that
again
<URL:http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html>

CHAR except CTL would be the characters 0x20-0x7e, which is most easily
written directly:
"[\\x20-\\x7e]+"

> and then test for "\u007f".matches(regex) and it returns true which is
> obviously wrong.


It's what you asked for, although I'm surprised that it gave "true".
The string is not a valid Regular Expression (the first ")" is
unmatched, since the first one is inside a character group).

/L
--
Lasse Reichstein Nielsen -
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      08-14-2005
On Mon, 08 Aug 2005 14:07:02 +0800, - <> wrote or
quoted :

>I have managed to form the regex for the following two:


my regex cheat sheet might help you. See
http://mindprod.com/jgloss/regex.html
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
String Pattern Matching: regex and Python regex documentation Xah Lee Java 1 09-22-2006 07:11 PM
Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine? =?Utf-8?B?SmViQnVzaGVsbA==?= ASP .Net 2 10-22-2005 02:43 PM
Java regex imposture re: Perl regex compatibility a_c_Attlee@yahoo.com Java 2 05-06-2005 12:16 AM
perl regex to java regex Rick Venter Java 5 11-06-2003 10:55 AM



Advertisments