Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Help simplify complex regexp needing positive lookahead and reluctant quantifers

Reply
Thread Tools

Help simplify complex regexp needing positive lookahead and reluctant quantifers

 
 
david.karr@wamu.net
Guest
Posts: n/a
 
      03-20-2005
My code is in Java, but my problem is a complicated regexp.
Ironically, I think I'm more likely to get a better response in here
than elsewhere. It's too bad there's no "regular expressions"
newsgroup (that I can find).

My sample data is the following (abstracted from real data):
--------------
*XXXlkjsflkw34lkjsfd
2XXXlkjsdfojsfjoimf344
3XXXabcdef9999999
4XXX9f9f9f9f9f9f9f9f
5XXXg8g8g8g8g8g8g8g
6XXXe6e6e6e6e6e6e6e6e
YYY=D/23333333
-xxxxxxxxxxxx
-yyyyyyyyyyyy
ZZZ=gggggggggggg
AAA=hhhhhhhhhh
-jjjjjjjjjjj
-kkkkkkkkkkk
/XXX 2
--------------

The important elements are "XXX", "YYY", "ZZZ", and "AAA". Each of
"YYY", "ZZZ", and "AAA" could be in any order, and some could be
missing, or others like it could be added. What I'd like to build is a
regexp that can group each of "YYY", "ZZZ", and "AAA" along with their
"associated data", up to either the next "[A-Z]{3}=", or the ending
"/XXX". If I can get the "associated data" into group values, I can
use other regexps for the detail in those group values.

The regexp that I've built so far comes close to solving this, but not
quite. This is what I have so far (translated from Java string syntax
to Perl):

--------------
"(?sm)\\*.{3}.*\n" .
"2.{3}.*\n" .
"3.{3}.*\n" .
"4.{3}.*\n" .
"5.{3}.*\n" .
"6.{3}.*\n" .
" ([A-Z]{3}=)(.*?)(?= [A-Z]{3}=|/[A-Z]{3})" .
" ([A-Z]{3}=)(.*?)(?= [A-Z]{3}=|/[A-Z]{3})" .
" ([A-Z]{3}=)(.*?)(?= [A-Z]{3}=|/[A-Z]{3})" .
"/[A-Z]{3}.*"
--------------

You can ignore for now the fact that I'm not verifying that all the
places that require "XXX" are all "XXX". The problem area is the
"[A-Z]{3}=" groups. This regexp works for my sample data, but I wasn't
able to simplify those three repeated lines into a single expression,
which would handle any number of those. I tried the following, to
replace those three lines:

"( ([A-Z]{3}=)(.*?)(?= [A-Z]{3}=|/[A-Z]{3}))*"

but that didn't seem to work, and I'm not sure why.

The following is the output from my Java program, using the working
regexp, where it iterated through the found groups. I provide this
just as another view of what I'm trying to capture:

--------------
group[YYY=]
group[D/23333333
-xxxxxxxxxxxx
-yyyyyyyyyyyy
]
group[ZZZ=]
group[gggggggggggg
]
group[AAA=]
group[hhhhhhhhhh
-jjjjjjjjjjj
-kkkkkkkkkkk
]
--------------

 
Reply With Quote
 
 
 
 
Sherm Pendley
Guest
Posts: n/a
 
      03-20-2005
wrote:

> My code is in Java, but my problem is a complicated regexp.
> Ironically, I think I'm more likely to get a better response in here
> than elsewhere. It's too bad there's no "regular expressions"
> newsgroup (that I can find).


No, but there is definitely a Java group.

I'm not just being snide - implementations of regular expressions vary. An
answer you get here may not apply to Java, and answers you get here or in a
Java group may not apply to sed, and so forth. You'd be far better off
asking your question in a group that's focused on the particular
implementation that you're using.

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
floor(positive double) vs trunc(positive double) different Hicham Mouline C Programming 2 04-23-2010 06:50 PM
Treetop positive lookahead problem Tom Aadland Ruby 4 07-14-2008 04:16 AM
positive/negative lookahead issue. greedy = problems? vbgunz Javascript 6 11-28-2007 09:02 PM
Positive lookahead assertion tobiah Python 8 09-08-2006 08:11 AM
Help simplify complex regexp needing positive lookahead and reluctant quantifers david.karr@wamu.net Java 7 03-25-2005 09:05 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57