Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Regex Black Magic... how to stop matching if char?

Reply
Thread Tools

Regex Black Magic... how to stop matching if char?

 
 
Jon
Guest
Posts: n/a
 
      03-30-2007
I'm trying to translate a strange derivative of xml into valid xml. Here
is an example line:

<SUBEVENTSTATUS
1:2><OPERATIONNAME></OPERATIONNAME>gofast<OPERATIONSTATUS>stopped</OPERATIONSTATUS><TARGETOBJECTNAME>name</TARGETOBJECTNAME><TARGETOBJECTVALUE>val</TARGETOBJECTVALUE></SUBEVENTSTATUS
1:1><SUBEVENTSTATUS 2:2><......and on

REXML pukes on the <SUBEVENTSTATUS 1:2> tag... which it should. There
should be some kind of attribute declaration instead. I want to
translate it to something like this: <SUBEVENTSTATUS no="1" of="2">

I'm trying to make a regex to detect the funny tags. Here is what I have
so far:

xml_fix=/<(\S+)\s+(\d+)\d+)>/

This is great, but it will match this:

<Request><code_set_list 1:2>

instead of just this:

<code_set_list 1:2>

...because there is no gauranteed whitespace between tags. Basically, I
need to stop matching if a ">" is found. I've never had to deal with
anything quite like this in my regex experience. Any help or thoughts of
a better way to do things is much appreciated!

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      03-30-2007
On 30.03.2007 17:34, Jon wrote:
> I'm trying to translate a strange derivative of xml into valid xml. Here
> is an example line:
>
> <SUBEVENTSTATUS
> 1:2><OPERATIONNAME></OPERATIONNAME>gofast<OPERATIONSTATUS>stopped</OPERATIONSTATUS><TARGETOBJECTNAME>name</TARGETOBJECTNAME><TARGETOBJECTVALUE>val</TARGETOBJECTVALUE></SUBEVENTSTATUS
> 1:1><SUBEVENTSTATUS 2:2><......and on
>
> REXML pukes on the <SUBEVENTSTATUS 1:2> tag... which it should. There
> should be some kind of attribute declaration instead. I want to
> translate it to something like this: <SUBEVENTSTATUS no="1" of="2">
>
> I'm trying to make a regex to detect the funny tags. Here is what I have
> so far:
>
> xml_fix=/<(\S+)\s+(\d+)\d+)>/
>
> This is great, but it will match this:
>
> <Request><code_set_list 1:2>
>
> instead of just this:
>
> <code_set_list 1:2>
>
> ..because there is no gauranteed whitespace between tags. Basically, I
> need to stop matching if a ">" is found. I've never had to deal with
> anything quite like this in my regex experience. Any help or thoughts of
> a better way to do things is much appreciated!


I can think of several solutions:

/<([^>\s]+)\s+(\d+)\d+)>/

Or even a two phased approach

/<[^>]+>/

and then with the match
/(\d+)\d+)>\z/

HTH

robert
 
Reply With Quote
 
 
 
 
F. Senault
Guest
Posts: n/a
 
      03-30-2007
Le 30 mars à 17:34, Jon a écrit :

> ..because there is no gauranteed whitespace between tags. Basically, I
> need to stop matching if a ">" is found. I've never had to deal with
> anything quite like this in my regex experience. Any help or thoughts of
> a better way to do things is much appreciated!


I'd simply use /<[^>]+\s+(\d+)\d+)>/ (untested, but you get my
drift)...

Fred
--
> Microsoft sucks, sucks, sucks.

Which wouldn't be such a bad thing, if it were cuter, didn't use its
teeth at inopportune moments, didn't hog the bed, cooked well, and had
good taste in films. Sadly, that's not the case. (Dan Birchall, SDM)
 
Reply With Quote
 
Jon Fi
Guest
Posts: n/a
 
      03-30-2007
Robert Klemme wrote:
> On 30.03.2007 17:34, Jon wrote:
>>
>>
>> <code_set_list 1:2>
>>
>> ..because there is no gauranteed whitespace between tags. Basically, I
>> need to stop matching if a ">" is found. I've never had to deal with
>> anything quite like this in my regex experience. Any help or thoughts of
>> a better way to do things is much appreciated!

>
> I can think of several solutions:
>
> /<([^>\s]+)\s+(\d+)\d+)>/
>
> Or even a two phased approach
>
> /<[^>]+>/
>
> and then with the match
> /(\d+)\d+)>\z/
>
> HTH
>
> robert



awesome, and thank you! but for my benefit, could you explain why that
works? I thought ^ was line start?

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Rob Biedenharn
Guest
Posts: n/a
 
      03-30-2007

On Mar 30, 2007, at 11:43 AM, Jon Fi wrote:

> Robert Klemme wrote:
>> On 30.03.2007 17:34, Jon wrote:
>>>
>>>
>>> <code_set_list 1:2>
>>>
>>> ..because there is no gauranteed whitespace between tags.
>>> Basically, I
>>> need to stop matching if a ">" is found. I've never had to deal with
>>> anything quite like this in my regex experience. Any help or
>>> thoughts of
>>> a better way to do things is much appreciated!

>>
>> I can think of several solutions:
>>
>> /<([^>\s]+)\s+(\d+)\d+)>/
>>
>> Or even a two phased approach
>>
>> /<[^>]+>/
>>
>> and then with the match
>> /(\d+)\d+)>\z/
>>
>> HTH
>>
>> robert

>
>
> awesome, and thank you! but for my benefit, could you explain why that
> works? I thought ^ was line start?


Within a character set it inverts the selection so [^>] matches any
character that's NOT a '>'

My solution is: .gsub(/<([^>]*?\b\s+)(\d+)\d+)>/, '<\1no="\2"
of="\3">')

-Rob

Rob Biedenharn http://agileconsultingllc.com




 
Reply With Quote
 
Brian Candler
Guest
Posts: n/a
 
      03-31-2007
On Sat, Mar 31, 2007 at 12:34:25AM +0900, Jon wrote:
> <SUBEVENTSTATUS
> 1:2><OPERATIONNAME></OPERATIONNAME>gofast<OPERATIONSTATUS>stopped</OPERATIONSTATUS><TARGETOBJECTNAME>name</TARGETOBJECTNAME><TARGETOBJECTVALUE>val</TARGETOBJECTVALUE></SUBEVENTSTATUS
> 1:1><SUBEVENTSTATUS 2:2><......and on
>
> REXML pukes on the <SUBEVENTSTATUS 1:2> tag... which it should. There
> should be some kind of attribute declaration instead. I want to
> translate it to something like this: <SUBEVENTSTATUS no="1" of="2">
>
> I'm trying to make a regex to detect the funny tags. Here is what I have
> so far:
>
> xml_fix=/<(\S+)\s+(\d+)\d+)>/
>
> This is great, but it will match this:
>
> <Request><code_set_list 1:2>
>
> instead of just this:
>
> <code_set_list 1:2>


Try (\w+) instead of (\S+)

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Regex testing and UTF8 awarenes or Regex and numeric pattern matching sln@netherlands.com Perl Misc 2 03-10-2009 03:51 AM
String Pattern Matching: regex and Python regex documentation Xah Lee Python 8 09-26-2006 03:24 PM
String Pattern Matching: regex and Python regex documentation Xah Lee Perl Misc 2 09-25-2006 03:15 AM
String Pattern Matching: regex and Python regex documentation Xah Lee Java 1 09-22-2006 07:11 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57