Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > iterating over sub-matches using std::tr1::regex?

Reply
Thread Tools

iterating over sub-matches using std::tr1::regex?

 
 
DomoChan@gmail.com
Guest
Posts: n/a
 
      08-13-2008
Given a repeatable group expression

([abc])+

and given its input

cab

will result in nested subgroups, which taken from 'rad software
regular expression tester' looks like

Match 'cab'
- Group 1
- c at pos 0 length 1
- a at pos 1 length 1
- b at pos 2 length 1

Id like to use the regex classes found in std::tr1 to iterate over all
the matches in Group1.

Im using regex_search to fill a smatch object. I need to go one more
step to iterate over the matches found in Group1. Can anyone tell me
what I need to do to iterate over the sub-matches?

I've tried the following, but it doesnt seem to work

// note: initialResults is sucessfully filled with a single group
match
regex_search( "cab", initialResults, "([abc])+" );

for ( size_t ii = 1; ii < initialResults.size(); ++ii )
{
ssub_match groupResults;
// note: groupResults.matches is false. groupResults.first is
NULL, as is groupResults.second
groupResults.compare( initialResults[ ii ] );
}

Thanks for any assistance!
-Velik

 
Reply With Quote
 
 
 
 
DomoChan@gmail.com
Guest
Posts: n/a
 
      08-13-2008
On Aug 12, 11:13*pm, "Alf P. Steinbach" <(E-Mail Removed)> wrote:
> * (E-Mail Removed):
>
>
>
> > Given a repeatable group expression

>
> > ([abc])+

>
> > and given its input

>
> > cab

>
> > will result in nested subgroups, which taken from 'rad software
> > regular expression tester' looks like

>
> > Match 'cab'
> > * *- Group 1
> > * * * *- c at pos 0 length 1
> > * * * *- a at pos 1 length 1
> > * * * *- b at pos 2 length 1

>
> > Id like to use the regex classes found in std::tr1 to iterate over all
> > the matches in Group1.

>
> > Im using regex_search to fill a smatch object. *I need to go one more
> > step to iterate over the matches found in Group1. *Can anyone tell me
> > what I need to do to iterate over the sub-matches?

>
> > I've tried the following, but it doesnt seem to work

>
> > // note: initialResults is sucessfully filled with a single group
> > match
> > regex_search( "cab", initialResults, "([abc])+" );

>
> > for ( size_t ii = 1; ii < initialResults.size(); ++ii )
> > {
> > * * * ssub_match groupResults;
> > * * * // note: groupResults.matches is false. *groupResults.first is
> > NULL, as is groupResults.second
> > * * * groupResults.compare( initialResults[ ii ] );
> > }

>
> Not sure exactly what you're talking about, but if I understand it correctly you
> want all possible matches of a single character from a specific set of chars.
>
> Then why not use ([abc]).
>
> All possible matches of ([abc])+, if I read it correctly as 1 or more successive
> characters drawn from the set {a, b, c}, for a string of length N of consisting
> of those characters only, well that's N + (N-1) + ... + 1 = (N^2 + N + 1)/2
> matches, and surely you don't want that, or do you?
>
> Cheers, & hth.,
>
> - Alf
>
> --
> A: Because it messes up the order in which people normally read text.
> Q: Why is it such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing on usenet and in e-mail?


You seem to clearly understand the expression, but perhaps I didnt use
an accurate expression to explain my situation.

If I changed my input string to "cab bat mac", the results would then
contain\

Match 'cab'
- Group 1
- c at pos 0 length 1
- a at pos 1 length 1
- b at pos 2 length 1
Match 'ba'
- Group 1
- b at pos 4 length 1
- a at pos 5 length 1
Match 'ac'
- Group 1
- a at pos 9 length 1
- c at pos 10 length 1

so 'cab', 'ba', and 'ac' are stored in initalResults, and I can
iterate over those easily using a for loop and using the 'smatch'
indexer. However, Im interested in the individual results within each
group, so from the first match 'cab' i want to be able to iterate over
that group and read [0] = 'c', [1] = 'a', [2] = 'b'. So, thats what
Im tring to use 'ssub_match' for. but, im sure im not using it
correctly.

Let me know if im still vague.

Thanks again!
 
Reply With Quote
 
 
 
 
DomoChan@gmail.com
Guest
Posts: n/a
 
      08-13-2008
On Aug 13, 2:46 am, "Alf P. Steinbach" <(E-Mail Removed)> wrote:
> * (E-Mail Removed):
>
>
>
> > On Aug 12, 11:13 pm, "Alf P. Steinbach" <(E-Mail Removed)> wrote:
> >> * (E-Mail Removed):

>
> >>> Given a repeatable group expression
> >>> ([abc])+
> >>> and given its input
> >>> cab
> >>> will result in nested subgroups, which taken from 'rad software
> >>> regular expression tester' looks like
> >>> Match 'cab'
> >>> - Group 1
> >>> - c at pos 0 length 1
> >>> - a at pos 1 length 1
> >>> - b at pos 2 length 1
> >>> Id like to use the regex classes found in std::tr1 to iterate over all
> >>> the matches in Group1.
> >>> Im using regex_search to fill a smatch object. I need to go one more
> >>> step to iterate over the matches found in Group1. Can anyone tell me
> >>> what I need to do to iterate over the sub-matches?
> >>> I've tried the following, but it doesnt seem to work
> >>> // note: initialResults is sucessfully filled with a single group
> >>> match
> >>> regex_search( "cab", initialResults, "([abc])+" );
> >>> for ( size_t ii = 1; ii < initialResults.size(); ++ii )
> >>> {
> >>> ssub_match groupResults;
> >>> // note: groupResults.matches is false. groupResults.first is
> >>> NULL, as is groupResults.second
> >>> groupResults.compare( initialResults[ ii ] );
> >>> }
> >> Not sure exactly what you're talking about, but if I understand it correctly you
> >> want all possible matches of a single character from a specific set of chars.

>
> >> Then why not use ([abc]).

>
> >> All possible matches of ([abc])+, if I read it correctly as 1 or more successive
> >> characters drawn from the set {a, b, c}, for a string of length N of consisting
> >> of those characters only, well that's N + (N-1) + ... + 1 = (N^2 + N + 1)/2
> >> matches, and surely you don't want that, or do you?

>
> Please don't quote signatures.
>
>
>
> > You seem to clearly understand the expression, but perhaps I didnt use
> > an accurate expression to explain my situation.

>
> > If I changed my input string to "cab bat mac", the results would then
> > contain\

>
> > Match 'cab'
> > - Group 1
> > - c at pos 0 length 1
> > - a at pos 1 length 1
> > - b at pos 2 length 1
> > Match 'ba'
> > - Group 1
> > - b at pos 4 length 1
> > - a at pos 5 length 1
> > Match 'ac'
> > - Group 1
> > - a at pos 9 length 1
> > - c at pos 10 length 1

>
> > so 'cab', 'ba', and 'ac' are stored in initalResults, and I can
> > iterate over those easily using a for loop and using the 'smatch'
> > indexer.

>
> Can you? I don't see how, if you're using the code shown earlier. Didn't work
> for me.
>
> > However, Im interested in the individual results within each
> > group, so from the first match 'cab' i want to be able to iterate over
> > that group and read [0] = 'c', [1] = 'a', [2] = 'b'. So, thats what
> > Im tring to use 'ssub_match' for. but, im sure im not using it
> > correctly.

>
> > Let me know if im still vague.

>
> No, it seems pretty clear.
>
> I reproduced the output shown above by using a sregex_iterator to iterate over
> the matches for "([abc])+", and an inner loop with sregex_iterator to iterate
> over the "([abc])" matches in each match (as suggested in my previous reply). It
> seems there is also capture functionality that can do this more directly, but
> requires recompilation of the regex library with certain switches, and affects
> efficiency in general, i.e. not just when it's used. I didn't try that.
>
> Since this might be a school homework assignment, or an exercise you're doing in
> order to learn from the experience of doing it, I'm not enclosing the code, but
> yes, with this simple expression it's not only possible but simple, as
> described, and I'm too lazy to think about whether a more complex expression
> might present problems. I did use some time on it though: building the regex
> library (never used) and checking the docs. But well used time, learned some!
>
> Cheers, & hth.,
>
> - Alf
>
> --
> A: Because it messes up the order in which people normally read text.
> Q: Why is it such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing on usenet and in e-mail?


> Can you? I don't see how, if you're using the code shown earlier. Didn't work
> for me.


yes... you can. see "http://en.wikipedia.org/wiki/C%2B
%2B0x#Regular_expressions"

> (as suggested in my previous reply)


which reply was that?

> I reproduced the output ... using sregex_iterator


This is not an assignment, unless you considerate an assignment to
myself in which
case I hold no rules against cheating : ) kidding aside, this is
just syntax, not
really a logic issue and im waaay past getting any personal
gratification from personal
experience due to the amount of hair i've lost over this issue. at
any rate, im
writing an simple xml parser. see...

Cmn_XmlReader::Cmn_XmlReader( string xml )
{
Cmn_String::StringToList( xml, m_original, "\r\n", true );
Cmn_String::StringToList( xml, m_workingCopy, "\r\n", true );

m_desc[Header] = "Header";
m_desc[SplitTag] = "SplitTag";
m_desc[CombinedTag] = "CombinedTag";
m_desc[CloseTag] = "CloseTag";
m_desc[OpenTag] = "OpenTag";

m_regexDefs[Header] = "(<[\\?].+[\\?]>){1}";
m_regexDefs[SplitTag] = "<(\\w+)\\s*(\\w+=['\"].+?['\"]\\s*)*\
\s*>(.+?)</\\1>";
m_regexDefs[CombinedTag] = "<(\\w+)\\s*(\\w+=['\"].+?['\"]\\s*)*\\s*/
>";

m_regexDefs[CloseTag] = "</(\\w+)>";
m_regexDefs[OpenTag] = "<(\\w+)\\s*(\\w+=['\"].+?['\"]\\s*)*\
\s*>";

m_patternDefs[Header] = new regex( m_regexDefs[Header] );
m_patternDefs[SplitTag] = new regex( m_regexDefs[SplitTag] );
m_patternDefs[CombinedTag] = new regex( m_regexDefs[CombinedTag] );
m_patternDefs[CloseTag] = new regex( m_regexDefs[CloseTag] );
m_patternDefs[OpenTag] = new regex( m_regexDefs[OpenTag] );

ValidateHeader();
}

It seems that boost makes it more obvious of how to access its
repeated captures via
smatch.captures()[] which doesn't exist in tr1.

void print_captures(const std::string& regx, const std::string& text)
{
boost::regex e(regx);
boost::smatch what;
std::cout << "Expression: \"" << regx << "\"\n";
std::cout << "Text: \"" << text << "\"\n";
if(boost::regex_match(text, what, e, boost::match_extra))
{
unsigned i, j;
std::cout << "** Match found **\n Sub-Expressions:\n";
for(i = 0; i < what.size(); ++i)
std::cout << " $" << i << " = \"" << what[i] << "\"\n";
std::cout << " Captures:\n";
for(i = 0; i < what.size(); ++i)
{
std::cout << " $" << i << " = {";
for(j = 0; j < what.captures(i).size(); ++j)
{
if(j)
std::cout << ", ";
else
std::cout << " ";
std::cout << "\"" << what.captures(i)[j] << "\"";
}
std::cout << " }\n";
}
}
else
{
std::cout << "** No Match found **\n";
}
}

to make matters more difficult, intellisense has not worked for any
tr1 objects, so viewing
methods and properties involves browsing the lengthy and cluttered
regex header file or
waiting util i start debug up to see whats what.

so, if anyone knows how to access repeated subgroups, please divulge
your knowledge and make
the forums a better place ^_^
 
Reply With Quote
 
DomoChan@gmail.com
Guest
Posts: n/a
 
      08-13-2008
ok, this thread has been kicked dead. please dont reply to any more
of my threads.

Thanks
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Iterating a std::vector vs iterating a std::map? carl C++ 5 11-25-2009 09:55 AM
VOIP over VPN over TCP over WAP over 3G Theo Markettos UK VOIP 2 02-14-2008 03:27 PM
Iterating over select available threads Mitch Java 3 03-12-2006 10:46 PM
iterating over arrays with map - problem Mothra Perl 1 05-27-2004 03:37 PM
Iterating over IMG in HTML file Mike Mimic Java 1 04-29-2004 07:55 PM



Advertisments