Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Matching chars in a std::string

Reply
Thread Tools

Matching chars in a std::string

 
 
tech
Guest
Posts: n/a
 
      06-23-2008
Hi, I need a function to specify a match pattern including using
wildcard characters as below
to find chars in a std::string.

The match pattern can contain the wildcard characters "*" and "?",
where "*" matches zero or more consecutive occurrences of any
character and "?" matches a single occurrence of any character.

Does boost or some other library have this capability? If boost does
have this, do i need to include an entire
boost library or just the bit i want. How much extra code size would
result from just using a single
utility function from the library?

Thanks
 
Reply With Quote
 
 
 
 
Mirco Wahab
Guest
Posts: n/a
 
      06-23-2008
tech wrote:
> Hi, I need a function to specify a match pattern including using
> wildcard characters as below
> to find chars in a std::string.


Use a Regular expression library.

> The match pattern can contain the wildcard characters "*" and "?",
> where "*" matches zero or more consecutive occurrences of any
> character and "?" matches a single occurrence of any character.


Example:
using namespace boost;
...
regex reg("^ \\s* .*? (\\d+) [^\\n\\r]* \d? [\\n\\r]+", regex::mod_x);

> Does boost or some other library have this capability?


Yes, it's called boost_regex
http://www.boost.org/doc/libs/1_35_0...tml/index.html

> If boost does have this, do i need to include an entire
> boost library or just the bit i want. How much extra code size would
> result from just using a single
> utility function from the library?#


On my (Linux-)System, the size of the shared library

/usr/lib/libboost_regex.so.1.34.1

is 768320 bytes.

Regards

M.


 
Reply With Quote
 
 
 
 
Mirco Wahab
Guest
Posts: n/a
 
      06-23-2008
Mirco Wahab wrote:
> tech wrote:
>> If boost does have this, do i need to include an entire
>> boost library or just the bit i want. How much extra code size would
>> result from just using a single
>> utility function from the library?#

>
> On my (Linux-)System, the size of the shared library
>
> /usr/lib/libboost_regex.so.1.34.1
>
> is 768320 bytes.


I verified (sort of) my claim with a boost-1.34.1
installation on a Suse Linux.

The application needed the libboost_regex.so,
which in (1.34.1) is 768320 bytes - but, the
boost library itself links to the unicode system
(libicu*) which (here) includes at least:

libicudata.so - 11363116 bytes
libicui18n.so - 1412764 bytes
libicuuc.so - 1215688 bytes

So the above files were required to copy
to a "clean" location in order to get the
program which uses boost_regex only to run.

The term "clean" means: dependent on the
actual configuration on the target machine,
the installation of other libraries may
be necessary. Doing a ldd on libboost_regex.so
shows:
linux-gate.so.1 => (0xffffe000)
libicui18n.so.38 => /usr/lib/libicui18n.so.38 (0xb7d12000)
libicuuc.so.38 => /usr/lib/libicuuc.so.38 (0xb7bea000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7afa000)
libm.so.6 => /lib/libm.so.6 (0xb7ac6000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7ab7000)
libc.so.6 => /lib/libc.so.6 (0xb7974000)
libicudata.so.38 => /usr/lib/libicudata.so.38 (0xb6e9d000)
libpthread.so.0 => /lib/libpthread.so.0 (0xb6e85000)
/lib/ld-linux.so.2 (0xb7f27000)

So it might be better to install static boost libraries
on the developer machine and hand out statically linked
"big-block" executables w/all bolts contained. I can't
test this here because there aren't static boost libraries
on my distribution and I'm too lazy to bother with that.

Regards

Mirco
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      06-23-2008
On Jun 23, 12:41 pm, Mirco Wahab <(E-Mail Removed)-halle.de> wrote:
> tech wrote:
> > Hi, I need a function to specify a match pattern including
> > using wildcard characters as below to find chars in a
> > std::string.


> Use a Regular expression library.


Yes, but...

> > The match pattern can contain the wildcard characters "*" and "?",
> > where "*" matches zero or more consecutive occurrences of any
> > character and "?" matches a single occurrence of any character.


> Example:
> using namespace boost;
> ...
> regex reg("^ \\s* .*? (\\d+) [^\\n\\r]* \d? [\\n\\r]+", regex::mod_x);


This is a joke, right. You need code to convert a match pattern
to a regular expression; you have to convert "*' to something like
"[^/]*", for example (under Unix---under Windows, the equivalent
mapping would be "[^/\\]*"---and under Unix, at least, if it is
the first thing in a filename, you also have to exclude .). And
you have to escape the regular expression meta-characters as
well.

It's still easier to use a regular expression class than to do
it all by hand, but you do need some extra code to generate
the regular expression from the initial pattern.

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
Mirco Wahab
Guest
Posts: n/a
 
      06-23-2008
James Kanze wrote:
> On Jun 23, 12:41 pm, Mirco Wahab <(E-Mail Removed)-halle.de> wrote:
>> Use a Regular expression library.

>
> Yes, but...
>
>> Example:
>> using namespace boost;
>> ...
>> regex reg("^ \\s* .*? (\\d+) [^\\n\\r]* \d? [\\n\\r]+", regex::mod_x);

>
> This is a joke, right. You need code to convert a match pattern
> to a regular expression; you have to convert "*' to something like
> "[^/]*", for example (under Unix---under Windows, the equivalent
> mapping would be "[^/\\]*"---and under Unix, at least, if it is
> the first thing in a filename, you also have to exclude .). And
> you have to escape the regular expression meta-characters as
> well.


What are you talking about? There's no 'filename' mentioned
nowhere. It's plain text processing with regular expressions
(if I'm not completely off the road).

> It's still easier to use a regular expression class than to do
> it all by hand, but you do need some extra code to generate
> the regular expression from the initial pattern.


Not at all. The above would be (OK I made this up, its
a pseudo expression) a valid regular expression. Other
(maybe related) example. Find all links in a web page:

int linkparser(const char* htmlname)
{
boost::regex reg(
"(?isx-m: \
< \\s* A [^>]* href \\s* = \
[\"\\s]* \
\\w+:// ([^\"\\s]*) \
)"
);

string line; // read lines and perform one match/search per line
int linecount = 0; // count lines (nice)
ifstream fin(htmlname); // open saved .html file

cout << "trying to find links in " << htmlname << endl;
while( getline(fin, line) ) {
++linecount;
boost::smatch match; // instantiate match variable
if( boost::regex_search(line, match, reg) )
cout << linecount << "\t" << match[1] << endl;
}

...

What part of the above expression exactly would you consider
when saying:

> you do need some extra code to generate the regular expression


Maybe we speak of different things?

Regards

Mirco
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      06-23-2008
On Jun 23, 7:21 pm, Mirco Wahab <(E-Mail Removed)-halle.de> wrote:
> James Kanze wrote:
> > On Jun 23, 12:41 pm, Mirco Wahab <(E-Mail Removed)-halle.de> wrote:
> >> Use a Regular expression library.


> > Yes, but...


> >> Example:
> >> using namespace boost;
> >> ...
> >> regex reg("^ \\s* .*? (\\d+) [^\\n\\r]* \d? [\\n\\r]+", regex::mod_x);


> > This is a joke, right. You need code to convert a match pattern
> > to a regular expression; you have to convert "*' to something like
> > "[^/]*", for example (under Unix---under Windows, the equivalent
> > mapping would be "[^/\\]*"---and under Unix, at least, if it is
> > the first thing in a filename, you also have to exclude .). And
> > you have to escape the regular expression meta-characters as
> > well.


> What are you talking about? There's no 'filename' mentioned
> nowhere. It's plain text processing with regular expressions
> (if I'm not completely off the road).


The pattern matching he described was wildcard matching of
filenames, not regular expression evaluation. The conventions
are different (but it is possible to map the wildcard matching
to regular expressions, sort of).

> > It's still easier to use a regular expression class than to
> > do it all by hand, but you do need some extra code to
> > generate the regular expression from the initial pattern.


> Not at all. The above would be (OK I made this up, its
> a pseudo expression) a valid regular expression.


Yes, but it's not what he asked for. What he asked for was that
``"*" matches zero or more consecutive occurrences of any
character and "?" matches a single occurrence of any
character.'' A subset of the classical filename globbing
patterns.

[...]
> What part of the above expression exactly would you consider
> when saying:


> > you do need some extra code to generate the regular expression


> Maybe we speak of different things?


I was talking about what the original poster asked for. You can
do it with regular expressions (I have code which translates a
Unix globbing pattern into a regular expression), but it takes
some pre-processing.

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
Mirco Wahab
Guest
Posts: n/a
 
      06-23-2008
James Kanze wrote:
>> Maybe we speak of different things?

>
> I was talking about what the original poster asked for. You can
> do it with regular expressions (I have code which translates a
> Unix globbing pattern into a regular expression), but it takes
> some pre-processing.

[...]
> The pattern matching he described was wildcard matching of
> filenames, not regular expression evaluation. The conventions
> are different (but it is possible to map the wildcard matching
> to regular expressions, sort of).


This is the OP's question:
|[Subject: Matching chars in a std::string]
| Hi, I need a function to specify a match pattern including using
| wildcard characters as below to find chars in a std::string. The
| match pattern can contain the wildcard characters "*" and "?",
| where "*" matches zero or more consecutive occurrences of any
| character and "?" matches a single occurrence of any character.

I fail to see anything here
you mentioned in your two
preceding posts.

Regards

Mirco
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      06-24-2008
On Jun 23, 9:46 pm, Mirco Wahab <(E-Mail Removed)-halle.de> wrote:
> James Kanze wrote:
> >> Maybe we speak of different things?


> > I was talking about what the original poster asked for. You can
> > do it with regular expressions (I have code which translates a
> > Unix globbing pattern into a regular expression), but it takes
> > some pre-processing.

> [...]
> > The pattern matching he described was wildcard matching of
> > filenames, not regular expression evaluation. The conventions
> > are different (but it is possible to map the wildcard matching
> > to regular expressions, sort of).


> This is the OP's question:
> |[Subject: Matching chars in a std::string]
> | Hi, I need a function to specify a match pattern including using
> | wildcard characters as below to find chars in a std::string. The
> | match pattern can contain the wildcard characters "*" and "?",
> | where "*" matches zero or more consecutive occurrences of any
> | character and "?" matches a single occurrence of any character.


> I fail to see anything here you mentioned in your two
> preceding posts.


Really? You don't see any mention of "wildcard"? You don't see
a definition of "*" which says it matches zero or more
consecutive occurrence of any character? You don't see a
definition of "?" which matches a single occurance of any
character?

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
Mirco Wahab
Guest
Posts: n/a
 
      06-24-2008
James Kanze wrote:
> On Jun 23, 9:46 pm, Mirco Wahab <(E-Mail Removed)-halle.de> wrote:
>> This is the OP's question:
>> |[Subject: Matching chars in a std::string]
>> | Hi, I need a function to specify a match pattern including using
>> | wildcard characters as below to find chars in a std::string. The
>> | match pattern can contain the wildcard characters "*" and "?",
>> | where "*" matches zero or more consecutive occurrences of any
>> | character and "?" matches a single occurrence of any character.


[...]

> Really? You don't see any mention of "wildcard"? You don't see
> a definition of "*" which says it matches zero or more
> consecutive occurrence of any character? You don't see a
> definition of "?" which matches a single occurance of any
> character?


OK, I'm sorry, my mistake. When I read your post saying:

>>> The pattern matching he described was wildcard matching of
>>> filenames, not regular expression evaluation. The conventions
>>> are different (but it is possible to map the wildcard matching
>>> to regular expressions, sort of).


I understood it more like:

| The pattern matching he described was wildcard matching of
| filenames, not regular expression evaluation. The conventions
| are different (but it is possible to map the wildcard matching
| to regular expressions, sort of).

So you didn't really mean:
"/... matching of filenames, not regular expression evaluation .../"

but rather meant exactly what the OP wanted to know. Sorry
for not being able to deduce that from it (I'm new to c.l.c++).

Regards & Thanks for clearing this up

Mirco
 
Reply With Quote
 
Nick Keighley
Guest
Posts: n/a
 
      06-24-2008
On 24 Jun, 10:12, Mirco Wahab <(E-Mail Removed)-halle.de> wrote:
> James Kanze wrote:
> > On Jun 23, 9:46 pm, Mirco Wahab <(E-Mail Removed)-halle.de> wrote:



> >> This is the OP's question:
> >> |[Subject: Matching chars in a std::string]
> >> | Hi, I need a function to specify a match pattern including using
> >> | wildcard characters as below to find chars in a std::string. The
> >> | match pattern can contain the wildcard characters "*" and "?",
> >> | where "*" matches zero or more consecutive occurrences of any
> >> | character and "?" matches a single occurrence of any character.


I think you're both mind-reading. You're translating what the
user asked for into what you think he wants.

<snip>

> >>> The pattern matching he described was wildcard matching of
> >>> filenames, not regular expression evaluation.


no... I wonder if he wants pattern matching and has only seen
file globbing. he not *know* he wants reg-exprs. I think the
*, ? was possibly only an example.

> >>>*The conventions
> >>> are different (but it is possible to map the wildcard matching
> >>> to regular expressions, sort of).

>
> I understood it more like:
>
> | The pattern matching he described was wildcard matching of
> | filenames, not regular expression evaluation. *The conventions
> | are different (but it is possible to map the wildcard matching
> | to regular expressions, sort of).
>
> So you didn't really mean:
> "/... matching of filenames, not regular expression evaluation .../"
>
> but rather meant exactly what the OP wanted to know. Sorry
> for not being able to deduce that from it (I'm new to c.l.c++).


well it confused me too. I too thought James Kanze was insisting
that the OP was matching file names.

Perhaps the OP could give more info?


--
Nick Keighley





 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
match matching chars at beginning of STL string Lars Schouw C++ 1 03-26-2010 12:10 AM
How to truncate char string fromt beginning and replace chars instring by other chars in C or C++? Hongyu C++ 9 08-08-2008 12:18 PM
Floats to chars and chars to floats Kosio C Programming 44 09-23-2005 09:49 AM
problem matching accented chars on OS X Alex Fenton Ruby 0 06-11-2005 11:08 AM
receiving ??? chars instead of "special" chars M.Posseth ASP .Net Web Services 3 11-16-2004 07:00 PM



Advertisments