Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Using std::lexicographical_compare with ignore case equalitydoesn't always work

Reply
Thread Tools

Using std::lexicographical_compare with ignore case equalitydoesn't always work

 
 
Alex Buell
Guest
Posts: n/a
 
      12-28-2008
The short snippet below demonstrates the problem I'm having with
std::lexicographical_compare() in that it does not reliably work!

#include <iostream>
#include <vector>
#include <ctype.h>

bool compare_ignore_case_equals(char c1, char c2)
{
return toupper(c1) == toupper(c2);
}

bool compare_ignore_case_less(char c1, char c2)
{
return toupper(c1) < toupper(c2);
}

int main(int argc, char *argv[])
{
std::vector<std::string> args(argv + 1, argv + argc);
const char *words[] =
{
"add", "del", "new", "help"
};

std::vector<std::string> list(words, words + (sizeof words / sizeof words[0]));
std::vector<std::string>::iterator word = list.begin();
while (word != list.end())
{
std::cout << "Testing " << *word << " = " << args[0];
if (std::lexicographical_compare(
word->begin(), word->end(),
args[0].begin(), args[0].end(),
compare_ignore_case_equals))
{
std::cout << " found!\n";
break;
}

std::cout << "\n";
word++;
}
}

Here's an example:

./quick new
Testing add = new
Testing del = new found!

That simply cannot be correct, what is it that I've done wrongly? Thanks
--
http://www.munted.org.uk

Fearsome grindings.

 
Reply With Quote
 
 
 
 
Alex Buell
Guest
Posts: n/a
 
      12-28-2008
On Sun, 28 Dec 2008 09:09:32 -0500, I waved a wand and this message
magically appears in front of Pete Becker:

> > if (std::lexicographical_compare(
> > word->begin(), word->end(),
> > args[0].begin(), args[0].end(),
> > compare_ignore_case_equals))

>
> First, remove compare_ignore_case_equals and try again. You'll get
> similar problems. Then read about lexicographical_compare and what
> its return value means.


I've now switched to using this:

#include <string.h>
#include <string>

inline int strcasecmp(const std::string& s1, const std::string& s2)
{
return strcasecmp(s1.c_str(), s2.c_str());
}

This leverages C++'s ability to overload functions and works better.

stricmp() isn't standard whilst strcasecmp() is standard ANSI/ISO. Some
posters have mentioned using stricmp() instead of strcasecmp(), which
happens not to be the correct answer. Why?
--
http://www.munted.org.uk

Fearsome grindings.

 
Reply With Quote
 
 
 
 
Alex Buell
Guest
Posts: n/a
 
      12-28-2008
On Sun, 28 Dec 2008 12:51:36 -0500, I waved a wand and this message
magically appears in front of Pete Becker:

> > This leverages C++'s ability to overload functions and works better.
> >
> > stricmp() isn't standard whilst strcasecmp() is standard ANSI/ISO.

>
> No, it's not. It's Unix, if I remeber correctly. But I think I didn't
> make my point clearly enough. The problem isn't fundamentally in the
> predicate. So drop the predicate and use the default predicate until
> you understand what lexicographical_compare does.


strcasecmp() is actually defined in the POSIX standards. But I will
look again at std::lexicograpical_compare() when I get some time. The
program works well enough with strcasecmp().
--
http://www.munted.org.uk

Fearsome grindings.

 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      12-29-2008
On Dec 28, 4:12 pm, Alex Buell <(E-Mail Removed)> wrote:
> On Sun, 28 Dec 2008 09:09:32 -0500, I waved a wand and this message
> magically appears in front of Pete Becker:
>
> > > if (std::lexicographical_compare(
> > > word->begin(), word->end(),
> > > args[0].begin(), args[0].end(),
> > > compare_ignore_case_equals))


> > First, remove compare_ignore_case_equals and try again.
> > You'll get similar problems. Then read about
> > lexicographical_compare and what its return value means.


> I've now switched to using this:


> #include <string.h>
> #include <string>


> inline int strcasecmp(const std::string& s1, const std::string& s2)
> {
> return strcasecmp(s1.c_str(), s2.c_str());
> }


> This leverages C++'s ability to overload functions and works
> better.


> stricmp() isn't standard whilst strcasecmp() is standard
> ANSI/ISO.


It's not present in any version of the standard I have handy
(C++98, C99, and the latest C++ draft). The standard C++
functionnal object for comparing strings in a locale dependent
way is std::locale (which has an operator() which does exactly
what is needed for lexicographical_compare). And as any
comparisons involved case are locale sensitive, it's really what
you need, e.g.:

if ( std::lexicographical_compare(
word->begin(), word->end(),
args[ 0 ].begin(), args[ 0 ].end(),
std::locale() ) ) {...}

(or std::locale( "xxx" ), with whatever locale you want).

> Some posters have mentioned using stricmp() instead of
> strcasecmp(), which happens not to be the correct answer.
> Why?


Neither are the correct answer, since neither are standard
C/C++. (strcasecmp is defined in Posix, but not very well: "In
the POSIX locale, [...]. The results are unspecified in other
locales." So unless you happen to live in POSIX, it's not very
useful.)

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      12-29-2008
On Dec 28, 2:44 pm, Alex Buell <(E-Mail Removed)> wrote:
> The short snippet below demonstrates the problem I'm having with
> std::lexicographical_compare() in that it does not reliably work!
>
> #include <iostream>
> #include <vector>
> #include <ctype.h>
>
> bool compare_ignore_case_equals(char c1, char c2)
> {
> return toupper(c1) == toupper(c2);


Just a reminder, but this is, of course, undefined behavior.

> }


> bool compare_ignore_case_less(char c1, char c2)
> {
> return toupper(c1) < toupper(c2);


As is this.
> }


(I've addressed the other issues in another posting.)

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
Alex Buell
Guest
Posts: n/a
 
      12-29-2008
On Sun, 28 Dec 2008 18:51:23 -0600, I waved a wand and this message
magically appears in front of blargg:

> > This leverages C++'s ability to overload functions and works
> > better.

>
> Actually, it looks more like it leverages C++'s ability to cause a
> stack overflow due to infinite recursion. strcasecmp isn't part of
> ISO C++, so on plenty of compilers, this function will simply call
> itself.


As this snippet below shows, you're actually correct.

#include <iostream>
#include <string>

int hahaha(const std::string& s1, const std::string& s2)
{
return hahaha(s1.c_str(), s2.c_str());
}

int main()
{
std::string s1 = "hahaha";
std::string s2 = "HAHAHA";

if (hahaha(s1, s2) == 0)
std::cout << "Equal!\n";

return 0;
}

> > stricmp() isn't standard whilst strcasecmp() is standard ANSI/ISO.
> > Some posters have mentioned using stricmp() instead of
> > strcasecmp(), which happens not to be the correct answer. Why?

>
> As far as I can tell, neither are part of standard C++.


Yes, at some point in time I'm going to have to change to
std::lexicographical_compare, or is there anything else I can try for
case insensitive compares on std::string objects?
--
http://www.munted.org.uk

Fearsome grindings.

 
Reply With Quote
 
Thomas J. Gritzan
Guest
Posts: n/a
 
      12-29-2008
James Kanze wrote:
> On Dec 28, 4:12 pm, Alex Buell <(E-Mail Removed)> wrote:
>> stricmp() isn't standard whilst strcasecmp() is standard
>> ANSI/ISO.

>
> It's not present in any version of the standard I have handy
> (C++98, C99, and the latest C++ draft). The standard C++
> functionnal object for comparing strings in a locale dependent
> way is std::locale (which has an operator() which does exactly
> what is needed for lexicographical_compare). And as any
> comparisons involved case are locale sensitive, it's really what
> you need, e.g.:
>
> if ( std::lexicographical_compare(
> word->begin(), word->end(),
> args[ 0 ].begin(), args[ 0 ].end(),
> std::locale() ) ) {...}
>
> (or std::locale( "xxx" ), with whatever locale you want).


operator() of std::locale works on strings by itself. You could use
operator() directly:

/* true, if word < args[0] */
if ( std::locale()(word, args[0]) ) {...}

But does std::locale()() really compare case insensitive?

--
Thomas
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      12-29-2008
On Dec 29, 3:10 pm, "Thomas J. Gritzan" <(E-Mail Removed)>
wrote:
> James Kanze wrote:
> > On Dec 28, 4:12 pm, Alex Buell <(E-Mail Removed)> wrote:
> >> stricmp() isn't standard whilst strcasecmp() is standard
> >> ANSI/ISO.


> > It's not present in any version of the standard I have handy
> > (C++98, C99, and the latest C++ draft). The standard C++
> > functionnal object for comparing strings in a locale
> > dependent way is std::locale (which has an operator() which
> > does exactly what is needed for lexicographical_compare).
> > And as any comparisons involved case are locale sensitive,
> > it's really what you need, e.g.:


> > if ( std::lexicographical_compare(
> > word->begin(), word->end(),
> > args[ 0 ].begin(), args[ 0 ].end(),
> > std::locale() ) ) {...}


> > (or std::locale( "xxx" ), with whatever locale you want).


> operator() of std::locale works on strings by itself. You could use
> operator() directly:


> /* true, if word < args[0] */
> if ( std::locale()(word, args[0]) ) {...}


> But does std::locale()() really compare case insensitive?


The answer to that is a definite maybe. It does (or it should)
in locales where case insensitive comparison makes sense. And
it does so correctly, matching "Straße" and "STRASSE" (or
"ändern" and "Aendern", in Switzerland, but not in Germany).
And "I" and "i" won't compare equal in a Turkish locale. Since
the "C" locale is designed for parsing C code, and the POSIX
locale for working in a Posix environment (including the file
systems and filenames), the comparison in those locales will NOT
be case insensitive.

And of course, you can always define your own locale. (At
least, that's what it says. In practice, it takes a pretty high
level of C++ competence to do it reliably. More than I have, at
any rate.)

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
Thomas J. Gritzan
Guest
Posts: n/a
 
      12-29-2008
James Kanze schrieb:
> On Dec 28, 2:44 pm, Alex Buell <(E-Mail Removed)> wrote:
>> The short snippet below demonstrates the problem I'm having with
>> std::lexicographical_compare() in that it does not reliably work!
>>
>> #include <iostream>
>> #include <vector>
>> #include <ctype.h>
>>
>> bool compare_ignore_case_equals(char c1, char c2)
>> {
>> return toupper(c1) == toupper(c2);

>
> Just a reminder, but this is, of course, undefined behavior.


#include <locale>

struct compare_ignore_case_equals
{
compare_ignore_case_equals(const std::locale& loc_ = std::locale())
: loc(loc_) {}

bool operator()(char c1, char c2) const
{
return std::tolower(c1, loc) == std::tolower(c2, loc);
}

private:
std::locale loc;
};

How about this? Doesn't depend on users locale, you can provide your own
locale, and isn't UB.

Why does ::toupper actually take an int?

--
Thomas
 
Reply With Quote
 
Thomas J. Gritzan
Guest
Posts: n/a
 
      12-29-2008
James Kanze schrieb:
> On Dec 29, 3:10 pm, "Thomas J. Gritzan" <(E-Mail Removed)>

[...]
>> But does std::locale()() really compare case insensitive?

>
> The answer to that is a definite maybe. [...]


If you want to parse commands case insensitivly, like in a shell, script
interpreter or text based protocoll, a maybe isn't enough.

> And of course, you can always define your own locale. (At
> least, that's what it says. In practice, it takes a pretty high
> level of C++ competence to do it reliably. More than I have, at
> any rate.)


Then it would be easier to build a comparision predicate with
std::toupper/tolower as I showed else-thread.

What do people do for multibyte encodings like UTF-8?

--
Thomas
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
RegularExpressionValidator doesn't ignore case A.M ASP .Net 5 09-23-2011 11:32 AM
ignore case ValidationExpression in RegularExpressionValidator Morten71 ASP .Net 0 04-02-2007 12:28 PM
regarding ignore case sensitive of a string using regularexpressions Mosas Python 1 03-22-2005 01:49 PM
Ignore + TEST + Ignore SpooderStank Computer Support 2 04-08-2004 11:26 AM
Searching for Exact Phrase - should I ignore the ignore words? Rob Meade ASP General 6 03-01-2004 11:28 AM



Advertisments