Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > regex_replace()

Reply
Thread Tools

regex_replace()

 
 
Friedel Jantzen
Guest
Posts: n/a
 
      05-12-2011
Am Wed, 11 May 2011 02:09:34 -0700 (PDT) schrieb Michael Doubez:
....
> You could try toupper/tolower with your local and see if works on the
> umlaut (and the eszett ).


Thank you for this hint.

cout << "User locale: " << locale("").name() << endl;//German_Germany.1252
setlocale(LC_ALL, "");

toupper() result:
toupper('', locale("")) == ''
(but toupper('') != '')

Replacing:
regex::flag_type rxFlags = regex::icase | regex::ECMAScript;
string rxStr = "";
string replStr = "oe";
string input("Schnes sterreich");
regex rx;
rx.imbue(locale(""));
rx.assign(rxStr, rxFlags);
string output = regex_replace(input, rx, replStr);
// output == "Schoenes sterreich" --> capital NOT replaced

I wonder if it works on e.g. a French system with sth. like and ?

Regards,
Friedel
 
Reply With Quote
 
 
 
 
Ralf Goertz
Guest
Posts: n/a
 
      05-12-2011
Friedel Jantzen wrote:

> Am Wed, 11 May 2011 02:09:34 -0700 (PDT) schrieb Michael Doubez:
> ...
>> You could try toupper/tolower with your local and see if works on the
>> umlaut (and the eszett ).

>
> Thank you for this hint.
>
> cout << "User locale: " << locale("").name() << endl;//German_Germany.1252
> setlocale(LC_ALL, "");
>
> toupper() result:
> toupper('ö', locale("")) == 'Ö'
> (but toupper('ö') != 'Ö')
>
> Replacing:
> regex::flag_type rxFlags = regex::icase | regex::ECMAScript;
> string rxStr = "ö";
> string replStr = "oe";
> string input("Schönes Österreich");
> regex rx;
> rx.imbue(locale(""));
> rx.assign(rxStr, rxFlags);
> string output = regex_replace(input, rx, replStr);
> // output == "Schoenes Österreich" --> capital Ö NOT replaced
>
> I wonder if it works on e.g. a French system with sth. like é and É ?


If you use wstrings it should work (except for the toupper without
locale specification). Here I used boost under linux:


#include <iostream>
#include <string>
#include <boost/regex.hpp>

using namespace std;
using namespace boost;

int main() {
ios::sync_with_stdio(false);
cout << "User locale: " << locale("").name() << endl;
setlocale(LC_ALL, "");
wcout.imbue(locale(""));

wcout<<L"toupper('ö', locale("")) == 'Ö': "<<boolalpha<<(toupper(L'ö',
locale(""))==L'Ö')<<endl;
wcout<<L"toupper('ö')==Ö: "<<boolalpha<<(toupper(L'ö')==L'Ö')<<endl;
regex::flag_type rxFlags = regex::icase | regex::ECMAScript;
wstring rxStr = L"ö";
wstring replStr = L"oe";
wstring input(L"Schönes Österreich");
wregex rx;
rx.imbue(locale(""));
rx.assign(rxStr, rxFlags);
wstring output = regex_replace(input, rx, replStr);
wcout<<input<<L" -> "<<output<<endl;
}

output:

User locale: de_DE.UTF-8
toupper('ö', locale()) == 'Ö': true
toupper('ö')==Ö: false
Schönes Österreich -> Schoenes oesterreich


 
Reply With Quote
 
 
 
 
Michael Doubez
Guest
Posts: n/a
 
      05-12-2011
On 12 mai, 07:39, Friedel Jantzen <(E-Mail Removed)> wrote:
> Am Wed, 11 May 2011 02:09:34 -0700 (PDT) schrieb Michael Doubez:
> ...
>
> > You could try toupper/tolower with your local and see if works on the
> > umlaut (and the eszett ).

>
> Thank you for this hint.
>
> cout << "User locale: " << locale("").name() << endl;//German_Germany.1252
> setlocale(LC_ALL, "");
>
> toupper() result:
> toupper('', locale("")) == ''
> (but toupper('') != '')
>
> Replacing:
> regex::flag_type rxFlags = regex::icase | regex::ECMAScript;
> string rxStr = "";
> string replStr = "oe";
> string input("Schnes sterreich");
> regex rx;
> rx.imbue(locale(""));
> rx.assign(rxStr, rxFlags);
> string output = regex_replace(input, rx, replStr);
> // output == "Schoenes sterreich" --> capital NOT replaced
>
> I wonder if it works on e.g. a French system with sth. like and ?


It works well enough on gcc version 4.3.3:

std::locale loc("");
std::cout<<"User locale: " << loc.name() << std::endl;
char const str[] = "";
std::cout<<str<<std::endl;
for( char const * it = str; *it ; ++it )
{
std::cout<<toupper(*it, loc);
}
std::cout<<std::endl;

Output:
User locale: fr_FR



Deutsch locale is not installed on my system and I couldn't try it.

--
Michael

 
Reply With Quote
 
Michael Doubez
Guest
Posts: n/a
 
      05-12-2011
On 12 mai, 11:53, Michael Doubez <(E-Mail Removed)> wrote:
> On 12 mai, 07:39, Friedel Jantzen <(E-Mail Removed)> wrote:
>
>
>
>
>
>
>
>
>
> > Am Wed, 11 May 2011 02:09:34 -0700 (PDT) schrieb Michael Doubez:
> > ...

>
> > > You could try toupper/tolower with your local and see if works on the
> > > umlaut (and the eszett ).

>
> > Thank you for this hint.

>
> > cout << "User locale: " << locale("").name() << endl;//German_Germany.1252
> > setlocale(LC_ALL, "");

>
> > toupper() result:
> > toupper('', locale("")) == ''
> > (but toupper('') != '')

>
> > Replacing:
> > regex::flag_type rxFlags = regex::icase | regex::ECMAScript;
> > string rxStr = "";
> > string replStr = "oe";
> > string input("Schnes sterreich");
> > regex rx;
> > rx.imbue(locale(""));
> > rx.assign(rxStr, rxFlags);
> > string output = regex_replace(input, rx, replStr);
> > // output == "Schoenes sterreich" --> capital NOT replaced

>
> > I wonder if it works on e.g. a French system with sth. like and ?

>
> It works well enough on gcc version 4.3.3:

[snip]

Oups, you were talking about regex. Well, I don't have a recent
compiler on this machine (and no admin right) so I cannot test it
right now.

--
Michael
 
Reply With Quote
 
Friedel Jantzen
Guest
Posts: n/a
 
      05-13-2011
Thank you for testing.

Am Thu, 12 May 2011 10:12:20 +0200 schrieb Ralf Goertz:
> ...
> If you use wstrings it should work (except for the toupper without
> locale specification). Here I used boost under linux:
> ...
>
> output:
>
> User locale: de_DE.UTF-8
> toupper('', locale()) == '': true
> toupper('')==: false
> Schnes sterreich -> Schoenes oesterreich


Compiled with MS VS2008, on Windows Vista, the output is:

User locale: German_Germany.1252
toupper('', locale("")) == '': true
toupper('')==: true
Schnes sterreich -> Schoenes sterreich

It looks like with this regex implementation (afaik MS lizensed it from
Dinkumware) icase does not work with wstring, too.
Interesting is, that toupper('')==: true

Regards,
Friedel
 
Reply With Quote
 
Jorgen Grahn
Guest
Posts: n/a
 
      05-13-2011
On Tue, 2011-05-10, Michael Doubez wrote:
> On 10 mai, 09:23, Friedel Jantzen <(E-Mail Removed)> wrote:

....
>> If a regex_error is thrown, how can I get the position of the error in the
>> regular expression string?

>
> AFAIS you cannot; and POSIX regcomp doesn't give more information
> either.
> You will need a regex format validator.


POSIX gives you *something* using regerror(3); I assume it's more than
"your regexp is broken" but less than "the problem is the backslash in
position 42".

Regexps are best used hard-coded anyway, rather than generated on the
fly or (worse) generated from user input. So this is usually not a
major problem.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      05-15-2011
On May 12, 6:39 am, Friedel Jantzen <(E-Mail Removed)> wrote:
> Am Wed, 11 May 2011 02:09:34 -0700 (PDT) schrieb Michael Doubez:
> ...
>
> > You could try toupper/tolower with your local and see if works on the
> > umlaut (and the eszett ).


> Thank you for this hint.


> cout << "User locale: " << locale("").name() << endl;//German_Germany.1252
> setlocale(LC_ALL, "");


> toupper() result:
> toupper('', locale("")) == ''
> (but toupper('') != '')


> Replacing:
> regex::flag_type rxFlags = regex::icase | regex::ECMAScript;
> string rxStr = "";
> string replStr = "oe";
> string input("Schnes sterreich");
> regex rx;
> rx.imbue(locale(""));
> rx.assign(rxStr, rxFlags);
> string output = regex_replace(input, rx, replStr);
> // output == "Schoenes sterreich" --> capital NOT replaced


This one's tricky. It's why Unicode introduced title case: if
you ever really wanted to do this, what you'd what to get would
be: "SChoenes Oesterreich". Not sure what that might mean in
the context of regular expressions, however; you'd probably want
a flag stating whether substitution should use a) the case of
the original, b) title case if the original was upper case, or
c) context sensitive title case.

> I wonder if it works on e.g. a French system with sth. like and ?

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments