Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Manipulation of strings: upper/lower case

Reply
Thread Tools

Manipulation of strings: upper/lower case

 
 
Joe Wright
Guest
Posts: n/a
 
      01-16-2005
Eric Sosman wrote:
> Joe Wright wrote:
>
>> [...]
>>
>>> Lew Pitcher wrote:
>>>
>>> Caution is necessary here. The behaviours of islower and toupper
>>> are undefined if they are passed a value that is neither EOF nor
>>> representable as an unsigned char. It is good practice, therefore,
>>> to cast *string to unsigned char. (No need to cast it back to
>>> int afterwards, since the normal promotion rules handle that.)
>>> [...]

>>
>>
>> There is no need to cast the argument to toupper() to unsigned char.

>
>
> Didn't we just do this a week or so ago? Perhaps it's
> a candidate for the FAQ; it seems at any rate to be FA.
>

Yes we did. It remains to be seen whether I can learn enough from
one beating to avoid the next one.

>> We assume that st points to a valid string. All characters of such a
>> string are within the range 0..CHAR_MAX by definition.

>
>
> No, they are in the range CHAR_MIN through CHAR_MAX.
> Since `char' may be a signed type (it's the implementation's
> choice), CHAR_MIN can be negative. It's true that all the
> characters mandated by the Standard are required to be non-
> negative, but the Standard allows the implementation to define
> additional characters, too -- and some of these may have
> negative codes.
>

Yes, and I truly missed that until just now. Thank you.

>> CHAR_MAX is within UCHAR_MAX by definition.

>
>
> True, but CHAR_MIN can be negative, hence outside the
> range of `unsigned char'.
>

Yes, but I never mentioned CHAR_MIN.

>> If st points to something not a valid string, and toupper() is
>> presented with something out of range, (-20 for example) it may
>> SEGFAULT. And why not? It might tell you where your error is.

>
>
> Except that the "error" isn't the presence of a -20 in
> the string (in one widely-used scheme, -20 is "Latin small
> i with grave accent"). The real error is the failure to
> use the cast that Lew recommends.
>

It didn't occur to me that the value of é (130) was negative as a
signed char (10000010) and when promoted to int would be -126.

I apologize to you and the group for my noise. I'll get it right
next time, I promise. :=)

--
Joe Wright (E-Mail Removed)
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
 
Reply With Quote
 
 
 
 
Mysidia
Guest
Posts: n/a
 
      01-16-2005
> char test [] = "Hello" "\xf0" "World";
>
> ...then your function causes undefined behavior on an implementation
> with CHAR_BIT 8 and signed char, because you will pass an invalid
> value to tolower() or toupper().



But checking islower() or isupper() does not protect from this,
because islower() and isupper() have the same fundamental requirement..

>From ISO/IEC 9899:1999 (E) :

"The header <ctype.h> declares several functions useful for classifying
and mapping characters.166) In all cases the argument is an int, the
value of which shall be representable as an unsigned char or shall
equal the value of the macro EOF. If the argument has any other value,
the behavior is undefined."
isupper(0xf0) is just as undefined as toupper(0xf0) is.

 
Reply With Quote
 
 
 
 
Joe Wright
Guest
Posts: n/a
 
      01-16-2005
Chris Torek wrote:
> In article <(E-Mail Removed)>
> Joe Wright <(E-Mail Removed)> wrote:
>
>>The islower() call is unnecessary.

>
>
> Indeed.
>
>
>>char *upper(char *st) {
>> char *s = st;
>> while ((*s = toupper(*s))) ++s;
>> return st;
>>}
>>
>>There is no need to cast the argument to toupper() to unsigned char.
>>We assume that st points to a valid string.

>
>
> And someone whose name is "Pól" has a name that is an "invalid
> string"?
>
>
>>All characters of such a string are within the range 0..CHAR_MAX
>>by definition. CHAR_MAX is within UCHAR_MAX by definition.

>
>
> If you use ISO-Latin-1, and have signed characters -- and both of
> these are quite commonly true today -- you *will* have characters
> whose value is outside the [0..CHAR_MAX] range. For instance, the
> o-with-accent-acute above is 0xf3 or -13.
>

It looks something like ó (162) at my house. 10100010 is -94 but
your point is taken. I didn't consider negative char as valid.
>
>>If st points to something not a valid string, and toupper() is
>>presented with something out of range, (-20 for example) it may
>>SEGFAULT. And why not? It might tell you where your error is.

>
>
> Or it may change the guy's name from Pól (the Celtic form of
> the name "Paul") to PzL, which might just annoy him. If he happens
> to have a large sword, this could be a bad strategy.


I'll try to stay away from that sword. I'm sorry to have muddied the
water. I'll get it wright next time, I promise.

--
Joe Wright (E-Mail Removed)
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
 
Reply With Quote
 
S.Tobias
Guest
Posts: n/a
 
      01-16-2005
Jack Klein <(E-Mail Removed)> wrote:
> On Sat, 15 Jan 2005 19:00:59 GMT, CBFalconer <(E-Mail Removed)>
> wrote in comp.lang.c:


> > #include <ctype.h>
> >
> > void flipcase(char *s)
> > {
> > unsigned char ch;
> >
> > if (s) /* assuming you want to protect against NULL */
> > while (ch = *s) {
> > if (isupper(ch) *s = tolower(ch);


> Completely unnecessary conditional test.


> > else if (islower(ch) *s = toupper(ch);


> Completely unnecessary conditional test.


Why completely unnecessary? This is case *toggling* function, so at
least one test must remain (note "else").

> > s++;
> > }
> > } /* flipcase, untested */
> >


--
Stan Tobias
mailx `echo http://www.velocityreviews.com/forums/(E-Mail Removed)LID | sed s/[[:upper:]]//g`
 
Reply With Quote
 
infobahn
Guest
Posts: n/a
 
      01-16-2005
Eric Sosman wrote:
>
> Except that the "error" isn't the presence of a -20 in
> the string (in one widely-used scheme, -20 is "Latin small
> i with grave accent"). The real error is the failure to
> use the cast that Lew recommends.


Ahem. That /Lew/ recommends? Am I invisible all of a sudden?
 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      01-16-2005
Jack Klein wrote:
> CBFalconer <(E-Mail Removed)>
>> Pierre wrote:
>>>
>>> I've been looking for a portable means of changing the case of a
>>> string but i've found nothing so far. Does it exists? I guess (and
>>> hope) it does..

>>
>> Unusual to want to simply change the case, but try something like:
>>
>> #include <ctype.h>
>>
>> void flipcase(char *s)
>> {
>> unsigned char ch;
>>
>> if (s) /* assuming you want to protect against NULL */
>> while (ch = *s) {
>> if (isupper(ch) *s = tolower(ch);

>
> Completely unnecessary conditional test.
>
>> else if (islower(ch) *s = toupper(ch);

>
> Completely unnecessary conditional test.
>
>> s++;
>> }
>> } /* flipcase, untested */
>>
>> which allows for the fact that some chars do not have an upper or
>> lower case to be flipped.

>

.... snip ...
>
> So the tests are totally unnecessary.
>
> But suppose:
>
> char test [] = "Hello" "\xf0" "World";
>
> ...then your function causes undefined behavior on an implementation
> with CHAR_BIT 8 and signed char, because you will pass an invalid
> value to tolower() or toupper().


If you examine my function you will find that isupper/lower and
toupper/lower are always operating on an unsigned char. The tests
are necessary, to decide whether to upshift or downshift, although
the second can probably be eliminated. However that would leave
the action somewhat unclear, as it is no longer obvious that some
characters are never transformed.

While busily charging off in all directions you failed to even read
the verbiage I attached, and missed the fact that the conditional
expressions lacked a closing parenthesis, and thus were syntax
errors.

The function will convert test[] to "hELLO" "\xf0" "wORLD".

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson

 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      01-16-2005
infobahn wrote:
> Eric Sosman wrote:
>
>> Except that the "error" isn't the presence of a -20 in
>>the string (in one widely-used scheme, -20 is "Latin small
>>i with grave accent"). The real error is the failure to
>>use the cast that Lew recommends.

>
>
> Ahem. That /Lew/ recommends? Am I invisible all of a sudden?


My apologies; I mistook >>> for >> (or maybe the
other way around) in the attrisnipbutions.

--
Eric Sosman
(E-Mail Removed)lid
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      01-16-2005
Jack Klein wrote:

> On Sat, 15 Jan 2005 19:00:59 GMT, CBFalconer <(E-Mail Removed)>
> wrote in comp.lang.c:
>
>>Unusual to want to simply change the case, but try something like:
>>
>>#include <ctype.h>
>>
>>void flipcase(char *s)
>>{
>> unsigned char ch;
>>
>> if (s) /* assuming you want to protect against NULL */
>> while (ch = *s) {
>> if (isupper(ch) *s = tolower(ch);

> [...]
> But suppose:
>
> char test [] = "Hello" "\xf0" "World";
>
> ...then your function causes undefined behavior on an implementation
> with CHAR_BIT 8 and signed char, because you will pass an invalid
> value to tolower() or toupper().


No: The argument is always in the range of `unsigned char'
as required by the Standard. You'll see why this must be so
if you examine the type of the variable `ch' ...

--
Eric Sosman
(E-Mail Removed)lid

 
Reply With Quote
 
Giorgos Keramidas
Guest
Posts: n/a
 
      01-16-2005
On 2005-01-15 19:00, CBFalconer wrote:
> Pierre wrote:
>> I've been looking for a portable means of changing the case of a
>> string but i've found nothing so far. Does it exists? I guess (and
>> hope) it does..

>
> Unusual to want to simply change the case, but try something like:
>
> #include <ctype.h>
>
> void flipcase(char *s)
> {
> unsigned char ch;
>
> if (s) /* assuming you want to protect against NULL */
> while (ch = *s) {
> if (isupper(ch) *s = tolower(ch);
> else if (islower(ch) *s = toupper(ch);
> s++;
> }
> } /* flipcase, untested */


Missing parentheses in both conditionals
 
Reply With Quote
 
Jack Klein
Guest
Posts: n/a
 
      01-16-2005
On Sun, 16 Jan 2005 05:16:37 GMT, CBFalconer <(E-Mail Removed)>
wrote in comp.lang.c:

> > CBFalconer <(E-Mail Removed)>

>
> If you examine my function you will find that isupper/lower and
> toupper/lower are always operating on an unsigned char. The tests
> are necessary, to decide whether to upshift or downshift, although
> the second can probably be eliminated. However that would leave
> the action somewhat unclear, as it is no longer obvious that some
> characters are never transformed.
>
> While busily charging off in all directions you failed to even read
> the verbiage I attached, and missed the fact that the conditional
> expressions lacked a closing parenthesis, and thus were syntax
> errors.
>
> The function will convert test[] to "hELLO" "\xf0" "wORLD".


Sorry, need to have my meds adjusted again, I guess. Please disregard
my previous post.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
case insensitive find on case sensitive stl map benhoefer@gmail.com C++ 1 04-06-2007 08:42 PM
lower case to upper case Janice C Programming 17 12-14-2004 02:35 PM
how to case select with case-insensitive string ? Tee ASP .Net 3 06-23-2004 07:40 PM
Possible to turn on/off cookieless sessions dynamically on a case by case basis at run-time? Steve Franks ASP .Net 2 06-10-2004 02:04 PM
Scorsese Collection: Keep case vs Snap case Ray DVD Video 0 05-30-2004 04:04 AM



Advertisments