Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Problem with gcc

Reply
Thread Tools

Problem with gcc

 
 
Eric Sosman
Guest
Posts: n/a
 
      11-14-2009
jacob navia wrote:
> Alan Curry a écrit :
>> In article <hdkpl5$8aq$(E-Mail Removed)>, jacob navia <(E-Mail Removed)> wrote:
>>> I do not want to have negative characters!

>>
>> Can you give an example of something that doesn't work with plain char
>> specifically because some characters are negative? I think that can only
>> happen if you are making bad assumptions.

>
> I assume characters are codes from one to 255. This is a bad assumption
> maybe, in some kind of weird logic when you assign a sign to a character
> code.


It's certainly a bad assumption on machines where `char'
runs from -128 to 127 ...

> There is a well established confusion in C between characters (that are
> encoded as integers) and integer VALUES.


A character -- loosely, a glyph like 'A' -- is not something
computers nowadays can represent directly in their memories.
Unable to store an actual 'A', they instead store a number like
65 or 193, and say "When thought of as a character, the value
refers to the 65th/193d entry in a list of glyphs." The members
of that list and the order in which they appear are a matter of
convention, nothing more.

It's not really different from the convention that "zero is
false, anything else is true." Some other languages use other
conventions, like "even values are false, odds are true." Neither
scheme is inherently more "right" or "wrong" than the other; it's
just a matter of convention, of a correspondence between the
notions one wants to represent and the numbers that are all the
computer can store internally.

What I'm getting at is that there is (or need be) no confusion
between storing a character and storing a number: The computer always
does the latter and never does the former. When we talk about
"storing a character," it's just a convenient verbal shorthand for
"storing the number that represents a character." And the data type
C uses for this purpose is `char'. Some awkwardnesses stem from this
choice, mostly having to do with the library, and getting the library
to work nicely sometimes involves converting the numbers to and
from other types -- see getchar() or isalpha(), for instance. But
when you want to store character codes, use `char'. Use `unsigned
char' or `signed char' when you want to store small numbers that
are *not* to be thought of as characters.

> I prefer not to use any sign in the characters, and treat 152 as character
> code 152 and not as -104. Stupid me, I know.


152 is not a character; it is a number. In one popular
encoding scheme it corresponds to the character 'q', by virtue
of one of those conventional correspondences. If you want a 'q',
use a `char' and store 'q' in it. If you want the number 152
in a small space, use an `unsigned char' -- but don't think of
it as a character, because it isn't one.

> Besides, when I convert it into a bigger type, I would like to get
> 152, and not 4294967192.


Much depends on the type to which you are converting, and
on why you are performing the conversion.

> Since size_t is unsigned, converting to unsigned is a fairly common
> operation.


It sounds very much as if you are dealing with "raw" numbers,
not with numbers that correspond to characters. If so, it's
quite strange that you are using strcmp() on assemblages of these
numbers, because strcmp() isn't well-suited to the task.

> Writing software
> is difficult enough without having to bother with the sign of characters
> or the
> sex of angels, or the number of demons you can fit in a pin's head.


A little thought about the artificiality of number-to-glyph
correspondences will remove much of the difficulty.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)lid
 
Reply With Quote
 
 
 
 
Flash Gordon
Guest
Posts: n/a
 
      11-14-2009
bartc wrote:
>
> "jacob navia" <(E-Mail Removed)> wrote in message
> news:hdkt69$c1k$(E-Mail Removed)...
>> Alan Curry a écrit :
>>> In article <hdkpl5$8aq$(E-Mail Removed)>, jacob navia <(E-Mail Removed)> wrote:
>>>> I do not want to have negative characters!
>>>
>>> Can you give an example of something that doesn't work with plain char
>>> specifically because some characters are negative? I think that can only
>>> happen if you are making bad assumptions.
>>>

>>
>> I assume characters are codes from one to 255. This is a bad assumption
>> maybe, in some kind of weird logic when you assign a sign to a character
>> code.
>>
>> There is a well established confusion in C between characters (that are
>> encoded as integers) and integer VALUES.
>>
>> One of the reasons is that we have "signed" and "unsigned" characters.

>
> Yes, there should have been signed and unsigned byte. And a separate char
> type equivalent to (or a synonym for) unsigned byte.


I disagree. Ideally char should be a separate type which is *nothing* to
do with integer types. So to assign a char to an integer type you have
to cast it to that type (just as with pointers).

> It really is exasperating when most people in this group insist that signed
> character codes are perfectly normal and sensible!


Insisting that they are perfectly normal is *not* the same as saying
that it is sensible.

> Apparently chars are signed because on the PDP11 or some such machine,
> sign-extending byte values was faster than zero-extending them. A bit
> shortsighted. (If it had been the other way around, they would of course
> have been singing the praises of unsigned char codes; except they would
> have
> been justified this time..)


Ah, but the people you are complaining about would proably accept that
char being unsigned is *also* perfectly normal.

>> I prefer not to use any sign in the characters, and treat 152 as
>> character
>> code 152 and not as -104. Stupid me, I know.

>
> As I understand it, you can easily choose to use unsigned char type for
> such
> codes. The problem being when passing these to library functions where char
> is signed and this triggers a warning?


More to the point, why does he actually care wither a given character
value happens to be positive or negative? The only time it matters that
I can see is when using certain specific functions in the C library, and
unfortunately then you need a cast.

Of course, with gcc you can (on many architectures) select whether char
is signed or unsigned, it is of course still a distinct type.

>> Besides, when I convert it into a bigger type, I would like to get
>> 152, and not 4294967192.

>
> Why doesn't widening a signed value into an unsigned one itself trigger a
> warning?


Why should it? In any case, as others mentioned, a cast will fix this.
Although I have to wonder why the char is being assigned to a larger
unsigned integer type in the first place, it seems an odd thing to do to me.
--
Flash Gordon
 
Reply With Quote
 
 
 
 
bartc
Guest
Posts: n/a
 
      11-14-2009
"Eric Sosman" <(E-Mail Removed)> wrote in message
news:hdmbe3$2hd$(E-Mail Removed)-september.org...
> jacob navia wrote:



>> Writing software
>> is difficult enough without having to bother with the sign of characters
>> or the
>> sex of angels, or the number of demons you can fit in a pin's head.

>
> A little thought about the artificiality of number-to-glyph
> correspondences will remove much of the difficulty.


Making char types always positive would remove all the difficulties.

And there are difficulties because this issue keeps coming up.

--
Bartc

 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      11-14-2009
John Kelly <(E-Mail Removed)> writes:

> On Sat, 14 Nov 2009 01:04:35 +0000, Ben Bacarisse <(E-Mail Removed)>
> wrote:

<snip>
>>Technically, a cast is needed to be portable:
>>
>> char *cp = ...;
>> ...
>> if (isdigit((unsigned char)*cp)) ...

>
> And if testing in a loop, you may want to cast separately from the test.
> Like in this trim function:
>
>
> static void
> trim (char **ts)
> {
> unsigned char *exam;
> unsigned char *keep;
>
> exam = (unsigned char *) *ts;
> while (*exam && isspace (*exam)) {


You can remove the *exam test.

> ++exam;
> }
> *ts = (char *) exam;
> if (!*exam) {
> return;
> }
> keep = exam;
> while (*++exam) {
> if (!isspace (*exam)) {
> keep = exam;
> }
> }
> if (*++keep) {
> *keep = '\0';
> }


And here you could replace the whole 'if' with 'keep[1] = 0;'.
Neither of them is wrong, of course, but every test makes the reader
wonder why it is there.

> }


--
Ben.
 
Reply With Quote
 
John Kelly
Guest
Posts: n/a
 
      11-14-2009
On Sat, 14 Nov 2009 15:27:03 +0000, Ben Bacarisse <(E-Mail Removed)>
wrote:

>> static void
>> trim (char **ts)
>> {
>> unsigned char *exam;
>> unsigned char *keep;
>>
>> exam = (unsigned char *) *ts;
>> while (*exam && isspace (*exam)) {

>
>You can remove the *exam test.


But then you're testing whether '\0' is a space or not. Perhaps it
improves performance, but is it good programming?


>> ++exam;
>> }
>> *ts = (char *) exam;
>> if (!*exam) {
>> return;
>> }
>> keep = exam;
>> while (*++exam) {
>> if (!isspace (*exam)) {
>> keep = exam;
>> }
>> }
>> if (*++keep) {
>> *keep = '\0';
>> }

>
>And here you could replace the whole 'if' with 'keep[1] = 0;'.
>Neither of them is wrong, of course, but every test makes the reader
>wonder why it is there.


But then you replace '\0' with '\0'. Which is worse, one extra test, or
a redundant action?


--
Webmail for Dialup Users
http://www.isp2dial.com/freeaccounts.html

 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      11-14-2009
John Kelly wrote:
> On Sat, 14 Nov 2009 15:27:03 +0000, Ben Bacarisse <(E-Mail Removed)>
> wrote:
>
>>> static void
>>> trim (char **ts)
>>> {
>>> unsigned char *exam;
>>> unsigned char *keep;
>>>
>>> exam = (unsigned char *) *ts;
>>> while (*exam && isspace (*exam)) {

>> You can remove the *exam test.

>
> But then you're testing whether '\0' is a space or not. Perhaps it
> improves performance, but is it good programming?


The test yields "false," so what's wrong with it?
Or, to turn it around, what would your response be to

while (*exam && *exam != '#' && *exam != 'X' && isspace(*exam))

?

--
Eric Sosman
(E-Mail Removed)lid
 
Reply With Quote
 
John Kelly
Guest
Posts: n/a
 
      11-14-2009
On Sat, 14 Nov 2009 12:01:16 -0500, Eric Sosman
<(E-Mail Removed)> wrote:

>John Kelly wrote:
>> On Sat, 14 Nov 2009 15:27:03 +0000, Ben Bacarisse <(E-Mail Removed)>
>> wrote:
>>
>>>> static void
>>>> trim (char **ts)
>>>> {
>>>> unsigned char *exam;
>>>> unsigned char *keep;
>>>>
>>>> exam = (unsigned char *) *ts;
>>>> while (*exam && isspace (*exam)) {
>>> You can remove the *exam test.

>>
>> But then you're testing whether '\0' is a space or not. Perhaps it
>> improves performance, but is it good programming?

>
> The test yields "false," so what's wrong with it?


'\0' is not part of the string, it's a pseudo length specifier, and
conceptually, should not be treated as part of the string. You can get
away with it in this case, but it's a bad programming habit to rely on
environmental assumptions.

With real length specifiers, you wouldn't test one position beyond the
end of the string, so why do it with NUL terminated strings? It's just
a stupid C trick for some dubious performance gain. For my use of that
code, the performance gain doesn't amount to a drop in a bucket.

I would rather think portably, as in from one language to another. I
may use tricks when performance really matters, but then I would include
some remark about my choice and why.



--
Webmail for Dialup Users
http://www.isp2dial.com/freeaccounts.html

 
Reply With Quote
 
Seebs
Guest
Posts: n/a
 
      11-14-2009
On 2009-11-14, John Kelly <(E-Mail Removed)> wrote:
> '\0' is not part of the string, it's a pseudo length specifier, and
> conceptually, should not be treated as part of the string. You can get
> away with it in this case, but it's a bad programming habit to rely on
> environmental assumptions.


The nul terminator is part of the string in C. It's not an environmental
assumption, it's a definition.

> I would rather think portably, as in from one language to another. I
> may use tricks when performance really matters, but then I would include
> some remark about my choice and why.


You can't meaningfully "think portably" about C strings, because they're
not really analagous to things in other languages.

-s
--
Copyright 2009, all wrongs reversed. Peter Seebach / (E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      11-14-2009
Eric Sosman a écrit :
>> I prefer not to use any sign in the characters, and treat 152 as
>> character
>> code 152 and not as -104. Stupid me, I know.

>
> 152 is not a character; it is a number. In one popular
> encoding scheme it corresponds to the character 'q', by virtue
> of one of those conventional correspondences. If you want a 'q',
> use a `char' and store 'q' in it. If you want the number 152
> in a small space, use an `unsigned char' -- but don't think of
> it as a character, because it isn't one.
>


The letter 'é' is 130. Why I should have it as -126 ???
The problem is that you ignore foreign languages and all their special
characters like é or è or à or £ or...

>> Besides, when I convert it into a bigger type, I would like to get
>> 152, and not 4294967192.

>
> Much depends on the type to which you are converting, and
> on why you are performing the conversion.
>


Most the conversions are indirect, or because some operation with characters
is done by promoting, etc etc.

>> Since size_t is unsigned, converting to unsigned is a fairly common
>> operation.

>
> It sounds very much as if you are dealing with "raw" numbers,
> not with numbers that correspond to characters. If so, it's
> quite strange that you are using strcmp() on assemblages of these
> numbers, because strcmp() isn't well-suited to the task.
>


Sure, if we accept that 'é' is not a character THEN obviously
"strcmp is not well suited to the task.

What function should I use then?

>> Writing software
>> is difficult enough without having to bother with the sign of
>> characters or the
>> sex of angels, or the number of demons you can fit in a pin's head.

>
> A little thought about the artificiality of number-to-glyph
> correspondences will remove much of the difficulty.
>


No. A little thought will make you use unsigned chars everywhere.
UNLESS you want signed small integers!
 
Reply With Quote
 
lawrence.jones@siemens.com
Guest
Posts: n/a
 
      11-14-2009
Ben Bacarisse <(E-Mail Removed)> wrote:
>
> The most annoying is using the character class tests isxxxx.
> Technically, a cast is needed to be portable:
>
> char *cp = ...;
> ...
> if (isdigit((unsigned char)*cp)) ...


Which has the potential to misbehave on ones' complement machines if
*cp is -0 (you might get 0 rather than UCHAR_MAX), so it's better to
cast the pointer:

if (isdigit(*(unsigned char *)cp)) ...
--
Larry Jones

It's like SOMEthing... I just can't think of it. -- Calvin
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Gcc 3.4.X to Gcc 4.1.X upgrading kas C++ 1 04-22-2010 08:56 PM
GCC 3.4.3 and GCC 4.1.2 ashnin C++ 1 07-07-2008 01:10 PM
Template construction in old gcc 3.3.3 does not compile in gcc 3.4.4 eknecronzontas@yahoo.com C++ 5 09-17-2005 12:27 AM
gcc 2.95 and gcc 3.2 gouqizi.lvcha@gmail.com C++ 8 03-16-2005 02:34 AM
C99 structure initialization in gcc-2.95.3 vs gcc-3.3.1 Kevin P. Fleming C Programming 2 11-06-2003 05:15 AM



Advertisments