Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > 3-byte ints

Reply
Thread Tools

3-byte ints

 
 
Keith Thompson
Guest
Posts: n/a
 
      09-27-2003
Jack Klein <(E-Mail Removed)> writes:
[...]
> These are pretty much all free-standing environments, it is not really
> possible to provide all the features of a hosted environment on a
> platform where char and int have the same representation. It is
> impossible to provide a getchar() function which complies with the
> standard, namely that it returns all possible values of char and also
> EOF, which is an int different from any possible char value.


I don't see where the standard requires that EOF has to be different
from any possible char value.

If EOF is a valid char value, you could just check the feof()
function. For example, the following program should copy stdin to
stdout on such an implementation:

#include <stdio.h>
int main(void)
{
int c;
while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {
putchar(c);
}
return 0;
}

The comparison to EOF could be omitted, but it might save the overhead
of some function calls.

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
 
Reply With Quote
 
 
 
 
Barry Schwarz
Guest
Posts: n/a
 
      09-27-2003
On Fri, 26 Sep 2003 09:59:59 GMT, Kevin Easton
<(E-Mail Removed)> wrote:

>Jack Klein <(E-Mail Removed)> wrote:
>[...]
>> Mind you, you won't find these sort of architectures anywhere else but
>> on DSPs anymore, but a lot of DSP programming is being done in C and
>> even C++ these days.
>>
>> These are pretty much all free-standing environments, it is not really
>> possible to provide all the features of a hosted environment on a
>> platform where char and int have the same representation. It is
>> impossible to provide a getchar() function which complies with the
>> standard, namely that it returns all possible values of char and also
>> EOF, which is an int different from any possible char value.

>
>(It's actually an unsigned char converted to int, not plain char).
>
>However, are you sure it has to be able to return all possible unsigned
>chars? Isn't it possible for unsigned char to have 65536 possible
>values, but there be only, say, 140 distinct _characters_ which the
>string, input and output functions deal with? Does every possible
>unsigned char value have to represent a character?
>

Obviously not since in the ASCII character set, values between 0x00
and 0x1f don't.


<<Remove the del for email>>
 
Reply With Quote
 
 
 
 
Kevin Easton
Guest
Posts: n/a
 
      09-27-2003
Keith Thompson <(E-Mail Removed)> wrote:
> Jack Klein <(E-Mail Removed)> writes:
> [...]
>> These are pretty much all free-standing environments, it is not really
>> possible to provide all the features of a hosted environment on a
>> platform where char and int have the same representation. It is
>> impossible to provide a getchar() function which complies with the
>> standard, namely that it returns all possible values of char and also
>> EOF, which is an int different from any possible char value.

>
> I don't see where the standard requires that EOF has to be different
> from any possible char value.
>
> If EOF is a valid char value, you could just check the feof()
> function. For example, the following program should copy stdin to
> stdout on such an implementation:
>
> #include <stdio.h>
> int main(void)
> {
> int c;
> while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {


ITYM

c != EOF || (!feof(stdin) && !ferror(stdin))

The real problem seems to be that getchar() is supposed to return an int
with a value in the range of unsigned char, or EOF. Returning any
negative non-EOF value is clearly out (not in the range of unsigned
char), so it'd have to map any characters with values between INT_MAX + 1
and UCHAR_MAX to some value between 0 and INT_MAX inclusive, which are
all already taken by other character values.

So it's implementable, but only in a way that loses information about
which character was actually read. Not really what you'd call
a practical way to write an input function.

- kevin.

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      09-27-2003
Kevin Easton <(E-Mail Removed)> writes:
> Keith Thompson <(E-Mail Removed)> wrote:

[...]
> > #include <stdio.h>
> > int main(void)
> > {
> > int c;
> > while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {

>
> ITYM
>
> c != EOF || (!feof(stdin) && !ferror(stdin))


Right.

> The real problem seems to be that getchar() is supposed to return an int
> with a value in the range of unsigned char, or EOF. Returning any
> negative non-EOF value is clearly out (not in the range of unsigned
> char), so it'd have to map any characters with values between INT_MAX + 1
> and UCHAR_MAX to some value between 0 and INT_MAX inclusive, which are
> all already taken by other character values.
>
> So it's implementable, but only in a way that loses information about
> which character was actually read. Not really what you'd call
> a practical way to write an input function.


What's wrong with getchar() returning a negative non-EOF value?

getchar() is equivalent to getc() with the argument stdin; getc() is
equivalent to fgetc(), except that if it's a macro it can evaluate its
argument more than once.

The description of fgetc() says:

If the end-of-file indicator for the input stream pointed to by
stream is not set and a next character is present, the fgetc
function obtains that character as an unsigned char converted to
an int and advances the associated file position indicator for the
stream (if defined).

Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
has the value, say, 60000, it's converted to the int value -5536 and
returned.

Having sizeof(int)==1 breaks the common "while ((c=getchar()) != EOF)"
idiom, but I don't see that it breaks anything else -- which argues
that the common idiom is non-portable.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
 
Reply With Quote
 
Peter Nilsson
Guest
Posts: n/a
 
      09-27-2003
"Keith Thompson" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Kevin Easton <(E-Mail Removed)> writes:
> > Keith Thompson <(E-Mail Removed)> wrote:

> [...]
> > > #include <stdio.h>
> > > int main(void)
> > > {
> > > int c;
> > > while (c = getchar(), c != EOF && !feof(stdin) &&

!ferror(stdin)) {
> >
> > ITYM
> >
> > c != EOF || (!feof(stdin) && !ferror(stdin))

>
> Right.
>
> > The real problem seems to be that getchar() is supposed to return an int
> > with a value in the range of unsigned char, or EOF. Returning any
> > negative non-EOF value is clearly out (not in the range of unsigned
> > char), so it'd have to map any characters with values between INT_MAX +

1
> > and UCHAR_MAX to some value between 0 and INT_MAX inclusive, which are
> > all already taken by other character values.
> >
> > So it's implementable, but only in a way that loses information about
> > which character was actually read. Not really what you'd call
> > a practical way to write an input function.

>
> What's wrong with getchar() returning a negative non-EOF value?
>
> getchar() is equivalent to getc() with the argument stdin; getc() is
> equivalent to fgetc(), except that if it's a macro it can evaluate its
> argument more than once.
>
> The description of fgetc() says:
>
> If the end-of-file indicator for the input stream pointed to by
> stream is not set and a next character is present, the fgetc
> function obtains that character as an unsigned char converted to
> an int and advances the associated file position indicator for the
> stream (if defined).
>
> Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
> has the value, say, 60000, it's converted to the int value -5536 and
> returned.
>
> Having sizeof(int)==1 breaks the common "while ((c=getchar()) != EOF)"
> idiom, but I don't see that it breaks anything else -- which argues
> that the common idiom is non-portable.


And it always has been. [Under C99 it is even worse since the conversion of
an unsigned char to signed int can theoretically raise an implementation
defined signal! Thus reducing getc to the level of gets.]

The unwritten assumption about hosted implementations is naturally that
UCHAR_MAX <= INT_MAX. Why the standards never made this normative seems a
mystery to lesser minds like my own.

--
Peter


 
Reply With Quote
 
Barry Schwarz
Guest
Posts: n/a
 
      09-27-2003
On Sat, 27 Sep 2003 01:56:02 GMT, Keith Thompson <(E-Mail Removed)> wrote:

>Jack Klein <(E-Mail Removed)> writes:
>[...]
>> These are pretty much all free-standing environments, it is not really
>> possible to provide all the features of a hosted environment on a
>> platform where char and int have the same representation. It is
>> impossible to provide a getchar() function which complies with the
>> standard, namely that it returns all possible values of char and also
>> EOF, which is an int different from any possible char value.

>
>I don't see where the standard requires that EOF has to be different
>from any possible char value.


EOF must have type int and be negative. On those systems where char
is unsigned, it obviously cannot be a char value.

It could be a valid char on a system where char is signed. But, as
explained below, none of the normal character I/O functions can return
any negative value other than for end of file or I/O error

>
>If EOF is a valid char value, you could just check the feof()
>function. For example, the following program should copy stdin to
>stdout on such an implementation:
>
> #include <stdio.h>
> int main(void)
> {
> int c;
> while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {


Coding problem here:

If c == EOF, then the remaining to expressions following the
first && will never be evaluated due to && short circuit. The while
will evaluate to false and the loop terminated immediately, regardless
of the status of feof and ferror. Consequently, you don't know if you
have hit the real EOF or the merely a character that looks like it.

If c != EOF, you are pretty much guaranteed that !feof() and
!ferror will both be true also.

Therefore, the expression c != EOF defeats the purpose of what
you want the expression after the comma to do.

Logic problem also:

getchar "returns the next character of [stdin] as an unsigned
char (converted to an int), or an EOF if end of file or error occurs"
(from K&R2, B1.4). Since an unsigned int cannot be negative and EOF
has to be, getchar cannot return EOF for a normal character.

> putchar(c);
> }
> return 0;
> }
>
>The comparison to EOF could be omitted, but it might save the overhead
>of some function calls.



<<Remove the del for email>>
 
Reply With Quote
 
Kevin Easton
Guest
Posts: n/a
 
      09-27-2003
Keith Thompson <(E-Mail Removed)> wrote:
> Kevin Easton <(E-Mail Removed)> writes:
>> Keith Thompson <(E-Mail Removed)> wrote:

> [...]
>> > #include <stdio.h>
>> > int main(void)
>> > {
>> > int c;
>> > while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {

>>
>> ITYM
>>
>> c != EOF || (!feof(stdin) && !ferror(stdin))

>
> Right.
>
>> The real problem seems to be that getchar() is supposed to return an int
>> with a value in the range of unsigned char, or EOF. Returning any
>> negative non-EOF value is clearly out (not in the range of unsigned
>> char), so it'd have to map any characters with values between INT_MAX + 1
>> and UCHAR_MAX to some value between 0 and INT_MAX inclusive, which are
>> all already taken by other character values.
>>
>> So it's implementable, but only in a way that loses information about
>> which character was actually read. Not really what you'd call
>> a practical way to write an input function.

>
> What's wrong with getchar() returning a negative non-EOF value?
>
> getchar() is equivalent to getc() with the argument stdin; getc() is
> equivalent to fgetc(), except that if it's a macro it can evaluate its
> argument more than once.
>
> The description of fgetc() says:
>
> If the end-of-file indicator for the input stream pointed to by
> stream is not set and a next character is present, the fgetc
> function obtains that character as an unsigned char converted to
> an int and advances the associated file position indicator for the
> stream (if defined).


OK, you're right - it just has to be converted to an int.

> Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
> has the value, say, 60000, it's converted to the int value -5536 and
> returned.


....but the conversion to int that takes place is in no way defined (it
just says "as an unsigned char converted to int") - so you don't know
how it'll be converted. It doesn't say it has to be a reversible
conversion, or even a stable one.

Perhaps you could read the requirement that anything written to a binary
stream will compare equal to the original value when it's read back as
meaning that the unsigned char / int conversions mentioned in the
character reading and writing functions have to be stable, reversible
and the inverse of each other.

You still break ungetc() if a valid character maps to EOF, since you
couldn't ungetc that character:

4 If the value of c equals that of the macro EOF, the operation fails
and the input stream is unchanged.

- Kevin.
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      09-27-2003
Kevin Easton <(E-Mail Removed)> writes:
> Keith Thompson <(E-Mail Removed)> wrote:

[...]
> > Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
> > has the value, say, 60000, it's converted to the int value -5536 and
> > returned.

>
> ...but the conversion to int that takes place is in no way defined (it
> just says "as an unsigned char converted to int") - so you don't know
> how it'll be converted. It doesn't say it has to be a reversible
> conversion, or even a stable one.


Thank you, that's the point I was missing. I had assumed (because I
didn't bother to check) that the conversion from unsigned char to int
was well-defined.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
 
Reply With Quote
 
Dave Thompson
Guest
Posts: n/a
 
      09-29-2003
On Tue, 23 Sep 2003 15:56:31 +0100, Kevin Bracey
<(E-Mail Removed)> wrote:

> In message <bkpkq1$(E-Mail Removed)>
> Ed Morton <(E-Mail Removed)> wrote:
>
> > I have 2 counters - one is required to be a 2-byte variable while the
> > other is required to be 3 bytes (not my choice, but I'm stuck with it!).
> > I've declared them as:
> >
> > unsigned short small;
> > unsigned long large: 24;
> >

(Within a struct, shown later.)

> > First question - is that the best way to declare the "large" one to
> > ensure it's 3 bytes?

>
> Pretty much, assuming it's in a structure. The only things I'd say are:
>

It will do exactly 24-bit arithmetic, which is 3 bytes IF a byte is 8
bits, as is very common but not required. It, or rather the
"allocation unit" containing it, is very likely to occupy 32 bits or 4
usual-bytes/octets. This difference matters only if you write out
the/a containing struct to a file or over a network etc., since you
can't form (or use) a pointer to a bitfield member; or if you (need
to) care about the actual memory/bus accesses performed by the
compiled (object) form of your code when executed.

> 1) C90 doesn't allow anything other than "int" and "unsigned int" for
> bitfield types. C99 does allow implementations to offer other types
> like "unsigned long"; presumably your implementation does - it's
> a common extension.
>

(explicitly) signed int, unsigned int, or "plain" int which unlike
non-char integer types elsewhere is not automatically signed, it is
implementation-defined as signed or unsigned. And C99 also standardly
allows _Bool (or bool with stdbool.h).

<snip>
> > It's complaining about the "cntr" argument in the line:
> >
> > printf("%lu -> %lu\n",_tmp,(unsigned long)(cntr));
> >
> > Third question - why is the compiler apparently ignoring my cast and
> > complaining that "(unsigned long)(cntr)" is an unsigned int?

>

Plus _tmp already had type unsigned long.

> Because it's buggy? Your code looks fine to me.
>

Unless perhaps the OP (or someone) did <GACK!> #define long int </>
since you are using gcc, check the preprocessor output with -E .


- David.Thompson1 at worldnet.att.net
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
hashCode for 4 ints? Jeff Java 5 06-22-2005 03:00 AM
Iterator Question for map of ints to set of ints uclamathguy@gmail.com C++ 3 04-03-2005 03:26 AM
ints ints ints and ints Skybuck Flying C Programming 24 07-10-2004 04:48 AM
Java2D: not possible to set 32bpp ints? Timo Nentwig Java 6 05-08-2004 12:22 AM
Why constant ints in switch case expressions? Brian J. Sayatovic Java 22 07-09-2003 09:39 PM



Advertisments