Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Reading from files and range of char and friends

Reply
Thread Tools

Reading from files and range of char and friends

 
 
Spiros Bousbouras
Guest
Posts: n/a
 
      03-10-2011
If you are reading from a file by successively calling fgetc() is there
any point in storing what you read in anything other than unsigned
char ? If you try to store it in char or signed char then it's possible
that what you read may fall outside the range of the type in which case
you get implementation defined behavior according to 6.3.1.3 p. 3. So
then why doesn't fgets() get unsigned char* as first argument ? It
would make the life of the user simpler and possibly also the life of
the implementor.

--
Pain makes believers.
Wally Jay
 
Reply With Quote
 
 
 
 
Angel
Guest
Posts: n/a
 
      03-10-2011
On 2011-03-10, Spiros Bousbouras <(E-Mail Removed)> wrote:
> If you are reading from a file by successively calling fgetc() is there
> any point in storing what you read in anything other than unsigned
> char ?


Yes, when you read EOF which is not an unsigned char.

"fgetc() reads the next character from stream and returns
it as an unsigned char cast to an int, or EOF on end of file or
error."
(From the Linux man pages.)


--
The natural state of a spammer's website is a smoking crater.
 
Reply With Quote
 
 
 
 
Spiros Bousbouras
Guest
Posts: n/a
 
      03-10-2011
On 10 Mar 2011 16:49:57 GMT
Angel <(E-Mail Removed)> wrote:
> On 2011-03-10, Spiros Bousbouras <(E-Mail Removed)> wrote:
> > If you are reading from a file by successively calling fgetc() is there
> > any point in storing what you read in anything other than unsigned
> > char ?

>
> Yes, when you read EOF which is not an unsigned char.


In my mind I was making a distinction between storing and temporarily
assigning but I guess it wasn't clear. What I had in mind was something
like:

unsigned char arr[some_size] ;
int a ;

while ( (a = fgetc(f)) != EOF) arr[position++] = a ;

Would there be any reason for arr to be something other than
unsigned char ?
 
Reply With Quote
 
Angel
Guest
Posts: n/a
 
      03-10-2011
On 2011-03-10, Spiros Bousbouras <(E-Mail Removed)> wrote:
> On 10 Mar 2011 16:49:57 GMT
> Angel <(E-Mail Removed)> wrote:
>> On 2011-03-10, Spiros Bousbouras <(E-Mail Removed)> wrote:
>> > If you are reading from a file by successively calling fgetc() is there
>> > any point in storing what you read in anything other than unsigned
>> > char ?

>>
>> Yes, when you read EOF which is not an unsigned char.

>
> In my mind I was making a distinction between storing and temporarily
> assigning but I guess it wasn't clear. What I had in mind was something
> like:
>
> unsigned char arr[some_size] ;
> int a ;
>
> while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
>
> Would there be any reason for arr to be something other than
> unsigned char ?


No, but you should use a cast there or your compiler might balk because
unsigned char is likely to have less bits than int.

fgetc() returns an int because EOF has to have a value that cannot
normally be read from a file. Once you've determined that the read value
is not EOF, it's safe to store it as an unsigned char.

And in C there is no difference between "storing" and "temporarily
assigning". Every assignment lasts until overwritten.


--
The natural state of a spammer's website is a smoking crater.
 
Reply With Quote
 
Paul N
Guest
Posts: n/a
 
      03-10-2011
On Mar 10, 5:05*pm, Spiros Bousbouras <(E-Mail Removed)> wrote:
> On 10 Mar 2011 16:49:57 GMT
>
> Angel <(E-Mail Removed)> wrote:
> > On 2011-03-10, Spiros Bousbouras <(E-Mail Removed)> wrote:
> > > If you are reading from a file by successively calling fgetc() is there
> > > any point in storing what you read in anything other than unsigned
> > > char ?

>
> > Yes, when you read EOF which is not an unsigned char.

>
> In my mind I was making a distinction between storing and temporarily
> assigning but I guess it wasn't clear. What I had in mind was something
> like:
>
> unsigned char arr[some_size] ;
> int a ;
>
> while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
>
> Would there be any reason for arr to be something other than
> unsigned char ?


char is normally used for storing characters, and I think that is what
it was designed for. So it seems a bit odd not to use it. If you're
going to use the str* functions to manipulate what you've read in,
then storing it as char seems sensible, and not doing so is likely to
require some nasty casts.

In my view anyway...

 
Reply With Quote
 
Spiros Bousbouras
Guest
Posts: n/a
 
      03-10-2011
On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
Paul N <(E-Mail Removed)> wrote:
> On Mar 10, 5:05 pm, Spiros Bousbouras <(E-Mail Removed)> wrote:
> > On 10 Mar 2011 16:49:57 GMT
> >
> > Angel <(E-Mail Removed)> wrote:
> > > On 2011-03-10, Spiros Bousbouras <(E-Mail Removed)> wrote:
> > > > If you are reading from a file by successively calling fgetc() is there
> > > > any point in storing what you read in anything other than unsigned
> > > > char ?

> >
> > > Yes, when you read EOF which is not an unsigned char.

> >
> > In my mind I was making a distinction between storing and temporarily
> > assigning but I guess it wasn't clear. What I had in mind was something
> > like:
> >
> > unsigned char arr[some_size] ;
> > int a ;
> >
> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
> >
> > Would there be any reason for arr to be something other than
> > unsigned char ?

>
> char is normally used for storing characters, and I think that is what
> it was designed for. So it seems a bit odd not to use it.


But if arr[] is char how do you avoid the implementation defined
behavior when doing arr[position++] = a ?
 
Reply With Quote
 
Angel
Guest
Posts: n/a
 
      03-10-2011
On 2011-03-10, Spiros Bousbouras <(E-Mail Removed)> wrote:
> On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
> Paul N <(E-Mail Removed)> wrote:
>> >
>> > In my mind I was making a distinction between storing and temporarily
>> > assigning but I guess it wasn't clear. What I had in mind was something
>> > like:
>> >
>> > unsigned char arr[some_size] ;
>> > int a ;
>> >
>> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
>> >
>> > Would there be any reason for arr to be something other than
>> > unsigned char ?

>>
>> char is normally used for storing characters, and I think that is what
>> it was designed for. So it seems a bit odd not to use it.

>
> But if arr[] is char how do you avoid the implementation defined
> behavior when doing arr[position++] = a ?


Depends on what exactly you are reading. If it's a normal text file
encoded in ASCII, converting the values read by fgetc() should be safe
because ASCII values are only 7 bits and will fit into a char.

If it's a binary file though, you'll have to use unsigned char, and
you should consider using fread instead.


--
The natural state of a spammer's website is a smoking crater.
 
Reply With Quote
 
Spiros Bousbouras
Guest
Posts: n/a
 
      03-10-2011
On 10 Mar 2011 22:49:52 GMT
Angel <(E-Mail Removed)> wrote:
> On 2011-03-10, Spiros Bousbouras <(E-Mail Removed)> wrote:
> > On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
> > Paul N <(E-Mail Removed)> wrote:
> >> >
> >> > In my mind I was making a distinction between storing and temporarily
> >> > assigning but I guess it wasn't clear. What I had in mind was something
> >> > like:
> >> >
> >> > unsigned char arr[some_size] ;
> >> > int a ;
> >> >
> >> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
> >> >
> >> > Would there be any reason for arr to be something other than
> >> > unsigned char ?
> >>
> >> char is normally used for storing characters, and I think that is what
> >> it was designed for. So it seems a bit odd not to use it.

> >
> > But if arr[] is char how do you avoid the implementation defined
> > behavior when doing arr[position++] = a ?

>
> Depends on what exactly you are reading. If it's a normal text file
> encoded in ASCII, converting the values read by fgetc() should be safe
> because ASCII values are only 7 bits and will fit into a char.
>
> If it's a binary file though, you'll have to use unsigned char, and
> you should consider using fread instead.


And what if it's a non ASCII text file ? It could be ISO-8859-1 or
UTF-8. An extra complication is that you may have to read some of the
file in order to determine what kind of information it contains.
 
Reply With Quote
 
Angel
Guest
Posts: n/a
 
      03-10-2011
On 2011-03-10, Spiros Bousbouras <(E-Mail Removed)> wrote:
> On 10 Mar 2011 22:49:52 GMT
> Angel <(E-Mail Removed)> wrote:
>> On 2011-03-10, Spiros Bousbouras <(E-Mail Removed)> wrote:
>> > On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
>> > Paul N <(E-Mail Removed)> wrote:
>> >> >
>> >> > In my mind I was making a distinction between storing and temporarily
>> >> > assigning but I guess it wasn't clear. What I had in mind was something
>> >> > like:
>> >> >
>> >> > unsigned char arr[some_size] ;
>> >> > int a ;
>> >> >
>> >> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
>> >> >
>> >> > Would there be any reason for arr to be something other than
>> >> > unsigned char ?
>> >>
>> >> char is normally used for storing characters, and I think that is what
>> >> it was designed for. So it seems a bit odd not to use it.
>> >
>> > But if arr[] is char how do you avoid the implementation defined
>> > behavior when doing arr[position++] = a ?

>>
>> Depends on what exactly you are reading. If it's a normal text file
>> encoded in ASCII, converting the values read by fgetc() should be safe
>> because ASCII values are only 7 bits and will fit into a char.
>>
>> If it's a binary file though, you'll have to use unsigned char, and
>> you should consider using fread instead.

>
> And what if it's a non ASCII text file ? It could be ISO-8859-1 or
> UTF-8. An extra complication is that you may have to read some of the
> file in order to determine what kind of information it contains.


fgetc() is guaranteed to return either an unsigned char or EOF, so that
always works. Interpreting the read data is up to your program and will
depend on what exactly you are trying to accomplish.

UTF-8, as the name implies, is 8 bits wide and will fit in an unsigned
char (it will fit in a signed char too, but values >127 will be
converted to negative values), and so does ISO-8859-1. For character
encodings with more bits, there is fgetwc().


--
The natural state of a spammer's website is a smoking crater.
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      03-10-2011
Spiros Bousbouras <(E-Mail Removed)> writes:
> On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
> Paul N <(E-Mail Removed)> wrote:
>> On Mar 10, 5:05 pm, Spiros Bousbouras <(E-Mail Removed)> wrote:
>> > On 10 Mar 2011 16:49:57 GMT
>> >
>> > Angel <(E-Mail Removed)> wrote:
>> > > On 2011-03-10, Spiros Bousbouras <(E-Mail Removed)> wrote:
>> > > > If you are reading from a file by successively calling fgetc() is there
>> > > > any point in storing what you read in anything other than unsigned
>> > > > char ?
>> >
>> > > Yes, when you read EOF which is not an unsigned char.
>> >
>> > In my mind I was making a distinction between storing and temporarily
>> > assigning but I guess it wasn't clear. What I had in mind was something
>> > like:
>> >
>> > unsigned char arr[some_size] ;
>> > int a ;
>> >
>> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
>> >
>> > Would there be any reason for arr to be something other than
>> > unsigned char ?

>>
>> char is normally used for storing characters, and I think that is what
>> it was designed for. So it seems a bit odd not to use it.

>
> But if arr[] is char how do you avoid the implementation defined
> behavior when doing arr[position++] = a ?


Typically by ignoring the issue. (Well, this doesn't avoid
the implementation defined behavior; it just assumes it's
ok.) On any system where this is a sensible thing to do, the
implementation-defined behavior is almost certain to be what you
want. Assigning a value exceeding CHAR_MAX to a char (assuming
plain char is signed) *could* give you a strange result, or even
raise an implementation-defined signal, but any implementation that
chose to do such a thing would break a lot of existing code.

C uses plain char (which may be signed) for strings, but it reads
characters from files as unsigned char values. IMHO this is a flaw
in the language. A byte read from a file with a representation
of 10101001 (0xa9) is far more likely to mean 169 than -87 (it's
a copyright symbol in Latin-1, 'z' in EBCDIC).

One solution might be to require plain char to be unsigned, but that
causes inefficient code for some operations -- which was more of
issue in the PDP-11 days than it is now, but it's probably still
significant.

Another might be to have fgetc() return an int representing either
a *plain* char value or EOF, but it's too late to change that.

I'm usually a strong advocate for writing code as portably as possible,
but in this case I suspect that workaround around the unsigned char vs.
plain char mismatch would be more effort than it's worth.

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
MEET UR SCHOOL & COLLEGE FRIENDS. UR FRIENDS ARE WAITING FOR U.. sai.sri206@gmail.com C++ 0 10-28-2007 08:43 PM
Friends don't let friends drink and fly through space =?ISO-8859-1?Q?R=F4g=EAr?= Computer Support 6 07-29-2007 03:52 AM
(const char *cp) and (char *p) are consistent type, (const char **cpp) and (char **pp) are not consistent lovecreatesbeauty C Programming 1 05-09-2006 08:01 AM
/usr/bin/ld: ../../dist/lib/libjsdombase_s.a(BlockGrouper.o)(.text+0x98): unresolvable relocation against symbol `std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostre silverburgh.meryl@gmail.com C++ 3 03-09-2006 12:14 AM
member functions as friends - friends of each other? bipod.rafique@gmail.com C++ 2 07-16-2005 10:55 AM



Advertisments