Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Reading from files and range of char and friends

Reply
Thread Tools

Reading from files and range of char and friends

 
 
Spiros Bousbouras
Guest
Posts: n/a
 
      03-11-2011
On Fri, 11 Mar 2011 11:53:57 -0800
Keith Thompson <kst-> wrote:
> Spiros Bousbouras <> writes:
> > On Thu, 10 Mar 2011 20:37:09 -0500
> > Eric Sosman <> wrote:
> >> On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
> >> > If you are reading from a file by successively calling fgetc() is there
> >> > any point in storing what you read in anything other than unsigned
> >> > char ?
> >>
> >> Sure. To see one reason in action, try
> >>
> >> unsigned char uchar_password[SIZE];
> >> ...
> >> if (strcmp(uchar_password, "SuperSecret") == 0) ...

> >
> > Just to be clear , the only thing that can go wrong with this example
> > is that strcmp() may try to convert the elements of uchar_password to
> > char thereby causing the implementation defined behavior. The same
> > issue could arise with any other str* function. Or is there something
> > specific about your example that I'm missing ?

>
> The call to strcmp() violates a constraint. strcmp() expects const
> char* (a non-const char* is also ok), but uchar_password, after
> the implicit conversion is of type unsigned char*. Types char*
> and unsigned char* are not compatible, and there is no implicit
> conversion from one to the other.


I see. I assumed that the implicit conversion would be ok because
paragraph 27 of 6.2.5 says "A pointer to void shall have the same
representation and alignment requirements as a pointer to a character
type.39)" and footnote 39 says "The same representation and alignment
requirements are meant to imply interchangeability as arguments to
functions, return values from functions, and members of unions." I
assumed that the relation "same representation and alignment
requirements" is transitive.

On the other hand footnote 35 of paragraph 15 says that char is not
compatible with signed or unsigned char and in 6.7.5.1 we read that
pointers to types are compatible only if the types are compatible. We
must conclude then that the relation "same representation and alignment
requirements" is not transitive. That's a damn poor choice of
terminology then.

> If you use an explicit cast, it will *probably* work as expected,
> but without the case the compiler is permitted to reject i.t


> > If getc() read int's from files instead of unsigned char's would it be
> > realistically possible that reading from a file would return a negative
> > zero ? That would be one strange file.

>
> What would be so strange about it? If a file contains a sequence of
> ints, stored as binary, and the implementation has a distinct
> representation for negative zero, then the file could certainly contain
> negative zeros.


Ok , I guess it could happen. But then I have a different objection. Eric said

(The situation is particularly bad for systems with
signed-magnitude or ones' complement notations, where the
sign of zero is obliterated on conversion to unsigned char
and thus cannot be recovered again after getc().)

It seems to me that an implementation can easily ensure that the sign
of zero does not get obliterated. If by using fgetc() an unsigned char
gets the bit pattern which corresponds to negative zero then the
implementation can assign the negative zero when converting to int .
The standard allows this.

--
Metadiscussion is evil !
 
Reply With Quote
 
 
 
 
lawrence.jones@siemens.com
Guest
Posts: n/a
 
      03-11-2011
Tim Rentsch <> wrote:
>
> A call to getc() cannot return negative zero. The reason is,
> getc() is defined in terms of fgetc(), which returns an
> 'unsigned char' converted to an 'int', and such conversions
> cannot produce negative zeros.


They can if char and int are the same size.
--
Larry Jones

I always send Grandma a thank-you note right away. ...Ever since she
sent me that empty box with the sarcastic note saying she was just
checking to see if the Postal Service was still working. -- Calvin
 
Reply With Quote
 
 
 
 
Spiros Bousbouras
Guest
Posts: n/a
 
      03-11-2011
On Fri, 11 Mar 2011 13:08:00 -0800
Tim Rentsch <> wrote:
> Spiros Bousbouras <> writes:
>
> > If getc() read int's from files instead of unsigned char's would it be
> > realistically possible that reading from a file would return a negative
> > zero ?

>
> A call to getc() cannot return negative zero. The reason is,
> getc() is defined in terms of fgetc(), which returns an
> 'unsigned char' converted to an 'int', and such conversions
> cannot produce negative zeros.


When I said "getc() read int's from files" I meant that also fgetc()
reads int's from files i.e. we're talking about an alternative C where
we don't have the intermediate unsigned char step.

Apart from that , in post

<>
http://groups.google.com/group/comp....1?dmode=source

you say

Do you mean to say that if a file has a byte with a bit
pattern corresponding to a 'char' negative-zero, and
that byte is read (in binary mode) with getc(), the
result of getc() will be zero? If that's what you're
saying I believe that is wrong.

Assuming actual C (i.e. not the alternative C from above) is it not
possible in the scenario you're describing that int will get negative
zero ?

--
Metadiscussion is evil !
 
Reply With Quote
 
Spiros Bousbouras
Guest
Posts: n/a
 
      03-11-2011
On Fri, 11 Mar 2011 20:44:02 GMT
Spiros Bousbouras <> wrote:
> On Thu, 10 Mar 2011 15:37:38 -0800
> Keith Thompson <kst-> wrote:
> > One solution might be to require plain char to be unsigned, but that
> > causes inefficient code for some operations -- which was more of
> > issue in the PDP-11 days than it is now, but it's probably still
> > significant.
> >
> > Another might be to have fgetc() return an int representing either
> > a *plain* char value or EOF, but it's too late to change that.

>
> The standard could say that if an implementation offers stdio.h then
> the following function
>
> int foo(unsigned char a) {
> char b = a ;
> unsigned char c = b ;
> return a == c ;
> }
>
> always returns 1. This I think would be sufficient to be able to assign
> the return value of fgetc() to char (after checking for EOF) without
> worries. But does it leave any existing implementations out ? And while
> I'm at it , how do existing implementations handle conversion to a
> signed integer type if the value doesn't fit ? Anyone has any unusual
> examples ?
>
> Another approach would be to have a macro __WBUC2CA (well behaved
> unsigned char to char assignment) which will have the value 1 or 0 and
> if it has the value 1 then foo() above will be guaranteed to return 1.


A better name would be __WBUC2CC for well behaved unsigned char to char
conversion.
 
Reply With Quote
 
Spiros Bousbouras
Guest
Posts: n/a
 
      03-11-2011
On 10 Mar 2011 20:36:11 GMT
Angel <angel+> wrote:
> On 2011-03-10, Spiros Bousbouras <> wrote:
> > assigning but I guess it wasn't clear. What I had in mind was something
> > like:
> >
> > unsigned char arr[some_size] ;
> > int a ;
> >
> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
> >
> > Would there be any reason for arr to be something other than
> > unsigned char ?

>
> No, but you should use a cast there or your compiler might balk because
> unsigned char is likely to have less bits than int.


A cast wouldn't buy you anything in this case because according to
paragraph 2 of 6.5.16.1 a conversion will happen anyway.
 
Reply With Quote
 
Spiros Bousbouras
Guest
Posts: n/a
 
      03-12-2011
On Fri, 11 Mar 2011 18:35:10 -0500
Joe Wright <> wrote:

> Pardon me for jumping in so late. I got interested when someone earlier
> thought to store the EOF character. Of course the EOF is a status and need
> not be stored.


I don't recall anyone in the thread saying that.

> The return type of fgetc() is int so as to allow the full 0..255 value of a
> byte AND a value EOF.


A byte in C can have values greater than 255 depending on the
implementation.

> When you assign int to char, the char takes the lower
> eight bits of the int without change.


Where do you get this from ? In the OP I mentioned paragraph 3 of
6.3.1.3 .Here's what it says:

Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.

And you do realise that a char is permitted to have more than 8 bits ,
yes ?

> Try this:
>
> #include <stdio.h>
> int main(void) {
> char c;
> unsigned char u;
> int i = 240;
> c = i;
> u = c;
> printf("%d, %d, %d\n", i, c, u);
> return 0;
> }
>
> I get: 240, -16, 240 as I expected.


That is one data point among the hundreds or thousands of C
implementations. Even if a char always had 8 bits and even if the
assignment int to char was guaranteed to copy the lower 8 bits , the
middle number could still be -112 if the implementation uses "sign and
magnitude" to represent negative numbers.

> The value of fgetc() being int and being assigned to char is not a problem
> and not a 'defect' of the language.


If only.

--
A recent statistic has showed the every 10 minutes
someone somewhere is insulting Seamus MacRae.
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      03-12-2011
Spiros Bousbouras <> writes:
> On Fri, 11 Mar 2011 13:08:00 -0800
> Tim Rentsch <> wrote:
>> Spiros Bousbouras <> writes:
>>
>> > If getc() read int's from files instead of unsigned char's would it be
>> > realistically possible that reading from a file would return a negative
>> > zero ?

>>
>> A call to getc() cannot return negative zero. The reason is,
>> getc() is defined in terms of fgetc(), which returns an
>> 'unsigned char' converted to an 'int', and such conversions
>> cannot produce negative zeros.

>
> When I said "getc() read int's from files" I meant that also fgetc()
> reads int's from files i.e. we're talking about an alternative C where
> we don't have the intermediate unsigned char step.


I'm afraid I'm not following you here.

I initially assumed you meant getc and fgetc would be reading
int-sized chunks from the file, rather than (as C currently
specifies) reading bytes, interpreting them as unsigned char,
and converting that to int.

Without the intermediate step, how is the int value determined?

Perhaps you mean getc and fgetc read a byte from the file, interpret
is as *plain* char, and then convert the result to int.

If so, and if plain char is signed and has a distinct representation
for negative zero (this excludes 2's-complement systems), then
could getc() return a negative zero?

I'd say no. Converting a negative zero from char to int does not
yield a negative zero int; 6.2.6.2p3 specifies the operations that
might generate a negative zero, and conversions aren't in the list.

Which means that getc() and fgetc() would be unable to distinguish
between a positive and negative zero in a byte read from a file.
Which is probably part of the reason why the standard specifies
that the value is treated as an unsigned char.

Or the standard could have said specifically that getc and fgetc do
return a negative zero in these cases, but dealing with that in code
would be nasty (and, since most current systems don't have negative
zeros, most programmers wouldn't bother).

(As I've said before, requiring plain char to be unsigned would
avoid a lot of this confusion, but might have other bad effects.)

--
Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      03-12-2011
Joe Wright <> writes:
> On 3/11/2011 16:29, Tim Rentsch wrote:
>> Angel<angel+> writes:
>>
>>> [snip]
>>>
>>> UTF-8, as the name implies, is 8 bits wide and will fit in an unsigned
>>> char (it will fit in a signed char too,

>>
>> It will on most implementations but the Standard does not
>> require that.
>>
>>> but values>127 will be converted to negative values),

>>
>> Again true on most implementations but not Standard-guaranteed.

>
> I must be missing your point. What does UTF-8 have to do with the Standard?


Somebody upthread suggested that the plain char vs. unsigned char
mismatch isn't a problem, because ASCII characters are all in the
range 0-127. UTF-8 is one example of a character encoding where
bytes in a text file can have values exceeding 127. (Latin-1 and
EBCDIC are other examples.)

--
Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      03-12-2011
On 3/11/2011 2:39 PM, Spiros Bousbouras wrote:
> On Thu, 10 Mar 2011 20:37:09 -0500
> Eric Sosman<> wrote:
>> On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
>>> If you are reading from a file by successively calling fgetc() is there
>>> any point in storing what you read in anything other than unsigned
>>> char ?

>>
>> Sure. To see one reason in action, try
>>
>> unsigned char uchar_password[SIZE];
>> ...
>> if (strcmp(uchar_password, "SuperSecret") == 0) ...

>
> Just to be clear , the only thing that can go wrong with this example
> is that strcmp() may try to convert the elements of uchar_password to
> char thereby causing the implementation defined behavior.


True: After issuing the required diagnostic, the implementation
may accept the faulty translation unit anyhow, and may assign it any
meaning it's inclined to, and that meaning may be implementation-
defined.

Alternatively, the implementation may issue the diagnostic and
spit the sorry source back in your face.

> The same
> issue could arise with any other str* function. Or is there something
> specific about your example that I'm missing ?


The required diagnostic, I think. 6.5.2.2p2, plus 6.3.2.3's
omission of any description of the necessary conversion.

>> Yes. This is, IMHO, a weakness in the library design, a weakness
>> inherited from the pre-Standard days that also gave us gets(). The
>> practical consequence is that the implementation must define the
>> behavior "usefully" in order to make the library work as desired.
>> (The situation is particularly bad for systems with signed-magnitude
>> or ones' complement notations, where the sign of zero is obliterated
>> on conversion to unsigned char and thus cannot be recovered again
>> after getc().)

>
> If getc() read int's from files instead of unsigned char's would it be
> realistically possible that reading from a file would return a negative
> zero ? That would be one strange file.


One strange text file, yes. But not so strange for a binary
file, where any bit pattern at all might appear. If a char that looks
like minus zero appears somewhere in the middle of a double, and you
fwrite() that double to a binary stream, the underlying fputc() calls
(a direct requirement; not even an "as if") convert each byte in turn
from unsigned char to int. I think the conversion allows the bits to
be diddled irreversibly -- although on reconsideration it may happen
only when sizeof(int)==1 as well.

>> In-band signaling works well in some situations -- NULL for a
>> failed malloc() or strchr() or getenv(), for example -- but C has
>> used it in situations where the benefits are not so clear. getc()
>> is one of those, strtoxxx() is another, and no doubt there are other
>> situations where the "error return" can be confused with a perfectly
>> valid value.

>
> I don't see how this can happen with getc().


When sizeof(int)==1, there will exist a perfectly valid unsigned
char value whose conversion to int yields EOF. (Or else there will
exist two or more distinct unsigned char values that convert to the
same int value, which is even worse and violates 7.19.2p3.) So
checking the value of getc() against EOF isn't quite enough: Having
found EOF, you also need to call feof() and ferror() before concluding
that it's "condition" rather than "data." More information is being
forced through the return-value channel than the unaided channel
can accommodate.

--
Eric Sosman
d
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      03-12-2011
On 3/11/2011 4:55 PM, Spiros Bousbouras wrote:
> [...]
> Ok , I guess it could happen. But then I have a different objection. Eric said
>
> (The situation is particularly bad for systems with
> signed-magnitude or ones' complement notations, where the
> sign of zero is obliterated on conversion to unsigned char
> and thus cannot be recovered again after getc().)
>
> It seems to me that an implementation can easily ensure that the sign
> of zero does not get obliterated. If by using fgetc() an unsigned char
> gets the bit pattern which corresponds to negative zero then the
> implementation can assign the negative zero when converting to int .
> The standard allows this.


Could you indicate where? I'm looking at 6.2.6.2p3, which lists
the operations that can generate a minus zero, and does not list
"conversion" among them.

--
Eric Sosman
d
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
MEET UR SCHOOL & COLLEGE FRIENDS. UR FRIENDS ARE WAITING FOR U.. sai.sri206@gmail.com C++ 0 10-28-2007 08:43 PM
Friends don't let friends drink and fly through space =?ISO-8859-1?Q?R=F4g=EAr?= Computer Support 6 07-29-2007 03:52 AM
(const char *cp) and (char *p) are consistent type, (const char **cpp) and (char **pp) are not consistent lovecreatesbeauty C Programming 1 05-09-2006 08:01 AM
/usr/bin/ld: ../../dist/lib/libjsdombase_s.a(BlockGrouper.o)(.text+0x98): unresolvable relocation against symbol `std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostre silverburgh.meryl@gmail.com C++ 3 03-09-2006 12:14 AM
member functions as friends - friends of each other? bipod.rafique@gmail.com C++ 2 07-16-2005 10:55 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57