Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > fgetc - end of line - where is 0xD?

Reply
Thread Tools

fgetc - end of line - where is 0xD?

 
 
Keith Thompson
Guest
Posts: n/a
 
      12-07-2008
Sri Harsha Dandibhotla <(E-Mail Removed)> writes:
[...]
> He meant to test for -1 and not '-1'.
> Though, he should rather test for EOF instead.
>
> I have read that EOF doesn't always have the value of -1. Can someone
> please list a few implementations where the value differs from -1?


I don't know of any, and it's entirely possible that there are no C
implementations where EOF has a value other than -1.

Nevertheless, you should never write -1 where EOF would be
appropriate. For one thing, your code could break if some future
implementation, or some present implementation I don't know about,
uses a value other than -1 for EOF (which would be perfectly legal).
For another, writing EOF rather than -1 makes your intent much clearer
to anyone reading your code.

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
 
 
 
Keith Thompson
Guest
Posts: n/a
 
      12-07-2008
Harald van Dijk <(E-Mail Removed)> writes:
> On Sat, 06 Dec 2008 20:25:19 -0500, CBFalconer wrote:
>> Sri Harsha Dandibhotla wrote:
>>>

>> ... snip ...
>>>
>>> I have read that EOF doesn't always have the value of -1. Can someone
>>> please list a few implementations where the value differs from -1?

>>
>> No. That is why you should always use the macro EOF, which is defined
>> in the standard includes.

>
> Non sequitur. If every implementation in the world defines EOF as -1,
> there is little benefit in using the macro. If some implementation gives
> it a different value, you have a definite need to use the macro for your
> code to work.


Yes, there is a benefit: clarity.

0 is a valid null pointer constant on every C implementation in the
world, but I still prefer to use NULL.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
 
 
 
Keith Thompson
Guest
Posts: n/a
 
      12-07-2008
"Jujitsu Lizard" <(E-Mail Removed)> writes:
> "Zero" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>>
>> When I open this file in binary code,
>> the end of the first line is 0xD 0xA.

>
> We're talking Windows here. Unix ends lines with a 0xA only.
>
> The safest approach (for portability given the universe of two
> conventions) is probably to open every file in binary mode, then have
> your code contain an automaton that treats 13-10 and 10 the same way.
> I believe a common approach is to consider only the 10's.

[...]

No, the safest approach is to open text files in text mode, so you
don't have to worry about how line endings are represented. That's
what text mode is for.

(If you have to deal with text files in a format not native to the
operating system you're running on, that's a different matter. If
possible, the best approach is usually to convert such files to native
format.)

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Harald van Dijk
Guest
Posts: n/a
 
      12-07-2008
On Sat, 06 Dec 2008 23:52:23 -0800, Keith Thompson wrote:
> Harald van Dijk <(E-Mail Removed)> writes:
>> On Sat, 06 Dec 2008 20:25:19 -0500, CBFalconer wrote:
>>> Sri Harsha Dandibhotla wrote:
>>>>
>>> ... snip ...
>>>>
>>>> I have read that EOF doesn't always have the value of -1. Can someone
>>>> please list a few implementations where the value differs from -1?
>>>
>>> No. That is why you should always use the macro EOF, which is defined
>>> in the standard includes.

>>
>> Non sequitur. If every implementation in the world defines EOF as -1,
>> there is little benefit in using the macro. If some implementation
>> gives it a different value, you have a definite need to use the macro
>> for your code to work.


To clarify, by "little" I did not mean "no", I meant "significantly
smaller".

> Yes, there is a benefit: clarity.


if ((c = getchar()) == -1)

seems almost equally straightforward to me, given that all successful
results are nonnegative. If you include non-standard functions, there are
plenty more that return a fixed negative value to indicate an error.

I do agree that EOF is more readable, but I think it's a relatively small
point when compared to a concrete implementation where EOF != -1.

> 0 is a valid null pointer constant on every C implementation in the
> world, but I still prefer to use NULL.


But I imagine you have no problems reading code by others that uses 0 to
initialise pointers. If so, here too the benefit is there, but it is not
great (to me).
 
Reply With Quote
 
Harald van Dijk
Guest
Posts: n/a
 
      12-07-2008
On Sun, 07 Dec 2008 08:44:28 +0000, Richard Heathfield wrote:
> Keith Thompson said:
>> Harald van D?k <(E-Mail Removed)> writes:
>>> [...] If every implementation in the world defines EOF as -1,

>> [...]

>[...]
> I conclude that either the ANSI C Committee were fruitcakes or there
> really were portability concerns with -1.


Well, I don't know if there are, but according to K&R, there were. It
describes two common conventions: end of file is indicated by -1, or by 0.
The latter was later disallowed by ANSI C, and I have no idea if those
implementations that used it have been changed, and if so, what value for
EOF they have changed to.
 
Reply With Quote
 
Bartc
Guest
Posts: n/a
 
      12-07-2008
Keith Thompson wrote:
> "Jujitsu Lizard" <(E-Mail Removed)> writes:
>> "Zero" <(E-Mail Removed)> wrote in message
>> news:(E-Mail Removed)...
>>>
>>> When I open this file in binary code,
>>> the end of the first line is 0xD 0xA.

>>
>> We're talking Windows here. Unix ends lines with a 0xA only.
>>
>> The safest approach (for portability given the universe of two
>> conventions) is probably to open every file in binary mode, then have
>> your code contain an automaton that treats 13-10 and 10 the same way.
>> I believe a common approach is to consider only the 10's.

> [...]
>
> No, the safest approach is to open text files in text mode, so you
> don't have to worry about how line endings are represented. That's
> what text mode is for.
>
> (If you have to deal with text files in a format not native to the
> operating system you're running on, that's a different matter. If
> possible, the best approach is usually to convert such files to native
> format.)


This is exactly the problem. C's text mode /assumes/ a native format, and
might go wrong on anything else. In that case you might as well work in
binary and sort out the CR/LF combinations yourself.

(Possibly related: if I execute printf("Hello World\n") under Windows, and
redirect the output to a file, as in hello >output, I get CR CR LF at the
end. I've forgotten the reason for this; anyone known why?)

--
Bartc

 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      12-07-2008
"Bartc" <(E-Mail Removed)> writes:

> Keith Thompson wrote:
>> "Jujitsu Lizard" <(E-Mail Removed)> writes:
>>> "Zero" <(E-Mail Removed)> wrote in message
>>> news:(E-Mail Removed)...
>>>>
>>>> When I open this file in binary code,
>>>> the end of the first line is 0xD 0xA.
>>>
>>> We're talking Windows here. Unix ends lines with a 0xA only.
>>>
>>> The safest approach (for portability given the universe of two
>>> conventions) is probably to open every file in binary mode, then have
>>> your code contain an automaton that treats 13-10 and 10 the same way.
>>> I believe a common approach is to consider only the 10's.

>> [...]
>>
>> No, the safest approach is to open text files in text mode, so you
>> don't have to worry about how line endings are represented. That's
>> what text mode is for.
>>
>> (If you have to deal with text files in a format not native to the
>> operating system you're running on, that's a different matter. If
>> possible, the best approach is usually to convert such files to native
>> format.)

>
> This is exactly the problem. C's text mode /assumes/ a native format,
> and might go wrong on anything else. In that case you might as well
> work in binary and sort out the CR/LF combinations yourself.


If you have to deal with files from various systems you simply have a
general program design problem. Every choice you make will involve a
set of compromises between convenience for you and your users and the
formats that your program can handle. There is very little general
advice one can give.

In the dark ages, this was less of a problem. There were so many
kinds of file that any software that moved data between systems had to
know what to do with them all. You could move a file from a
record-oriented EBCDIC machine to a Unix one and the right things
would be done. The problem you see is partly caused by the similarity
between formats, rather than the differences, and pertly by the fact
the data gets moved between systems without regard to the data's
"type".

> (Possibly related: if I execute printf("Hello World\n") under Windows,
> and redirect the output to a file, as in hello >output, I get CR CR LF
> at the end. I've forgotten the reason for this; anyone known why?)


Name and shame the compiler and (more likely "or") the library. It
helps to know what to avoid. I've not seen that behaviour and I would
want to avoid it as far as possible.

--
Ben.
 
Reply With Quote
 
Bartc
Guest
Posts: n/a
 
      12-07-2008
"Ben Bacarisse" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> "Bartc" <(E-Mail Removed)> writes:


>> (Possibly related: if I execute printf("Hello World\n") under Windows,
>> and redirect the output to a file, as in hello >output, I get CR CR LF
>> at the end. I've forgotten the reason for this; anyone known why?)

>
> Name and shame the compiler and (more likely "or") the library. It
> helps to know what to avoid. I've not seen that behaviour and I would
> want to avoid it as far as possible.


I've just remembered the reason: I was calling C's printf() from a language
that expanded "\n" to CR,LF actually in the string literal.

Because printf writes to stdout and stdout is in text mode, the LF results
in an extra expansion. But the CR,CR,LF is only seen when directed to a
file.

So not a C problem other than stdout being awkward to set to binary mode.

--
Bartc

 
Reply With Quote
 
nick_keighley_nospam@hotmail.com
Guest
Posts: n/a
 
      12-07-2008
On Dec 7, 8:44*am, Richard Heathfield <(E-Mail Removed)> wrote:
> Keith Thompson said:
> > Harald van D?k <(E-Mail Removed)> writes:


> >> [...] If every implementation in the world defines EOF as -1,

>
> [all: note the conditional - Harald is not making this claim, merely
> reasoning about it. I don't know (or care) whether the claim is true.]
>
> >> there is little benefit in using the macro. If some implementation gives
> >> it a different value, you have a definite need to use the macro for your
> >> code to work.

>
> > Yes, there is a benefit: clarity.

>
> Yes. He said "little benefit", not "no benefit". If EOF did not exist (as
> this sense that we know and love so well), it would hardly be necessary to
> invent it unless there really were portability issues with -1.


I disagree. I think the clarity point is important

> The reason
> you give is a tiny reason. Lots of Unix people hard-code -1s into their
> code knowing full well that they will be understood as failure tests by
> lots of other Unix people.


a bad idea I think

> I conclude that either the ANSI C Committee were fruitcakes or there really
> were portability concerns with -1.
>
> > 0 is a valid null pointer constant on every C implementation in the
> > world, but I still prefer to use NULL.

>
> Yes. It's a small thing, though - if NULL didn't exist and everybody used
> 0, we'd all know what it meant, right?


but again it would be a bad idea. After all we *know* that 0 will
work on all impementations but many programmers (including me)
use the NULL macro.

The main reason to use a macros like these is semantic clarity.
There are two lesser reasons (or beneficial side effects).

1. if a value changes it can be changed in one place.
2. if a value changes you don't have to worry that
a global substitution will change unexpected values

#define MAX_BASE_STATIONS 9
#define RESET_COMMAND 9
#define HEADER_SIZE 9
#define SEEK_FIELD_OFFSET -1


--
Nick Keighley

"Initialize constants with DATA statements or INITIAL attributes;
initialize variables with excutable code."
Kernighan and Plauger "The Elements of Programming Style"
 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      12-07-2008
Bartc wrote:
....
> This is exactly the problem. C's text mode /assumes/ a native format,
> and might go wrong on anything else. In that case you might as well work
> in binary and sort out the CR/LF combinations yourself.


If there were only a few possible choices, that would make sense. But
what about, for instance, files from systems where end-of-line is
indicated by padding to a fixed block length with '\0'? That's just one
just one of several real-world options that involve neither CR nor LF.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
std::istream slowness vs. std::fgetc Jason K C++ 6 05-12-2005 02:16 PM
getc() vs. fgetc() William L. Bahn C Programming 13 07-21-2004 04:16 AM
Re: EOF and getchar/fgetc Martin Dickopp C Programming 0 02-14-2004 03:17 PM
Fgetc returns the wrong character (0a -> 0d) Georg Troxler C Programming 8 01-27-2004 06:03 PM
fgetc() past EOF =?iso-8859-1?q?Jos=E9_de_Paula?= C Programming 6 01-19-2004 09:03 AM



Advertisments