Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   Re: position of \0 while using fgets() (http://www.velocityreviews.com/forums/t741287-re-position-of-0-while-using-fgets.html)

Eric Sosman 01-02-2011 03:46 PM

Re: position of \0 while using fgets()
 
On 1/2/2011 10:11 AM, Cross wrote:
>
> If I have a file containing "I am a boy.\r\n" and I am using fgets() to
> read the file, what shall be the contents of the string:
>
> 1. "I am a boy.\r\n\0"
> 2. "I am a boy.\r\0"
> 3. "I am a boy.\n\0"
> 4. "I am a boy."
>
> I think it shall depend upon the definition of the EOF character on the
> platform used.


There is no "EOF character" in C or its library. There is an EOF
macro that yields a negative value, and some C functions return EOF to
indicate that an end-of-input or error condition prevents them from
reading characters. But EOF is not a character; in fact, its value is
chosen so that it can be easily distinguished from actual character
values (on most platforms).

But you probably meant "EOL character," and here the situation is
a little different. C's libraries use the convention that '\n' marks
the end of a line -- but that's the "internal" representation, and has
no necessary relation to how end-of-line is represented by the file
system(s) the program communicates with. Unix-derived systems usually
record an explicit '\n' as a data character in the file; some others
use the character pair '\r' '\n', some use '\n' '\r'. Some use other
methods entirely, like preceding the line's payload with a count.

Whatever scheme the host system employs for dividing a file into
lines, it is the job of C's I/O functions to translate between that
scheme and C's internal representation of '\n'-terminated lines. So
if you read a text file that's properly formed according to the host
system's conventions, your C program will see a '\n' at the end of each
line (except possibly the very last). That '\n' may or may not resemble
what's actually recorded in the file; it's just the library function
telling you "The line ended here" without telling you how it discovered
that fact.

So, back to your question. If we take "I am a boy.\r\n" to mean
that the file as recorded contains those thirteen characters (I'm
assuming you don't intend these quoted strings as C source literals,
which would automatically append a '\0' as a fourteenth character),
then the significance, if any, of the '\r' and '\n' is whatever the
host system says it is. If the host system uses the '\r' '\n' pair as
an end-of-line marker, then your case (3) is the right one: The pair
marks the preceding eleven characters as the line's content, the
library synthesizes a '\n', and fgets() always writes a '\0'. If the
host system uses a single '\n' to mark end-of-line, then case (1) is
right: The first twelve characters are the content, the library appends
an '\n', and fgets() appends a '\0'. If the host system uses some other
means of separating lines -- well, you haven't told us what it is, so
we can't tell what C will make of it.

Case (2) could only be right if you used a short buffer. If there's
room, fgets() will always read an entire line and place '\n' at its end
(except possibly at the end of the whole file, and in your example we
know that more data follows the '\r').

Case (4) cannot be right, ever, because fgets() always writes a '\0'
at the end of the string, whether that string represents a complete or
a partial line. (Note again that I'm assuming your quoted strings do
not indicate C source literals.)

--
Eric Sosman
esosman@ieee-dot-org.invalid

Ben Bacarisse 01-02-2011 04:35 PM

Re: position of \0 while using fgets()
 
Eric Sosman <esosman@ieee-dot-org.invalid> writes:

> On 1/2/2011 10:11 AM, Cross wrote:
>>
>> If I have a file containing "I am a boy.\r\n" and I am using fgets() to
>> read the file, what shall be the contents of the string:
>>
>> 1. "I am a boy.\r\n\0"
>> 2. "I am a boy.\r\0"
>> 3. "I am a boy.\n\0"
>> 4. "I am a boy."


<snip>
> So, back to your question. If we take "I am a boy.\r\n" to mean
> that the file as recorded contains those thirteen characters (I'm
> assuming you don't intend these quoted strings as C source literals,
> which would automatically append a '\0' as a fourteenth character),
> then the significance, if any, of the '\r' and '\n' is whatever the
> host system says it is. If the host system uses the '\r' '\n' pair as
> an end-of-line marker, then your case (3) is the right one: The pair
> marks the preceding eleven characters as the line's content, the
> library synthesizes a '\n', and fgets() always writes a '\0'.


It's probably also worth noting that case (3) is the right one if the
system uses '\r' as the end-of-line marker.

> If the
> host system uses a single '\n' to mark end-of-line, then case (1) is
> right: The first twelve characters are the content, the library appends
> an '\n', and fgets() appends a '\0'. If the host system uses some other
> means of separating lines -- well, you haven't told us what it is, so
> we can't tell what C will make of it.
>
> Case (2) could only be right if you used a short buffer. If there's
> room, fgets() will always read an entire line and place '\n' at its end
> (except possibly at the end of the whole file, and in your example we
> know that more data follows the '\r').
>
> Case (4) cannot be right, ever, because fgets() always writes a '\0'
> at the end of the string, whether that string represents a complete or
> a partial line. (Note again that I'm assuming your quoted strings do
> not indicate C source literals.)


Very much a nit (and i would not have picked it if I weren't already
posting) but if there is a read error the buffer contents are
indeterminate so all these cases (and lots of others) become possible.

--
Ben.


All times are GMT. The time now is 09:33 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.