![]() |
Re: position of \0 while using fgets()
On 1/2/2011 10:11 AM, Cross wrote:
> > If I have a file containing "I am a boy.\r\n" and I am using fgets() to > read the file, what shall be the contents of the string: > > 1. "I am a boy.\r\n\0" > 2. "I am a boy.\r\0" > 3. "I am a boy.\n\0" > 4. "I am a boy." > > I think it shall depend upon the definition of the EOF character on the > platform used. There is no "EOF character" in C or its library. There is an EOF macro that yields a negative value, and some C functions return EOF to indicate that an end-of-input or error condition prevents them from reading characters. But EOF is not a character; in fact, its value is chosen so that it can be easily distinguished from actual character values (on most platforms). But you probably meant "EOL character," and here the situation is a little different. C's libraries use the convention that '\n' marks the end of a line -- but that's the "internal" representation, and has no necessary relation to how end-of-line is represented by the file system(s) the program communicates with. Unix-derived systems usually record an explicit '\n' as a data character in the file; some others use the character pair '\r' '\n', some use '\n' '\r'. Some use other methods entirely, like preceding the line's payload with a count. Whatever scheme the host system employs for dividing a file into lines, it is the job of C's I/O functions to translate between that scheme and C's internal representation of '\n'-terminated lines. So if you read a text file that's properly formed according to the host system's conventions, your C program will see a '\n' at the end of each line (except possibly the very last). That '\n' may or may not resemble what's actually recorded in the file; it's just the library function telling you "The line ended here" without telling you how it discovered that fact. So, back to your question. If we take "I am a boy.\r\n" to mean that the file as recorded contains those thirteen characters (I'm assuming you don't intend these quoted strings as C source literals, which would automatically append a '\0' as a fourteenth character), then the significance, if any, of the '\r' and '\n' is whatever the host system says it is. If the host system uses the '\r' '\n' pair as an end-of-line marker, then your case (3) is the right one: The pair marks the preceding eleven characters as the line's content, the library synthesizes a '\n', and fgets() always writes a '\0'. If the host system uses a single '\n' to mark end-of-line, then case (1) is right: The first twelve characters are the content, the library appends an '\n', and fgets() appends a '\0'. If the host system uses some other means of separating lines -- well, you haven't told us what it is, so we can't tell what C will make of it. Case (2) could only be right if you used a short buffer. If there's room, fgets() will always read an entire line and place '\n' at its end (except possibly at the end of the whole file, and in your example we know that more data follows the '\r'). Case (4) cannot be right, ever, because fgets() always writes a '\0' at the end of the string, whether that string represents a complete or a partial line. (Note again that I'm assuming your quoted strings do not indicate C source literals.) -- Eric Sosman esosman@ieee-dot-org.invalid |
Re: position of \0 while using fgets()
Eric Sosman <esosman@ieee-dot-org.invalid> writes:
> On 1/2/2011 10:11 AM, Cross wrote: >> >> If I have a file containing "I am a boy.\r\n" and I am using fgets() to >> read the file, what shall be the contents of the string: >> >> 1. "I am a boy.\r\n\0" >> 2. "I am a boy.\r\0" >> 3. "I am a boy.\n\0" >> 4. "I am a boy." <snip> > So, back to your question. If we take "I am a boy.\r\n" to mean > that the file as recorded contains those thirteen characters (I'm > assuming you don't intend these quoted strings as C source literals, > which would automatically append a '\0' as a fourteenth character), > then the significance, if any, of the '\r' and '\n' is whatever the > host system says it is. If the host system uses the '\r' '\n' pair as > an end-of-line marker, then your case (3) is the right one: The pair > marks the preceding eleven characters as the line's content, the > library synthesizes a '\n', and fgets() always writes a '\0'. It's probably also worth noting that case (3) is the right one if the system uses '\r' as the end-of-line marker. > If the > host system uses a single '\n' to mark end-of-line, then case (1) is > right: The first twelve characters are the content, the library appends > an '\n', and fgets() appends a '\0'. If the host system uses some other > means of separating lines -- well, you haven't told us what it is, so > we can't tell what C will make of it. > > Case (2) could only be right if you used a short buffer. If there's > room, fgets() will always read an entire line and place '\n' at its end > (except possibly at the end of the whole file, and in your example we > know that more data follows the '\r'). > > Case (4) cannot be right, ever, because fgets() always writes a '\0' > at the end of the string, whether that string represents a complete or > a partial line. (Note again that I'm assuming your quoted strings do > not indicate C source literals.) Very much a nit (and i would not have picked it if I weren't already posting) but if there is a read error the buffer contents are indeterminate so all these cases (and lots of others) become possible. -- Ben. |
| All times are GMT. The time now is 06:22 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.