Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > fgetc - end of line - where is 0xD?

Reply
Thread Tools

fgetc - end of line - where is 0xD?

 
 
Zero
Guest
Posts: n/a
 
      12-06-2008
Hi there,

I have the following file:

--------------------
Hello
world
--------------------

When I open this file in binary code,
the end of the first line is 0xD 0xA.

When I read this file with fgetc like
while( (c = fgetc(pFilePointer)) != '-1')
{
printf("\n%d", c);
}

I only get the 0xA at the end of line, not 0xD.

Does anybody know, what happens?

Zeh Mau
 
Reply With Quote
 
 
 
 
Richard Tobin
Guest
Posts: n/a
 
      12-06-2008
In article <(E-Mail Removed)>,
Zero <(E-Mail Removed)> wrote:

>When I open this file in binary code,
>the end of the first line is 0xD 0xA.


You're probably using a Microsoft operating system. If you were using
Unix, you'd see a single 0xA byte (a linefeed character). If you were
using an old Mac operating system, you'd see a single 0xD byte (a
carriage return). If you were using some ancient mainframe system,
you'd see a lots of nulls padding it to 80 characters.

>When I read this file with fgetc like
>while( (c = fgetc(pFilePointer)) != '-1')
>{
> printf("\n%d", c);
>}
>
>I only get the 0xA at the end of line, not 0xD.


It would be inconvenient if you had to know how lines end on every
different operating system that your program might run on, so C
converts line ends to a single linefeed character.

If you want to see the actual bytes in the file, open it in binary mode.

-- Richard
--
Please remember to mention me / in tapes you leave behind.
 
Reply With Quote
 
 
 
 
Zero
Guest
Posts: n/a
 
      12-06-2008
> You're probably using a Microsoft operating system. *If you were using
Yes I do.

> so C > converts line ends to a single linefeed character.

Does it mean, the 0xD is there but the fgetc-functions simply ignores
it?
As I said, in binary code, both 0xD and 0xA are shown.

Zeh Mau
 
Reply With Quote
 
Sri Harsha Dandibhotla
Guest
Posts: n/a
 
      12-06-2008
On Dec 6, 2:17*pm, Zero <(E-Mail Removed)> wrote:
> Hi there,
>
> I have the following file:
>
> --------------------
> Hello
> world
> --------------------
>
> When I open this file in binary code,
> the end of the first line is 0xD 0xA.
>
> When I read this file with fgetc like
> while( (c = fgetc(pFilePointer)) != '-1')
> {
> * *printf("\n%d", c);
>
> }
>
> I only get the 0xA at the end of line, not 0xD.
>
> Does anybody know, what happens?
>
> Zeh Mau



This is what I found on Wikipedia ( http://en.wikipedia.org/wiki/Newline...ming_languages
)

see point 2 :
When writing a file in text mode, '\n' is transparently translated to
the native newline sequence used by the system, which may be longer
than one character. (Note that a C implementation is allowed to not
store newline characters in files. For example, the lines of a text
file could be stored as rows of a SQL table or as fixed-length
records.) When reading in text mode, the native newline sequence is
translated back to '\n'. In binary mode, the second mode of I/O
supported by the C library, no translation is performed, and the
internal representation of any escape sequence is output directly.

The internal representation in windows is \r\n which is shown in
binary without converting to a single \n character.
 
Reply With Quote
 
viza
Guest
Posts: n/a
 
      12-06-2008
On Sat, 06 Dec 2008 01:35:24 -0800, Zero wrote:

>> You're probably using a Microsoft operating system. *If you were using

> Yes I do.
>
>> so C > converts line ends to a single linefeed character.

> Does it mean, the 0xD is there but the fgetc-functions simply ignores
> it?
> As I said, in binary code, both 0xD and 0xA are shown.


In practice, it ignores it when it comes at the end of a line (before an
0xa), and it shouldn't appear elsewhere. In theory, the input file on
disc is converted into an abstract series of lines, and then then the
lines are separated by newline characters, and in us-ascii a newline is
0xa.
 
Reply With Quote
 
Martin Ambuhl
Guest
Posts: n/a
 
      12-06-2008
Zero wrote, asking a frequently asked question (FAQ) about end of lines

Using the two line input file containing
Hello
world
We run the following.
Notice the difference between reading in text mode("r") which just sees
that the end-of-line is marked in a system-specific way and in binary
mode ("rb") which sees the actual characters:

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main(void)
{
int c;
FILE *f;
const char fname[] = "inputdata";

printf("Opening \"%s\" for input in text mode.\n", fname);
if (!(f = fopen(fname, "r"))) {
fputs("fopen failed. Quitting.\n", stderr);
exit(EXIT_FAILURE);
}
printf("The characters read from the file are (in text mode):\n");
while ((c = fgetc(f)) != EOF) {
printf("%#04x %#05o %03d ", (unsigned) c, (unsigned) c, c);
if (iscntrl(c))
printf(" (a control character)\n");
else if (isspace(c))
printf(" (whitespace)\n");
else if (!isgraph(c))
printf(" (other non-graphic)\n");
else
printf("'%c'\n", c);
}

putchar('\n');

printf("Reopening \"%s\" for input in binary mode.\n", fname);
if (!(f = freopen(fname, "rb", f))) {
fputs("freopen failed. Quitting.\n", stderr);
exit(EXIT_FAILURE);
}
printf("The characters read from the file are (in binary mode):\n");
while ((c = fgetc(f)) != EOF) {
printf("%#04x %#05o %03d ", (unsigned) c, (unsigned) c, c);
if (iscntrl(c))
printf(" (a control character)\n");
else if (isspace(c))
printf(" (whitespace)\n");
else if (!isgraph(c))
printf(" (other non-graphic)\n");
else
printf("'%c'\n", c);
}
fclose(f);
return 0;
}

[output on a Windows system]
Opening "inputdata" for input in text mode.
The characters read from the file are (in text mode):
0x48 00110 072 'H'
0x65 00145 101 'e'
0x6c 00154 108 'l'
0x6c 00154 108 'l'
0x6f 00157 111 'o'
0x0a 00012 010 (a control character)
0x57 00127 087 'W'
0x6f 00157 111 'o'
0x72 00162 114 'r'
0x6c 00154 108 'l'
0x64 00144 100 'd'
0x0a 00012 010 (a control character)

Reopening "inputdata" for input in binary mode.
The characters read from the file are (in binary mode):
0x48 00110 072 'H'
0x65 00145 101 'e'
0x6c 00154 108 'l'
0x6c 00154 108 'l'
0x6f 00157 111 'o'
0x0d 00015 013 (a control character)
0x0a 00012 010 (a control character)
0x57 00127 087 'W'
0x6f 00157 111 'o'
0x72 00162 114 'r'
0x6c 00154 108 'l'
0x64 00144 100 'd'
0x0d 00015 013 (a control character)
0x0a 00012 010 (a control character)
 
Reply With Quote
 
Sri Harsha Dandibhotla
Guest
Posts: n/a
 
      12-06-2008
On Dec 6, 11:13*pm, (E-Mail Removed) (blargg) wrote:
> Zero wrote:
> > I have the following file:

>
> > --------------------
> > Hello
> > world
> > --------------------

>
> > When I open this file in binary code,
> > the end of the first line is 0xD 0xA.

>
> > When I read this file with fgetc like
> > while( (c = fgetc(pFilePointer)) != '-1')
> > {
> > * *printf("\n%d", c);
> > }

>
> Why are you comparing the result of fgetc with the multi-character literal
> '-1'? I'm surprised that loop ever terminates. Actually, I imagine the
> real answer is that the above code is NOT the actual code you ran.


He meant to test for -1 and not '-1'.
Though, he should rather test for EOF instead.

I have read that EOF doesn't always have the value of -1. Can someone
please list a few implementations where the value differs from -1?
Thanks
 
Reply With Quote
 
George
Guest
Posts: n/a
 
      12-06-2008
On Sat, 06 Dec 2008 05:56:29 -0500, Martin Ambuhl wrote:

> #include <stdio.h>
> #include <stdlib.h>
> #include <ctype.h>
>
> int main(void)
> {
> int c;
> FILE *f;
> const char fname[] = "inputdata";
>
> printf("Opening \"%s\" for input in text mode.\n", fname);
> if (!(f = fopen(fname, "r"))) {
> fputs("fopen failed. Quitting.\n", stderr);
> exit(EXIT_FAILURE);
> }
> printf("The characters read from the file are (in text mode):\n");
> while ((c = fgetc(f)) != EOF) {
> printf("%#04x %#05o %03d ", (unsigned) c, (unsigned) c, c);
> if (iscntrl(c))
> printf(" (a control character)\n");
> else if (isspace(c))
> printf(" (whitespace)\n");
> else if (!isgraph(c))
> printf(" (other non-graphic)\n");
> else
> printf("'%c'\n", c);
> }
>
> putchar('\n');
>
> printf("Reopening \"%s\" for input in binary mode.\n", fname);
> if (!(f = freopen(fname, "rb", f))) {
> fputs("freopen failed. Quitting.\n", stderr);
> exit(EXIT_FAILURE);
> }
> printf("The characters read from the file are (in binary mode):\n");
> while ((c = fgetc(f)) != EOF) {
> printf("%#04x %#05o %03d ", (unsigned) c, (unsigned) c, c);
> if (iscntrl(c))
> printf(" (a control character)\n");
> else if (isspace(c))
> printf(" (whitespace)\n");
> else if (!isgraph(c))
> printf(" (other non-graphic)\n");
> else
> printf("'%c'\n", c);
> }
> fclose(f);
> return 0;
> }


Many of Martin's posts are short, error-free, legible programs one can copy
and adapt easily. I get the same output he does for a different data set
on the same platform:

Opening "george.txt" for input in text mode.
The characters read from the file are (in text mode):
0x31 00061 049 '1'
0x20 00040 032 (whitespace)
0x20 00040 032 (whitespace)
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x31 00061 049 '1'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x31 00061 049 '1'
0x0a 00012 010 (a control character)
....
Reopening "george.txt" for input in binary mode.
The characters read from the file are (in binary mode):
0x31 00061 049 '1'
0x20 00040 032 (whitespace)
0x20 00040 032 (whitespace)
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x31 00061 049 '1'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x31 00061 049 '1'
0x0d 00015 013 (a control character)
0x0a 00012 010 (a control character)

The tool that I have found very helpful for this type of work is od.exe
found here:

http://downloads.sourceforge.net/unxutils/UnxUtils.zip

Copy the .exe to a convenient directory and invoke it using the batch file
dump.bat. Dump.bat contains:

od -tx1 -Ax -v %1

-t == how to display data
x1 == one hex byte
-A == how to display address (offset from start of file)
x == hex
-v == show all data, including runs of duplicates

%1 first argument to .bat file

For example, if I have a file "chars.dat", then the appropriate command is:

C:\Users\epc\temp>dump chars.dat
--
George

When you turn your heart and your life over to Christ, when you accept
Christ as the savior, it changes your heart.
George W. Bush

Picture of the Day http://apod.nasa.gov/apod/
 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      12-07-2008
Sri Harsha Dandibhotla wrote:
>

.... snip ...
>
> I have read that EOF doesn't always have the value of -1.
> Can someone please list a few implementations where the
> value differs from -1?


No. That is why you should always use the macro EOF, which is
defined in the standard includes. See the C standard.

Some useful references about C:
<http://www.ungerhu.com/jxh/clc.welcome.txt>
<http://c-faq.com/> (C-faq)
<http://benpfaff.org/writings/clc/off-topic.html>
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf> (C99)
<http://cbfalconer.home.att.net/download/n869_txt.bz2> (pre-C99)
<http://www.dinkumware.com/c99.aspx> (C-library}
<http://gcc.gnu.org/onlinedocs/> (GNU docs)
<http://clc-wiki.net/wiki/C_community:comp.lang.c:Introduction>

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
 
Reply With Quote
 
Harald van Dijk
Guest
Posts: n/a
 
      12-07-2008
On Sat, 06 Dec 2008 20:25:19 -0500, CBFalconer wrote:
> Sri Harsha Dandibhotla wrote:
>>

> ... snip ...
>>
>> I have read that EOF doesn't always have the value of -1. Can someone
>> please list a few implementations where the value differs from -1?

>
> No. That is why you should always use the macro EOF, which is defined
> in the standard includes.


Non sequitur. If every implementation in the world defines EOF as -1,
there is little benefit in using the macro. If some implementation gives
it a different value, you have a definite need to use the macro for your
code to work.

I don't have an example of an implementation where EOF is anything other
than -1.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
std::istream slowness vs. std::fgetc Jason K C++ 6 05-12-2005 02:16 PM
getc() vs. fgetc() William L. Bahn C Programming 13 07-21-2004 04:16 AM
Re: EOF and getchar/fgetc Martin Dickopp C Programming 0 02-14-2004 03:17 PM
Fgetc returns the wrong character (0a -> 0d) Georg Troxler C Programming 8 01-27-2004 06:03 PM
fgetc() past EOF =?iso-8859-1?q?Jos=E9_de_Paula?= C Programming 6 01-19-2004 09:03 AM



Advertisments