Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > EOF (novice)

Reply
Thread Tools

EOF (novice)

 
 
Hallvard B Furuseth
Guest
Posts: n/a
 
      12-02-2003
James Hu wrote:

> Of course not. "We don't know nor should we care" clearly spells that
> out. You claimed that EOF was *not* stored on the physical medium.


No, I claimed that _C's_ EOF is not. At least that was what I meant to
say. Anyway, maybe we just have been saying the same thing in different
ways.

--
Hallvard
 
Reply With Quote
 
 
 
 
Arthur J. O'Dwyer
Guest
Posts: n/a
 
      12-03-2003

On Tue, 2 Dec 2003, Dan Pop wrote:
>
> In <eMadnYwOVs0CWFGiRVn-> James Hu writes:
> >On 2003-12-02, Dan Pop <> wrote:
> >> In <IM-dndlzzvLK9VaiRVn-> James Hu writes:
> >>
> >>>It is not inconceivable for a system to buffer read a file and scan
> >>>the bytes for a sizeof(int) bytes long bit pattern before determining
> >>>whether it should return the next char as input or return EOF instead.
> >>>Of course, in such a scheme, the system would need to be able to escape
> >>>the bit pattern if it wanted to be able to scan the bytes literally.
> >>>(We don't know how it is done, nor should we care.)


> >If you want a practical example, consider a stdio interface implemented
> >over a compressed filesystem.

>
> I still don't get it. Each and every byte combination is still valid
> in a binary file, therefore it *cannot* be used as eof marker.


A trivial example would be an MS-DOS-like hybrid system on which the
byte 0xA1 would indicate the end of each file (text or binary). [Not
a typo; I specifically changed it from 0x1A so that EOF could be
#defined to be 0xA1A1 on this hypothetical 16-bit system.]
"But then how does a program represent the literal byte 0xA1 on
the disk?" you ask. Simple -- escape codes. For example, the EOF
code could be 0xA1A1, and the escape code for the literal byte 0xA1
could be 0xA100 (big-endian). This would satisfy all the requirements
of the C standard on file systems (i.e., precious few), while being
technically possible.
Heck, you could even Huffman-encode every single file on the system
to save space, and use some rare codon to indicate EOF. That's getting
closer to what I think James means by "a compressed filesystem."


> >> For text files, a single eof character is enough to mark the end of the
> >> text file, even if the physical file is larger (up to the end of the
> >> logical disk block). This is the well known scheme used by CP/M-80.

> >
> >A single character is sufficient but not necessary. My multi-byte EOF
> >system is hypothetical.

>
> That's true and irrelevant in the case of text files. My point is that
> your scheme simply does not work for binary files.


[In case Dan hasn't already thought of this: fseek() is not required
to run in constant time. Binary files don't have to be random-access
in their "natural state"; it just happens that all existing systems
do it that way.]

> Furthermore, EOF
> is a C macro having no connection with whatever mechanism the
> implementation uses to detect the end of a file. All we know about it
> is that it expands to a negative integer value.


Correct, of course. But I just gave a possible implementation
on which the system's EOF marker, 0xA1A1, is exactly the same value
as the C compiler's EOF value. So James' scenario is not impossible,
merely implausible. Heck, for all I know it might be *common* on
some highly esoteric platforms!

-Arthur

 
Reply With Quote
 
 
 
 
Dan Pop
Guest
Posts: n/a
 
      12-03-2003
In <Pine.LNX.4.58-> "Arthur J. O'Dwyer" <> writes:


>On Tue, 2 Dec 2003, Dan Pop wrote:
>>
>> In <eMadnYwOVs0CWFGiRVn-> James Hu writes:
>> >On 2003-12-02, Dan Pop <> wrote:
>> >> In <IM-dndlzzvLK9VaiRVn-> James Hu writes:
>> >>
>> >>>It is not inconceivable for a system to buffer read a file and scan
>> >>>the bytes for a sizeof(int) bytes long bit pattern before determining
>> >>>whether it should return the next char as input or return EOF instead.
>> >>>Of course, in such a scheme, the system would need to be able to escape
>> >>>the bit pattern if it wanted to be able to scan the bytes literally.
>> >>>(We don't know how it is done, nor should we care.)

>
>> >If you want a practical example, consider a stdio interface implemented
>> >over a compressed filesystem.

>>
>> I still don't get it. Each and every byte combination is still valid
>> in a binary file, therefore it *cannot* be used as eof marker.

>
> A trivial example would be an MS-DOS-like hybrid system on which the
>byte 0xA1 would indicate the end of each file (text or binary). [Not
>a typo; I specifically changed it from 0x1A so that EOF could be
>#defined to be 0xA1A1 on this hypothetical 16-bit system.]
> "But then how does a program represent the literal byte 0xA1 on
>the disk?" you ask. Simple -- escape codes. For example, the EOF
>code could be 0xA1A1, and the escape code for the literal byte 0xA1
>could be 0xA100 (big-endian). This would satisfy all the requirements
>of the C standard on file systems (i.e., precious few), while being
>technically possible.


The semantics of fscanf and ftell on binary streams render this scheme
painful to implement: the byte offsets used by the program or reported
to the program are not the real byte offsets inside the file. But this
is only the tip of the iceberg. Imagine that I want to overwrite a
sequence of ordinary bytes by a sequence of 0xA1 bytes. Not only the
whole remaining of the file would have to be rewritten on the disk, but
the physical size of the file would increase, creating problems if there
is no more room on the disk (from the user's POV the file has the same
size, but it suddenly no longer fits on the disk). I'm afraid no one
would want to use your implementation

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email:
 
Reply With Quote
 
glen herrmannsfeldt
Guest
Posts: n/a
 
      12-05-2003
Arthur J. O'Dwyer wrote:

> [In case Dan hasn't already thought of this: fseek() is not required
> to run in constant time. Binary files don't have to be random-access
> in their "natural state"; it just happens that all existing systems
> do it that way.]


The file system used on some IBM mainframes does not make fseek() easy.

For files with fixed length records, they are normally stored on disk in
fixed length blocks, except for the last block. If an existing file is
appended to, it can have a short block that is not at the end, making
random access difficult. Though if the library routines keep track of
the block sizes the first time through, it would be easy to fseek() to
any previously seen position.

For files with variable length records (V or VB), the only way would be
to keep track of the block lengths in the file.

-- glen

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[Windows] Any way to distinguish ^C Induced EOF from ^Z EOF? Jan Burse Java 67 03-14-2012 12:21 AM
ifstream eof not reporting eof? SpreadTooThin C++ 10 06-15-2007 08:49 AM
if EOF = -1, can't a valid character == EOF and cause problems? Kobu C Programming 10 03-04-2005 10:40 PM
A question about EOF SL_McManus Perl 1 12-04-2003 01:50 AM
How to check for EOF (End of file) when using StreamReader to parse text file Sacha Korell ASP .Net 2 09-06-2003 02:59 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57