On Tue, 2 Dec 2003, Dan Pop wrote:
>
> In <eMadnYwOVs0CWFGiRVn-> James Hu writes:
> >On 2003-12-02, Dan Pop <> wrote:
> >> In <IM-dndlzzvLK9VaiRVn-> James Hu writes:
> >>
> >>>It is not inconceivable for a system to buffer read a file and scan
> >>>the bytes for a sizeof(int) bytes long bit pattern before determining
> >>>whether it should return the next char as input or return EOF instead.
> >>>Of course, in such a scheme, the system would need to be able to escape
> >>>the bit pattern if it wanted to be able to scan the bytes literally.
> >>>(We don't know how it is done, nor should we care.)
> >If you want a practical example, consider a stdio interface implemented
> >over a compressed filesystem.
>
> I still don't get it. Each and every byte combination is still valid
> in a binary file, therefore it *cannot* be used as eof marker.
A trivial example would be an MS-DOS-like hybrid system on which the
byte 0xA1 would indicate the end of each file (text or binary). [Not
a typo; I specifically changed it from 0x1A so that EOF could be
#defined to be 0xA1A1 on this hypothetical 16-bit system.]
"But then how does a program represent the literal byte 0xA1 on
the disk?" you ask. Simple -- escape codes. For example, the EOF
code could be 0xA1A1, and the escape code for the literal byte 0xA1
could be 0xA100 (big-endian). This would satisfy all the requirements
of the C standard on file systems (i.e., precious few), while being
technically possible.
Heck, you could even Huffman-encode every single file on the system
to save space, and use some rare codon to indicate EOF. That's getting
closer to what I think James means by "a compressed filesystem."
> >> For text files, a single eof character is enough to mark the end of the
> >> text file, even if the physical file is larger (up to the end of the
> >> logical disk block). This is the well known scheme used by CP/M-80.
> >
> >A single character is sufficient but not necessary. My multi-byte EOF
> >system is hypothetical.
>
> That's true and irrelevant in the case of text files. My point is that
> your scheme simply does not work for binary files.
[In case Dan hasn't already thought of this: fseek() is not required
to run in constant time. Binary files don't have to be random-access
in their "natural state"; it just happens that all existing systems
do it that way.]
> Furthermore, EOF
> is a C macro having no connection with whatever mechanism the
> implementation uses to detect the end of a file. All we know about it
> is that it expands to a negative integer value.
Correct, of course. But I just gave a possible implementation
on which the system's EOF marker, 0xA1A1, is exactly the same value
as the C compiler's EOF value. So James' scenario is not impossible,
merely implausible. Heck, for all I know it might be *common* on
some highly esoteric platforms!
-Arthur