Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > differance between binary file and ascii file

Reply
Thread Tools

differance between binary file and ascii file

 
 
Herbert Rosenau
Guest
Posts: n/a
 
      05-17-2006
On Tue, 16 May 2006 08:11:04 UTC, Flash Gordon
<(E-Mail Removed)> wrote:

> Herbert Rosenau wrote:
> > On Mon, 15 May 2006 13:58:28 UTC, Flash Gordon
> > <(E-Mail Removed)> wrote:
> >
> >> Without text streams how can you produce a C source file that is
> >> guaranteed to produce a valid text file on whatever system you run the
> >> program on? Historically systems have used rather more schemes than just
> >> terminating lines with CR, CRLF or LF,

> >
> > It is simple. A stream is an absract form of data I/O. There is

>
> <snip>
>
> I think you are in violent agreement with me. I was responding to a
> questions about why C has text streams as well as binary streams with an
> explanation of the problems if it did not. You are explaining why C
> programs see an abstraction (e.g. text and binary streams with the
> system) specifics handled at a lower level.


Yes - but in question it helps nothing. Some years ago I had the job
to write a program that hat to read text files, reformat them from
line mode to stream mode (means having a paragraph as a sinlge line
independant how many single lines it were in the soure. Problem: the
files to convert on a single mashine were coming in native text
- origination from DOS/WIN, OS/2, FTP text \r\n
- origination from 370 FTP binary mode \n
- origination from 370 virtual console \r
All found mixed up in a single directory tree on local disk
Some of them were created with a stange program using 0x8d as soft
line feed.
Reading anything as text failed to get clean output.

So reading it in binary mode and interpreting
\r\n\r\n as paragraph separator
\r\r "
\n\n "
convert 0x8d to either nothing or single space
convert (\r)\n\f to \n\n
convert \f to nothing or single space
\t as single space - except in tables
\t as sequence of spaces in tables to fill up the columns
remove any syllable (mens make a single word of the syllabled one) but
leave hyphen intact

and then reformat to 80 column fixed font, leaving tables intact.

No problem insofar but the different newline separators had it made
impossible to read that as text because the only way to get out the
different text modes was to read that as binary stream.

myungetc(), mygetc() was needed to unget multiple chars.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
 
Reply With Quote
 
 
 
 
S.Tobias
Guest
Posts: n/a
 
      05-20-2006
P.J. Plauger <(E-Mail Removed)> wrote:
> "S.Tobias" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...


>> Why is there the text mode in the first place? All operations valid
>> for text streams seem to be valid for binary ones, too. Text streams
>> are more difficult to handle (eg. you can't calculate offsets, there's
>> some extra undefinededness). Apart from system compatibility, is there
>> any advantage to opening files in text mode?

>
> System compatibility is a damned important reason.


All right. But besides that, is there any advantage that text files/mode
offer that binary files/mode don't have?

Suppose I'm serializing data into a textual representation to be read on
another system (with the same charset). Does it matter whether I open
the file in text or binary mode?

>Whitesmiths,
> Ltd. introduced the text/binary dichotomy in 1978 when porting C to
> dozens of non-Unix systems, and other companies did much the same thing
> in the coming years. It was a slam dunk to put it in the draft C Standard
> begun in 1983.


I've often heard in c.l.c. some systems (Mainframes) had complicated
internal representation of text files. Why was it that way? What did
it solve? Why couldn't they be replaced with simple "binary" files with
'\n' as record separator?

IMHO how text is represented could be viewed as a per-application
convention rather than system-wide. `Sendmail' doesn't have to read
`inetd' configuration files and v.v., so there is no reason why they
should follow the same text representation convention. It means that
on a system several (or even unlimited) conventions might be present.
Why is there in the C language room only for one type of text stream?

There is only one "binary" file (bytes are stored in the file exactly as
written to; no translation is done). Why isn't the default open mode
(such as "r+") binary? It seems to me more natural to have settled it
this way.

--
Stan Tobias
mailx `echo http://www.velocityreviews.com/forums/(E-Mail Removed)LID | sed s/[[:upper:]]//g`
 
Reply With Quote
 
 
 
 
Keith Thompson
Guest
Posts: n/a
 
      05-20-2006
"S.Tobias" <(E-Mail Removed)> writes:
> P.J. Plauger <(E-Mail Removed)> wrote:
>> "S.Tobias" <(E-Mail Removed)> wrote in message
>> news:(E-Mail Removed)...

>
>>> Why is there the text mode in the first place? All operations valid
>>> for text streams seem to be valid for binary ones, too. Text streams
>>> are more difficult to handle (eg. you can't calculate offsets, there's
>>> some extra undefinededness). Apart from system compatibility, is there
>>> any advantage to opening files in text mode?

>>
>> System compatibility is a damned important reason.

>
> All right. But besides that, is there any advantage that text files/mode
> offer that binary files/mode don't have?


Um, yes. Text files represent text.

> Suppose I'm serializing data into a textual representation to be read on
> another system (with the same charset). Does it matter whether I open
> the file in text or binary mode?


Absolutely. For example, as I'm sure you know, Windows represents an
end-of-line by two characters, a CR followed by an LF ('\r' followed
by '\n'). If you write a "text" file on one Windows system and read
it on another in binary mode, there are two possibilities: either the
program that reads the file has to explicitly discard the '\r'
characters, or the file won't be a valid Windows text file, and you
won't be able to process it with other tools, such as ordinary text
editors.

>>Whitesmiths,
>> Ltd. introduced the text/binary dichotomy in 1978 when porting C to
>> dozens of non-Unix systems, and other companies did much the same thing
>> in the coming years. It was a slam dunk to put it in the draft C Standard
>> begun in 1983.

>
> I've often heard in c.l.c. some systems (Mainframes) had complicated
> internal representation of text files. Why was it that way? What did
> it solve? Why couldn't they be replaced with simple "binary" files with
> '\n' as record separator?


They could have. They weren't.

Historically, files on mainframes were typically stacks of 80-column
punch cards. The complex internal representations of text files were
based on that. (I'm not very familiar with this, so I could be
mistaken.) Changing to a Unix-style format would break compatibility.

> IMHO how text is represented could be viewed as a per-application
> convention rather than system-wide. `Sendmail' doesn't have to read
> `inetd' configuration files and v.v., so there is no reason why they
> should follow the same text representation convention. It means that
> on a system several (or even unlimited) conventions might be present.


That sounds like a nightmare. Do I have to have one version of vi or
emacs to read sendmail config files and another to read indentd config
files?

> Why is there in the C language room only for one type of text stream?


Why does there need to be more than one?

> There is only one "binary" file (bytes are stored in the file exactly as
> written to; no translation is done). Why isn't the default open mode
> (such as "r+") binary? It seems to me more natural to have settled it
> this way.


There is no default open mode. You always have to specify whether
you're opening the file in text or binary node. You specify binary
mode by including a 'b' in the mode argument; you specify text mode by
not includig a 'b' in the mode argument.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
 
Reply With Quote
 
Dik T. Winter
Guest
Posts: n/a
 
      05-21-2006
In article <(E-Mail Removed)> "S.Tobias" <(E-Mail Removed)> writes:
....
> I've often heard in c.l.c. some systems (Mainframes) had complicated
> internal representation of text files. Why was it that way? What did
> it solve? Why couldn't they be replaced with simple "binary" files with
> '\n' as record separator?


Because they also did not have simple "binary" files. A "binary" file
consisted of (for instance) fixed length records of (say) 80 bytes
(whatever the size of a byte). This conformed to the Fortran and
Cobol models (also for text files), with only an implicit record
separator. And if there were variable length records available, they
were either represented by a length preceding the content or (on the
CDC Cyber) as a sequence of words, each containing 10 6-bit bytes or
5 12-bit bytes, where the last word in the sequence contained 12 zero
bits in the low order part. The ultimate reason was that I/O was
record oriented, because of speed.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      05-21-2006
On Sat, 20 May 2006 22:53:22 +0000, S.Tobias wrote:

> I've often heard in c.l.c. some systems (Mainframes) had complicated
> internal representation of text files. Why was it that way? What did
> it solve? Why couldn't they be replaced with simple "binary" files with
> '\n' as record separator?


If I can be informatively flippant: the problem was not a complicated
*internal* representation, but of a dominant *external* one -- punched
cards. In a world of cards, why would one waste one of the precious 72
character spaces (the last 8 were often reserved for sequence numbering)
for a marker to show the end of something that so obvious as the end of
the card? It ended -- the computer got an signal the card was done. What
was the point of a marker?

In that world, \n (and \r and \0 used to pad the output) were seen as
control characters sent only to a printer so that it would advance the
paper and re-position the head (if it had one!).

Another consequence was that spaces did not really exist (or more
precisely that there were a lot of implicit ones). Most card punches
(if I remember right) punched nothing where there was a space so you could
not tell where the "line" ended, except for the obvious: after 80
characters (or 72 if you were stripping sequence numbers).

--
Ben.
 
Reply With Quote
 
Richard Heathfield
Guest
Posts: n/a
 
      05-21-2006
Keith Thompson said:

> Do I have to have one version of vi or
> emacs to read sendmail config files and another to read indentd config
> files?


The indentd daemon (after just a quick pint of config down at the /etc)
faithfully chunters along in the background, waiting and watching for that
momentous occasion when the user decides to run indent, a thin client which
opens a connection to the daemon, hurls the C code down it, and says
"whaddya mek o' that, then, laddie?"

Nothing daunted, indentd bravely catches the code, and turns it from IOCCC
material into something approximately approaching readability. Handing it
back to the client with a smart salute and a "have a nice day", indentd
awaits the next urgent case of mangled layout, knowing that every readable
program it produces is another victory for God, Queen, and country.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
 
Reply With Quote
 
Richard Bos
Guest
Posts: n/a
 
      05-22-2006
CBFalconer <(E-Mail Removed)> wrote:

> "P.J. Plauger" wrote:
> > If you try to live with just binary mode, then every program either has
> > to map text files for itself or tolerate a broad assortment of rules for
> > delimiting text lines. There's precedent for the latter approach too
> > (see, for example, Java), but Unix gives a powerful precedent for
> > having a uniform internal convention for representing text streams.

>
> However the user should be aware that everything breaks down if the
> input system tries to handle a file as text when that file doesn't
> adhere to the conventions for text on the system.


So, as the doctor said to the man who complained that his arm hurt when
he hit his elbow against the wall, Don't Do That, Then. That's why we
have FTP in A mode.

Richard
 
Reply With Quote
 
S.Tobias
Guest
Posts: n/a
 
      05-23-2006
Keith Thompson <(E-Mail Removed)> wrote:
> "S.Tobias" <(E-Mail Removed)> writes:
>> P.J. Plauger <(E-Mail Removed)> wrote:
>>> "S.Tobias" <(E-Mail Removed)> wrote in message
>>> news:(E-Mail Removed)...


....
>> All right. But besides that, is there any advantage that text files/mode
>> offer that binary files/mode don't have?

>
> Um, yes. Text files represent text.
>

Well, binary files can contain text, too.

>> Suppose I'm serializing data into a textual representation to be read on
>> another system (with the same charset). Does it matter whether I open
>> the file in text or binary mode?

>
> Absolutely. For example, as I'm sure you know, Windows represents an
> end-of-line by two characters, a CR followed by an LF ('\r' followed

[...]
>

I wasn't clear enough, I have to restate the problem. Suppose the file
is not meant for interaction with other system tools (editors), but is
a means of transferring the results to another instance of a similar
program that will continue calculations (ie. the writing mode is known).

main()
{
double result = calculate();
FILE *fp = fopen("results.txt", "w" BINM);
fprintf(fp, "%f\n", result);
fclose(fp);

fp = fopen("results.txt", "r" BINM);
fscanf(fp, "%f", &result);
fclose(fp);
cont_calculation(result);
}

Will it matter if BINM is #defined as "b" or as nothing?
Can binary mode replace the text mode in this way? I'm reading from
the Standard that nul characters may be appended to a binary stream;
could this cause problems if I want to handle binary stream in text
manner, like in the above sketch? (Will append mode + multiple closing
and opening work correctly?)

>>>Whitesmiths,
>>> Ltd. introduced the text/binary dichotomy in 1978 when porting C to
>>> dozens of non-Unix systems, and other companies did much the same thing
>>> in the coming years. It was a slam dunk to put it in the draft C Standard
>>> begun in 1983.

....
>> IMHO how text is represented could be viewed as a per-application
>> convention rather than system-wide. `Sendmail' doesn't have to read
>> `inetd' configuration files and v.v., so there is no reason why they
>> should follow the same text representation convention. It means that
>> on a system several (or even unlimited) conventions might be present.

>
> That sounds like a nightmare. Do I have to have one version of vi or
> emacs to read sendmail config files and another to read indentd config
> files?
>

Or the editors would have to be able to read multiple text formats.
IIRC, Windows XP WordPad and Notepad can save to plain text, RTF, Unicode
and Utf-8 formats. Can't we consider all these formats as text formats?

It's not that uncommon that special configuration files have dedicated
editors, eg.: vipw, vigr, visudo (however, for different reasons than
text file format convention).

>> Why is there in the C language room only for one type of text stream?


I was wrong here, actually C does not preclude multiple text modes.
One could specify one like this, as an extension: "r+OStext".



>> There is only one "binary" file (bytes are stored in the file exactly as
>> written to; no translation is done). Why isn't the default open mode
>> (such as "r+") binary? It seems to me more natural to have settled it
>> this way.

>
> There is no default open mode. You always have to specify whether
> you're opening the file in text or binary node. You specify binary
> mode by including a 'b' in the mode argument; you specify text mode by
> not includig a 'b' in the mode argument.
>

For me the main difference between binary and text modes is that the
first is untranslated and the other is translated. You can "not do"
something only in one way, therefore I feel it would have been more
logical to have the default (not including mode spec in the argument)
binary.

--
Stan Tobias
mailx `echo (E-Mail Removed)LID | sed s/[[:upper:]]//g`
 
Reply With Quote
 
ena8t8si@yahoo.com
Guest
Posts: n/a
 
      05-27-2006

Richard Heathfield wrote:
> osmium said:
>
> > "P.J. Plauger" writes:
> >
> >>> I'll be damned! In Note 2, they defined byte very precisely as a word
> >>> that simply means a collection of contiguous bits. They took a widely
> >>> used word, that meant something to hundreds of thousands of people and
> >>> redefined it to mean something entirely different.
> >>>
> >>> There are about 30 definitions of byte that make the cut on google, and
> >>> the *vast* majority say a byte is eight bits.
> >>
> >> We forgot to do a web search before we chose that terminology in 1983.

> >
> > I appreciate your sarcasm and have no desire to argue with anyone - and
> > most certainly not with you.
> >
> > But wasn't the word byte pretty much introduced into the world by the IBM
> > 360 in 1964 or thereabouts?

>
> Knuth says that the 8-bit "standardisation" happened in around 1975 or so.
> By then, C was already well under way, and dmr was almost certainly
> accustomed to using the word in its non-"standard" sense.


Speaking as someone who worked on System/360's, and other computers,
during the 1960's, the word byte was already established as meaning an
8-bit quantity during that time. For sure, there were machines with
other byte sizes, but those were explicitly qualified - "seven-bit
byte",
or whatever. In the absence of any indication otherwise, byte always
meant 8 bits, even when C was growing up.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Differance Between Default Permissions in MS SharePoint Server 200 rawan ASP .Net 1 01-08-2009 06:02 AM
Differance between USB and USB2 Cable? Rob Computer Support 7 10-08-2006 09:32 PM
Video cards differance Richard Miller Computer Support 3 08-19-2006 05:23 AM
Differance between Intrrupt and function Harshankumar C Programming 2 07-27-2005 05:32 PM
CD RW / CD RWR :- What's the differance Win Computer Support 31 04-09-2005 11:02 PM



Advertisments