Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   XML (http://www.velocityreviews.com/forums/f32-xml.html)
-   -   [Q] Text vs Binary Files (http://www.velocityreviews.com/forums/t167178-q-text-vs-binary-files.html)

Eric 05-27-2004 01:23 AM

[Q] Text vs Binary Files
 
Assume that disk space is not an issue
(the files will be small < 5k in general for the purpose of storing
preferences)

Assume that transportation to another OS may never occur.


Are there any solid reasons to prefer text files over binary files
files?

Some of the reasons I can think of are:

-- should transportation to another OS become useful or needed,
the text files would be far easier to work with

-- tolerant of basic data type size changes (enumerated types have been
known to change in size from one version of a compiler to the next)

-- if a file becomes corrupted, it would be easier to find and repair
the problem potentially avoiding the annoying case of just
throwing it out

I would like to begin using XML for the storage of application
preferences, but I need to convince others who are convinced that binary
files are the superior method that text files really are the way to go.

Thoughts? Comments?

Arthur J. O'Dwyer 05-27-2004 02:03 AM

Re: [Q] Text vs Binary Files
 

On Thu, 27 May 2004, Eric wrote:
>
> Assume that disk space is not an issue [...]
> Assume that transportation to another OS may never occur.
> Are there any solid reasons to prefer text files over binary files?
>
> Some of the reasons I can think of are:
>
> -- should transportation to another OS become useful or needed,
> the text files would be far easier to work with


I would guess this is wrong, in general. Think of the difference
between a DOS/Win32 text file, a MacOS text file, and a *nix text
file (hint: linefeeds and carriage returns). Now think of the
difference between the same systems' binary files (hint: nothing).
There do exist many free tools to deal with line-ending troubles,
though, so this isn't really a disadvantage; just a counter to your
claim.

> -- tolerant of basic data type size changes (enumerated types have been
> known to change in size from one version of a compiler to the next)


It's about five minutes' work to write portable binary I/O functions
in most languages, if you're worried about the size of 'int' on your
next computer or something. Check out any file-format standard for
ideas, and Google "network byte order." If you're coming from a C
background, then you'll understand when I tell you that 'fwrite' should
never, ever be applied to anything but buffers of 'unsigned char'! :)

> -- if a file becomes corrupted, it would be easier to find and repair
> the problem potentially avoiding the annoying case of just
> throwing it out


Yes, definitely. Also, it's much easier to tell if text has been
corrupted in transmission --- it won't look like text anymore!
Binary always looks like binary; you need explicit checksums and
guards against corruption there. (Again, see file-format standards,
especially my favorite, the PNG image standard.)

> I would like to begin using XML for the storage of application
> preferences, but I need to convince others who are convinced that binary
> files are the superior method that text files really are the way to go.


One major advantage of plain text is that it can be sent over HTTP
and other Web protocols without "armoring." You can put plain text
in the body of a POST request, for example, where I doubt arbitrary
bytes would be accepted. (I dunno, though.)
Along the same lines, you can email your data files back and forth
in the body of an email message, rather than mucking about with
attachments.

The disadvantage is size; but you don't seem worried about that.
Another possible disadvantage would be that text is easily read and
reverse-engineered, if you're worried about that (e.g., proprietary
config files or savefiles for a game) --- but then you can always
encrypt whatever you don't want read immediately. [Whatever you
don't want read *ever*, you simply don't give to your users, because
they'll crack anything given enough time.]

HTH,
-Arthur



Eric 05-27-2004 03:55 AM

Re: [Q] Text vs Binary Files
 
Arthur J. O'Dwyer <ajo@nospam.andrew.cmu.edu> wrote:

> > -- should transportation to another OS become useful or needed,
> > the text files would be far easier to work with

>
> I would guess this is wrong, in general. Think of the difference
> between a DOS/Win32 text file, a MacOS text file, and a *nix text
> file (hint: linefeeds and carriage returns).


Which is why I mentioned at the end using a solid XML parser to deal
with such issues transparently. I likely wouldn't consider using a text
file if something like XML and solid parsers weren't available and free.

> Now think of the
> difference between the same systems' binary files (hint: nothing).


Well, you say 'same systems'...so, yes, in general, reading & writing a
binary file that will never be moved to another OS shouldn't present any
serious issues. (or am I wrong here?)

However, the point was that it could be moved, in which case dealing
with big/little endian issues would become important.

> > -- tolerant of basic data type size changes (enumerated types have
> > been known to change in size from one version of a compiler to
> > the next)

>
> It's about five minutes' work to write portable binary I/O functions
> in most languages


Ah, but it's five minutes I don't want to spend, especially since the
time would need to be spent every time something changed. I believe in
fixing a problem once.

Plus, the potental for spending time attempting to figure out why the
@#$%@$ isn't being read properly isn't accounted for here.

> Another possible disadvantage would be that text is easily read and
> reverse-engineered


In my case, this is a benefit.

--
== Eric Gorr ========= http://www.ericgorr.net ========= ICQ:9293199 ===
"Therefore the considerations of the intelligent always include both
benefit and harm." - Sun Tzu
== Insults, like violence, are the last refuge of the incompetent... ===

gswork 05-27-2004 08:01 AM

Re: [Q] Text vs Binary Files
 
egDfAusenetE5fz@verizon.net (Eric) wrote in message news:<1geew2n.10d70ck1mpdeeN%egDfAusenetE5fz@veriz on.net>...
> Assume that disk space is not an issue
> (the files will be small < 5k in general for the purpose of storing
> preferences)
>
> Assume that transportation to another OS may never occur.
>
>
> Are there any solid reasons to prefer text files over binary files
> files?
>
> Some of the reasons I can think of are:
>
> -- should transportation to another OS become useful or needed,
> the text files would be far easier to work with
>
> -- tolerant of basic data type size changes (enumerated types have been
> known to change in size from one version of a compiler to the next)
>
> -- if a file becomes corrupted, it would be easier to find and repair
> the problem potentially avoiding the annoying case of just
> throwing it out


All good reasons...

> I would like to begin using XML for the storage of application
> preferences, but I need to convince others who are convinced that binary
> files are the superior method that text files really are the way to go.
>
> Thoughts? Comments?


For your application i think you have it right. Preferences in an XML
text file are more flexible for the user/admin (can be edited by hand
as last resort) and also for you as developer, a text file can have
entries listed 'out of order' and with the right tags and parsing it
won't really matter. For the same reasons they can also be easier to
change and add to over time.

The main reasons for using binary files to store preferences are:

-security (but they're crackable, and text files can be encrypted
anyway)
-programming ease, it can be easier to just have a preference
structure than to attempt a robust parsing of a given set of text
items, the text could be messed with after all
-size, relevant if they need to be shuttled around a network a lot or
will take up lots disk space

It sounds like they don't apply in your case.

Arthur J. O'Dwyer 05-27-2004 02:07 PM

Re: [Q] Text vs Binary Files
 

On Thu, 27 May 2004, Eric wrote:
>
> Arthur J. O'Dwyer <ajo@nospam.andrew.cmu.edu> wrote:
> [Eric wrote]
> > > -- should transportation to another OS become useful or needed,
> > > the text files would be far easier to work with

> >
> > I would guess this is wrong, in general. Think of the difference
> > between a DOS/Win32 text file, a MacOS text file, and a *nix text
> > file (hint: linefeeds and carriage returns).

>
> Which is why I mentioned at the end using a solid XML parser to deal
> with such issues transparently. I likely wouldn't consider using a text
> file if something like XML and solid parsers weren't available and free.


Ah, but what do you do when the XML standard changes? :) Seriously,
this is something you really need to consider IMHO. (Of course, this
is cross-posted to an XML group, and I don't know much about XML, so
don't take my word about anything...) There are XML Version Foo parsers
available now, but when XML Version Bar comes out, there'll be lag time.
Think of the messes with HTML 4.0 [about which I know little] and C'99
[about which I know much].
Free parsers *are* nice, though, no dispute there. :)

> > Now think of the
> > difference between the same systems' binary files (hint: nothing).

>
> Well, you say 'same systems'...so, yes, in general, reading & writing a
> binary file that will never be moved to another OS shouldn't present any
> serious issues. (or am I wrong here?)


Misunderstood. By "the same systems," I meant the systems I just
mentioned: DOS/Win32, Unix, and MacOS. Their binary data formats are
identical.

> > > -- tolerant of basic data type size changes (enumerated types have
> > > been known to change in size from one version of a compiler to
> > > the next)

> >
> > It's about five minutes' work to write portable binary I/O functions
> > in most languages

>
> Ah, but it's five minutes I don't want to spend,


Versus five minutes trying to make your free XML parser compile?
I'd take five minutes with binary files any day. ;-)

> especially since the
> time would need to be spent every time something changed. I believe in
> fixing a problem once.


So do I. That's why you spend the five minutes writing your portable
binary I/O functions. Then you never need to write them again. For
a not-so-hot-but-portable-across-aforementioned-systems example, see
http://www.contrib.andrew.cmu.edu/~a...re/ImageFmtc.c,
functions 'fread_endian' and 'bwrite_endian'. Write once, use many
times.
The number of bits in a 32-bit integer is *never* going to change.
The number of bits in a machine word is *definitely* going to change.
This is why all existing file-format standards explicitly state that
they are dealing with 32-bit integers, not machine words: so the
file-format code never has to change, no matter where it runs.

> Plus, the potental for spending time attempting to figure out why the
> @#$%@$ isn't being read properly isn't accounted for here.


Of course not. I/O is trivial. It's your *algorithms* that are
going to be broken; and they'd be broken no matter what output format
you used.

> > Another possible disadvantage would be that text is easily read and
> > reverse-engineered

>
> In my case, this is a benefit.


Good. :)

-Arthur

Eric 05-27-2004 02:40 PM

Re: [Q] Text vs Binary Files
 
Arthur J. O'Dwyer <ajo@nospam.andrew.cmu.edu> wrote:

> On Thu, 27 May 2004, Eric wrote:
> >
> > Arthur J. O'Dwyer <ajo@nospam.andrew.cmu.edu> wrote:
> > [Eric wrote]
> > > > -- should transportation to another OS become useful or needed,
> > > > the text files would be far easier to work with
> > >
> > > I would guess this is wrong, in general. Think of the difference
> > > between a DOS/Win32 text file, a MacOS text file, and a *nix text
> > > file (hint: linefeeds and carriage returns).

> >
> > Which is why I mentioned at the end using a solid XML parser to deal
> > with such issues transparently. I likely wouldn't consider using a text
> > file if something like XML and solid parsers weren't available and free.

>
> Ah, but what do you do when the XML standard changes? :)


Please correct me if I am wrong, but the design of XML already takes
this into account. In otherwords, the idea that it can and will change
is a part of the design - this is one reason why XML is such a nifty
technology.

> Misunderstood. By "the same systems," I meant the systems I just
> mentioned: DOS/Win32, Unix, and MacOS. Their binary data formats are
> identical.


What do you mean by 'their binary data formats are identical'?...this
would seem to imply that big/little endian issues are a thing of the
past...?

> > > > -- tolerant of basic data type size changes (enumerated types have
> > > > been known to change in size from one version of a compiler to
> > > > the next)
> > >
> > > It's about five minutes' work to write portable binary I/O functions
> > > in most languages

> >
> > Ah, but it's five minutes I don't want to spend,

>
> Versus five minutes trying to make your free XML parser compile?


Binaries of the better parsers are available, so this is a non-issue.
:-)

> > Plus, the potental for spending time attempting to figure out why the
> > @#$%@$ isn't being read properly isn't accounted for here.

>
> Of course not. I/O is trivial.


Once you track down the problem...however, it would not be uncommon to
think the problem lies elsewhere first and spend hours before finding
the trivial fix.

> It's your *algorithms* that are
> going to be broken; and they'd be broken no matter what output format
> you used.


With XML, the risk of this is far less, as long as you're not changing
the tag names or what they mean, if it really exists at all.

Arthur J. O'Dwyer 05-27-2004 04:11 PM

Re: [Q] Text vs Binary Files
 

On Thu, 27 May 2004, Eric wrote:
>
> Arthur J. O'Dwyer <ajo@nospam.andrew.cmu.edu> wrote:
> >
> > Ah, but what do you do when the XML standard changes? :)

>
> Please correct me if I am wrong, but the design of XML already takes
> this into account. In otherwords, the idea that it can and will change
> is a part of the design - this is one reason why XML is such a nifty
> technology.


Probably true. I don't know much about XML's namespacing rules
(by which I mean the rules that say that <foo> is an okay tag for
a user to create, but <bar> could be given special meaning by
future standards). [If anyone wants to give me a lecture, that's
fine; otherwise, I'll just look it up when I need to know. ;) ]

> > Misunderstood. By "the same systems," I meant the systems I just
> > mentioned: DOS/Win32, Unix, and MacOS. Their binary data formats are
> > identical.

>
> What do you mean by 'their binary data formats are identical'?...this
> would seem to imply that big/little endian issues are a thing of the
> past...?


Yup. The vast majority of computers these days use eight-bit
byte-oriented transmission and storage protocols. Whatever bit-ordering
problems there are have moved "downstream" to those people involved in
the construction of hardware that has to choose whether to transmit
bit 0 or bit 7 first (and I'm sure they have their own relevant
standards in those fields, too).
Again, I refer you to standards like RFCs 1950, 1951, and 1952
(Google "RFC 1950"). Note the utter lack of concern with the vagaries
of the machine. We have indeed moved past big/little-endian wars;
now, whoever's[1] writing the relevant standard simply says, "All eggs
distributed according to the Fred protocol must be broken at the
big end," and that's the end of *that!*


> > > Plus, the potental for spending time attempting to figure out why the
> > > @#$%@$ isn't being read properly isn't accounted for here.

> >
> > Of course not. I/O is trivial.

>
> Once you track down the problem...however, it would not be uncommon to
> think the problem lies elsewhere first and spend hours before finding
> the trivial fix.


You misunderstand me. I/O is trivial; thus, after the first five
minutes spent making sure the trivial code is correct (which is trivial
to prove), you never need to touch it or look at it again. If you
never touch it, you can't possibly introduce bugs into it. And if it
starts out bugfree (trivially proven), and never has any bugs introduced
into it (because it's never modified), then it will remain bugfree
forever. (And thus you never need to fix it, trivially or not.)

I'm completely serious and not using hyperbole at all when I say
I/O is trivial. It really is.

-Arthur

[1] - In speech I'd say "who'sever writing...," but that looks
awful no matter how I spell it. Whosever? Whos'ever? Who's-ever?
Yuck. :(

Darrell Grainger 05-27-2004 07:29 PM

Re: [Q] Text vs Binary Files
 
On Thu, 27 May 2004, Eric wrote:

> Assume that disk space is not an issue
> (the files will be small < 5k in general for the purpose of storing
> preferences)
>
> Assume that transportation to another OS may never occur.
>
>
> Are there any solid reasons to prefer text files over binary files
> files?
>
> Some of the reasons I can think of are:
>
> -- should transportation to another OS become useful or needed,
> the text files would be far easier to work with
>
> -- tolerant of basic data type size changes (enumerated types have been
> known to change in size from one version of a compiler to the next)
>
> -- if a file becomes corrupted, it would be easier to find and repair
> the problem potentially avoiding the annoying case of just
> throwing it out
>
> I would like to begin using XML for the storage of application
> preferences, but I need to convince others who are convinced that binary
> files are the superior method that text files really are the way to go.
>
> Thoughts? Comments?


In favour of binary, if a customer has access to it, they will be more
likely to muck with a text file then a binary file.

In favour of text, will you ever need to diff the files (old version
against new version)? Will you need to source control and/or merge the
files? Easier to do as text.

--
Send e-mail to: darrell at cs dot toronto dot edu
Don't send e-mail to vice.president@whitehouse.gov

Ben Measures 05-27-2004 10:20 PM

Re: [Q] Text vs Binary Files
 
Arthur J. O'Dwyer wrote:
> On Thu, 27 May 2004, Eric wrote:
>>Which is why I mentioned at the end using a solid XML parser to deal
>>with such issues transparently. I likely wouldn't consider using a text
>>file if something like XML and solid parsers weren't available and free.

>
> Ah, but what do you do when the XML standard changes? :) Seriously,
> this is something you really need to consider IMHO. (Of course, this
> is cross-posted to an XML group, and I don't know much about XML, so
> don't take my word about anything...) There are XML Version Foo parsers
> available now, but when XML Version Bar comes out, there'll be lag time.
> Think of the messes with HTML 4.0 [about which I know little] and C'99
> [about which I know much].
> Free parsers *are* nice, though, no dispute there. :)


XML was created to solve the problem of the HTML version mess. The
specification itself is very flexible (yet precise) with the result that
the language can be extended without needing a change to the
specification (or parsers based on the specification).

It's so good it's almost magical.

> The number of bits in a 32-bit integer is *never* going to change.
> The number of bits in a machine word is *definitely* going to change.
> This is why all existing file-format standards explicitly state that
> they are dealing with 32-bit integers, not machine words: so the
> file-format code never has to change, no matter where it runs.


IIRC in C++ (and I'm sure C) there is no such guarantee of a "32-bit
integer" - the int type can be more than 32-bits.

>>Plus, the potental for spending time attempting to figure out why the
>>@#$%@$ isn't being read properly isn't accounted for here.

>
> Of course not. I/O is trivial. It's your *algorithms* that are
> going to be broken; and they'd be broken no matter what output format
> you used.


Unless you're using somebody else's parser, which may not be broken.
Such as libxml2 which is *very* unlikely to be broken.

--
Ben M.

Arthur J. O'Dwyer 05-28-2004 02:05 PM

Re: [Q] Text vs Binary Files
 

On Thu, 27 May 2004, Ben Measures wrote:
>
> XML was created to solve the problem of the HTML version mess. The
> specification itself is very flexible (yet precise) with the result that
> the language can be extended without needing a change to the
> specification (or parsers based on the specification).
>
> It's so good it's almost magical.


Okay, I'm convinced, then. :)


> > The number of bits in a 32-bit integer is *never* going to change.
> > The number of bits in a machine word is *definitely* going to change.
> > This is why all existing file-format standards explicitly state that
> > they are dealing with 32-bit integers, not machine words: so the
> > file-format code never has to change, no matter where it runs.

>
> IIRC in C++ (and I'm sure C) there is no such guarantee of a "32-bit
> integer" - the int type can be more than 32-bits.


More is better. A 33-bit integer can hold all the values that a
32-bit integer can, and then some. If the particular algorithms in
question are defined not to use the "and then some" part of the integer,
that's fine. (The at-least-32-bit type in C and C++ is 'long int'.
When I use the word 'integer', I'm using it in the same sense as the
C standard: to mean "any integral type," not to mean "'int' type."
Just in case that was confusing you.)

*Again* I urge the consultation of the RFCs defining any standard
binary file format, and the notice of the complete lack of regard
for big-endian/little-endian/19-bit-int/37-bit-int issues. At the
byte level, these things simply never come up.


> >>Plus, the potental for spending time attempting to figure out why the
> >>@#$%@$ isn't being read properly isn't accounted for here.

> >
> > Of course not. I/O is trivial. It's your *algorithms* that are
> > going to be broken; and they'd be broken no matter what output format
> > you used.

>
> Unless you're using somebody else's parser, which may not be broken.
> Such as libxml2 which is *very* unlikely to be broken.


I don't see the connection between my statement and your reply.
What is the antecedent of your "Unless"? (Literally, you're saying
that if you use libxml2 for I/O, then your non-I/O-related algorithms
will have no bugs. This is what used to be called "spooky action at a
distance," and I don't think it applies to code. :)

-Arthur


All times are GMT. The time now is 08:24 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.