Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Java and huge XML file to be parsed

Reply
Thread Tools

Java and huge XML file to be parsed

 
 
Jezuch
Guest
Posts: n/a
 
      06-22-2004
Użytkownik Roedy Green napisał:
> You are missing my point. I believe that both XML and HTML, the thing
> actually posted should be binary formats. No one would ever read or
> edit them directly, guaranteed to meet the spec, preparsed. Anything
> hand-coded with notepad is guaranteed to have some errors.(..)


But then you'd never see WWW as it is today. Heck, you'd never see WWW at all
--
Ecce Jezuch
"I went in killing the sun
I once one" - T. Meeks
 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      06-22-2004
On Wed, 23 Jun 2004 00:35:17 +0200, Jezuch <(E-Mail Removed)> wrote
or quoted :

>But then you'd never see WWW as it is today. Heck, you'd never see WWW at all


I disagree. The tools just clean up. Think how many times you go to a
website and the code does not work with your browser.

Had we used a binary format:

1. web browsing would be at least twice as fast.

2. you would have far less problem with browsers not rendering as
expected.

It is just people would have used more appropriate tools to create the
web content.

Note how often now people are moving to PDF. Part of that is for fine
control, but much of it is simply is to avoid all the variation in
HTML and syntax errors.

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
 
Reply With Quote
 
 
 
 
Christophe Vanfleteren
Guest
Posts: n/a
 
      06-22-2004
Roedy Green wrote:

> On Tue, 22 Jun 2004 17:04:30 +0000 (UTC), Dimitri Maziuk
> <dima@127.0.0.1> wrote or quoted :
>
>>
>>LOL. Roedy, either you've never looked at output of any "HTML processor",
>>or you're posting from a parallel universe.

>
> You are missing my point. I believe that both XML and HTML, the thing
> actually posted should be binary formats. No one would ever read or
> edit them directly, guaranteed to meet the spec, preparsed. Anything
> hand-coded with notepad is guaranteed to have some errors. Even
> though I validate my HTML daily, you will always find some HTML errors
> in there, and also some quasi errors that I tell the verifier to
> ignore. My site is very clean compared with most.


I'm afraid that doesn't make much sense. Validation is a binary property.
Either something validates, or it doesn't.

Why are you so afraid about someone putting out non-validating HTML/XML? If
all browsers had started out with strict parsers (and if the WYSIWYG
programs created valid HTML), we wouldn't have had the problem with HTML we
have today. Browsers were way too liberal in what they accepted, and that
got us in the mess we are in today. If the web had started out with an xml
format, no non-validating pages would be found, since no browser would let
you view them (and I suspect that even the most ignorant Frontpage monkey
tests their pages at least once, all be it just in the very latest IE
version

>
> See http://mindprod.com/jgloss/xml.html and
> http://mindprod.com/projects/htmlcompactor.html for the sort of
> formats I had in mind.
>
>
> When you want to view the HTML/XML you use a viewer or editor.
> Tradionalists could fluff it up to something like conventional HTML or
> XML for viewing. I would prefer something more graphic like a JTree or
> WYSIWYG
>
> How many of you are old enough to remember Wordstar. It was
> conceptually easy to understand because you embedded visible tags in
> your text. Then Word came along and hit the tags, and just let you
> think in terms of the final outcome. It drove everyone mad at first
> since Word did such a bad job of the internal tags, but in the long
> run the impossibility of getting invalid or unbalanced tags won out.
>
> XML is just about data, so you don't have that same problem. With
> HTML it would a lot easier to collapse and clean up a preparsed tree.


There are HTML/XML editors out there that let you view your page as a tree.
So I guess it is even possible without a binary format.

--
Kind regards,
Christophe Vanfleteren
 
Reply With Quote
 
Jezuch
Guest
Posts: n/a
 
      06-22-2004
Użytkownik Roedy Green napisał:
> It is just people would have used more appropriate tools to create the
> web content.


This one is *the* problem. People are lazy. Imagine what would happen if you
developed something like this and said to them "it's all fine, but you have
to use THIS tool". I presume that noone would bother to get it...
--
Ecce Jezuch
"I went in killing the sun
I once one" - T. Meeks
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      06-22-2004
On Wed, 23 Jun 2004 01:31:17 +0200, Jezuch <(E-Mail Removed)> wrote
or quoted :

>This one is *the* problem. People are lazy. Imagine what would happen if you
>developed something like this and said to them "it's all fine, but you have
>to use THIS tool". I presume that noone would bother to get it...


IF XML and HTML were binary formats there would be MORE tools to
choose from because it is so much easier to work with a binary format
than one you have to parse and that is CRAM FULL OF SYNTAX ERRORS.



--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
 
Reply With Quote
 
iksrazal
Guest
Posts: n/a
 
      06-22-2004
Sudsy <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> Roedy Green wrote:
> <snip>
> > You don't use notepad to edit your Oracle files. <snip>

>
> You're right: I use vi.


Doesn't everybody?

iksrazal
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      06-22-2004
On Tue, 22 Jun 2004 22:47:57 GMT, Christophe Vanfleteren
<(E-Mail Removed)> wrote or quoted :

>I'm afraid that doesn't make much sense. Validation is a binary property.


In an ideal world that would be true, but it most certainly is not in
the world of HTML.

Have a look at the hundreds of option switches on HTMLValidator.
see http://mindprod.com/jgloss/htmlvalidator.html

Look at how many official W3C validation HTML standards there are at
http://mindprod.com/jgloss/htmlcheat.html#DOCTYPE

There are so many to allow for varying degrees of anal retentiveness.

If HTML were a binary format this would not be a concern to anyone but
tool writers.

When you go for a human readable, human editable format, you
necessarily introduce tolerance for error, variation and general
sloppiness. With a binary format, you can be like Mussolini and make
the trains run on time, without anyone feeling the internal Fascism.

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      06-23-2004
Roedy Green <(E-Mail Removed)> writes:
>If HTML were a binary format this would not be a concern to
>anyone but tool writers.


On a digital computing engine or storage system, an HTML or
XML document indeed usually is stored as a sequence of binary
digits.

What does "binary format" mean to you?


 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      06-23-2004
On 23 Jun 2004 00:12:59 GMT, http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de (Stefan Ram) wrote
or quoted :

> On a digital computing engine or storage system, an HTML or
> XML document indeed usually is stored as a sequence of binary
> digits.
>
> What does "binary format" mean to you?


see the two essays I referred to earlier in this thread.

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      06-23-2004
On Wed, 23 Jun 2004 00:21:40 GMT, Roedy Green
<(E-Mail Removed)> wrote or quoted :

>> What does "binary format" mean to you?

>
>see the two essays I referred to earlier in this thread.


Let me sell you the idea in stages.

What if XML were not considered valid unless it contained a signature
by a tool that included a checksum that asserted the file conformed to
the DTD and XML in general. The tool would identify itself as part of
the signature.

XML generating library or editor would provide this.

The next stage would be certification of such verifiers.

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Memory error due to the huge/huge input file size tejsupra@gmail.com Python 3 11-20-2008 07:21 PM
Writing a local XML file from a parsed URL?? Similar to RSS linearfusion Javascript 2 06-27-2006 07:37 PM
Validating parsed XML document against XML-schema TKok Java 1 12-08-2005 02:01 PM
insertion of string into characters of parsed xml (in SAX) YuliaG Java 2 04-04-2005 07:16 AM
How to embed html in xml (i.e. prevent the html from being parsed)? Failure XML 1 09-07-2003 09:34 PM



Advertisments