Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Need help on File parsing

Reply
Thread Tools

Need help on File parsing

 
 
David Resnick
Guest
Posts: n/a
 
      03-23-2011
On Mar 22, 4:13*pm, Maxx <(E-Mail Removed)> wrote:
> On Mar 21, 8:41*pm, Nobody <(E-Mail Removed)> wrote:
>
>
>
> > On Mon, 21 Mar 2011 13:35:01 -0700, Maxx wrote:
> > > I'm writing a C program which would parse a xml file as its input and
> > > perform specific operations...
> > > Now what i have in my mind is that i should declare a two dimensional
> > > array and store the xml file in it
> > > My question is... is there any better way to do this, i.e. is there any
> > > better way to store the xml input input..

>
> > Yes. In fact, it would be hard to imagine a worse way.

>
> > First, I wouldn't recommend trying to actually parse the XML yourself, as
> > you're practically bound to get it wrong. Use an XML parsing library
> > instead.

>
> > XML parsing libraries come in two main flavours: DOM and SAX. DOM
> > constructs a parse tree for the entire file, which the application can
> > then query. SAX generates events (reported via callbacks) as it parses the
> > file; it's up to the application to actually store the data.

>
> > Which flavour to use and exactly how to do it depend upon the details of
> > the application.

>
> Actually the xml file that i was going to provide the program will
> always have a predefined format, like the one example i gave above.It
> will always parse the same format and simply extract the values from
> the fields and write another xml file having the same template... so i
> was looking for the easiest way to solve it, instead of requiring to
> call extensive library functions...


Note that it always starts this way. It is easy to hand parse the XML
if it is in a truly fixed format, so why use a real parser? But then
there are modifications/extensions/etc. People hand edit the file and
add white space, which won't confuse a parser but messes up your less
flexible hand parse. People write a mixture of <element></element>
instead of <element/>, which should parse as equivalent and somehow
don't when hand parsing. People suddenly want validation. etc.
Going with a real parser is very much the way to go in a real
application, much more future friendly even if not apparently needed
up front...
 
Reply With Quote
 
 
 
 
John Bode
Guest
Posts: n/a
 
      03-23-2011
On Mar 23, 1:45*pm, David Resnick <(E-Mail Removed)> wrote:
> On Mar 22, 4:13*pm, Maxx <(E-Mail Removed)> wrote:
>
>
>
>
>
> > On Mar 21, 8:41*pm, Nobody <(E-Mail Removed)> wrote:

>
> > > On Mon, 21 Mar 2011 13:35:01 -0700, Maxx wrote:
> > > > I'm writing a C program which would parse a xml file as its input and
> > > > perform specific operations...
> > > > Now what i have in my mind is that i should declare a two dimensional
> > > > array and store the xml file in it
> > > > My question is... is there any better way to do this, i.e. is thereany
> > > > better way to store the xml input input..

>
> > > Yes. In fact, it would be hard to imagine a worse way.

>
> > > First, I wouldn't recommend trying to actually parse the XML yourself, as
> > > you're practically bound to get it wrong. Use an XML parsing library
> > > instead.

>
> > > XML parsing libraries come in two main flavours: DOM and SAX. DOM
> > > constructs a parse tree for the entire file, which the application can
> > > then query. SAX generates events (reported via callbacks) as it parses the
> > > file; it's up to the application to actually store the data.

>
> > > Which flavour to use and exactly how to do it depend upon the detailsof
> > > the application.

>
> > Actually the xml file that i was going to provide the program will
> > always have a predefined format, like the one example i gave above.It
> > will always parse the same format and simply extract the values from
> > the fields and write another xml file having the same template... so i
> > was looking for the easiest way to solve it, instead of requiring to
> > call extensive library functions...

>
> Note that it always starts this way. *It is easy to hand parse the XML
> if it is in a truly fixed format, so why use a real parser? *But then
> there are modifications/extensions/etc. *People hand edit the file and
> add white space, which won't confuse a parser but messes up your less
> flexible hand parse. *People write a mixture of <element></element>
> instead of <element/>, which should parse as equivalent and somehow
> don't when hand parsing. *People suddenly want validation. *etc.
> Going with a real parser is very much the way to go in a real
> application, much more future friendly even if not apparently needed
> up front...


Not to mention it's code that *you* don't have to write or test.

Figuring out how to use the library in your code will take less time
than writing a robust parser from scratch. Yes, you can hand-hack a
minimal, non-validating, less-than-totally-robust XML parser in an
afternoon (I've done it), but you'll be tweaking that sucker
*constantly* (which I did as well).
 
Reply With Quote
 
 
 
 
Michael Press
Guest
Posts: n/a
 
      03-24-2011
In article
<(E-Mail Removed)>,
David Resnick <(E-Mail Removed)> wrote:

> On Mar 22, 4:13*pm, Maxx <(E-Mail Removed)> wrote:
> > On Mar 21, 8:41*pm, Nobody <(E-Mail Removed)> wrote:
> >
> >
> >
> > > On Mon, 21 Mar 2011 13:35:01 -0700, Maxx wrote:
> > > > I'm writing a C program which would parse a xml file as its input and
> > > > perform specific operations...
> > > > Now what i have in my mind is that i should declare a two dimensional
> > > > array and store the xml file in it
> > > > My question is... is there any better way to do this, i.e. is there any
> > > > better way to store the xml input input..

> >
> > > Yes. In fact, it would be hard to imagine a worse way.

> >
> > > First, I wouldn't recommend trying to actually parse the XML yourself, as
> > > you're practically bound to get it wrong. Use an XML parsing library
> > > instead.

> >
> > > XML parsing libraries come in two main flavours: DOM and SAX. DOM
> > > constructs a parse tree for the entire file, which the application can
> > > then query. SAX generates events (reported via callbacks) as it parses the
> > > file; it's up to the application to actually store the data.

> >
> > > Which flavour to use and exactly how to do it depend upon the details of
> > > the application.

> >
> > Actually the xml file that i was going to provide the program will
> > always have a predefined format, like the one example i gave above.It
> > will always parse the same format and simply extract the values from
> > the fields and write another xml file having the same template... so i
> > was looking for the easiest way to solve it, instead of requiring to
> > call extensive library functions...

>
> Note that it always starts this way. It is easy to hand parse the XML
> if it is in a truly fixed format, so why use a real parser? But then
> there are modifications/extensions/etc. People hand edit the file and
> add white space, which won't confuse a parser but messes up your less
> flexible hand parse. People write a mixture of <element></element>
> instead of <element/>, which should parse as equivalent and somehow
> don't when hand parsing. People suddenly want validation. etc.
> Going with a real parser is very much the way to go in a real
> application, much more future friendly even if not apparently needed
> up front...


XML is the same as csh. Every time somebody raises a
problem with XML somebody else steps in and presents an
easy workaround. Eventually you are told not even to
try writing a parser. It is the death of a thousand
cuts. And for what?

XML gives PHBs the illusion that they know about
programming; and adventurers a cozy berth. XML is a scam.

Has XML gotten to the point a universal Turing machine
could be written in XML, or is it still singing "Daisy"?

--
Michael Press
 
Reply With Quote
 
David Resnick
Guest
Posts: n/a
 
      03-24-2011
On Mar 24, 4:45*am, Michael Press <(E-Mail Removed)> wrote:
> In article
> <(E-Mail Removed)>,
> *David Resnick <(E-Mail Removed)> wrote:
>
>
>
> > On Mar 22, 4:13*pm, Maxx <(E-Mail Removed)> wrote:
> > > On Mar 21, 8:41*pm, Nobody <(E-Mail Removed)> wrote:

>
> > > > On Mon, 21 Mar 2011 13:35:01 -0700, Maxx wrote:
> > > > > I'm writing a C program which would parse a xml file as its inputand
> > > > > perform specific operations...
> > > > > Now what i have in my mind is that i should declare a two dimensional
> > > > > array and store the xml file in it
> > > > > My question is... is there any better way to do this, i.e. is there any
> > > > > better way to store the xml input input..

>
> > > > Yes. In fact, it would be hard to imagine a worse way.

>
> > > > First, I wouldn't recommend trying to actually parse the XML yourself, as
> > > > you're practically bound to get it wrong. Use an XML parsing library
> > > > instead.

>
> > > > XML parsing libraries come in two main flavours: DOM and SAX. DOM
> > > > constructs a parse tree for the entire file, which the application can
> > > > then query. SAX generates events (reported via callbacks) as it parses the
> > > > file; it's up to the application to actually store the data.

>
> > > > Which flavour to use and exactly how to do it depend upon the details of
> > > > the application.

>
> > > Actually the xml file that i was going to provide the program will
> > > always have a predefined format, like the one example i gave above.It
> > > will always parse the same format and simply extract the values from
> > > the fields and write another xml file having the same template... so i
> > > was looking for the easiest way to solve it, instead of requiring to
> > > call extensive library functions...

>
> > Note that it always starts this way. *It is easy to hand parse the XML
> > if it is in a truly fixed format, so why use a real parser? *But then
> > there are modifications/extensions/etc. *People hand edit the file and
> > add white space, which won't confuse a parser but messes up your less
> > flexible hand parse. *People write a mixture of <element></element>
> > instead of <element/>, which should parse as equivalent and somehow
> > don't when hand parsing. *People suddenly want validation. *etc.
> > Going with a real parser is very much the way to go in a real
> > application, much more future friendly even if not apparently needed
> > up front...

>
> XML is the same as csh. Every time somebody raises a
> problem with XML somebody else steps in and presents an
> easy workaround. Eventually you are told not even to
> try writing a parser. It is the death of a thousand
> cuts. And for what?
>
> XML gives PHBs the illusion that they know about
> programming; and adventurers a cozy berth. XML is a scam.
>
> Has XML gotten to the point a universal Turing machine
> could be written in XML, or is it still singing "Daisy"?
>


XML is great in its place. Not a PHB, and don't believe
it to be a scam. I love it for flatfiles that need
structured information and flexibility. Easy to extend,
easy (with XPATH queries say) to get stuff out of.
Standard, everyone knows what it means, how to add
to it, how to parse and validate it. Does it solve
all problems in the world? Of course not...

-David



 
Reply With Quote
 
Nobody
Guest
Posts: n/a
 
      03-25-2011
On Wed, 23 Mar 2011 11:45:47 -0700, David Resnick wrote:

> Note that it always starts this way. It is easy to hand parse the XML
> if it is in a truly fixed format,


If you restrict the application to reading a subset of XML, that defeats
the purpose of using XML in the first place.

You can find a wide range of tools which can process XML, but the range of
tools which can process a particular custom subset of XML is likely to be
much smaller (i.e. those tools which you write yourself).

If you think that you only need to support files written by a particular
program, you're likely to end up only supporting files which were directly
written by that program and not post-processed in any way. This often
makes your program less useful than you had originally assumed.

 
Reply With Quote
 
Malcolm McLean
Guest
Posts: n/a
 
      03-25-2011
On Mar 24, 12:57*am, John Bode <(E-Mail Removed)> wrote:
> On Mar 23, 1:45*pm, David Resnick <(E-Mail Removed)> wrote:
>
> Figuring out how to use the library in your code will take less time
> than writing a robust parser from scratch. *Yes, you can hand-hack a
> minimal, non-validating, less-than-totally-robust XML parser in an
> afternoon (I've done it), but you'll be tweaking that sucker
> *constantly* (which I did as well).
>

The problem is that it becomes harder to distribute the program. Even
if you have source to the library, it's often in messy files that are
hard to integrate and distract the reader from the actual logical core
of the program.

 
Reply With Quote
 
David Resnick
Guest
Posts: n/a
 
      03-25-2011
On Mar 25, 3:40*am, Nobody <(E-Mail Removed)> wrote:
> On Wed, 23 Mar 2011 11:45:47 -0700, David Resnick wrote:
> > Note that it always starts this way. *It is easy to hand parse the XML
> > if it is in a truly fixed format,

>
> If you restrict the application to reading a subset of XML, that defeats
> the purpose of using XML in the first place.
>
> You can find a wide range of tools which can process XML, but the range of
> tools which can process a particular custom subset of XML is likely to be
> much smaller (i.e. those tools which you write yourself).
>
> If you think that you only need to support files written by a particular
> program, you're likely to end up only supporting files which were directly
> written by that program and not post-processed in any way. This often
> makes your program less useful than you had originally assumed.


Holy out of context quotes, Batman. Your reply misses the entire
point
of mine, which is that hand parsing is a bad idea. Did you read the
rest of the post or just answer after the first 2 lines?

-David
 
Reply With Quote
 
Nobody
Guest
Posts: n/a
 
      03-25-2011
On Fri, 25 Mar 2011 04:31:43 -0700, David Resnick wrote:

> Holy out of context quotes, Batman. Your reply misses the entire
> point of mine, which is that hand parsing is a bad idea. Did you read the
> rest of the post or just answer after the first 2 lines?


I wasn't "replying" to your comments. I elaborated on your reply,
providing more reasons why it's a bad idea to assume that you only need
to handle a subset.

 
Reply With Quote
 
David Resnick
Guest
Posts: n/a
 
      03-25-2011
On Mar 25, 2:15*pm, Nobody <(E-Mail Removed)> wrote:
> On Fri, 25 Mar 2011 04:31:43 -0700, David Resnick wrote:
> > Holy out of context quotes, Batman. *Your reply misses the entire
> > point of mine, which is that hand parsing is a bad idea. *Did you read the
> > rest of the post or just answer after the first 2 lines?

>
> I wasn't "replying" to your comments. I elaborated on your reply,
> providing more reasons why it's a bad idea to assume that you only need
> to handle a subset.


Just seemed to be replying to my comments, as that was the only quoted
text being addressed. My mistake.

-David
 
Reply With Quote
 
Maxx
Guest
Posts: n/a
 
      03-25-2011
On Mar 23, 11:45*am, David Resnick <(E-Mail Removed)> wrote:
> On Mar 22, 4:13*pm, Maxx <(E-Mail Removed)> wrote:
>
>
>
> > On Mar 21, 8:41*pm, Nobody <(E-Mail Removed)> wrote:

>
> > > On Mon, 21 Mar 2011 13:35:01 -0700, Maxx wrote:
> > > > I'm writing a C program which would parse a xml file as its input and
> > > > perform specific operations...
> > > > Now what i have in my mind is that i should declare a two dimensional
> > > > array and store the xml file in it
> > > > My question is... is there any better way to do this, i.e. is thereany
> > > > better way to store the xml input input..

>
> > > Yes. In fact, it would be hard to imagine a worse way.

>
> > > First, I wouldn't recommend trying to actually parse the XML yourself, as
> > > you're practically bound to get it wrong. Use an XML parsing library
> > > instead.

>
> > > XML parsing libraries come in two main flavours: DOM and SAX. DOM
> > > constructs a parse tree for the entire file, which the application can
> > > then query. SAX generates events (reported via callbacks) as it parses the
> > > file; it's up to the application to actually store the data.

>
> > > Which flavour to use and exactly how to do it depend upon the detailsof
> > > the application.

>
> > Actually the xml file that i was going to provide the program will
> > always have a predefined format, like the one example i gave above.It
> > will always parse the same format and simply extract the values from
> > the fields and write another xml file having the same template... so i
> > was looking for the easiest way to solve it, instead of requiring to
> > call extensive library functions...

>
> Note that it always starts this way. *It is easy to hand parse the XML
> if it is in a truly fixed format, so why use a real parser? *But then
> there are modifications/extensions/etc. *People hand edit the file and
> add white space, which won't confuse a parser but messes up your less
> flexible hand parse. *People write a mixture of <element></element>
> instead of <element/>, which should parse as equivalent and somehow
> don't when hand parsing. *People suddenly want validation. *etc.
> Going with a real parser is very much the way to go in a real
> application, much more future friendly even if not apparently needed
> up front...


I'm using the parser so that i can extract the necessary values from
specific fields...Anyways i have decided to go with a real parser as
its becoming too cumbersome.


Thanks
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Need Advice Help On Parsing A File Mrmaster Mrmaster Ruby 9 06-30-2009 04:55 AM
Need Help Parsing From File John Frame Python 12 12-10-2006 12:23 AM
Parsing data file, need help with the logic guser@packetstorm.org Perl Misc 6 06-27-2006 09:45 PM
In file parsing, taking the first few characters of a text file after a readfile or streamreader file read... .Net Sports ASP .Net 11 01-17-2006 12:44 AM
(noob) need help parsing Apache log file Koncept Ruby 9 03-03-2004 09:17 PM



Advertisments