Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > "Functional"

Reply
Thread Tools

"Functional"

 
 
Keith Thompson
Guest
Posts: n/a
 
      09-28-2013
Malcolm McLean <(E-Mail Removed)> writes:
> On Saturday, September 28, 2013 8:34:19 PM UTC+1, blmblm @ myrealbox. com wrote:
>> In article <(E-Mail Removed)>,
>> Well, yes, but .... If the data you're working on is arranged in
>> some recursive form, it seems to me to be natural to process it using
>> recursion. Examples that come to my mind are filesystems and XML. ?
>>

> XML has a recursive definition, but the files themselves seldom have
> nesting of arbitrary depth. So you can parse them slowly using a
> general-purpose recursive descent parser, like the vanilla XML parser
> (advertising feature, on my website). Or you can write a format
> -specific parser, which will probably execute faster an in less memory
> but which won't be recursive.


Wouldn't that break as soon as you feed it an XML document that happens
to be one level deeper than what you anticipated?

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
 
 
 
Malcolm McLean
Guest
Posts: n/a
 
      09-28-2013
On Saturday, September 28, 2013 9:19:37 PM UTC+1, Keith Thompson wrote:
> Malcolm McLean <(E-Mail Removed)> writes:
>
>
> Wouldn't that break as soon as you feed it an XML document that happens
> to be one level deeper than what you anticipated?
>

No, because typically it goes like this:

<company>
<site>
<other highly nested gubbins>

<employeelist>
<employee>
<name> Fred </name>
<salary> 2p </salary>
<id> 12345 </id>
</employee>
<employee>
<name> Bill Gates </name>
<salary> 1000000 million </salary>
<id> 12346 </id>
</employee>
</employee>
</employeelist>
....

So you read lines until you hit employeelist, then read off the employees.
You just ignore the deeply nested stuff around it. Presumably your program
can't handle those anyway, all it understands is lists of employees.
 
Reply With Quote
 
 
 
 
glen herrmannsfeldt
Guest
Posts: n/a
 
      09-28-2013
Keith Thompson <(E-Mail Removed)> wrote:
> Malcolm McLean <(E-Mail Removed)> writes:


(snip)
>> XML has a recursive definition, but the files themselves seldom have
>> nesting of arbitrary depth. So you can parse them slowly using a
>> general-purpose recursive descent parser, like the vanilla XML parser
>> (advertising feature, on my website). Or you can write a format
>> -specific parser, which will probably execute faster an in less memory
>> but which won't be recursive.


> Wouldn't that break as soon as you feed it an XML document that happens
> to be one level deeper than what you anticipated?


Well, for one, when I program for a "known" small size I usually
round up, a lot. So if I expected a depth of three or four, I might
program for 10 or more.

However, one of the features of XML is that a parser should ignore
tags that it doesn't recognize. That allows for forward compatibility.
(New features to be added, without invalidating old programs.)

The new tags might have more depth, but if you properly ignore them,
you can parse them, and ignore them, without any problems.

-- glen

 
Reply With Quote
 
Malcolm McLean
Guest
Posts: n/a
 
      09-29-2013
On Saturday, September 28, 2013 11:42:42 PM UTC+1, glen herrmannsfeldt wrote:
> Keith Thompson <(E-Mail Removed)> wrote:
>
> > Malcolm McLean <(E-Mail Removed)> writes:

>
>
> >> XML has a recursive definition, but the files themselves seldom have
> >> nesting of arbitrary depth. So you can parse them slowly using a
> >> general-purpose recursive descent parser, like the vanilla XML parser
> >> (advertising feature, on my website). Or you can write a format
> >> -specific parser, which will probably execute faster an in less memory
> >> but which won't be recursive.

>
> > Wouldn't that break as soon as you feed it an XML document that happens
> > to be one level deeper than what you anticipated?

>
> Well, for one, when I program for a "known" small size I usually
> round up, a lot. So if I expected a depth of three or four, I might
> program for 10 or more.
>
> However, one of the features of XML is that a parser should ignore
> tags that it doesn't recognize. That allows for forward compatibility.
> (New features to be added, without invalidating old programs.)
>
> The new tags might have more depth, but if you properly ignore them,
> you can parse them, and ignore them, without any problems.
>

The question is whether you need to parse recursively.

Say we've got xml like this:

<employee> Obama
<subordinate> Joe </subordinate>
<subordinate> Hilary
<subordinate> Bill
<subordinate> Monica </subordinate>
<subordinate> Socks </subordinate>
</subordinate>
</subordinate>
</employee>

It's hard to parse that non-recursively, because sub-ordinates can have sub-
subordinates in a whole hierarchy, we need to match the opens and closers.

But that's rare. Normally we'll want to present the data like this

<employee> Obama
<subordinates> Joe, Hilary </subordinates>
</employee>
<employee> Hilary
<subordinates> Bill </subordinates>
</employee>
<employee> Bill
<subordinates> Monica, Socks </subordinates>
</employee>

Given this, and given that our program understands employee names but not
hierarchies, we can just chug through ignoring the subordinate tags. We
don't need to keep track of anything except the closing tag.


 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      09-29-2013
Malcolm McLean <(E-Mail Removed)> writes:
[...]
> The question is whether you need to parse recursively.
>
> Say we've got xml like this:
>
> <employee> Obama
> <subordinate> Joe </subordinate>
> <subordinate> Hilary
> <subordinate> Bill
> <subordinate> Monica </subordinate>
> <subordinate> Socks </subordinate>
> </subordinate>
> </subordinate>
> </employee>
>
> It's hard to parse that non-recursively, because sub-ordinates can have sub-
> subordinates in a whole hierarchy, we need to match the opens and closers.
>
> But that's rare. Normally we'll want to present the data like this
>
> <employee> Obama
> <subordinates> Joe, Hilary </subordinates>
> </employee>
> <employee> Hilary
> <subordinates> Bill </subordinates>
> </employee>
> <employee> Bill
> <subordinates> Monica, Socks </subordinates>
> </employee>
>
> Given this, and given that our program understands employee names but not
> hierarchies, we can just chug through ignoring the subordinate tags. We
> don't need to keep track of anything except the closing tag.


If your data is not inherently recursive, perhaps XML is not the best
way to represent it.

In any case, it seems wasteful to write a custom parser that handles a
*subset* of XML when there are so many open source full XML parsers out
there. A custom parser might be smaller in terms of total code size,
but a full parser is much smaller in terms of code *that you have to
write and maintain*.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Malcolm McLean
Guest
Posts: n/a
 
      09-30-2013
On Sunday, September 29, 2013 9:45:47 PM UTC+1, Keith Thompson wrote:
> Malcolm McLean <(E-Mail Removed)> writes:
>
> If your data is not inherently recursive, perhaps XML is not the best
> way to represent it.
>

XML is a sort of accepted standard for information exchange. I
use if for the Baby X resource compiler, not because the data is
complex or recursive - essentially it's just a list of files to
pack into a compileable C source - but because it's easier than
defining a custom convention.
Whilst you can represent trees or other recursive structures in
XML directly, most XML files aren't like that.
>
> In any case, it seems wasteful to write a custom parser that handles a
> *subset* of XML when there are so many open source full XML parsers out
> there. A custom parser might be smaller in terms of total code size,
> but a full parser is much smaller in terms of code *that you have to
> write and maintain*.
>

I wrote a vanilla xml parser (http://www.malcolmmclean.site11.com/www).
There were other parsers available on the web, but I didn't like
any of them. The snag is that to support the full xml spec you need
lots of complex code, and then to load in a massive file you have to
devise a complex system for keeping most of it on disk. The XML people
call it a "pull parser". This is all overkill if you just want a crossword
file containing a 15x15 grid and maybe thirty one-line clues, or a
list of maybe a dozen files.
If you use a library then no-one can compile your code unless they
have that library installed, which means that no-one will compile
your code unless very motivated to use it. If you ship pages and
pages of source then the program becomes hard to understand. You also
potentially have legal problems, for instance Microsoft won't permit
their free compilers to be used on anything with a "viral" licence.

On the other hand it is better to do things properly.
 
Reply With Quote
 
Ian Collins
Guest
Posts: n/a
 
      09-30-2013
Malcolm McLean wrote:
> On Sunday, September 29, 2013 9:45:47 PM UTC+1, Keith Thompson wrote:
>> Malcolm McLean <(E-Mail Removed)> writes:
>>
>> If your data is not inherently recursive, perhaps XML is not the best
>> way to represent it.
>>

> XML is a sort of accepted standard for information exchange. I
> use if for the Baby X resource compiler, not because the data is
> complex or recursive - essentially it's just a list of files to
> pack into a compileable C source - but because it's easier than
> defining a custom convention.
> Whilst you can represent trees or other recursive structures in
> XML directly, most XML files aren't like that.


Most XML documents these days are web service requests or (Open)Office
documents. Both of these use XML namespaces, which inevitably leads to
a more complex parser.

>> In any case, it seems wasteful to write a custom parser that handles a
>> *subset* of XML when there are so many open source full XML parsers out
>> there. A custom parser might be smaller in terms of total code size,
>> but a full parser is much smaller in terms of code *that you have to
>> write and maintain*.
>>

> I wrote a vanilla xml parser (http://www.malcolmmclean.site11.com/www).
> There were other parsers available on the web, but I didn't like
> any of them. The snag is that to support the full xml spec you need
> lots of complex code, and then to load in a massive file you have to
> devise a complex system for keeping most of it on disk. The XML people
> call it a "pull parser". This is all overkill if you just want a crossword
> file containing a 15x15 grid and maybe thirty one-line clues, or a
> list of maybe a dozen files.


LibXML is pretty universal. Any platform with PHP or (I think) Python
installed will already have it.

It sounds like a simple SAX parser is all you need for your implementation.

--
Ian Collins
 
Reply With Quote
 
Malcolm McLean
Guest
Posts: n/a
 
      09-30-2013
On Monday, September 30, 2013 9:54:40 AM UTC+1, Ian Collins wrote:
> Malcolm McLean wrote:
>
>
> LibXML is pretty universal. Any platform with PHP or (I think) Python
> installed will already have it.
>

You can't give a program that depends on LibXML to a Windows user
and expect them to compile it. The Windows machine I'm typing this
on doesn't have it. In half an hour I'll turn on a Linux machine
then it's as simple as typing "app-get". But the world doesn't use
Linux or corporate development Windows machines with all the bits
and bats installed.
 
Reply With Quote
 
Ian Collins
Guest
Posts: n/a
 
      09-30-2013
Malcolm McLean wrote:
> On Monday, September 30, 2013 9:54:40 AM UTC+1, Ian Collins wrote:
>> Malcolm McLean wrote:
>>
>>
>> LibXML is pretty universal. Any platform with PHP or (I think) Python
>> installed will already have it.
>>

> You can't give a program that depends on LibXML to a Windows user
> and expect them to compile it. The Windows machine I'm typing this
> on doesn't have it. In half an hour I'll turn on a Linux machine
> then it's as simple as typing "app-get". But the world doesn't use
> Linux or corporate development Windows machines with all the bits
> and bats installed.


So you expect windows users to be using your X toolkit?

Seriously, if every piece of software that uses XML included its own
parser, half the programming world would be writing XML parsers. That's
why we have libraries.

--
Ian Collins
 
Reply With Quote
 
Malcolm McLean
Guest
Posts: n/a
 
      09-30-2013
On Monday, September 30, 2013 11:41:39 AM UTC+1, Ian Collins wrote:
> Malcolm McLean wrote:
>
> So you expect windows users to be using your X toolkit?
>
> Seriously, if every piece of software that uses XML included its own
> parser, half the programming world would be writing XML parsers. That's
> why we have libraries.
>

Windows users might want the resource compiler. Probably not for Windows
programs, because it doesn't do anything that the MS resource compiler
won't. But if you want to embed a font in a commandline program, for example,
you might use it.

Unfortunately this problem hasn't been solved. When some standards body
or similar organisation announces a new file format, it should release
reference parsers which most people can incorporate into their programming
languages without any fuss. We simply don't have the technical and social
infrastructure to to that yet, though we're close.

If you make your program dependent on a library, it's a thundering headache
for anyone who doesn't have the same development set up as you. If you
ship masses and masses of source, similarly it make the program hard to
understand. When the motive is to parse a trivial 1K file containing a
list of images, it just becomes silly.

So the vanilla xml parser is the best answer when 1) you want to make the
source available outside of the immediate development environment, 2) you
don't have to handle arbitrary xml generated by processes you can't
control, 3) you don't need unicode (I might extend it to support unicode),
4) the file is relatively small in relation to the computer's speed and
memory.

( http://www.malcolmmclean.site11.com/...XMLParser.html )

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
a = [ "1", "2", "3" ] v/s a = new Array ( "1", "2", "3" )identical in all ways? okey Javascript 1 08-25-2009 12:56 PM
What is the "functional" way of doing this? beginner Python 17 08-01-2007 04:16 PM
HTML Functional Equivalent Of "Include" or "CopyLib"? (Pete Cresswell) HTML 2 11-06-2004 03:39 PM
<FORM METHOD="post" onSubmit="return fieldcheck()" name="orientation" action="http://ws-kitty.BU.edu/AT/survey/orientation/script/write.asp" language="JavaScript"> Joeyej ASP .Net 0 06-04-2004 07:55 PM
["a", "b", "c", "d"] to "a, b, c, d"? Martin Ruby 3 04-07-2004 02:11 PM



Advertisments