![]() |
|
|
|
#1 |
|
Hi Folks,
This is my first post to this group, and I really am not sure whether this is the right group to ask my question. If its not an appropriate question to this group, please correct me and guide me to the right place. The thing is, I have been asked to design a XML parser using C. I have done some study on XML so far and I know that I should have a design before I start my coding. And since I am new to the part of parser, I really am confused about what would be components of my parser. All I know now is that I need a validating component that validates the XML file, which should then pass the XML file on to the parsing component for parsing. My confusion lies on the parsing component. Its like I can't decide what should be the sub-components of the parsing component. Would some of you people be kind enough to enlighten me on this issue. Thanks in Advance. Mahesh. mahesh.kanakaraj@gmail.com |
|
|
|
|
#2 |
|
Posts: n/a
|
wrote:
> validating component that validates the XML file, which should then > pass the XML file on to the parsing component for parsing. It's usually done the other way around -- write a nonvalidating parser to deal with the syntactic issues, then attach the validator to that. (That isn't the only solution, or always the best solution, just the easiest way to think about the problem.) > My confusion lies on the parsing component. Its like I can't decide > what should be the sub-components of the parsing component. For a basic implementation, read any good book on parser design and/or feed the XML grammar into any standard parser generator tool (eg the YACC/LEX set). Strong suggestion that -- unless this is a class assignment or you believe you have a new approach that has significant advantages -- you consider instead using one of the many parsers already available. (And I assume that if the latter applied, you wouldn't have posted this vague a question.) Reinventing wheels is sometimes useful; reimplementing existing wheels is generally a waste of resources. -- () ASCII Ribbon Campaign | Joe Kesselman /\ Stamp out HTML e-mail! | System architexture and kinetic poetry |
|
|
|
#3 |
|
Posts: n/a
|
Joe Kesselman wrote:
> Strong suggestion that -- unless this is a class assignment or you > believe you have a new approach that has significant advantages -- you > consider instead using one of the many parsers already available. (And I Joe is right. If you really think that you should write your own parser, be prepared to deal with all the details of Unicode. For example, have you ever heard of the BOM at the beginning of an XML file ? Will your parser be able to deal with UTF-7 as well as UTF-32 ? Use Expat or libxml: http://expat.sourceforge.net/ http://xmlsoft.org/ |
|
|
|
#4 |
|
Posts: n/a
|
Jürgen Kahrs wrote:
> Joe is right. If you really think that you should > write your own parser, be prepared to deal with all > the details of Unicode. Well, one can start with an I/O library that handles Unicode; those exist too. And sometimes it does make sense to have an implementation that only supports a limited set of encodings, if you are certain that those are all your application is ever going to see. But there are lots of details in XML itself, especially if you want a modern XML environment that supports namespaces, validation against schemas, the standard XML APIs (DOM and/or SAX)... A basic XML parser is a reasonable term project. A practical, efficient, robust, validating XML parser is rather more. So unless this is a class assignment (or equivalent), I'd definite go back to whoever said "write one" and ask them why they want you to do that. -- () ASCII Ribbon Campaign | Joe Kesselman /\ Stamp out HTML e-mail! | System architexture and kinetic poetry |
|
|
|
#5 |
|
Posts: n/a
|
Jürgen Kahrs wrote: > Joe Kesselman wrote: > > > Strong suggestion that -- unless this is a class assignment or you > > believe you have a new approach that has significant advantages -- you > > consider instead using one of the many parsers already available. (And I > > Joe is right. If you really think that you should > write your own parser, be prepared to deal with all > the details of Unicode. For example, have you ever > heard of the BOM at the beginning of an XML file ? > Will your parser be able to deal with UTF-7 as well > as UTF-32 ? My parser need to worry only about UTF-8, which, i think, is not that difficult to deal as compared to what you were asking (the UTF's). > > Use Expat or libxml: > > http://expat.sourceforge.net/ > http://xmlsoft.org/ |
|
|
|
#6 |
|
Posts: n/a
|
wrote:
>> Will your parser be able to deal with UTF-7 as well >> as UTF-32 ? > > My parser need to worry only about UTF-8, which, i think, is not that > difficult to deal as compared to what you were asking (the UTF's). Even UTF-8 data may contain a Byte-Oder-Mark (BOM). Be prepared to read up to 4 bytes per "character" and be prepared to read them in any byte-order. But (as Joe suggested), there are libraries that do the conversion for you. Use the libiconv, which is a POSIX lib (see "man iconv"). |
|
|
|
#7 |
|
Posts: n/a
|
Jürgen Kahrs wrote: > wrote: > > >> Will your parser be able to deal with UTF-7 as well > >> as UTF-32 ? > > > > My parser need to worry only about UTF-8, which, i think, is not that > > difficult to deal as compared to what you were asking (the UTF's). > > Even UTF-8 data may contain a Byte-Oder-Mark (BOM). > Be prepared to read up to 4 bytes per "character" > and be prepared to read them in any byte-order. I shall make sure to handle the BOM. > > But (as Joe suggested), there are libraries that > do the conversion for you. Use the libiconv, which > is a POSIX lib (see "man iconv"). I surely will look into the libconv. And I thank all of you guys who have given suggestions and such. |
|