Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Newbie q: Parsing vendor-data into uniform XML

Reply
Thread Tools

Newbie q: Parsing vendor-data into uniform XML

 
 
Casper B
Guest
Posts: n/a
 
      01-09-2005
If I have 3-4 specific ASCII/non-XML vendor-specific data-sheets,
forming tables of simple types (int, float, string) with space as
delimiter. The data is simple (from a grammar point of view) yet not as
simple as a 2D-array/recordset. Example:

1234567894 00000100 50 10400
01330002 003 0000213337 10400
01330025 002 0000066887 10400
01330027 000 0000033841 10400
01330029 001 0000061182 10400
01330030 004 0000047411 10400
9999999998 0001165422- 10400
1234567894 00000100 50 10400
01330003 001 0000033671- 10400
01330004 001 0000116653- 10400
....looped data!

Normally I would parse this and do transformation using a
Compiler-Compiler. This is however, a very static approach (new format
would require recompilation etc) and certainly not suited for database
integration.

Can I somehow use XML or any features hereof (DTD, Xpath...) to
parse/validate vendor-specific ASCII/non-XML data-sheets and transform
this into a standard XML format.

The goal is of course, to be able to receive vendor-data in a new
propriatary ASCII format and still be able to read the data provided an
associated grammar has been created for this new format. Unfortunately I
have no way of requireing the vendor to provide/follow a schema/XML
format.

Thanks in advance for any feedback!

Casper Bang

 
Reply With Quote
 
 
 
 
Andy Fish
Guest
Posts: n/a
 
      01-10-2005

"Casper B" <(E-Mail Removed)> wrote in message
news:41e115f8$0$198$(E-Mail Removed). ..
> If I have 3-4 specific ASCII/non-XML vendor-specific data-sheets, forming
> tables of simple types (int, float, string) with space as delimiter. The
> data is simple (from a grammar point of view) yet not as simple as a
> 2D-array/recordset. Example:
>
> 1234567894 00000100 50 10400
> 01330002 003 0000213337 10400
> 01330025 002 0000066887 10400
> 01330027 000 0000033841 10400
> 01330029 001 0000061182 10400
> 01330030 004 0000047411 10400
> 9999999998 0001165422- 10400
> 1234567894 00000100 50 10400
> 01330003 001 0000033671- 10400
> 01330004 001 0000116653- 10400
> ...looped data!
>
> Normally I would parse this and do transformation using a
> Compiler-Compiler. This is however, a very static approach (new format
> would require recompilation etc) and certainly not suited for database
> integration.
>
> Can I somehow use XML or any features hereof (DTD, Xpath...) to
> parse/validate vendor-specific ASCII/non-XML data-sheets and transform
> this into a standard XML format.
>


you might be able to process these files using XML tools but it certainly
wouldn't help with the job of parsing them.

In XML, any data between tags is represented as text nodes so all you would
end up with would be either a single text node or a sequence of text nodes.
you would still have to use substring() or instr() type operations to locate
the individual fields. this would be more complicated in, say, xxlt code
than it would be in a conventional 3gl.

I think you need to treat the parsing of the incoming non-xml data as a
separate process. once you have done that, you can certainly build XML
structures and use XML tools to process and output the data.


> The goal is of course, to be able to receive vendor-data in a new
> propriatary ASCII format and still be able to read the data provided an
> associated grammar has been created for this new format. Unfortunately I
> have no way of requireing the vendor to provide/follow a schema/XML
> format.
>
> Thanks in advance for any feedback!
>
> Casper Bang
>



 
Reply With Quote
 
 
 
 
Casper B
Guest
Posts: n/a
 
      01-10-2005
Thought so, thanks for the clarification!

Casper
 
Reply With Quote
 
eranb
Guest
Posts: n/a
 
      01-13-2005
Hi,
to handle the parsing side I would recomend taking a look at
ContentMaster, ItemField's file parsing
solution - using its parser studio a parsing solution for the scenario
you have just described can be created in minutes.

http://www.itemfield.com

ContentMaster is a complete multi-format (EDI, Excel, Word, RTF, custom

formats, etc.) text parsing solution, that comes with a dedicated
visual
authoring environment for the creation of parsing scripts, and a
parsing
engine that seamlessly integrates into any environement.
Regards,

Eran Berkowitz

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing XML input from web form into namespaced xml file Jason XML 2 04-28-2007 12:41 AM
Parsing XML into PHP to insert into a MySQL DB impulse() XML 0 10-13-2006 03:05 PM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM
Re: Uniform "toolbar" on all machines. Evan Platt Computer Support 4 07-21-2005 12:59 PM
Parsing XML into Array (newbie) Jason Williard ASP .Net 6 10-11-2004 09:15 PM



Advertisments