Go Back   Velocity Reviews > Newsgroups > XML
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

XML - Docs to XML conversion & read the XML files

 
Thread Tools Search this Thread
Old 09-11-2006, 07:56 AM   #1
Default Docs to XML conversion & read the XML files


I am creating an application which need to convert document files into
XML. Then read the xml files for specific words in specific format. I
am using Microsoft.Office.Interop for converting the document files to
xml .The files are getting generated but with lots of formating
information which leads to heavy file.

I need an help to write a code which can reduce the xml files by
removing the unwanted document formating. Or can be preserved if
required.


Thanks in advance.



msinghindia@gmail.com
  Reply With Quote
Old 09-11-2006, 01:10 PM   #2
Joe Kesselman
 
Posts: n/a
Default Re: Docs to XML conversion & read the XML files

wrote:
> I need an help to write a code which can reduce the xml files by
> removing the unwanted document formating. Or can be preserved if
> required.


That sounds like a straight programming problem. First, you need to
analyse the files to create rules for recognizing the "unwanted" markup.
Then you need to write code that either filters that markup out during
the conversion process, or postprocesses the XML file by reading it in,
applying those rules to alter it, and writing it back out.

Pick your programming language and have fun. If you take the
postprocessing approach, you could probably do this in XSLT... but
whether that's the best approach depends in part on the nature of the
rules you're trying to apply.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
  Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump