Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Word + win32ole - how to find formatting of a word?

Reply
Thread Tools

Word + win32ole - how to find formatting of a word?

 
 
Mohit Sindhwani
Guest
Posts: n/a
 
      10-25-2008
HI! I'm trying to use Ruby and win32ole to parse a Word document. So
far, I'm able to extract the style and text of each paragraph. That
works great to convert it into individual divs (in the HTML CSS sense).

Now, inside the paragraphs, there are certain words that have special
formatting (for e.g. the name of a command which is in monospace) - I'm
trying to find how to extract those special cases. Does anyone know how
to achieve that?

Appreciate your help - thanks!

Cheers,
Mohit.
10/25/2008 | 4:33 PM.


 
Reply With Quote
 
 
 
 
Axel Etzold
Guest
Posts: n/a
 
      10-25-2008
> HI! I'm trying to use Ruby and win32ole to parse a Word document. So
> far, I'm able to extract the style and text of each paragraph. That
> works great to convert it into individual divs (in the HTML CSS sense).
>
> Now, inside the paragraphs, there are certain words that have special
> formatting (for e.g. the name of a command which is in monospace) - I'm
> trying to find how to extract those special cases. Does anyone know how
> to achieve that?
>


Dear Mohit,

you could save the Word file as an html and then extract the relevant information...
I did that using OpenOffice and got a file containing the font information in the following form.


<BODY LANG="en-US" DIR="LTR">
<P STYLE="margin-bottom: 0in">A command in <FONT FACE="Linux Libertine">Linux
Libertine</FONT></P>
<P STYLE="margin-bottom: 0in">A text in <FONT FACE="Bitstream Charter, serif">Bitstream
Charter</FONT></P>
</BODY>

If you read in the text of that file as a String, you can then find the relevant bits using regexps.

Best regards,

Axel

--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

 
Reply With Quote
 
 
 
 
Mohit Sindhwani
Guest
Posts: n/a
 
      10-26-2008
Axel Etzold wrote:
>> HI! I'm trying to use Ruby and win32ole to parse a Word document. So
>> far, I'm able to extract the style and text of each paragraph. That
>> works great to convert it into individual divs (in the HTML CSS sense).
>>
>> Now, inside the paragraphs, there are certain words that have special
>> formatting (for e.g. the name of a command which is in monospace) - I'm
>> trying to find how to extract those special cases. Does anyone know how
>> to achieve that?
>>

>
> Dear Mohit,
>
> you could save the Word file as an html and then extract the relevant information...
> I did that using OpenOffice and got a file containing the font information in the following form.
>
>
> <BODY LANG="en-US" DIR="LTR">
> <P STYLE="margin-bottom: 0in">A command in <FONT FACE="Linux Libertine">Linux
> Libertine</FONT></P>
> <P STYLE="margin-bottom: 0in">A text in <FONT FACE="Bitstream Charter, serif">Bitstream
> Charter</FONT></P>
> </BODY>
>


Hi Axel

Thanks for replying! Converting to HTML and working with that is my
last option actually. In a well-written document, I found that using
Word to return style information about the paragraph is a lot less work
and relatively easy to work with. I guess it's time to consider your
suggestion!

Cheers,
Mohit.
10/26/2008 | 5:44 PM.


 
Reply With Quote
 
Mohit Sindhwani
Guest
Posts: n/a
 
      10-26-2008
Mohit Sindhwani wrote:
> Axel Etzold wrote:
>>> HI! I'm trying to use Ruby and win32ole to parse a Word document.
>>> So far, I'm able to extract the style and text of each paragraph.
>>> That works great to convert it into individual divs (in the HTML CSS
>>> sense).
>>>
>>> Now, inside the paragraphs, there are certain words that have
>>> special formatting (for e.g. the name of a command which is in
>>> monospace) - I'm trying to find how to extract those special cases.
>>> Does anyone know how to achieve that?
>>>

>>
>> Dear Mohit,
>> you could save the Word file as an html and then extract the
>> relevant information...
>> I did that using OpenOffice and got a file containing the font
>> information in the following form.
>>

>
> Hi Axel
>
> Thanks for replying! Converting to HTML and working with that is my
> last option actually. In a well-written document, I found that using
> Word to return style information about the paragraph is a lot less
> work and relatively easy to work with. I guess it's time to consider
> your suggestion!
>

Actually, after digging around, I found that this gets me somewhere there:
words = doc.Words
words.each {|w|
index += 1
ft = w.Font.Name
ftHash[ft] = 1
}

Thanks for your help!

Cheers,
Mohit.
10/26/2008 | 9:14 PM.



 
Reply With Quote
 
Axel Etzold
Guest
Posts: n/a
 
      10-26-2008

-------- Original-Nachricht --------
> Datum: Sun, 26 Oct 2008 22:14:53 +0900
> Von: Mohit Sindhwani <>
> An: ruby-
> Betreff: Re: Word + win32ole - how to find formatting of a word?


> Mohit Sindhwani wrote:
> > Axel Etzold wrote:
> >>> HI! I'm trying to use Ruby and win32ole to parse a Word document.
> >>> So far, I'm able to extract the style and text of each paragraph.
> >>> That works great to convert it into individual divs (in the HTML CSS
> >>> sense).
> >>>
> >>> Now, inside the paragraphs, there are certain words that have
> >>> special formatting (for e.g. the name of a command which is in
> >>> monospace) - I'm trying to find how to extract those special cases.
> >>> Does anyone know how to achieve that?
> >>>
> >>
> >> Dear Mohit,
> >> you could save the Word file as an html and then extract the
> >> relevant information...
> >> I did that using OpenOffice and got a file containing the font
> >> information in the following form.
> >>

> >
> > Hi Axel
> >
> > Thanks for replying! Converting to HTML and working with that is my
> > last option actually. In a well-written document, I found that using
> > Word to return style information about the paragraph is a lot less
> > work and relatively easy to work with. I guess it's time to consider
> > your suggestion!
> >

> Actually, after digging around, I found that this gets me somewhere there:
> words = doc.Words
> words.each {|w|
> index += 1
> ft = w.Font.Name
> ftHash[ft] = 1
> }
>
> Thanks for your help!
>
> Cheers,
> Mohit.
> 10/26/2008 | 9:14 PM.
>
>


Dear Mohit,

you're welcome
It's always nice to best answer one's own questions , isn't it ? Thanks for the info !

Best regards,

Axel

--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

 
Reply With Quote
 
Mohit Sindhwani
Guest
Posts: n/a
 
      10-27-2008
Axel Etzold wrote:
> you're welcome
> It's always nice to best answer one's own questions , isn't it ? Thanks for the info !
>

Thanks for your reply again! Yes, it's good to find the answer yourself
and then share it

I find that Win32ole is quite powerful, just that it needs a little
looking around to work with it.

Cheers,
Mohit.
10/27/2008 | 11:19 AM.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
win32ole word find replace jhn Vln Ruby 1 10-21-2009 01:52 AM
WIN32OLE - failed to create WIN32OLE zxem Ruby 1 12-19-2007 07:01 PM
WIN32OLE#[] and WIN32OLE#[]= method in Ruby 1.9 (or later) Masaki Suketa Ruby 4 03-27-2006 11:17 AM
Looking for Win32OLE sample code (Tables in Word) Jim Freeze Ruby 0 01-27-2004 05:24 PM



Advertisments