Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Scan Microsoft Office files

Reply
Thread Tools

Scan Microsoft Office files

 
 
Will Fawcett
Guest
Posts: n/a
 
      09-03-2004
I am trying to put together a script that will allow me to scan
Microsoft Office files and store "keywords" for those files so they
are searchable by content not just title.

If you open a word file with Perl and look at the actual source it is
basically a text file with a bunch of bogus code. I was hoping someone
here might have heard of a module out there that can step out the
ambiguous code out and just store plain text words. Or is RegEx my
only option?

-Will
 
Reply With Quote
 
 
 
 
wfsp
Guest
Posts: n/a
 
      09-03-2004

"Will Fawcett" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) m...
>I am trying to put together a script that will allow me to scan
> Microsoft Office files and store "keywords" for those files so they
> are searchable by content not just title.
>
> If you open a word file with Perl and look at the actual source it is
> basically a text file with a bunch of bogus code. I was hoping someone
> here might have heard of a module out there that can step out the
> ambiguous code out and just store plain text words. Or is RegEx my
> only option?
>
> -Will


An example:

#!/bin/perl5
use strict;
use warnings;
use Win32::OLE;

my $w = Win32::OLE->GetActiveObject('Word.Application');
my $d = $w->ActiveDocument;
my $paras = $d->Paragraphs;

foreach my $para ( in $paras ) {
my $style = $para->Style->{ NameLocal };
my $text = $para->Range->{ text };
print "$style\t$text\n"
}
Assumes Word is open and a document is open. The vba help files have all the
methods/properties. A search on Win32::OLE will bring up many
tutorials/references.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Microsoft Office Excell 2003 & Microsoft Office Word 2003 Angel Eyes Microsoft Certification 2 06-30-2008 09:28 PM
microsoft.public.certification, microsoft.public.cert.exam.mcsa, microsoft.public.cert.exam.mcad, microsoft.public.cert.exam.mcse, microsoft.public.cert.exam.mcsd loyola MCSE 4 11-15-2006 02:40 AM
microsoft.public.certification, microsoft.public.cert.exam.mcsa, microsoft.public.cert.exam.mcad, microsoft.public.cert.exam.mcse, microsoft.public.cert.exam.mcsd loyola Microsoft Certification 3 11-14-2006 05:18 PM
microsoft.public.certification, microsoft.public.cert.exam.mcsa, microsoft.public.cert.exam.mcad, microsoft.public.cert.exam.mcse, microsoft.public.cert.exam.mcsd realexxams@yahoo.com Microsoft Certification 0 05-10-2006 02:35 PM
microsoft.public.dotnet.faqs,microsoft.public.dotnet.framework,microsoft.public.dotnet.framework.windowsforms,microsoft.public.dotnet.general,microsoft.public.dotnet.languages.vb Charles A. Lackman ASP .Net 1 12-08-2004 07:08 PM



Advertisments