Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > [ANN] OOoExtract v0.1

Reply
Thread Tools

[ANN] OOoExtract v0.1

 
 
Daniel Carrera
Guest
Posts: n/a
 
      10-28-2003
Greetings,

I'd like to announce the immediate availability of "OOoExtract" :

http://www.math.umd.edu/~dcarrera/op...o_extract.html

This is a command-line program, inspired by 'grep', to extract data from
OpenOffice.org files according to certain regular expressions.

This program is really cool and I'm very happy with it. It can make use of OOo's XML
structure to make more intelligent and complex matches than a simple 'grep' could.

OpenOffice.org has a concept of "styles". It has some pre-defined styles, and you
can define your own. For example, if you have a list of poems, you can define a
"Poem" style and a "PoemAuthor" style. You an then assign to them a particular
appearance. This allows you to give your document a logical structure.

OOoExtract can make use of this information to match not only text content, but also
styles. For example:

$ ruby ooo_extract.rb --style="PoemAuthor" poems.sxw
Robert Frost
Ernest Hemingway
Robert Frost


OOoExtract can also apply boolean operators to the search.

$ ruby ooo_extract.rb --style="PoemAuthor" --text="R" file.sxw
Robert Frost
Robert Frost
$
$ ruby ooo_extract.rb --style="PoemAuthor" --or --text="R" file.sxw
Robert Frost
Ernest Hemingway
Robert Frost
Richard M. Stallman
$
$ ruby ooo_extract.rb --style="PoemAuthor" --xor --text="R" file.sxw
Ernest Hemingway
Richard M. Stallman
$
$ ruby ooo_extract.rb --style="PoemAuthor" --xor --text="R" \
--ignore-case file.sxw
Richard M. Stallman
$


This program should be considered beta. OpenOffice.org files are very complex and I
have only tested it in very simple scenarios. I have not tested it on files with
tables, or lists. I have not tested it on anything but word processor documents
(Writer).

Let me know what you think.

Cheers,
--
Daniel Carrera | OpenPGP KeyID: 9AF77A88
PhD grad student. |
Mathematics Dept. | "To understand recursion, you must first
UMD, College Park | understand recursion".

 
Reply With Quote
 
 
 
 
Harry Ohlsen
Guest
Posts: n/a
 
      10-29-2003
Hi Daniel,

Daniel Carrera wrote:

> I'd like to announce the immediate availability of "OOoExtract" :
>
> http://www.math.umd.edu/~dcarrera/op...o_extract.html


What's all that embedded "binary" at the end of the script?

Harry O.



 
Reply With Quote
 
 
 
 
Daniel Carrera
Guest
Posts: n/a
 
      10-29-2003
On Wed, Oct 29, 2003 at 09:01:36AM +0900, Harry Ohlsen wrote:

> >I'd like to announce the immediate availability of "OOoExtract" :
> >
> >http://www.math.umd.edu/~dcarrera/op...o_extract.html

>
> What's all that embedded "binary" at the end of the script?


It's a tar archive. The script is a self-extracting archive. I made it with
Erik's Tar2RubyScript:

http://www.erikveen.dds.nl/tar2rubyscript/

This program takes a directory with a ruby program and any number of files and packs
them all together into one single script. The idea being, that this makes it easier
to distribute, because it is only one single, self-contained file.

If you download the tar.gz file under the "Download Source" link, you extract it, and
run tar2rubyscript.rb on it, you will get the "binary" file under the "Download
Program" link.

Cheers,
--
Daniel Carrera | OpenPGP KeyID: 9AF77A88
PhD grad student. |
Mathematics Dept. | "To understand recursion, you must first
UMD, College Park | understand recursion".

 
Reply With Quote
 
Harry Ohlsen
Guest
Posts: n/a
 
      10-29-2003
Daniel Carrera wrote:

>>What's all that embedded "binary" at the end of the script?

>
>
> It's a tar archive. The script is a self-extracting archive. I made it with
> Erik's Tar2RubyScript:
>
> http://www.erikveen.dds.nl/tar2rubyscript/
>
> This program takes a directory with a ruby program and any number of files and packs
> them all together into one single script. The idea being, that this makes it easier
> to distribute, because it is only one single, self-contained file.


Brilliant!

Cheers,

H.



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments