Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > OLAP and pivot tables

Reply
Thread Tools

OLAP and pivot tables

 
 
George Sakkis
Guest
Posts: n/a
 
      05-26-2006
After a brief search, I didn't find any python package related to OLAP
and pivot tables. Did I miss anything ? To be more precise, I'm not so
interested in a full-blown OLAP server with an RDBMS backend, but
rather a pythonic API for constructing datacubes in memory, slicing and
dicing them, drilling down or up dimensions and exposing them in some
suitable form to a presentation layer. I've hacked a first cut of a
pivot table implementation and an XHTML generator that produces
hierarchical html tables but it's not particularly general or easily
extensible so far. Is there any interest at all on a pythonic version
of something like JOLAP or XMLA ?

George

 
Reply With Quote
 
 
 
 
Ben Stroud
Guest
Posts: n/a
 
      05-26-2006
George Sakkis wrote:

>After a brief search, I didn't find any python package related to OLAP
>and pivot tables. Did I miss anything ? To be more precise, I'm not so
>interested in a full-blown OLAP server with an RDBMS backend, but
>rather a pythonic API for constructing datacubes in memory, slicing and
>dicing them, drilling down or up dimensions and exposing them in some
>suitable form to a presentation layer. I've hacked a first cut of a
>pivot table implementation and an XHTML generator that produces
>hierarchical html tables but it's not particularly general or easily
>extensible so far. Is there any interest at all on a pythonic version
>of something like JOLAP or XMLA ?
>
>George
>
>
>

I'd be interested as well. I posted a similar question to the ruby
mailing list a few months ago to no avail. Ideally, someone much more
talented than myself would create a open OLAP library in C that could be
interfaced with dynamic languages easily (I ordered some OLAP books and
started in on this, and decided I was in over my head for now). As far
as free software, all I've been able to find is java-based Mondrian.
Maybe it could serve as a reference implementation for someone.

Cheers,
Ben
 
Reply With Quote
 
 
 
 
Duncan Smith
Guest
Posts: n/a
 
      05-26-2006
George Sakkis wrote:
> After a brief search, I didn't find any python package related to OLAP
> and pivot tables. Did I miss anything ? To be more precise, I'm not so
> interested in a full-blown OLAP server with an RDBMS backend, but
> rather a pythonic API for constructing datacubes in memory, slicing and
> dicing them, drilling down or up dimensions and exposing them in some
> suitable form to a presentation layer. I've hacked a first cut of a
> pivot table implementation and an XHTML generator that produces
> hierarchical html tables but it's not particularly general or easily
> extensible so far. Is there any interest at all on a pythonic version
> of something like JOLAP or XMLA ?
>
> George
>


I have a few applications that require the generation of large numbers
of contingency tables from a higher-dimensional base table. The
approaches I've tried (Numeric arrays / dictionary-based sparse arrays /
various caching schemes / searches on subset lattices for previously
generated 'super'-tables that can be marginalised from etc.) still
represent major bottlenecks. So, I guess I would be interested.

Duncan
 
Reply With Quote
 
Tim Churches
Guest
Posts: n/a
 
      05-26-2006
Ben Stroud wrote:
> George Sakkis wrote:
>
>> After a brief search, I didn't find any python package related to OLAP
>> and pivot tables. Did I miss anything ? To be more precise, I'm not so
>> interested in a full-blown OLAP server with an RDBMS backend, but
>> rather a pythonic API for constructing datacubes in memory, slicing and
>> dicing them, drilling down or up dimensions and exposing them in some
>> suitable form to a presentation layer. I've hacked a first cut of a
>> pivot table implementation and an XHTML generator that produces
>> hierarchical html tables but it's not particularly general or easily
>> extensible so far. Is there any interest at all on a pythonic version
>> of something like JOLAP or XMLA ?
>>

> I'd be interested as well. I posted a similar question to the ruby
> mailing list a few months ago to no avail. Ideally, someone much more
> talented than myself would create a open OLAP library in C that could be
> interfaced with dynamic languages easily (I ordered some OLAP books and
> started in on this, and decided I was in over my head for now). As far
> as free software, all I've been able to find is java-based Mondrian.
> Maybe it could serve as a reference implementation for someone.


The NetEpi Analysis project - see http://sourceforge.net/projects/netepi
, although not strictly an OLAP or datacube engine, might offer some of
the things you are looking for. It is intended for exploratory
epidemiological analysis of (potentially large) health-related datasets,
but should work with most types of data for which an OLAP engine would
be useful. Underneath there is a vertically-disaggregated,
ordinally-mapped, set-theoretic data selection and summarisation engine,
which is a pompous way of saying that it holds data column-wise in
memory-mapped Numpy (Numeric Python) arrays, and uses some fast
(custom-written) set functions on inverted indexes on the ordinal
positions of column values to select and summarise data (entirely at
run-time, cf most OLAP engines, which rely on a degree of
pre-summarisation along pre-chosen dimensions). It is all Python and
thus has a Python(ic) API, including an SQL-like WHERE clause parser
for data selection (OK, SQL is not Pythonic, but that's just for data
subsetting). It includes quite a few statistical functions and nice
graphics courtesy of R (http://www.r-project.org) (which is embedded via
RPy - http://rpy.sourceforge.net/). Full support for missing values and
weighted datasets is provided (but not full support for survey data with
complex sample designs - that's forthcoming). Currently it works well
with datasets in the 5-10 million row range, but the basic design lends
itself easily to parallelisation if you have bigger datasets, and
preliminary work indicates good speed improvements - something we want
to pursue given all these multi-core CPUs which are now available at
reasonable cost. Be warned that NetEpi Analysis is currently only of
beta quality, and is a bit of a pig to install, on Linux/Unix/Mac OS X
only at present. We hope to be able to ready a production-ready Version
1.0 by the end of 2006, possibly with MS-Windows support as well.
However, the core data summarisation/subsetting engine is thought to be
sound (and there are some unit tests to attest to that).

Probably not quite what you were after but I thought it worth a mention.
Please post follow-ups, if any, to the NetEpi mailing list:
http://sourceforge.net/mail/?group_id=123700

Tim C





>
> Cheers,
> Ben


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Read Pivot Tables, Shapes in XLSX Files & Stable Excel to PDF Conv. sherazam Java 0 03-10-2011 08:48 AM
Open Source Pivot table (OLAP cube?) Laszlo Nagy Python 3 04-27-2010 11:42 AM
jxl pivot tables issue Pedro Pinto Java 0 07-20-2007 10:10 AM
interactive Pivot-tables with OWC11 Jens Hofmeier ASP .Net Web Controls 0 04-04-2006 11:10 AM
HSSF and Excel Pivot Tables David W. Java 0 07-28-2003 10:02 PM



Advertisments