Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Re: Fast forward-backward (write-read) (http://www.velocityreviews.com/forums/t953795-re-fast-forward-backward-write-read.html)

David Hutto 10-23-2012 09:50 PM

Re: Fast forward-backward (write-read)
 
On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <vs@it.uu.se> wrote:
> I am working with some rather large data files (>100GB) that contain time
> series data. The data (t_k,y(t_k)), k = 0,1,...,N are stored in ASCII
> format. I perform various types of processing on these data (e.g. moving
> median, moving average, and Kalman-filter, Kalman-smoother) in a sequential
> manner and only a small number of these data need be stored in RAM when
> being processed. When performing Kalman-filtering (forward in time pass, k =
> 0,1,...,N) I need to save to an external file several variables (e.g. 11*32
> bytes) for each (t_k, y(t_k)). These are inputs to the Kalman-smoother
> (backward in time pass, k = N,N-1,...,0). Thus, I will need to input these
> variables saved to an external file from the forward pass, in reverse order
> --- from last written to first written.
>
> Finally, to my question --- What is a fast way to write these variables to
> an external file and then read them in backwards?


Don't forget to use timeit for an average OS utilization.

I'd suggest two list comprehensions for now, until I've reviewed it some more:

forward = ["%i = %s" % (i,chr(i)) for i in range(33,126)]
backward = ["%i = %s" % (i,chr(i)) for i in range(126,32,-1)]

for var in forward:
print var

for var in backward:
print var

You could also use a dict, and iterate through a straight loop that
assigned a front and back to a dict_one = {0 : [0.100], 1 : [1.99]}
and the iterate through the loop, and call the first or second in the
dict's var list for frontwards , or backwards calls.


But there might be faster implementations, depending on other
function's usage of certain lower level functions.


--
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com

Steven D'Aprano 10-23-2012 10:53 PM

Re: Fast forward-backward (write-read)
 
On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:

> On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <vs@it.uu.se> wrote:
>> I am working with some rather large data files (>100GB)

[...]
>> Finally, to my question --- What is a fast way to write these variables
>> to an external file and then read them in backwards?

>
> Don't forget to use timeit for an average OS utilization.


Given that the data files are larger than 100 gigabytes, the time
required to process each file is likely to be in hours, not microseconds.
That being the case, timeit is the wrong tool for the job, it is
optimized for timings tiny code snippets. You could use it, of course,
but the added inconvenience doesn't gain you any added accuracy.

Here's a neat context manager that makes timing long-running code simple:


http://code.activestate.com/recipes/577896



> I'd suggest two list comprehensions for now, until I've reviewed it some
> more:


I would be very surprised if the poster will be able to fit 100 gigabytes
of data into even a single list comprehension, let alone two.

This is a classic example of why the old external processing algorithms
of the 1960s and 70s will never be obsolete. No matter how much memory
you have, there will always be times when you want to process more data
than you can fit into memory.



--
Steven

Demian Brecht 10-23-2012 10:57 PM

Re: Fast forward-backward (write-read)
 
> This is a classic example of why the old external processing algorithms
> of the 1960s and 70s will never be obsolete. No matter how much memory
> you have, there will always be times when you want to process more data
> than you can fit into memory.



But surely nobody will *ever* need more than 640k

Right?

Demian Brecht
@demianbrecht
http://demianbrecht.github.com





David Hutto 10-23-2012 11:34 PM

Re: Fast forward-backward (write-read)
 
On Tue, Oct 23, 2012 at 6:53 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:
>
>> On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <vs@it.uu.se> wrote:
>>> I am working with some rather large data files (>100GB)

> [...]
>>> Finally, to my question --- What is a fast way to write these variables
>>> to an external file and then read them in backwards?

>>
>> Don't forget to use timeit for an average OS utilization.

>
> Given that the data files are larger than 100 gigabytes, the time
> required to process each file is likely to be in hours, not microseconds.
> That being the case, timeit is the wrong tool for the job, it is
> optimized for timings tiny code snippets. You could use it, of course,
> but the added inconvenience doesn't gain you any added accuracy.


It depends on the end result, and the fact that if the iterations
themselves are about the same time, then just using a segment of the
iterations could be scaled down, and a full run might be worth it, if
you have a second computer running optimization.

>
> Here's a neat context manager that makes timing long-running code simple:
>
>
> http://code.activestate.com/recipes/577896



I'll test this out for big O notation later. For the OP:

http://en.wikipedia.org/wiki/Big_O_notation





>
>
>
>> I'd suggest two list comprehensions for now, until I've reviewed it some
>> more:

>
> I would be very surprised if the poster will be able to fit 100 gigabytes
> of data into even a single list comprehension, let alone two.

Again, these can be scaled depending on the operations of the function
in question, and the average time of aforementioned function(s)

>
> This is a classic example of why the old external processing algorithms
> of the 1960s and 70s will never be obsolete. No matter how much memory
> you have, there will always be times when you want to process more data
> than you can fit into memory


This is a common misconception. You can engineer a device that
accommodates this if it's a direct experimental necessity.
>


--
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com

Virgil Stokes 10-24-2012 07:17 AM

Re: Fast forward-backward (write-read)
 
On 24-Oct-2012 00:57, Demian Brecht wrote:
>> This is a classic example of why the old external processing algorithms
>> of the 1960s and 70s will never be obsolete. No matter how much memory
>> you have, there will always be times when you want to process more data
>> than you can fit into memory.

>
> But surely nobody will *ever* need more than 640k…
>
> Right?
>
> Demian Brecht
> @demianbrecht
> http://demianbrecht.github.com
>
>
>
>

Yes, I can still remember such quotes --- thanks for jogging my memory, Demian :-)

Virgil Stokes 10-24-2012 07:19 AM

Re: Fast forward-backward (write-read)
 
On 24-Oct-2012 00:53, Steven D'Aprano wrote:
> On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:
>
>> On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <vs@it.uu.se> wrote:
>>> I am working with some rather large data files (>100GB)

> [...]
>>> Finally, to my question --- What is a fast way to write these variables
>>> to an external file and then read them in backwards?

>> Don't forget to use timeit for an average OS utilization.

> Given that the data files are larger than 100 gigabytes, the time
> required to process each file is likely to be in hours, not microseconds.
> That being the case, timeit is the wrong tool for the job, it is
> optimized for timings tiny code snippets. You could use it, of course,
> but the added inconvenience doesn't gain you any added accuracy.
>
> Here's a neat context manager that makes timing long-running code simple:
>
>
> http://code.activestate.com/recipes/577896

Thanks for this link
>
>
>
>> I'd suggest two list comprehensions for now, until I've reviewed it some
>> more:

> I would be very surprised if the poster will be able to fit 100 gigabytes
> of data into even a single list comprehension, let alone two.

You are correct and I have been looking at working with blocks that are sized to
the RAM available for processing.
>
> This is a classic example of why the old external processing algorithms
> of the 1960s and 70s will never be obsolete. No matter how much memory
> you have, there will always be times when you want to process more data
> than you can fit into memory.
>
>
>

Thanks for your insights :-)

David Hutto 10-24-2012 07:26 AM

Re: Fast forward-backward (write-read)
 
On Wed, Oct 24, 2012 at 3:17 AM, Virgil Stokes <vs@it.uu.se> wrote:
> On 24-Oct-2012 00:57, Demian Brecht wrote:
>>>
>>> This is a classic example of why the old external processing algorithms
>>> of the 1960s and 70s will never be obsolete. No matter how much memory
>>> you have, there will always be times when you want to process more data
>>> than you can fit into memory.

>>
>>
>> But surely nobody will *ever* need more than 640k
>>
>> Right?
>>
>> Demian Brecht
>> @demianbrecht
>> http://demianbrecht.github.com
>>
>>
>>
>>

> Yes, I can still remember such quotes --- thanks for jogging my memory,
> Demian :-)



This is only on equipment designed by others, otherwise, you could
engineer the hardware yourself to perfom just certain functions for
you(RISC), and pass that back to the CISC(from a PCB design).


--
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com

Grant Edwards 10-24-2012 01:56 PM

Re: Fast forward-backward (write-read)
 
On 2012-10-23, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> I would be very surprised if the poster will be able to fit 100
> gigabytes of data into even a single list comprehension, let alone
> two.
>
> This is a classic example of why the old external processing
> algorithms of the 1960s and 70s will never be obsolete. No matter how
> much memory you have, there will always be times when you want to
> process more data than you can fit into memory.


Too true. One of the projects I did in grad school about 20 years ago
was a plugin for some fancy data visualization software (I think it
was DX: http://www.research.ibm.com/dx/). My plugin would subsample
"on the fly" a selected section of a huge 2D array of data in a file.
IBM and SGI had all sorts of widgets you could use to sample,
transform and visualize data, but they all assumed that the input data
would fit into virtual memory.

--
Grant Edwards grant.b.edwards Yow! I Know A Joke!!
at
gmail.com


All times are GMT. The time now is 08:52 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.