Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   dynamic allocation file buffer (http://www.velocityreviews.com/forums/t634619-dynamic-allocation-file-buffer.html)

castironpi 09-09-2008 09:59 PM

dynamic allocation file buffer
 
I will try my idea again. I want to talk to people about a module I
want to write and I will take the time to explain it. I think it's a
"cool idea" that a lot of people, forgiving the slang, could benefit
from. What are its flaws?

A user has a file he is using either 1/ to persist binary data after
the run of a single program (persistence) or 2/ share binary data
between concurrently running programs (IPC / shared memory). The data
are records of variable types and lengths that can change over time.
He wants to change a record that's already present in the file. Here
are two examples.

Use Case 1: Hierarchical ElementTree-style data

A user has an XML file like the one shown here.

<a>
<b>
<c>Foo</c>
</b>
...

He wants to change "Foo" to "Foobar".

<a>
<b>
<c>Foobar</c>
</b>
...

The change he wants to make is at the beginning of a 4GB file, and
recopying the remainder is an unacceptable resource drain.

Use Case 2: Web session logger

A tutor application has written a plugin to a webbrowser that records
the order of a user's mouse and keyboard activity during a browsing
session, and makes them concurrently available to other applications
in a suite, which are written in varying lanugages. The user takes
some action, such as surfing to a site or clicking on a link. The
browser plugin records that sequence into shared memory, where it is
marked as acknowledged by the listener programs, and recycled back
into an unused block. URLs, user inputs, and link text can be of any
length, so truncating them to fit a fixed length is not an option.

Existing Solutions

- Shelve - A Python Standard Library shelf object can store a random
access dictionary mapping strings to pickled objects. It does not
provide for hierarchical data stores, and objects must be unpickled
before they can be examined.
- Relational Database - Separate tables of nodes, attributes, and
text, and the relations between them are slow and unwieldy to
reproduce the contents of a dynamic structure. The VARCHAR data type
still carries a maximum size, no more flexible than fixed-length
records.
- POSH - Python Object Sharing - A module currently in its alpha stage
promises to make it possible to store Python objects directly in
shared memory. In its current form, its only entry point is 'fork'
and does not offer persistence, only sharing. See:
http://poshmodule.sourceforge.net/

Dynamic Allocation

The traditional solution, dynamic memory allocation, is to maintain a
metadata list of "free blocks" that are available to write to. See:
http://en.wikipedia.org/wiki/Dynamic_memory_allocation
http://en.wikipedia.org/wiki/Malloc
http://en.wikipedia.org/wiki/Mmap
http://en.wikipedia.org/wiki/Memory_leak
The catch, and the crux of the proposal, is that the metadata must be
stored in shared memory along with the data themselves. Assuming they
are, a program can acquire the offset of an unused block of a
sufficient size for its data, then write it to the file at that
offset. The metadata can maintain the offset of one root member, to
serve as a 'table of contents' or header for the remainder of the
file. It can be grown and reassigned as needed.

An acquaintence writes: It could be quite useful for highly concurrent
systems: the overhead involved with interprocess communication can be
overwhelming, and something more flexible than normal object
persistence to disk might be worth having.

Python Applicability

The usual problems with data persistence and sharing apply. The
format of the external data is only established conventionally, and
conversions between Python objects and raw memory bytes take the usual
overhead. 'struct.Struct', 'ctypes.Structure', and 'pickle.Pickler'
currently offer this functionality, and the buffer offset obtained
from 'alloc' can be used with all three.

Ex 1.
s= struct.Struct( 'III' )
x= alloc( s.size )
s.pack_into( mem, x, 2, 4, 6 )
Struct in its current form does not permit random access into
structure contents; a user must read or write the entire converted
strucutre in order to update one field. Alternative:
s= struct.Struct( 'I' )
x1, x2, x3= alloc( s.size ), alloc( s.size ), alloc( s.size )
s.pack_into( mem, x1, 2 )
s.pack_into( mem, x2, 4 )
s.pack_into( mem, x3, 6 )

Ex 2.
class Items( ctypes.Structure ):
_fields_= [
( 'x1', ctypes.c_float ),
( 'y1', ctypes.c_float ) ]
x= alloc( ctypes.sizeof( Items ) )
c= ctypes.cast( mem+ x, ctypes.POINTER( Items ) ).contents
c.x1, c.y1= 2, 4
The 'mem' variable is obtained from a call to PyObject_AsWriteBuffer.

Ex 3.
s= pickle.dumps( ( 2, 4, 6 ) )
x= alloc( len( s ) )
mem[ x: x+ len( s ) ]= s
'dumps' is still slow and nor does permit random access into contents.

Use Cases Revisited

Use Case 1: Hierarchical ElementTree-style data
Solution: Dynamically allocate the tree and its elements.

Node: tag: a
Node: tag: b
Node: tag: c
Node: text: Foo

The user wants to change "Foo" to "Foobar".

Node: tag: a
Node: tag: b
Node: tag: c
Node: text: Foobar

Deallocate 'Node: text: Foo', allocate 'Node: text: Foobar', and store
the new offset into 'Node: tag: c'. Total writes 6 bytes 'foobar', a
one-word offset, and approximatly 5- 10-word metadata update.

Use Case 2: Web session logger
Dynamically allocate a linked list of data points.

Data: 'friendster.com'
Data: 'My Account'

Allocate one block for each string, adding it to a linked list. As
listeners acknowledge each data point, remove it from the linked
list. Keep the head node in the 'root offset' metadata field.

Restrictions

It is not possible for persistent memory to refer to live memory. Any
objects it refers to must also be located in file. Their mapped
addresses must not be stored, only their offsets into it. However,
live references to persistent memory are eminently possible.

Current Status

A pure Python alloc-free implementation based on the GNU PAVL tree
library is on Google Code. It is only in proof-of-concept form and
not commented, but does contain a first-pass test suite. See:
http://code.google.com/p/pymmapstruc...wse/#svn/trunk
The ctypes solution for access is advised.

Larry Bates 09-09-2008 10:44 PM

Re: dynamic allocation file buffer
 
castironpi wrote:
> I will try my idea again. I want to talk to people about a module I
> want to write and I will take the time to explain it. I think it's a
> "cool idea" that a lot of people, forgiving the slang, could benefit
> from. What are its flaws?
>
> A user has a file he is using either 1/ to persist binary data after
> the run of a single program (persistence) or 2/ share binary data
> between concurrently running programs (IPC / shared memory). The data
> are records of variable types and lengths that can change over time.
> He wants to change a record that's already present in the file. Here
> are two examples.
>
> Use Case 1: Hierarchical ElementTree-style data
>
> A user has an XML file like the one shown here.
>
> <a>
> <b>
> <c>Foo</c>
> </b>
> ...
>
> He wants to change "Foo" to "Foobar".
>
> <a>
> <b>
> <c>Foobar</c>
> </b>
> ...
>
> The change he wants to make is at the beginning of a 4GB file, and
> recopying the remainder is an unacceptable resource drain.
>
> Use Case 2: Web session logger
>
> A tutor application has written a plugin to a webbrowser that records
> the order of a user's mouse and keyboard activity during a browsing
> session, and makes them concurrently available to other applications
> in a suite, which are written in varying lanugages. The user takes
> some action, such as surfing to a site or clicking on a link. The
> browser plugin records that sequence into shared memory, where it is
> marked as acknowledged by the listener programs, and recycled back
> into an unused block. URLs, user inputs, and link text can be of any
> length, so truncating them to fit a fixed length is not an option.
>
> Existing Solutions
>
> - Shelve - A Python Standard Library shelf object can store a random
> access dictionary mapping strings to pickled objects. It does not
> provide for hierarchical data stores, and objects must be unpickled
> before they can be examined.
> - Relational Database - Separate tables of nodes, attributes, and
> text, and the relations between them are slow and unwieldy to
> reproduce the contents of a dynamic structure. The VARCHAR data type
> still carries a maximum size, no more flexible than fixed-length
> records.
> - POSH - Python Object Sharing - A module currently in its alpha stage
> promises to make it possible to store Python objects directly in
> shared memory. In its current form, its only entry point is 'fork'
> and does not offer persistence, only sharing. See:
> http://poshmodule.sourceforge.net/
>
> Dynamic Allocation
>
> The traditional solution, dynamic memory allocation, is to maintain a
> metadata list of "free blocks" that are available to write to. See:
> http://en.wikipedia.org/wiki/Dynamic_memory_allocation
> http://en.wikipedia.org/wiki/Malloc
> http://en.wikipedia.org/wiki/Mmap
> http://en.wikipedia.org/wiki/Memory_leak
> The catch, and the crux of the proposal, is that the metadata must be
> stored in shared memory along with the data themselves. Assuming they
> are, a program can acquire the offset of an unused block of a
> sufficient size for its data, then write it to the file at that
> offset. The metadata can maintain the offset of one root member, to
> serve as a 'table of contents' or header for the remainder of the
> file. It can be grown and reassigned as needed.
>
> An acquaintence writes: It could be quite useful for highly concurrent
> systems: the overhead involved with interprocess communication can be
> overwhelming, and something more flexible than normal object
> persistence to disk might be worth having.
>
> Python Applicability
>
> The usual problems with data persistence and sharing apply. The
> format of the external data is only established conventionally, and
> conversions between Python objects and raw memory bytes take the usual
> overhead. 'struct.Struct', 'ctypes.Structure', and 'pickle.Pickler'
> currently offer this functionality, and the buffer offset obtained
> from 'alloc' can be used with all three.
>
> Ex 1.
> s= struct.Struct( 'III' )
> x= alloc( s.size )
> s.pack_into( mem, x, 2, 4, 6 )
> Struct in its current form does not permit random access into
> structure contents; a user must read or write the entire converted
> strucutre in order to update one field. Alternative:
> s= struct.Struct( 'I' )
> x1, x2, x3= alloc( s.size ), alloc( s.size ), alloc( s.size )
> s.pack_into( mem, x1, 2 )
> s.pack_into( mem, x2, 4 )
> s.pack_into( mem, x3, 6 )
>
> Ex 2.
> class Items( ctypes.Structure ):
> _fields_= [
> ( 'x1', ctypes.c_float ),
> ( 'y1', ctypes.c_float ) ]
> x= alloc( ctypes.sizeof( Items ) )
> c= ctypes.cast( mem+ x, ctypes.POINTER( Items ) ).contents
> c.x1, c.y1= 2, 4
> The 'mem' variable is obtained from a call to PyObject_AsWriteBuffer.
>
> Ex 3.
> s= pickle.dumps( ( 2, 4, 6 ) )
> x= alloc( len( s ) )
> mem[ x: x+ len( s ) ]= s
> 'dumps' is still slow and nor does permit random access into contents.
>
> Use Cases Revisited
>
> Use Case 1: Hierarchical ElementTree-style data
> Solution: Dynamically allocate the tree and its elements.
>
> Node: tag: a
> Node: tag: b
> Node: tag: c
> Node: text: Foo
>
> The user wants to change "Foo" to "Foobar".
>
> Node: tag: a
> Node: tag: b
> Node: tag: c
> Node: text: Foobar
>
> Deallocate 'Node: text: Foo', allocate 'Node: text: Foobar', and store
> the new offset into 'Node: tag: c'. Total writes 6 bytes 'foobar', a
> one-word offset, and approximatly 5- 10-word metadata update.
>
> Use Case 2: Web session logger
> Dynamically allocate a linked list of data points.
>
> Data: 'friendster.com'
> Data: 'My Account'
>
> Allocate one block for each string, adding it to a linked list. As
> listeners acknowledge each data point, remove it from the linked
> list. Keep the head node in the 'root offset' metadata field.
>
> Restrictions
>
> It is not possible for persistent memory to refer to live memory. Any
> objects it refers to must also be located in file. Their mapped
> addresses must not be stored, only their offsets into it. However,
> live references to persistent memory are eminently possible.
>
> Current Status
>
> A pure Python alloc-free implementation based on the GNU PAVL tree
> library is on Google Code. It is only in proof-of-concept form and
> not commented, but does contain a first-pass test suite. See:
> http://code.google.com/p/pymmapstruc...wse/#svn/trunk
> The ctypes solution for access is advised.


You should review Zope's ZODB and/or memcached before putting in too much effort.

-Larry

Steven D'Aprano 09-09-2008 10:58 PM

Re: dynamic allocation file buffer
 
On Tue, 09 Sep 2008 14:59:19 -0700, castironpi wrote:

> I will try my idea again. I want to talk to people about a module I
> want to write and I will take the time to explain it. I think it's a
> "cool idea" that a lot of people, forgiving the slang, could benefit
> from. What are its flaws?


[snip long description with not-very-credible use-cases]

You've created a solution to a problem which (probably) only affects a
very small number of people, at least judging by your use-cases. Who has
a 4GB XML file, and how much crack did they smoke?

Castironpi, what do *you* use this proof-of-concept module for? Don't
bother tell us what you think *we* should use it for. Tell us what you're
using it for, or at least what somebody else is using it for. If this is
just a module that you think will be cool, I don't like your chances of
people caring. There is no shortage of "cool" software that isn't useful
for anything, and unlike eye-candy, nobody is going to use your module
just because they like the algorithm.

If you don't have an existing application for the software, then explain
what it does (not how) and give some idea of the performance ("it's alpha
and written in Python and really slow, but I will re-write it in C and
expect it to make a billion random accesses in a 10GB file per
millisecond", or whatever). You might be lucky and have somebody say
"Hey, that's just the tool I need to solve my problem!".


--
Steven

castironpi 09-10-2008 12:28 AM

Re: dynamic allocation file buffer
 
On Sep 9, 5:44*pm, Larry Bates <larry.ba...@vitalEsafe.com> wrote:
> castironpi wrote:
> > I will try my idea again. *I want to talk to people about a module I
> > want to write and I will take the time to explain it. *I think it's a
> > "cool idea" that a lot of people, forgiving the slang, could benefit
> > from. *What are its flaws?

>
> > A user has a file he is using either 1/ to persist binary data after
> > the run of a single program (persistence) or 2/ share binary data
> > between concurrently running programs (IPC / shared memory). *The data
> > are records of variable types and lengths that can change over time.
> > He wants to change a record that's already present in the file. *Here
> > are two examples.

>
> > Use Case 1: Hierarchical ElementTree-style data

>
> > A user has an XML file like the one shown here.

>
> > <a>
> > * <b>
> > * * <c>Foo</c>
> > * </b>
> > * ...

>
> > He wants to change "Foo" to "Foobar".

>
> > <a>
> > * <b>
> > * * <c>Foobar</c>
> > * </b>
> > * ...

>
> > The change he wants to make is at the beginning of a 4GB file, and
> > recopying the remainder is an unacceptable resource drain.

>
> > Use Case 2: Web session logger

>
> > A tutor application has written a plugin to a webbrowser that records
> > the order of a user's mouse and keyboard activity during a browsing
> > session, and makes them concurrently available to other applications
> > in a suite, which are written in varying lanugages. *The user takes
> > some action, such as surfing to a site or clicking on a link. *The
> > browser plugin records that sequence into shared memory, where it is
> > marked as acknowledged by the listener programs, and recycled back
> > into an unused block. *URLs, user inputs, and link text can be of any
> > length, so truncating them to fit a fixed length is not an option.

>
> > Existing Solutions

>
> > - Shelve - A Python Standard Library shelf object can store a random
> > access dictionary mapping strings to pickled objects. *It does not
> > provide for hierarchical data stores, and objects must be unpickled
> > before they can be examined.
> > - Relational Database - Separate tables of nodes, attributes, and
> > text, and the relations between them are slow and unwieldy to
> > reproduce the contents of a dynamic structure. *The VARCHAR data type
> > still carries a maximum size, no more flexible than fixed-length
> > records.
> > - POSH - Python Object Sharing - A module currently in its alpha stage
> > promises to make it possible to store Python objects directly in
> > shared memory. *In its current form, its only entry point is 'fork'
> > and does not offer persistence, only sharing. *See:
> > * *http://poshmodule.sourceforge.net/

>
> > Dynamic Allocation

>
> > The traditional solution, dynamic memory allocation, is to maintain a
> > metadata list of "free blocks" that are available to write to. *See:
> > * *http://en.wikipedia.org/wiki/Dynamic_memory_allocation
> > * *http://en.wikipedia.org/wiki/Malloc
> > * *http://en.wikipedia.org/wiki/Mmap
> > * *http://en.wikipedia.org/wiki/Memory_leak
> > The catch, and the crux of the proposal, is that the metadata must be
> > stored in shared memory along with the data themselves. *Assuming they
> > are, a program can acquire the offset of an unused block of a
> > sufficient size for its data, then write it to the file at that
> > offset. *The metadata can maintain the offset of one root member, to
> > serve as a 'table of contents' or header for the remainder of the
> > file. *It can be grown and reassigned as needed.

>
> > An acquaintence writes: It could be quite useful for highly concurrent
> > systems: the overhead involved with interprocess communication can be
> > overwhelming, and something more flexible than normal object
> > persistence to disk might be worth having.

>
> > Python Applicability

>
> > The usual problems with data persistence and sharing apply. *The
> > format of the external data is only established conventionally, and
> > conversions between Python objects and raw memory bytes take the usual
> > overhead. *'struct.Struct', 'ctypes.Structure', and 'pickle.Pickler'
> > currently offer this functionality, and the buffer offset obtained
> > from 'alloc' can be used with all three.

>
> > Ex 1.
> > * * s= struct.Struct( 'III' )
> > * * x= alloc( s.size )
> > * * s.pack_into( mem, x, 2, 4, 6 )
> > Struct in its current form does not permit random access into
> > structure contents; a user must read or write the entire converted
> > strucutre in order to update one field. *Alternative:
> > * * s= struct.Struct( 'I' )
> > * * x1, x2, x3= alloc( s.size ), alloc( s.size ), alloc( s.size )
> > * * s.pack_into( mem, x1, 2 )
> > * * s.pack_into( mem, x2, 4 )
> > * * s.pack_into( mem, x3, 6 )

>
> > Ex 2.
> > * * class Items( ctypes.Structure ):
> > * * * * _fields_= [
> > * * * * * * ( 'x1', ctypes.c_float ),
> > * * * * * * ( 'y1', ctypes.c_float ) ]
> > * * x= alloc( ctypes.sizeof( Items ) )
> > * * c= ctypes.cast( mem+ x, ctypes.POINTER( Items ) ).contents
> > * * c.x1, c.y1= 2, 4
> > The 'mem' variable is obtained from a call to PyObject_AsWriteBuffer.

>
> > Ex 3.
> > * * s= pickle.dumps( ( 2, 4, 6 ) )
> > * * x= alloc( len( s ) )
> > * * mem[ x: x+ len( s ) ]= s
> > 'dumps' is still slow and nor does permit random access into contents.

>
> > Use Cases Revisited

>
> > Use Case 1: Hierarchical ElementTree-style data
> > Solution: Dynamically allocate the tree and its elements.

>
> > Node: tag: a
> > Node: tag: b
> > Node: tag: c
> > Node: text: Foo

>
> > The user wants to change "Foo" to "Foobar".

>
> > Node: tag: a
> > Node: tag: b
> > Node: tag: c
> > Node: text: Foobar

>
> > Deallocate 'Node: text: Foo', allocate 'Node: text: Foobar', and store
> > the new offset into 'Node: tag: c'. *Total writes 6 bytes 'foobar', a
> > one-word offset, and approximatly 5- 10-word metadata update.

>
> > Use Case 2: Web session logger
> > Dynamically allocate a linked list of data points.

>
> > Data: 'friendster.com'
> > Data: 'My Account'

>
> > Allocate one block for each string, adding it to a linked list. *As
> > listeners acknowledge each data point, remove it from the linked
> > list. *Keep the head node in the 'root offset' metadata field.

>
> > Restrictions

>
> > It is not possible for persistent memory to refer to live memory. *Any
> > objects it refers to must also be located in file. *Their mapped
> > addresses must not be stored, only their offsets into it. *However,
> > live references to persistent memory are eminently possible.

>
> > Current Status

>
> > A pure Python alloc-free implementation based on the GNU PAVL tree
> > library is on Google Code. *It is only in proof-of-concept form and
> > not commented, but does contain a first-pass test suite. *See:
> > * *http://code.google.com/p/pymmapstruc...wse/#svn/trunk
> > The ctypes solution for access is advised.

>
> You should review Zope's ZODB and/or memcached before putting in too much effort.
>
> -Larry


Larry,

I'd love to say they were exactly what I was looking for. They're
not. I confess, I stopped reading ZODB when I got to the "uses
pickles" part, and 'memcached' when I got to the awkward and unwieldy
"SELECT FROM" part. I'm aware of both of those and my solution does
something neither other does.

castironpi 09-10-2008 12:48 AM

Re: dynamic allocation file buffer
 
On Sep 9, 5:58*pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> On Tue, 09 Sep 2008 14:59:19 -0700, castironpi wrote:
> > I will try my idea again. *I want to talk to people about a module I
> > want to write and I will take the time to explain it. *I think it's a
> > "cool idea" that a lot of people, forgiving the slang, could benefit
> > from. *What are its flaws?

>
> [snip long description with not-very-credible use-cases]


Steven,

> You've created a solution to a problem which (probably) only affects a
> very small number of people, at least judging by your use-cases. Who has
> a 4GB XML file, and how much crack did they smoke?


I judge from the existence of 'shelve' and 'pickle' modules, and
relational database packages, that the problem I am addressing is not
rare. It could be the millionaire investor across the street, the
venture capitalist down the hall, or the guy with a huge CD catalog.

> Castironpi, what do *you* use this proof-of-concept module for?


Honestly, nothing yet. I just wrote it. My user community and
customer base are very small. Originally, I wanted to store variable-
length strings in a file, where shelves and databases were overkill.
I created it for its beauty, sorry to disappoint.

> Don't
> bother tell us what you think *we* should use it for. Tell us what you're
> using it for, or at least what somebody else is using it for. If this is
> just a module that you think will be cool, I don't like your chances of
> people caring. There is no shortage of "cool" software that isn't useful
> for anything, and unlike eye-candy, nobody is going to use your module
> just because they like the algorithm.


Unfortunately, nobody is going to care about most of the uses I have
for it 'til I have a job. I'm goofing around with a laptop,
remembering when my databases professor kept dropping the ball on
VARCHARs. If you want a sound byte, think, "imagine programming
without 'new' and 'malloc'."

> If you don't have an existing application for the software, then explain
> what it does (not how) and give some idea of the performance ("it's alpha
> and written in Python and really slow, but I will re-write it in C and
> expect it to make a billion random accesses in a 10GB file per
> millisecond", or whatever). You might be lucky and have somebody say
> "Hey, that's just the tool I need to solve my problem!".


I wrote a Rope implementation just to test drive it. It exceeded the
native immutable string type at 2 megs. It used 'struct' instead of
'ctypes', so that number could conceivably come down. I am intending
to leave it in pure Python, so there.

> --
> Steven


Pleasure chatting as always sir.

George Sakkis 09-10-2008 03:03 AM

Re: dynamic allocation file buffer
 
On Sep 9, 5:59*pm, castironpi <castiro...@gmail.com> wrote:

> I will try my idea again. *I want to talk to people about a
> module I want to write and I will take the time to explain it.
>*I think it's a "cool idea" that a lot of people, forgiving the
> slang, could benefit from. *
>
> (snipped)
>
> A pure Python alloc-free implementation based on the GNU PAVL
> tree library is on Google Code. *It is only in proof-of-concept
> form and not commented, but does contain a first-pass test
> suite. *See:
> * *http://code.google.com/p/pymmapstruc...wse/#svn/trunk


So at best (i.e. if it actually makes any sense; I didn't read it),
this is an ANNouncement of a pre-alpha piece of code. ANN posts rarely
attract replies, even when they are about production/stable software.
Thankfully, most people don't expect (let alone "require") readers to
share their interest or enthusiasm by replying to the ANN. Given your
past semi-coherent and incoherent posts, expecting people to jump on
such a thread is a rather tall order.

George

Fredrik Lundh 09-10-2008 07:26 AM

Re: dynamic allocation file buffer
 
Steven D'Aprano wrote:

> You've created a solution to a problem which (probably) only affects a
> very small number of people, at least judging by your use-cases. Who has
> a 4GB XML file


Getting 4GB XML files from, say, logging processes or databases that can
render their output as XML is not that uncommon. They're usually
record-oriented, and are intended to be processed as streams. And given
the right tools, doing that is no harder than doing the same to a 4GB
text file.

</F>


Steven D'Aprano 09-10-2008 10:24 AM

Re: dynamic allocation file buffer
 
On Wed, 10 Sep 2008 09:26:20 +0200, Fredrik Lundh wrote:

> Steven D'Aprano wrote:
>
>> You've created a solution to a problem which (probably) only affects a
>> very small number of people, at least judging by your use-cases. Who
>> has a 4GB XML file

>
> Getting 4GB XML files from, say, logging processes or databases that can
> render their output as XML is not that uncommon. They're usually
> record-oriented, and are intended to be processed as streams. And given
> the right tools, doing that is no harder than doing the same to a 4GB
> text file.



Fair enough, that's a good point.

But would you expect random access to a 4GB XML file? If I've understood
what Castironpi is trying for, his primary use case was for people
wanting exactly that.


--
Steven

Aaron \Castironpi\ Brady 09-10-2008 06:59 PM

Re: dynamic allocation file buffer
 
On Sep 10, 5:24*am, Steven D'Aprano
<ste...@REMOVE.THIS.cybersource.com.au> wrote:
> On Wed, 10 Sep 2008 09:26:20 +0200, Fredrik Lundh wrote:
> > Steven D'Aprano wrote:

>
> >> You've created a solution to a problem which (probably) only affects a
> >> very small number of people, at least judging by your use-cases. Who
> >> has a 4GB XML file

>
> > Getting 4GB XML files from, say, logging processes or databases that can
> > render their output as XML is not that uncommon. *They're usually
> > record-oriented, and are intended to be processed as streams. *And given
> > the right tools, doing that is no harder than doing the same to a 4GB
> > text file.

>
> Fair enough, that's a good point.
>
> But would you expect random access to a 4GB XML file? If I've understood
> what Castironpi is trying for, his primary use case was for people
> wanting exactly that.
>
> --
> Steven


Steven,

Are you claiming that sequential storage is sufficient for small
amounts of data, and relational db.s are necessary for large amounts?
It's possible that there is only the fringe exception, in which case
'alloc/free' aren't useful in the majority of cases, and will never
win customers away from the more mature competition.

Regardless, it is an elegant solution to the problem of storing
variable-length strings, with hardly any practical value. Perfect for
grad school.

Aaron \Castironpi\ Brady 09-10-2008 07:13 PM

Re: dynamic allocation file buffer
 
On Sep 9, 10:03*pm, George Sakkis <george.sak...@gmail.com> wrote:
> On Sep 9, 5:59*pm, castironpi <castiro...@gmail.com> wrote:
>
> > I will try my idea again. *I want to talk to people about a
> > module I want to write and I will take the time to explain it.
> >*I think it's a "cool idea" that a lot of people, forgiving the
> > slang, could benefit from. *

>
> > (snipped)

>
> > A pure Python alloc-free implementation based on the GNU PAVL
> > tree library is on Google Code. *It is only in proof-of-concept
> > form and not commented, but does contain a first-pass test
> > suite. *See:
> > * *http://code.google.com/p/pymmapstruc...wse/#svn/trunk

>
> So at best (i.e. if it actually makes any sense; I didn't read it),
> this is an ANNouncement of a pre-alpha piece of code. ANN posts rarely
> attract replies, even when they are about production/stable software.
> Thankfully, most people don't expect (let alone "require") readers to
> share their interest or enthusiasm by replying to the ANN. Given your
> past semi-coherent and incoherent posts, expecting people to jump on
> such a thread is a rather tall order.
>
> George


No, I'm just excited about it and want to share. I definitely think
that discouragement of new ideas, successes, and personal expressions
is a weakness that society has, and something that c-l-py is missing
as well. I want to encourage them in other people so I will do it
myself.

As for the practicality of this module, I am definitely receiving
skepticism from the group. Further, its feasibility is in question.

For instance, no one has pointed out, and I only came across last
night, that IPC synchronization is non-trivial and possibly platform-
dependent. Of course it's a prerequisite for the advance of any IPC
mod, so I'm glad I did not release an ANN.



All times are GMT. The time now is 08:34 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.