Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Negative array indicies and slice()

Reply
Thread Tools

Negative array indicies and slice()

 
 
Chris Kaynor
Guest
Posts: n/a
 
      10-30-2012
On Mon, Oct 29, 2012 at 11:00 AM, Andrew Robinson
<(E-Mail Removed)> wrote:
>
> Let's look at the source code rather than the web notes -- the source must
> be the true answer anyhow.
>
> I downloaded the source code for python 3.3.0, as the tbz;
> In the directory "Python-3.3.0/Python", look at Python-ast.c, line 2089 &
> ff.
>
> Clearly a slice is malloced for a slice_ty type.
> It has four elements: kind, lower, upper, and step.
>
> So, tracing it back to the struct definition...
>
> "Include/Python-ast.h" has "typedef struct _slice *slice_ty;"
>
> And, here's the answer!:
>
> enum _slice_kind {Slice_kind=1, ExtSlice_kind=2, Index_kind=3};
> struct _slice {
> enum _slice_kind kind;
> union {
> struct {
> expr_ty lower;
> expr_ty upper;
> expr_ty step;
> } Slice;
>
> struct {
> asdl_seq *dims;
> } ExtSlice;
>
> struct {
> expr_ty value;
> } Index;
>
> } v;
> };
>
>
> So, slice() does indeed have arbitrary python types included in it; contrary
> to what I read elsewhere.
> expr_ty is a pointer to an arbitrary expression, so the actual structure is
> 4 pointers, at 32 bits each = 16 bytes.
> The size of the structure itself, given in an earlier post, is 20 bytes --
> which means one more pointer is involved, perhaps the one pointing to the
> slice structure itself.
>
> Hmm...!
>
> An empty tuple gives sys.getsizeof( () ) = 24.
>
> But, I would expect a tuple to be merely a list of object pointers; hence I
> would expect 4 bytes for len(), and then a head pointer 4 bytes, and then a
> pointer for each object.
> 3 objects gives 12 bytes, + 8 = 16 bytes.
>
> Then we need one more pointer so Python knows where the struct is...
> So a Tuple of 3 objects ought to fit nicely into 20 bytes; the same size as
> slice() --
>
> but it's 24, even when empty...
> And 36 when initialized...
> What are the extra 16 bytes for?


Every Python object requires two pieces of data, both of which are
pointer-sized (one is a pointer, one is an int the size of a pointer).
These are: a pointer to the object's type, and the object's reference
count.

A tuple actually does not need a head pointer: the head pointer is
merely an offset from the tuple's pointer. It merely has a ref count,
type, an item count, and pointers to its contents.

A slice has the same type pointer and reference count, then three
pointers to the start, stop, and step objects. This means a slice
object should be the same size as a two-item tuple: the tuple needs a
count, while that is fixed at 3 for a slice (though some items may be
unset).

NOTE: The above is taken from reading the source code for Python 2.6.
For some odd reason, I am getting that an empty tuple consists of 6
pointer-sized objects (48 bytes on x64), rather than the expected 3
pointer-sized (24 bytes on x64). Slices are showing up as the expected
5 pointer-sized (40 bytes on x64), and tuples grow at the expected 1
pointer (8 bytes on x64) per item. I imagine I am missing something,
but cannot figure out what that would be.

>
> All I see is:
> typedef struct { object** whatever } PyTupleObject;
>

 
Reply With Quote
 
 
 
 
Michael Torrie
Guest
Posts: n/a
 
      10-30-2012
On 10/29/2012 01:34 PM, Andrew Robinson wrote:
> No, I don't think it big and complicated. I do think it has timing
> implications which are undesirable because of how *much* slices are used.
> In an embedded target -- I have to optimize; and I will have to reject
> certain parts of Python to make it fit and run fast enough to be useful.


Since you can't port the full Python system to your embedded machine
anyway, why not just port a subset of python and modify it to suit your
needs right there in the C code. It would be a fork, yes, but any port
to this target will be a fork anyway. No matter how you cut it, it
won't be easy at all, and won't be easy to maintain. You'll basically
be writing your own implementation of Python (that's what
python-on-a-chip is, and that's why it's closer to Python 2.x than
Python 3). That's totally fine, though. I get the impression you think
you will be able to port cPython as is to your target. Without a libc,
an MMU on the CPU, and a kernel, it's not going to just compile and run.

Anyway, the only solution, given your constraints, is to implement your
own python interpreter to handle a subset of Python, and modify it to
suit your tastes. What you want with slicing behavior changes has no
place in the normal cPython implementation, for a lot of reasons. The
main one is that it is already possible to implement what you are
talking about in your own python class, which is a fine solution for a
normal computer with memory and CPU power available.
 
Reply With Quote
 
 
 
 
Ian Kelly
Guest
Posts: n/a
 
      10-30-2012
On Mon, Oct 29, 2012 at 12:00 PM, Andrew Robinson
<(E-Mail Removed)> wrote:
> I downloaded the source code for python 3.3.0, as the tbz;
> In the directory "Python-3.3.0/Python", look at Python-ast.c, line 2089 &
> ff.


Python-ast.c is part of the compiler code. That's not the struct used
to represent the object at runtime, but the struct used to represent
the AST node while compiling.

For the runtime definition of slices, look at sliceobject.h and
sliceobject.c. Slices are represented as:

typedef struct {
PyObject_HEAD
PyObject *start, *stop, *step; /* not NULL */
} PySliceObject;

"PyObject_HEAD" is a macro that incorporates the object type pointer
and the reference count. Hence, 5 words.
 
Reply With Quote
 
Ian Kelly
Guest
Posts: n/a
 
      10-30-2012
On Mon, Oct 29, 2012 at 4:39 PM, Andrew Robinson
<(E-Mail Removed)> wrote:
> In addition to those items you mention, of which the reference count is not
> even *inside* the struct -- there is additional debugging information not
> mentioned. Built in objects contain a "line number", a "column number", and
> a "context" pointer. These each require a full word of storage.
>
> Also, built in types appear to have a "kind" field which indicates the
> object "type" but is not a pointer. That suggests two "object" type
> indicators, a generic pointer (probably pointing to "builtin"? somewhere
> outside the struct) and a specific one (an enum) inside the "C" struct.
>
> Inside the tuple struct, I count 4 undocumented words of information.
> Over all, there is a length, the list of pointers, a "kind", "line", "col"
> and "context"; making 6 pieces in total.
>
> Although your comment says the head pointer is not required; I found in
> 3.3.0 that it is a true head pointer; The Tuple() function on line 2069 of
> Python-ast.c, (3.3 version) -- is passed in a pointer called *elts. That
> pointer is copied into the Tuple struct.


As above, you're looking at the compiler code, which is why you're
finding things like "line" and "column". The tuple struct is defined
in tupleobject.h and stores tuple elements in a tail array.

> How ironic, slices don't have debugging info, that's the main reason they
> are smaller.
> When I do slice(3,0,2), suprisingly "Slice()" is NOT called.
> But when I do a[1:2:3] it *IS* called.


Because compiling the latter involves parsing slicing syntax, and
compiling the former does not.
 
Reply With Quote
 
Ian Kelly
Guest
Posts: n/a
 
      10-30-2012
On Mon, Oct 29, 2012 at 7:49 PM, Chris Kaynor <(E-Mail Removed)> wrote:
> NOTE: The above is taken from reading the source code for Python 2.6.
> For some odd reason, I am getting that an empty tuple consists of 6
> pointer-sized objects (48 bytes on x64), rather than the expected 3
> pointer-sized (24 bytes on x64). Slices are showing up as the expected
> 5 pointer-sized (40 bytes on x64), and tuples grow at the expected 1
> pointer (8 bytes on x64) per item. I imagine I am missing something,
> but cannot figure out what that would be.


I'm likewise seeing 4 extra words in tuples in 32-bit Python 3.3.
What I've found is that for tuples and other collection objects, the
garbage collector tacks on an extra header before the object in
memory. That header looks like this:

typedef union _gc_head {
struct {
union _gc_head *gc_next;
union _gc_head *gc_prev;
Py_ssize_t gc_refs;
} gc;
long double dummy; /* force worst-case alignment */
} PyGC_Head;

gc_next and gc_prev implement a doubly-linked list that the garbage
collector uses to explicitly track this object. gc_refs is used for
counting references during a garbage collection and stores the GC
state of the object otherwise.

I'm not entirely certain why collection objects get this special
treatment, but there you have it.
 
Reply With Quote
 
Ian Kelly
Guest
Posts: n/a
 
      10-30-2012
On Mon, Oct 29, 2012 at 6:17 PM, Andrew Robinson
<(E-Mail Removed)> wrote:
> If you re-check my post to chris, I listed the struct you mention.
> The C code is what is actually run (by GDB breakpoint test) when a tuple is
> instantiated.


When you were running GDB, were you debugging the interactive
interpreter or a precompiled script? The interactive interpreter does
a compilation step for every line entered.

> If the tuple were stripped of the extra data -- then it ought to be as small
> as slice().
> But it's not as small -- so either the sys.getsizeof() is lying -- or the
> struct you mention is not complete.


As just explained, the extra 16 bytes are added by the garbage collector.
 
Reply With Quote
 
Ian Kelly
Guest
Posts: n/a
 
      10-30-2012
On Mon, Oct 29, 2012 at 5:54 PM, Andrew Robinson
<(E-Mail Removed)> wrote:
>> I don't know of a reason why one might need to use a negative start
>> with a positive stop, though.

>
> I've already given several examples; and another poster did too


I meant that I don't know of a reason to do that given the existing
semantics, which is what you were asking for.

I understand and agree that there are potential applications for
allowing slices to wrap around from negative to positive. What I'm
not convinced of is that these applications need or should be handled
by the slicing operation -- which is more commonly understood as
producing subsequences -- especially since they already can be handled
relatively easily by iteration. I think that more users would find
the behavior surprising than useful.
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      10-30-2012
By the way Andrew, the timestamps on your emails appear to be off, or
possibly the time zone. Your posts are allegedly arriving before the
posts you reply to, at least according to my news client.


On Mon, 29 Oct 2012 12:34:24 -0700, Andrew Robinson wrote:

> On 10/29/2012 05:02 PM, Steven D'Aprano wrote:
>> On Mon, 29 Oct 2012 08:42:39 -0700, Andrew Robinson wrote:
>>
>>>>> But, why can't I just overload the existing __getitem__ for lists
>>>>> and not bother writing an entire class?

>> You say that as if writing "an entire class" was a big complicated
>> effort. It isn't. It is trivially simple, a single line:
>>
>> class MyList(list):
>> ...

> No, I don't think it big and complicated. I do think it has timing
> implications which are undesirable because of how *much* slices are
> used. In an embedded target -- I have to optimize; and I will have to
> reject certain parts of Python to make it fit and run fast enough to be
> useful.


Then I look forward to seeing your profiling results that show that the
overhead of subclassing list is the bottleneck in your application.

Until then, you are making the classic blunder of the premature optimizer:

"More computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason — including
blind stupidity." — W.A. Wulf


I am not impressed by performance arguments when you have (apparently)
neither identified the bottlenecks in your code, nor even measured the
performance. You are essentially *guessing* where the bottlenecks are,
and *hoping* that some suggested change will be an optimization rather
than a pessimization.

Of course I may be wrong, and you have profiled your code and determined
that the overhead of inheritance is a problem. If so, that's a different
ball game. But your posts so far suggest to me that you're trying to
predict performance optimizations rather than measure them.


>>>> You can just overload that one method in a subclass of list. Being
>>>> able to monkey-patch __getitem__ for the list class itself would not
>>>> be advisable, as it would affect all list slicing anywhere in your
>>>> program and possibly lead to some unexpected behaviors.
>>> That's what I am curious about.
>>> What unexpected behaviors would a "monkey patch" typically cause?

>> What part of "unexpected" is unclear?
>>

> Ahh -- The I don't know approach! It's only unexpected if one is a bad
> programmer...!


No, it is unexpected because once you start relying on monkey-patching,
and the libraries you install are monkey-patching, you have a
combinational explosion of interactions. Any piece of code, anywhere,
could monkey-patch any other piece of code -- it is a form of action-at-a-
distance coding, like GOTOs and global variables. Use it with caution, in
moderation.


>> Let me see if I can illustrate a flavour of the sort of things that can
>> happen if monkey-patching built-ins were allowed.

[...]
> Right, which means that people developing the libraries made
> contradictory assumptions.


Not necessarily. Not only can monkey-patches conflict, but they can
combine in bad ways. It isn't just that Fred assumes X and Barney assumes
not-X, but also that Fred assumes X and Barney assumes Y and *nobody*
imagined that there was some interaction between X and Y.



[...]
>> Ruby allows monkey-patching of everything. And the result was
>> predictable:
>>
>> http://devblog.avdi.org/2008/02/23/w...is-destroying-

ruby/
>>
>>

> I read that post carefully; and the author purposely notes that he is
> exaggerating.


Not carefully enough. He notes that he was using a provocative title and
that he doesn't actually think that Ruby is being destroyed. But the
actual harm he describes is real, e.g. bugs that take months to track
down.


> What you are talking about is namespace preservation;


I haven't mentioned namespaces. Nothing I have said has anything to do
with namespaces. I remember Apple monkey-patching routines in ROM back in
the mid 1980s, long before there was anything corresponding to namespaces
in Apple's programming model.


> and I am thinking
> about it. I can preserve it -- but only if I disallow true Python
> primitives in my own interpreter; I can't provide two sets in the memory
> footprint I am using.


If you want to write a language that is not Python, go right ahead.


> If someone had a clear explanation of the disadvantages of allowing an
> iterator, or a tuple -- in place of a slice() -- I would have no qualms
> dropping the subject. However, I am not finding that yet. I am finding
> very small optimization issues...


Python has certain public interfaces. If your language does not support
those public interfaces, then it might be an awesome language, but it is
not Python.

Python slices have certain features:

* they can be used repeatedly;

* they have public attributes start, step and stop;

* The stop attribute can be None, and the slice will default to the
length of the thing being sliced, which is not known until call-time.

* Slices have a public indices method.

These are known characteristics of slices, and there is Python code that
relies on it. If your language does not support these, then it is not
Python.

Iterators cannot replace slices, because once you have looked up an
iterator value, that value is gone, never to be seen again.

Tuples cannot replace slices, because tuples do not have start, step and
stop attributes; most tuples have no need of them, and if you care about
memory footprint, the last thing you want is to give every tuple three
unused or meaningless named attributes. Besides, what values should they
take in an empty tuple?

xrange cannot replaces slices, because there is no way to create an xrange
object with an empty or missing end; besides, xrange objects have no
public start/stop/step attributes.

And none of them have a public indices method.


> The actual *need* for a slice() object still hasn't been demonsrated.


Hell, the actual *need* for slicing hasn't been demonstrated! Or Python!
Since C doesn't have slicing, nobody needs it! Am I right?

Needed or not, that is what Python has, so if your language doesn't have
it, it isn't Python.



> I
> am thinking that the implementation of __getitem__() is very poor
> probably because of legacy issues.


You haven't demonstrated that it is very poor.


> A tuple can also hold None, so ( 1, None, 2 ) is still a valid Tuple.


Sure. And a tuple can also be () or (1, "x", [], None, None, None, None).
Now your list.__getitem__ code has to deal with those instead.


> Alternately: An iterator, like xrange(), could be made which takes None
> as a parameter, or a special value like 'inf'.


So you want to get rid of slices because you want to save every byte of
memory you can, and your way to do that is to introduce a new, *heavier*
type that does more than slices?

Oh man, that's funny.


--
Steven
 
Reply With Quote
 
Ian Kelly
Guest
Posts: n/a
 
      10-30-2012
On Tue, Oct 30, 2012 at 1:21 AM, Ian Kelly <(E-Mail Removed)> wrote:
> I'm not entirely certain why collection objects get this special
> treatment, but there you have it.


Thinking about it some more, this makes sense. The GC header is there
to support garbage collection for the object. Atomic types like ints
do not need this header because they do not reference other objects
and so cannot be involved in reference cycles. For those types,
reference counting is sufficient. For types like collections that do
reference other objects, garbage collection is needed.

Expanding on this, I suspect it is actually a bug that slice objects
are not tracked by the garbage collector. The following code appears
to result in a memory leak:

import gc
gc.disable()
while True:
for i in range(100):
l = []
s = slice(l)
l.append(s)
del s, l
_ = gc.collect()

Try running that and watch your Python memory usage climb and climb.
For contrast, replace the slice with a list and observe that memory
usage does *not* climb. On each iteration, the code constructs a
reference cycle between a slice and a list. It seems that because
slices are not tracked by the garbage collector, it is unable to break
these cycles.
 
Reply With Quote
 
Andrew Robinson
Guest
Posts: n/a
 
      10-30-2012
On 10/30/2012 11:02 AM, Ian Kelly wrote:
> On Tue, Oct 30, 2012 at 10:14 AM, Ethan Furman<(E-Mail Removed)> wrote:
>> File a bug report?

> Looks like it's already been wontfixed back in 2006:
>
> http://bugs.python.org/issue1501180

Thanks, IAN, you've answered the first of my questions and have been a
great help.
(And yes, I was debugging interactive mode... I took a nap after writing
that post, as I realized I had reached my 1 really bad post for the day... )

I at least I finally know why Python chooses to implement slice() as a
separate object from tuple; even if I don't like the implications.

I think there are three main consequences of the present implementation
of slice():

1) The interpreter code size is made larger with no substantial
improvement in functionality, which increases debugging effort.
2) No protection against perverted and surprising (are you surprised?! I
am) memory operation exists.
3) There is memory savings associated with not having garbage collection
overhead.

D'Apriano mentioned the named values, start, stop, step in a slice()
which are an API and legacy issue; These three names must also be
stored in the interpreter someplace. Since slice is defined at the "C"
level as a struct, have you already found these names in the source code
(hard-coded), or are they part of a .py file associated with the
interface to the "C" code?

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Negative array indicies and slice() Ian Kelly Python 0 10-31-2012 07:43 AM
Re: Negative array indicies and slice() Dennis Lee Bieber Python 0 10-31-2012 03:36 AM
Re: Negative array indicies and slice() Ethan Furman Python 0 10-30-2012 09:21 PM
Re: Negative array indicies and slice() Mark Lawrence Python 0 10-29-2012 10:10 AM
Re: Negative array indicies and slice() Andrew Robinson Python 0 10-29-2012 02:31 AM



Advertisments