Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Multi-dimensional list initialization (http://www.velocityreviews.com/forums/t954205-multi-dimensional-list-initialization.html)

 Demian Brecht 11-05-2012 06:27 AM

Multi-dimensional list initialization

So, here I was thinking "oh, this is a nice, easy way to initialize a 4D matrix" (running 2.7.3, non-core libs not allowed):

m = [[None] * 4] * 4

The way to get what I was after was:

m = [[None] * 4, [None] * 4, [None] * 4, [None * 4]]

(Obviously, I could have just hardcoded the initialization, but I'm too lazy to type all that out ;))

The behaviour I encountered seems a little contradictory to me. [None] * 4 creates four distinct elements in a single array while [[None] * 4] * 4 creates one distinct array of four distinct elements, with three references to it:

>>> a = [None] * 4
>>> a[0] = 'a'
>>> a

['a', None, None, None]

>>> m = [[None] * 4] * 4
>>> m[0][0] = 'm'
>>> m

[['m', None, None, None], ['m', None, None, None], ['m', None, None, None], ['m', None, None, None]]

Is this expected behaviour and if so, why? In my mind either result makes sense, but the inconsistency is what throws me off.

Demian Brecht
@demianbrecht
http://demianbrecht.github.com

 Hans Mulder 11-05-2012 09:13 AM

Re: Multi-dimensional list initialization

On 5/11/12 07:27:52, Demian Brecht wrote:
> So, here I was thinking "oh, this is a nice, easy way to initialize a 4D matrix"
> (running 2.7.3, non-core libs not allowed):
>
> m = [[None] * 4] * 4
>
> The way to get what I was after was:
>
> m = [[None] * 4, [None] * 4, [None] * 4, [None * 4]]

Or alternateively:

m = [[None] * 4 for _ in range(4)]

> (Obviously, I could have just hardcoded the initialization, but I'm too
> lazy to type all that out ;))
>
> The behaviour I encountered seems a little contradictory to me.
> [None] * 4 creates four distinct elements in a single array

Actually, it creates a list with four references to the same object.
But then, this object is immutable, so you won't notice that it's the
same object.

> while [[None] * 4] * 4 creates one distinct array of four distinct
> elements, with three references to it:

We usually phrase that as "a list with four references to the
same list". The first reference is not special in any way.

>>>> a = [None] * 4
>>>> a[0] = 'a'
>>>> a

> ['a', None, None, None]
>
>>>> m = [[None] * 4] * 4
>>>> m[0][0] = 'm'
>>>> m

> [['m', None, None, None], ['m', None, None, None], ['m', None, None, None], ['m', None, None, None]]
>
> Is this expected behaviour

Yes.

> and if so, why? In my mind either result makes sense, but the
> inconsistency is what throws me off.

There's no inconsistency: in both cases you get a list with four
references to the same object. The only difference is that in the
fist case, the references are to an immutable object, so the fact
that it's the same object won't hurt you.

Hope this helps,

-- HansM

 wxjmfauth@gmail.com 11-05-2012 09:55 AM

Re: Multi-dimensional list initialization

Le lundi 5 novembre 2012 07:28:00 UTC+1, Demian Brecht a écrit*:
> So, here I was thinking "oh, this is a nice, easy way to initialize a 4D matrix" (running 2.7.3, non-core libs not allowed):
>
>
>
> m = [[None] * 4] * 4
>
>
>
> The way to get what I was after was:
>
>
>
> m = [[None] * 4, [None] * 4, [None] * 4, [None * 4]]
>
>
>
> (Obviously, I could have just hardcoded the initialization, but I'm too lazy to type all that out ;))
>
>
>
> The behaviour I encountered seems a little contradictory to me. [None] * 4 creates four distinct elements in a single array while [[None] * 4] * 4 creates one distinct array of four distinct elements, with three references to it:
>
>
>
> >>> a = [None] * 4

>
> >>> a[0] = 'a'

>
> >>> a

>
> ['a', None, None, None]
>
>
>
> >>> m = [[None] * 4] * 4

>
> >>> m[0][0] = 'm'

>
> >>> m

>
> [['m', None, None, None], ['m', None, None, None], ['m', None, None, None], ['m', None, None, None]]
>
>
>
> Is this expected behaviour and if so, why? In my mind either result makessense, but the inconsistency is what throws me off.
>
>
>
> Demian Brecht
>
> @demianbrecht
>
> http://demianbrecht.github.com

----------

You probably mean a two-dimensional matrix not a 4D matrix.

>>> def DefMatrix(nrow, ncol, val):

.... return [[val] * ncol for i in range(nrow)]
....
>>> aa = DefMatrix(2, 3, 1.0)
>>> aa
>>> aa = DefMatrix(2, 3, 1.0)
>>> aa

[[1.0, 1.0, 1.0], [1.0, 1.0, 1.0]]
>>> aa[0][0] = 3.14
>>> aa[1][2] = 2.718
>>> aa

[[3.14, 1.0, 1.0], [1.0, 1.0, 2.718]]
>>>
>>> bb = DefMatrix(2, 3, None)
>>> bb

[[None, None, None], [None, None, None]]
>>> bb[0][0] = 3.14
>>> bb[1][2] = 2.718
>>> bb

[[3.14, None, None], [None, None, 2.718]]

jmf

 Oscar Benjamin 11-06-2012 01:32 AM

Re: Multi-dimensional list initialization

On 5 November 2012 09:13, Hans Mulder <hansmu@xs4all.nl> wrote:
> On 5/11/12 07:27:52, Demian Brecht wrote:
>> So, here I was thinking "oh, this is a nice, easy way to initialize a 4D matrix"
>> (running 2.7.3, non-core libs not allowed):
>>
>> m = [[None] * 4] * 4
>>
>> The way to get what I was after was:
>>
>> m = [[None] * 4, [None] * 4, [None] * 4, [None * 4]]

>
> Or alternateively:
>
> m = [[None] * 4 for _ in range(4)]

That's the way to do it.

I've seen this question many times between here and the python-tutor
list. It does seem to be a common gotcha.

I was just thinking to myself that it would be a hard thing to change
because the list would need to know how to instantiate copies of all
the different types of the elements in the list. Then I realised it
doesn't. It is simply a case of how the list multiplication operator
is implemented and whether it chooses to use a reference to the same
list or make a copy of that list. Since all of this is implemented
within the same list type it is a relatively easy change to make
(ignoring backward compatibility concerns).

I don't see this non-copying list multiplication behaviour as
contradictory but has anyone ever actually found a use for it?

Oscar

 Chris Angelico 11-06-2012 02:01 AM

Re: Multi-dimensional list initialization

On Tue, Nov 6, 2012 at 12:32 PM, Oscar Benjamin
<oscar.j.benjamin@gmail.com> wrote:
> I was just thinking to myself that it would be a hard thing to change
> because the list would need to know how to instantiate copies of all
> the different types of the elements in the list. Then I realised it
> doesn't. It is simply a case of how the list multiplication operator
> is implemented and whether it chooses to use a reference to the same
> list or make a copy of that list. Since all of this is implemented
> within the same list type it is a relatively easy change to make
> (ignoring backward compatibility concerns).
>
> I don't see this non-copying list multiplication behaviour as
> contradictory but has anyone ever actually found a use for it?

Stupid example of why it can't copy:

How do you clone something that isn't Plain Old Data? Ultimately,
that's where the problem comes from. It's easy enough to clone
something that's all scalars (strings, integers, None, etc) and
non-recursive lists/dicts of scalars, but anything more complicated
than that is rather harder.

If you want a deep copy and are prepared to handle any issues that
might result, you can do this:

>>> import copy
>>> a=[[2,3,4]]
>>> a.extend(copy.deepcopy(a))
>>> a[0][1]=10
>>> a

[[2, 10, 4], [2, 3, 4]]

And some things just won't work:

Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
File "C:\Python32\lib\copy.py", line 147, in deepcopy
y = copier(x, memo)
File "C:\Python32\lib\copy.py", line 209, in _deepcopy_list
y.append(deepcopy(a, memo))
File "C:\Python32\lib\copy.py", line 166, in deepcopy
rv = reductor(2)
TypeError: cannot serialize '_io.TextIOWrapper' object

The default behaviour is safe and reliable. When you want something
other than the default, there are ways of doing it.

ChrisA

 Oscar Benjamin 11-06-2012 02:30 AM

Re: Multi-dimensional list initialization

On 6 November 2012 02:01, Chris Angelico <rosuav@gmail.com> wrote:
> On Tue, Nov 6, 2012 at 12:32 PM, Oscar Benjamin
> <oscar.j.benjamin@gmail.com> wrote:
>> I was just thinking to myself that it would be a hard thing to change
>> because the list would need to know how to instantiate copies of all
>> the different types of the elements in the list. Then I realised it
>> doesn't. It is simply a case of how the list multiplication operator
>> is implemented and whether it chooses to use a reference to the same
>> list or make a copy of that list. Since all of this is implemented
>> within the same list type it is a relatively easy change to make
>> (ignoring backward compatibility concerns).
>>
>> I don't see this non-copying list multiplication behaviour as
>> contradictory but has anyone ever actually found a use for it?

>
> Stupid example of why it can't copy:
>
> bad = [open("test_file")] * 4
>
> How do you clone something that isn't Plain Old Data? Ultimately,
> that's where the problem comes from. It's easy enough to clone
> something that's all scalars (strings, integers, None, etc) and
> non-recursive lists/dicts of scalars, but anything more complicated
> than that is rather harder.

That's not what I meant. But now you've made me realise that I was
wrong about what I did mean. In the case of

stuff = [[obj] * n] * m

I thought that the multiplication of the inner list ([obj] * n) by m
could create a new list of lists using copies. On closer inspection I
see that the list being multiplied is in fact [[obj] * n] and that
this list can only know that it is a list of lists by inspecting its
element(s) which makes things more complicated.

I retract my claim that this change would be easy to implement.

Oscar

 Andrew Robinson 11-06-2012 05:51 AM

Re: Multi-dimensional list initialization

On 11/05/2012 06:30 PM, Oscar Benjamin wrote:
> On 6 November 2012 02:01, Chris Angelico<rosuav@gmail.com> wrote:
>> On Tue, Nov 6, 2012 at 12:32 PM, Oscar Benjamin
>> <oscar.j.benjamin@gmail.com> wrote:
>>> I was just thinking to myself that it would be a hard thing to change
>>> because the list would need to know how to instantiate copies of all
>>> the different types of the elements in the list. Then I realised it
>>> doesn't. It is simply a case of how the list multiplication operator
>>> is implemented and whether it chooses to use a reference to the same
>>> list or make a copy of that list. Since all of this is implemented
>>> within the same list type it is a relatively easy change to make
>>> (ignoring backward compatibility concerns).
>>>
>>> I don't see this non-copying list multiplication behaviour as
>>> contradictory but has anyone ever actually found a use for it?

>> Stupid example of why it can't copy:
>>
>> bad = [open("test_file")] * 4
>>
>> How do you clone something that isn't Plain Old Data? Ultimately,
>> that's where the problem comes from. It's easy enough to clone
>> something that's all scalars (strings, integers, None, etc) and
>> non-recursive lists/dicts of scalars, but anything more complicated
>> than that is rather harder.

> That's not what I meant. But now you've made me realise that I was
> wrong about what I did mean. In the case of
>
> stuff = [[obj] * n] * m
>
> I thought that the multiplication of the inner list ([obj] * n) by m
> could create a new list of lists using copies. On closer inspection I
> see that the list being multiplied is in fact [[obj] * n] and that
> this list can only know that it is a list of lists by inspecting its
> element(s) which makes things more complicated.
>
> I retract my claim that this change would be easy to implement.
>
>
> Oscar

Hi Oscar,

In general, people don't use element multiplication (that I have *ever*
seen) to make lists where all elements of the outer most list point to
the same sub-*list* by reference. The most common use of the
multiplication is to fill an array with a constant, or short list of
constants; Hence, almost everyone has to work around the issue as the
initial poster did by using a much longer construction.

The most compact notation in programming really ought to reflect the
most *commonly* desired operation. Otherwise, we're really just making
people do extra typing for no reason.

Further, list comprehensions take quite a bit longer to run than low
level copies; by a factor of roughly 10. SO, it really would be worth
implementing the underlying logic -- even if it wasn't super easy.

I really don't think doing a shallow copy of lists would break anyone's
program.
The non-list elements, whatever they are, can be left as reference
copies -- but any element which is a list ought to be shallow copied.
The behavior observed in the opening post where modifying one element of
a sub-list, modifies all elements of all sub-lists is never desired as
far as I have ever witnessed.

The underlying implementation of Python can check an object type
trivially, and the only routine needed is a shallow list copy. So, no
it really isn't a complicated operation to do shallow copies of lists.

:)

 Chris Angelico 11-06-2012 06:07 AM

Re: Multi-dimensional list initialization

On Tue, Nov 6, 2012 at 4:51 PM, Andrew Robinson
<andrew3@r3dsolutions.com> wrote:
> I really don't think doing a shallow copy of lists would break anyone's
> program.

Well, it's a change, a semantic change. It's almost certainly going to
break _something_. But for the sake of argument, we can suppose that
the change could be made. Would it be the right thing to do?

Shallow copying by default would result in extremely weird behaviour.
All the same confusion would result, only instead of comparing
[None]*4 with [[None]]*4, there'd be confusion over the difference
between [[None]]*4 and [[[None]]]*4.

I don't think it would help anything, and it'd result in a lot more
work for no benefit.

ChrisA

 Andrew Robinson 11-06-2012 08:21 AM

Re: Multi-dimensional list initialization

On 11/05/2012 10:07 PM, Chris Angelico wrote:
> On Tue, Nov 6, 2012 at 4:51 PM, Andrew Robinson
> <andrew3@r3dsolutions.com> wrote:
>> I really don't think doing a shallow copy of lists would break anyone's
>> program.

> Well, it's a change, a semantic change. It's almost certainly going to
> break _something_. But for the sake of argument, we can suppose that
> the change could be made. Would it be the right thing to do?
>
> Shallow copying by default would result in extremely weird behaviour.
> All the same confusion would result, only instead of comparing
> [None]*4 with [[None]]*4, there'd be confusion over the difference
> between [[None]]*4 and [[[None]]]*4.
>
> I don't think it would help anything, and it'd result in a lot more
> work for no benefit.
>
> ChrisA

I don't follow.
a=[ None ]*4 would give a=[ None, None, None, None ] as usual.
All four None's would be the same object, but there are automatically 4
different pointers to it.
Hence,
a[0]=1 would give a=[ 1, None, None, None ] as usual.

a=[ [None] ]*4 would give a=[ [None], [None], [None], [None] ] as usual
BUT:
a[0][0] = 1 would no longer give a=[ [1],[1],[1],[1] ] *Rather* it would
give
a=[ [1].[None].[None],[None] ]

The None objects are all still the same one, BUT the lists themselves
are different.

Again, a=[ ["alpha","beta"] * 4 ] would give:
a=[ ["alpha","beta"], ["alpha","beta"], ["alpha","beta"], ["alpha","beta"] ]

All four strings, "alpha", are the same object -- but there are 5
different lists; The pointers inside the initial list are copied four
times -- not the string objects;
But the *lists* themselves are created new for each replication.

If you nest it another time;
[[[None]]]*4, the same would happen; all lists would be independent --
but the objects which aren't lists would be refrenced-- not copied.

a=[[["alpha","beta"]]]*4 would yield:
a=[[['alpha', 'beta']], [['alpha', 'beta']], [['alpha', 'beta']],
[['alpha', 'beta']]]
and a[0][0]=1 would give [[1],[['alpha', 'beta']], [['alpha', 'beta']],
[['alpha', 'beta']]]]
rather than a=[[1], [1], [1], [1]]

Or at another level down: a[0][0][0]=1 would give: a=[[[1, 'beta']],
[['alpha', 'beta']], [['alpha', 'beta']], [['alpha', 'beta']] ]
rather than a=[[[1, 'beta']], [[1, 'beta']], [[1, 'beta']], [[1, 'beta']]]

The point is, there would be no difference at all noticed in what data
is found where in the array;
the *only* thing that would change is that replacing an item by
assignment would only affect the *location* assigned to -- all other
locations would not be affected.

That really is what people *generally* want.
If the entire list is meant to be read only -- the change would affect
*nothing* at all.

See if you can find *any* python program where people desired the
multiplication to have the die effect that changing an object in one of
the sub lists -- changes all the objects in the other sub lists.

I'm sure you're not going to find it -- and even if you do, it's going
to be 1 program in 1000's.

 Steven D'Aprano 11-06-2012 09:04 AM

Re: Multi-dimensional list initialization

On Mon, 05 Nov 2012 21:51:24 -0800, Andrew Robinson wrote:

> The most compact notation in programming really ought to reflect the
> most *commonly* desired operation. Otherwise, we're really just making
> people do extra typing for no reason.

There are many reasons not to put minimizing of typing ahead of all other
values:

* Typically, code is written once and read many times. Minimizing
typing might save you a second or two once, and then cost you many
seconds every time you read the code. That's why we tell people to
choose meaningful variable names, instead of naming everything "a"
and "b".

* Consistency of semantics is better than a plethora of special
cases. Python has a very simple and useful rule: objects should
not be copied unless explicitly requested to be copied. This is
much better than having to remember whether this operation or
that operation makes a copy. The answer is consistent:

(pardon me for belabouring the point here)

Q: Does [0]*10 make ten copies of the integer object?
A: No, list multiplication doesn't make copies of elements.

A: No, the elements are never copied.

Q: What if I use a singleton? Does [None]*10 try to copy?
A: No, the elements are never copied.

Q: How about things like file objects that can't be copied?
A: No, the elements are never copied.

A: No, the elements are never copied.

Q: How about if the elements are subclasses of list?
A: No, the elements are never copied.

Q: What about other mutable objects like sets or dicts?
A: No, the elements are never copied.

Q: What about instances of custom classes?
A: No, the elements are never copied.

Q: What if they are old-style Classic classes?
A: No, the elements are never copied.

Q: What if I do some funny tricks with the metaclass?
A: No, the elements are never copied.

Q: How about on Tuesdays? I bet they're copied on Tuesdays.
A: No, the elements are never copied.

Your proposal throws away consistency for a trivial benefit on a rare use-
case, and replaces it with a bunch of special cases:

A: Oh yeah, I forgot about lists, they're copied.

Q: How about if the elements are subclasses of list?
A: Hmmm, that's a good one, I'm not actually sure.

Q: How about if I use delegation to proxy a list?
A: Oh no, they definitely won't be copied.

Q: What about other mutable objects like sets or dicts?
A: No, definitely not. Unless people complain enough.

Q: What about instances of custom classes?
A: That's a definite maybe.

Q: How about on Tuesdays? I bet they're copied on Tuesdays.
A: Only if you're in Belgium.

Losing consistency in favour of saving a few characters for something as
uncommon as list multiplication is a poor tradeoff. That's why this
proposal has been rejected again and again and again every time it has
been suggested.

List multiplication [x]*n is conceptually equivalent to:

newlist = []
for i in range(n):
newlist.append(x)

or if you prefer a list comp:

[x for i in range(n)]

This is nice and simple and efficient. Some objects cannot be copied at
all. Copying other objects is slow and inefficient. Keeping list
multiplication consistent, and fast, is MUCH more important than making
it work as expected for the rare case of 2D arrays:

[[0]*n]*m

where there are other alternatives.

> Further, list comprehensions take quite a bit longer to run than low
> level copies; by a factor of roughly 10. SO, it really would be worth
> implementing the underlying logic -- even if it wasn't super easy.

It is true that list multiplication can be much faster than a list comp.
But that's because the list multiply doesn't have to inspect the
elements, copy them, or engage the iteration machinery. Avoiding copying
gives you a big saving:

[steve@ando ~]\$ python3.3 -m timeit -s "x = range(1000)"
"[x for _ in range(100)]" # not copied
100000 loops, best of 3: 11.9 usec per loop

[steve@ando utilities]\$ python3.3 -m timeit -s "x = range(1000)"
"[x[:] for _ in range(100)]" # copied
10000 loops, best of 3: 103 usec per loop

So there's a factor of ten difference right there. If list multiplication
had to make copies, it would lose much of its speed advantage. For large
enough lists, or complicated enough objects, it would become slower than
a list comprehension.

It would be even slower if list multiplication had to inspect each
element first and decide whether or not to copy.

> I really don't think doing a shallow copy of lists would break anyone's
> program.

Anyone who is currently using list multiplication with mutable objects is
expecting that they will be the same object, and relying on that fact.
Otherwise they wouldn't be using list multiplication.

You're suggesting a semantic change. Therefore they will be expecting
something different from what actually happens. Result: broken code.

It's not just mutable objects. It's also objects that can't be copied.
Result: mylist*3 used to work, now it raises an exception. And
performance issues: what used to be fast is now slow.

Even if this change was allowed, it would have to go through a multi-year
process. Python 3.3 is too late -- the absolute earliest would be Python
3.4, which is scheduled for about 18 months from now. So in Python 3.4
you could write:

from __future__ import list_multiplication_copying

to get the behaviour you want, and then in Python 3.5 it would become the
default. That's three years until it becomes the standard. Meanwhile,
there will still be millions of people using Python 2.7 or 3.2, and their
code will behave differently from your code.

Conservatively, if you write code to support three previous releases,
that means you can't use this feature until Python 3.7. So that's about
six years before it can be used widely.

If the problem being solved was big enough, this would be worth doing.
But it's not.

> The non-list elements, whatever they are, can be left as reference
> copies -- but any element which is a list ought to be shallow copied.

That's even worse than "list multiplication always copies". At least that
is simple and consistent, even if it isn't consistent with the rest of
the language, at least it is self-consistent. You are proposing something
much worse: special cases to remember. "Objects aren't copied, except for
lists, which are copied."

And then people will wonder why sets aren't copied, and dicts. People
will make a 2D array like so:

[[0]*5]*10

and it will work. Then they'll write this:

[{}]*5

and wonder why it doesn't work the way they expect. Consistency is *much*
more valuable than ad hoc DWIM semantics. Languages that try to Do What I
Mean somehow end up Doing What Somebody Else Meant, But Not Me.

--
Steven

All times are GMT. The time now is 09:59 AM.