Velocity Reviews > Break up list into groups

# Break up list into groups

danmcleran@yahoo.com
Guest
Posts: n/a

 07-16-2007
All,

I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?

Here's what I've done and the expected output of the algorithm:

def splitIntoGroups(data):
groups = []
local = []

for value in data:
if 0x80 & value:
if len(local) > 0:
groups.append(local)

local = []
local.append(value)
else:
local.append(value)

if len(local) > 0:
groups.append(local)

return groups

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

print splitIntoGroups(l)

Desired result:

[[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
13], [240, 14], [241, 15]]

Thanks,

Dan McLeran

marduk
Guest
Posts: n/a

 07-16-2007
On Mon, 2007-07-16 at 14:11 -0700, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> I can't seem to find an answer to this question anywhere, but I'm
> still looking. My problem is I have a list of values like this:
>
> l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
> 13, 0xF0, 14, 0xF1, 15]
>
> A value with bit 0x80 set delineates the start of a new packet of
> information. What I want to do is to group the packets so that 1, 2, 3
> go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
> tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
> length of the data associated with each tag can vary. I've already
> written an algorithm to do this but I was wondering if some
> combination of itertools functions could do the job faster?
>
> Here's what I've done and the expected output of the algorithm:
>
> def splitIntoGroups(data):
> groups = []
> local = []
>
> for value in data:
> if 0x80 & value:
> if len(local) > 0:
> groups.append(local)
>
> local = []
> local.append(value)
> else:
> local.append(value)
>
> if len(local) > 0:
> groups.append(local)
>
> return groups
>
> l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
> 13, 0xF0, 14, 0xF1, 15]
>
> print splitIntoGroups(l)
>
> Desired result:
>
> [[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
> 13], [240, 14], [241, 15]]

Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
>=240 starts a new group? If so:

groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if i >= 240:
current = []
groups.append(current)
current.append(i)

marduk
Guest
Posts: n/a

 07-16-2007
On Mon, 2007-07-16 at 16:31 -0500, marduk wrote:
> Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
> >=240 starts a new group? If so:

>
> groups = []
> current = [] # probably not necessary, but as a safety
> for i in l:
> if i >= 240:
> current = []
> groups.append(current)
> current.append(i)
>
>

Misunderstood... actually that should have read

groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if 240 & i:
current = []
groups.append(current)
current.append(i)

James Stroud
Guest
Posts: n/a

 07-16-2007
(E-Mail Removed) wrote:
> All,
>
> I can't seem to find an answer to this question anywhere, but I'm
> still looking. My problem is I have a list of values like this:
>
> l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
> 13, 0xF0, 14, 0xF1, 15]
>
> A value with bit 0x80 set delineates the start of a new packet of
> information. What I want to do is to group the packets so that 1, 2, 3
> go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
> tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
> length of the data associated with each tag can vary. I've already
> written an algorithm to do this but I was wondering if some
> combination of itertools functions could do the job faster?
>
> Here's what I've done and the expected output of the algorithm:
>
> def splitIntoGroups(data):
> groups = []
> local = []
>
> for value in data:
> if 0x80 & value:
> if len(local) > 0:
> groups.append(local)
>
> local = []
> local.append(value)
> else:
> local.append(value)
>
> if len(local) > 0:
> groups.append(local)
>
> return groups
>
> l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
> 13, 0xF0, 14, 0xF1, 15]
>
> print splitIntoGroups(l)
>
> Desired result:
>
> [[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
> 13], [240, 14], [241, 15]]
>
> Thanks,
>
> Dan McLeran
>

Here's how I *would* do it:

py> def doit(alist):
.... ary = []
.... for i in alist:
.... if 0xf0 & i:
.... ary.append([i])
.... else:
.... ary[-1].append(i)
.... return [x for x in ary if x]
....
py> doit(alist)

[[240, 1, 2, 3],
[240, 4, 5, 6],
[241, 7, 8],
[242, 9, 10, 11, 12, 13],
[240, 14],
[241, 15]]

Here's an ugly way I wouldn't recommended (using itertools groupby):

py> from itertools import groupby
py> alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
.... 0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
.... 0xF0, 14, 0xF1, 15]
py> def doit(alist):
.... i = (list(g) for k,g in groupby(alist, lambda x: 0xf0&x))
.... return [k for k in [j + i.next() for j in i] if len(k)>1]
....
py> doit(alist)

[[240, 1, 2, 3],
[240, 4, 5, 6],
[241, 7, 8],
[242, 9, 10, 11, 12, 13],
[240, 14],
[241, 15]]

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

James Stroud
Guest
Posts: n/a

 07-16-2007
James Stroud wrote:
> Here's how I *would* do it:
>
> py> def doit(alist):
> ... ary = []
> ... for i in alist:
> ... if 0xf0 & i:
> ... ary.append([i])
> ... else:
> ... ary[-1].append(i)
> ... return [x for x in ary if x]
> ...

To be absolutely compliant with the specifications, it should be

return [x for x in ary if len(x)>1]

in the recommended way. This does not affect the example given, though.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

Matimus
Guest
Posts: n/a

 07-16-2007
This is a perfect place for a generator:

<code>
seq = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

def gengroups(seq):
group = []
for val in seq:
if val & 0x80 and group:
yield group
group = []
group.append(val)
yield group

if __name__ == "__main__":
print list(gengroups(seq))
</code>

The above assumes that the first value in the input sequence will have
0x80 set. Your implementation seems to makes the same assumption
though.

Also, just a note...
<code>
if len(local) > 0:
...
</code>

is better written

<code>
if local:
...
</code>

Paul Rubin
Guest
Posts: n/a

 07-16-2007
"(E-Mail Removed)" <(E-Mail Removed)> writes:
> I can't seem to find an answer to this question anywhere, but I'm
> still looking. My problem is I have a list of values like this:
>
> l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
> 13, 0xF0, 14, 0xF1, 15]
>
> A value with bit 0x80 set delineates the start of a new packet of
> information. What I want to do is to group the packets so that 1, 2, 3
> go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
> tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
> length of the data associated with each tag can vary. I've already
> written an algorithm to do this but I was wondering if some
> combination of itertools functions could do the job faster?

See:

danmcleran@yahoo.com
Guest
Posts: n/a

 07-16-2007
On Jul 16, 3:56 pm, marduk <(E-Mail Removed)> wrote:
> On Mon, 2007-07-16 at 16:31 -0500, marduk wrote:
> > Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
> > >=240 starts a new group? If so:

No, I meant 0x80. 0x80 is only the most significant bit of the 0xF0
value. Every time this bit is set indicates the start of a new group.
Anyway, I like this algorithm better than mine, less code. I've
modified yours to make the fn a parameter:

def splitIntoGroups(data, fn):
groups = []
current = []
for i in data:
if fn(i):
current = []
groups.append(current)
current.append(i)

return groups

print splitIntoGroups(l, lambda x : x & 0x80)

danmcleran@yahoo.com
Guest
Posts: n/a

 07-16-2007
> Here's an ugly way I wouldn't recommended (using itertools groupby):
>
> py> from itertools import groupby
> py> alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
> ... 0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
> ... 0xF0, 14, 0xF1, 15]
> py> def doit(alist):
> ... i = (list(g) for k,g in groupby(alist, lambda x: 0xf0&x))
> ... return [k for k in [j + i.next() for j in i] if len(k)>1]
> ...
> py> doit(alist)
>
> [[240, 1, 2, 3],
> [240, 4, 5, 6],
> [241, 7, 8],
> [242, 9, 10, 11, 12, 13],
> [240, 14],
> [241, 15]]
>
> James

Wow that is ugly. I think I like the simpler way better.

Thanks

James Stroud
Guest
Posts: n/a

 07-16-2007
Paul Rubin wrote:
> See:
>

Groupby is damn slow as far as I can tell (the Bates numbering in the
above link assumes more than the OP intended, I assume). It looks like
the author's original algorithm is the fastest python way as it bypasses
a lot of lookup, etc.

Here's the output from the script below (doit2 => groupby way):

doit
11.96 usec/pass
doit2
87.14 usec/pass

James

# timer script
from itertools import groupby
from timeit import Timer

alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
0xF0, 14, 0xF1, 15]

def doit(alist):
ary = []
for i in alist:
if 0xf0 & i:
ary.append([i])
else:
ary[-1].append(i)
return [x for x in ary if x]

def c(x):
return 0xf0 & x

def doit2(alist):
i = (list(g) for k,g in groupby(alist, c))
return [k for k in [j + i.next() for j in i] if len(k)>1]

print doit(alist)

print 'doit'
t = Timer('doit(alist)',
'from __main__ import groupby, doit, alist, c')
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

print 'doit2'
t = Timer('doit2(alist)',
'from __main__ import groupby, doit2, alist, c')
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/