Break up list into groups

danmcleran · Jul 16, 2007

All,

I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?

Here's what I've done and the expected output of the algorithm:

def splitIntoGroups(data):
groups = []
local = []

for value in data:
if 0x80 & value:
if len(local) > 0:
groups.append(local)

local = []
local.append(value)
else:
local.append(value)

if len(local) > 0:
groups.append(local)

return groups

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

print splitIntoGroups(l)

Desired result:

[[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
13], [240, 14], [241, 15]]

Thanks,

Dan McLeran

marduk · Jul 16, 2007

I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?

Here's what I've done and the expected output of the algorithm:

def splitIntoGroups(data):
groups = []
local = []

for value in data:
if 0x80 & value:
if len(local) > 0:
groups.append(local)

local = []
local.append(value)
else:
local.append(value)

if len(local) > 0:
groups.append(local)

return groups

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

print splitIntoGroups(l)

Desired result:

[[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
13], [240, 14], [241, 15]]

Assuming you meant '0xF0' instead of '0x80'.... do you mean any value

=240 starts a new group? If so:

groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if i >= 240:
current = []
groups.append(current)
current.append(i)

marduk · Jul 16, 2007

Assuming you meant '0xF0' instead of '0x80'.... do you mean any value

=240 starts a new group? If so:

Click to expand...

groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if i >= 240:
current = []
groups.append(current)
current.append(i)

Misunderstood... actually that should have read

groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if 240 & i:
current = []
groups.append(current)
current.append(i)

James Stroud · Jul 16, 2007

All,

I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?

Here's what I've done and the expected output of the algorithm:

def splitIntoGroups(data):
groups = []
local = []

for value in data:
if 0x80 & value:
if len(local) > 0:
groups.append(local)

local = []
local.append(value)
else:
local.append(value)

if len(local) > 0:
groups.append(local)

return groups

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

print splitIntoGroups(l)

Desired result:

[[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
13], [240, 14], [241, 15]]

Thanks,

Dan McLeran

Here's how I *would* do it:

py> def doit(alist):
.... ary = []
.... for i in alist:
.... if 0xf0 & i:
.... ary.append()
.... else:
.... ary[-1].append(i)
.... return [x for x in ary if x]
....
py> doit(alist)

[[240, 1, 2, 3],
[240, 4, 5, 6],
[241, 7, 8],
[242, 9, 10, 11, 12, 13],
[240, 14],
[241, 15]]

Here's an ugly way I wouldn't recommended (using itertools groupby):

py> from itertools import groupby
py> alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
.... 0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
.... 0xF0, 14, 0xF1, 15]
py> def doit(alist):
.... i = (list(g) for k,g in groupby(alist, lambda x: 0xf0&x))
.... return [k for k in [j + i.next() for j in i] if len(k)>1]
....
py> doit(alist)

[[240, 1, 2, 3],
[240, 4, 5, 6],
[241, 7, 8],
[242, 9, 10, 11, 12, 13],
[240, 14],
[241, 15]]

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

James Stroud · Jul 16, 2007

James said:
Here's how I *would* do it:

py> def doit(alist):
... ary = []
... for i in alist:
... if 0xf0 & i:
... ary.append()
... else:
... ary[-1].append(i)
... return [x for x in ary if x]
...

To be absolutely compliant with the specifications, it should be

return [x for x in ary if len(x)>1]

in the recommended way. This does not affect the example given, though.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

Matimus · Jul 16, 2007

This is a perfect place for a generator:

<code>
seq = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

def gengroups(seq):
group = []
for val in seq:
if val & 0x80 and group:
yield group
group = []
group.append(val)
yield group

if __name__ == "__main__":
print list(gengroups(seq))
</code>

The above assumes that the first value in the input sequence will have
0x80 set. Your implementation seems to makes the same assumption
though.

Also, just a note...
<code>
if len(local) > 0:
...
</code>

is better written

<code>
if local:
...
</code>

Paul Rubin · Jul 16, 2007

[email protected] said:
I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?

See:

http://groups.google.com/group/comp.lang.python/msg/2410c95c7f3b3654

danmcleran · Jul 16, 2007

No, I meant 0x80. 0x80 is only the most significant bit of the 0xF0
value. Every time this bit is set indicates the start of a new group.
Anyway, I like this algorithm better than mine, less code. I've
modified yours to make the fn a parameter:

def splitIntoGroups(data, fn):
groups = []
current = []
for i in data:
if fn(i):
current = []
groups.append(current)
current.append(i)

return groups

print splitIntoGroups(l, lambda x : x & 0x80)

danmcleran · Jul 16, 2007

Here's an ugly way I wouldn't recommended (using itertools groupby):

py> from itertools import groupby
py> alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
... 0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
... 0xF0, 14, 0xF1, 15]
py> def doit(alist):
... i = (list(g) for k,g in groupby(alist, lambda x: 0xf0&x))
... return [k for k in [j + i.next() for j in i] if len(k)>1]
...
py> doit(alist)

[[240, 1, 2, 3],
[240, 4, 5, 6],
[241, 7, 8],
[242, 9, 10, 11, 12, 13],
[240, 14],
[241, 15]]

James

Wow that is ugly. I think I like the simpler way better.

Thanks

James Stroud · Jul 17, 2007

Paul said:
See:

http://groups.google.com/group/comp.lang.python/msg/2410c95c7f3b3654

Groupby is damn slow as far as I can tell (the Bates numbering in the
above link assumes more than the OP intended, I assume). It looks like
the author's original algorithm is the fastest python way as it bypasses
a lot of lookup, etc.

Here's the output from the script below (doit2 => groupby way):

doit
11.96 usec/pass
doit2
87.14 usec/pass

James

# timer script
from itertools import groupby
from timeit import Timer

alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
0xF0, 14, 0xF1, 15]

def doit(alist):
ary = []
for i in alist:
if 0xf0 & i:
ary.append()
else:
ary[-1].append(i)
return [x for x in ary if x]

def c(x):
return 0xf0 & x

def doit2(alist):
i = (list(g) for k,g in groupby(alist, c))
return [k for k in [j + i.next() for j in i] if len(k)>1]

print doit(alist)

print 'doit'
t = Timer('doit(alist)',
'from __main__ import groupby, doit, alist, c')
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

print 'doit2'
t = Timer('doit2(alist)',
'from __main__ import groupby, doit2, alist, c')
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

Matimus · Jul 17, 2007

I did some more experimenting and came up with the code below. It
shows several methods. When run, the script tests the robustness of
each method (roughly), and profiles it using timeit. The results from
running on my laptop are shown below the code.

<code>
seqs = [# Original:
[0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
12, 13,
0xF0, 14, 0xF1, 15],
# Single entry:
[0xF0, 1, 2, 3],
# empty
[],
# No values with 0x80 set
[1, 2, 3, 14, 15],
# Does not start with a value that has 0x80 set
[1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]

expected = [# Original:
[[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
10, 11, 12, 13],
[0xF0, 14], [0xF1, 15]],
# Single entry:
[[0xF0, 1, 2, 3]],
# empty
[],
# No values with 0x80 set
[],
# Does not start with a value that has 0x80 set
[[0xF0, 1, 2, 3]]]

def gengroups0(seq):
group = None
for val in seq:
if val & 0x80:
if group: yield group
group = []
try:
group.append(val)
except AttributeError:
pass
if group: yield group

def getgroups0(seq):
groups = []
group = None
for val in seq:
if val & 0x80:
if group:
groups.append(group)
group = []
try:
group.append(val)
except AttributeError:
pass
if group:
groups.append(group)
return groups

def gengroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
for i,j in zip(idxs,idxs[1:]+[None]):
yield seq[i:j]

def getgroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]

# Similar to the partition method on strings
def partition(seq,f=None):
if f is None:
f = lambda x:x
for i,v in enumerate(seq):
if f(v):
return seq[:i],[seq],seq[i+1:]
return seq,[],[]

def rpartition(seq, f=None):
if f is None:
f = lambda x:x
for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
if f(v):
return seq[:i],[seq],seq[i+1:]
return ([],[],seq)

def gengroups2(seq):
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val: yield key+val

def getgroups2(seq):
groups = []
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val:
groups.append(key+val)
return groups

def getgroups3(seq):
groups = []
for i in seq:
if 0x80 & i:
groups.append()
else:
groups[-1].append(i)
return [x for x in groups if x]

seq = seqs[0]
if __name__ == "__main__":
from timeit import Timer
import __main__
for i in range(4):
fname = "getgroups"+str(i)
f = getattr(__main__,fname)
print fname
for i,(s,e) in enumerate(zip(seqs,expected)):
print "test %d:"%i,
try:
if f(s) == e:
print "pass"
else:
print "fail"
except:
print "error"

t = Timer(fname+'(seq)',
'from __main__ import seq,'+fname)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
100000)
print
</code>

Output from running the above:
getgroups0
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
14.85 usec/pass

getgroups1
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
13.81 usec/pass

getgroups2
test 0: fail
test 1: pass
test 2: pass
test 3: pass
test 4: pass
56.38 usec/pass

getgroups3
test 0: pass
test 1: pass
test 2: pass
test 3: error
test 4: error
16.23 usec/pass

`getgropus2' fails test 0 because it produces a reversed list. That
can easily be fixed by re-reversing the output before returning. But,
since it is by far the slowest method, I didn't bother.

`getgroups3' is a method I got from another post in this thread, just
for comparison.

From my benchmarks it looks like getgroups1 is the winner. I didn't

Click to expand...

scour the thread to test all the methods however.

Ron Adam · Jul 17, 2007

Matimus said:
I did some more experimenting and came up with the code below. It
shows several methods. When run, the script tests the robustness of
each method (roughly), and profiles it using timeit. The results from
running on my laptop are shown below the code.

Try this one...

def groupdata(data, fn):
start = 0
for n in range(1, len(data)):
if fn(data[n]):
yield data[start:n]
start = n
yield data[start:]

print list(groupdata(l, lambda x: x & 0x80))

Cheers ;-)
Ron

<code>
seqs = [# Original:
[0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
12, 13,
0xF0, 14, 0xF1, 15],
# Single entry:
[0xF0, 1, 2, 3],
# empty
[],
# No values with 0x80 set
[1, 2, 3, 14, 15],
# Does not start with a value that has 0x80 set
[1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]

expected = [# Original:
[[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
10, 11, 12, 13],
[0xF0, 14], [0xF1, 15]],
# Single entry:
[[0xF0, 1, 2, 3]],
# empty
[],
# No values with 0x80 set
[],
# Does not start with a value that has 0x80 set
[[0xF0, 1, 2, 3]]]

def gengroups0(seq):
group = None
for val in seq:
if val & 0x80:
if group: yield group
group = []
try:
group.append(val)
except AttributeError:
pass
if group: yield group

def getgroups0(seq):
groups = []
group = None
for val in seq:
if val & 0x80:
if group:
groups.append(group)
group = []
try:
group.append(val)
except AttributeError:
pass
if group:
groups.append(group)
return groups

def gengroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
for i,j in zip(idxs,idxs[1:]+[None]):
yield seq[i:j]

def getgroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]

# Similar to the partition method on strings
def partition(seq,f=None):
if f is None:
f = lambda x:x
for i,v in enumerate(seq):
if f(v):
return seq[:i],[seq],seq[i+1:]
return seq,[],[]

def rpartition(seq, f=None):
if f is None:
f = lambda x:x
for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
if f(v):
return seq[:i],[seq],seq[i+1:]
return ([],[],seq)

def gengroups2(seq):
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val: yield key+val

def getgroups2(seq):
groups = []
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val:
groups.append(key+val)
return groups

def getgroups3(seq):
groups = []
for i in seq:
if 0x80 & i:
groups.append()
else:
groups[-1].append(i)
return [x for x in groups if x]

seq = seqs[0]
if __name__ == "__main__":
from timeit import Timer
import __main__
for i in range(4):
fname = "getgroups"+str(i)
f = getattr(__main__,fname)
print fname
for i,(s,e) in enumerate(zip(seqs,expected)):
print "test %d:"%i,
try:
if f(s) == e:
print "pass"
else:
print "fail"
except:
print "error"

t = Timer(fname+'(seq)',
'from __main__ import seq,'+fname)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
100000)
print
</code>

Output from running the above:
getgroups0
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
14.85 usec/pass

getgroups1
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
13.81 usec/pass

getgroups2
test 0: fail
test 1: pass
test 2: pass
test 3: pass
test 4: pass
56.38 usec/pass

getgroups3
test 0: pass
test 1: pass
test 2: pass
test 3: error
test 4: error
16.23 usec/pass

`getgropus2' fails test 0 because it produces a reversed list. That
can easily be fixed by re-reversing the output before returning. But,
since it is by far the slowest method, I didn't bother.

`getgroups3' is a method I got from another post in this thread, just
for comparison.

From my benchmarks it looks like getgroups1 is the winner. I didn't

Click to expand...

scour the thread to test all the methods however.

Ron Adam · Jul 17, 2007

Matimus said:
I did some more experimenting and came up with the code below. It
shows several methods. When run, the script tests the robustness of
each method (roughly), and profiles it using timeit. The results from
running on my laptop are shown below the code.

Try this one...

def groupdata(data, fn):
start = 0
for n in range(1, len(data)):
if fn(data[n]):
yield data[start:n]
start = n
yield data[start:]

print list(groupdata(l, lambda x: x & 0x80))

Cheers ;-)
Ron

<code>
seqs = [# Original:
[0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
12, 13,
0xF0, 14, 0xF1, 15],
# Single entry:
[0xF0, 1, 2, 3],
# empty
[],
# No values with 0x80 set
[1, 2, 3, 14, 15],
# Does not start with a value that has 0x80 set
[1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]

expected = [# Original:
[[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
10, 11, 12, 13],
[0xF0, 14], [0xF1, 15]],
# Single entry:
[[0xF0, 1, 2, 3]],
# empty
[],
# No values with 0x80 set
[],
# Does not start with a value that has 0x80 set
[[0xF0, 1, 2, 3]]]

def gengroups0(seq):
group = None
for val in seq:
if val & 0x80:
if group: yield group
group = []
try:
group.append(val)
except AttributeError:
pass
if group: yield group

def getgroups0(seq):
groups = []
group = None
for val in seq:
if val & 0x80:
if group:
groups.append(group)
group = []
try:
group.append(val)
except AttributeError:
pass
if group:
groups.append(group)
return groups

def gengroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
for i,j in zip(idxs,idxs[1:]+[None]):
yield seq[i:j]

def getgroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]

# Similar to the partition method on strings
def partition(seq,f=None):
if f is None:
f = lambda x:x
for i,v in enumerate(seq):
if f(v):
return seq[:i],[seq],seq[i+1:]
return seq,[],[]

def rpartition(seq, f=None):
if f is None:
f = lambda x:x
for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
if f(v):
return seq[:i],[seq],seq[i+1:]
return ([],[],seq)

def gengroups2(seq):
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val: yield key+val

def getgroups2(seq):
groups = []
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val:
groups.append(key+val)
return groups

def getgroups3(seq):
groups = []
for i in seq:
if 0x80 & i:
groups.append()
else:
groups[-1].append(i)
return [x for x in groups if x]

seq = seqs[0]
if __name__ == "__main__":
from timeit import Timer
import __main__
for i in range(4):
fname = "getgroups"+str(i)
f = getattr(__main__,fname)
print fname
for i,(s,e) in enumerate(zip(seqs,expected)):
print "test %d:"%i,
try:
if f(s) == e:
print "pass"
else:
print "fail"
except:
print "error"

t = Timer(fname+'(seq)',
'from __main__ import seq,'+fname)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
100000)
print
</code>

Output from running the above:
getgroups0
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
14.85 usec/pass

getgroups1
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
13.81 usec/pass

getgroups2
test 0: fail
test 1: pass
test 2: pass
test 3: pass
test 4: pass
56.38 usec/pass

getgroups3
test 0: pass
test 1: pass
test 2: pass
test 3: error
test 4: error
16.23 usec/pass

`getgropus2' fails test 0 because it produces a reversed list. That
can easily be fixed by re-reversing the output before returning. But,
since it is by far the slowest method, I didn't bother.

`getgroups3' is a method I got from another post in this thread, just
for comparison.

From my benchmarks it looks like getgroups1 is the winner. I didn't

Click to expand...

scour the thread to test all the methods however.

Ron Adam · Jul 18, 2007

Matt said:
That certainly is fast, unfortunately it doesn't pass all of the tests.
I came up with those tests so I don't know how important they are to the
original poster. I modified it and came up with a generator and a
non-generator version based (roughly) on your algorithm, that are almost
as quick, and pass all of the tests. Some of the modifications were done
just to make it quicker, so it would be fair when comparing against the
other methods. I hard-coded the comparison instead of using a function
and created a function that directly generates and returns a list
instead of a generator. I would probably use the generator version in my
code, but wrapping `list' around a generator adds about 4us (on my
machine). Anyway, getgroups7 passes all of the tests I mentioned and it
was timed at 10.37usec/pass. The down side: the code doesn't seem nearly
as elegant.

Matt

In most cases you wouldn't wrap the generator version in a list(), but use
it directly as a loop iterator.

A little renaming of variables helps it be a bit more elegant I think ...

def getgroups8(seq):
groups = []
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
groups.append(seq[start:stop])
start = stop
groups.append(seq[start:])
return groups

This passes all the tests and runs about the same speed.

Cheers,
Ron

<code>
def gengroups7(seq):
iseq = iter(xrange(len(seq)))
start = 0
for i in iseq:
if seq&0x80:
start = i
break
else:
return
for i in iseq:
if seq&0x80:
yield seq[start:i]
start = i
yield seq[start:]

def getgroups7(seq):
groups = []
iseq = iter(xrange(len(seq)))
start = 0
for i in iseq:
if seq&0x80:
start = i
break
else:
return groups
for i in iseq:
if seq&0x80:
groups.append(seq[start:i])
start = i
groups.append(seq[start:])
return groups

</code>

Ron Adam · Jul 18, 2007

Matt said:
That certainly is fast, unfortunately it doesn't pass all of the tests.
I came up with those tests so I don't know how important they are to the
original poster. I modified it and came up with a generator and a
non-generator version based (roughly) on your algorithm, that are almost
as quick, and pass all of the tests. Some of the modifications were done
just to make it quicker, so it would be fair when comparing against the
other methods. I hard-coded the comparison instead of using a function
and created a function that directly generates and returns a list
instead of a generator. I would probably use the generator version in my
code, but wrapping `list' around a generator adds about 4us (on my
machine). Anyway, getgroups7 passes all of the tests I mentioned and it
was timed at 10.37usec/pass. The down side: the code doesn't seem nearly
as elegant.

Matt

In most cases you wouldn't wrap the generator version in a list(), but use
it directly as a loop iterator.

A little renaming of variables helps it be a bit more elegant I think ...

def getgroups8(seq):
groups = []
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
groups.append(seq[start:stop])
start = stop
groups.append(seq[start:])
return groups

This passes all the tests and runs about the same speed.

Cheers,
Ron

<code>
def gengroups7(seq):
iseq = iter(xrange(len(seq)))
start = 0
for i in iseq:
if seq&0x80:
start = i
break
else:
return
for i in iseq:
if seq&0x80:
yield seq[start:i]
start = i
yield seq[start:]

def getgroups7(seq):
groups = []
iseq = iter(xrange(len(seq)))
start = 0
for i in iseq:
if seq&0x80:
start = i
break
else:
return groups
for i in iseq:
if seq&0x80:
groups.append(seq[start:i])
start = i
groups.append(seq[start:])
return groups

</code>

Miki · Jul 18, 2007

Hello Dan,

Yet another option (using itertools.groupby):

from itertools import groupby

class GrouperToggler:
def __init__(self):
self.group = 1

def __call__(self, value):
# New packet, toggle group
if value & 0x80:
self.group = 1 - self.group
return self.group

def group(items):
for group, items in groupby(items, GrouperToggler()):
# groupby return [key, group_iterator]
yield [item for item in items]

i = [
0xF0, 1, 2, 3,
0xF0, 4, 5, 6,
0xF1, 7, 8,
0xF2, 9, 10, 11, 12, 13,
0xF0, 14,
0xF1, 15
]

for g in group(i):
print g

HTH,

George Sakkis · Jul 19, 2007

I did some more experimenting and came up with the code below. It
shows several methods. When run, the script tests the robustness of
each method (roughly), and profiles it using timeit. The results from
running on my laptop are shown below the code.

<code>
seqs = [# Original:
[0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
12, 13,
0xF0, 14, 0xF1, 15],
# Single entry:
[0xF0, 1, 2, 3],
# empty
[],
# No values with 0x80 set
[1, 2, 3, 14, 15],
# Does not start with a value that has 0x80 set
[1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]

expected = [# Original:
[[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
10, 11, 12, 13],
[0xF0, 14], [0xF1, 15]],
# Single entry:
[[0xF0, 1, 2, 3]],
# empty
[],
# No values with 0x80 set
[],
# Does not start with a value that has 0x80 set
[[0xF0, 1, 2, 3]]]

def gengroups0(seq):
group = None
for val in seq:
if val & 0x80:
if group: yield group
group = []
try:
group.append(val)
except AttributeError:
pass
if group: yield group

def getgroups0(seq):
groups = []
group = None
for val in seq:
if val & 0x80:
if group:
groups.append(group)
group = []
try:
group.append(val)
except AttributeError:
pass
if group:
groups.append(group)
return groups

def gengroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
for i,j in zip(idxs,idxs[1:]+[None]):
yield seq[i:j]

def getgroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]

# Similar to the partition method on strings
def partition(seq,f=None):
if f is None:
f = lambda x:x
for i,v in enumerate(seq):
if f(v):
return seq[:i],[seq],seq[i+1:]
return seq,[],[]

def rpartition(seq, f=None):
if f is None:
f = lambda x:x
for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
if f(v):
return seq[:i],[seq],seq[i+1:]
return ([],[],seq)

def gengroups2(seq):
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val: yield key+val

def getgroups2(seq):
groups = []
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val:
groups.append(key+val)
return groups

def getgroups3(seq):
groups = []
for i in seq:
if 0x80 & i:
groups.append()
else:
groups[-1].append(i)
return [x for x in groups if x]

seq = seqs[0]
if __name__ == "__main__":
from timeit import Timer
import __main__
for i in range(4):
fname = "getgroups"+str(i)
f = getattr(__main__,fname)
print fname
for i,(s,e) in enumerate(zip(seqs,expected)):
print "test %d:"%i,
try:
if f(s) == e:
print "pass"
else:
print "fail"
except:
print "error"

t = Timer(fname+'(seq)',
'from __main__ import seq,'+fname)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
100000)
print
</code>

Output from running the above:
getgroups0
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
14.85 usec/pass

getgroups1
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
13.81 usec/pass

getgroups2
test 0: fail
test 1: pass
test 2: pass
test 3: pass
test 4: pass
56.38 usec/pass

getgroups3
test 0: pass
test 1: pass
test 2: pass
test 3: error
test 4: error
16.23 usec/pass

`getgropus2' fails test 0 because it produces a reversed list. That
can easily be fixed by re-reversing the output before returning. But,
since it is by far the slowest method, I didn't bother.

`getgroups3' is a method I got from another post in this thread, just
for comparison.

From my benchmarks it looks like getgroups1 is the winner. I didn't

Click to expand...

scour the thread to test all the methods however.

Here's a small improvement of getgroups1, both in time and memory:

from itertools import islice

def getgroups4(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
idxs.append(None)
return [seq[i:j] for i,j in izip(idxs, islice(idxs,1,None))]

George

Matimus · Jul 19, 2007

A little renaming of variables helps it be a bit more elegant I think ...

def getgroups8(seq):
groups = []
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
groups.append(seq[start:stop])
start = stop
groups.append(seq[start:])
return groups

Excellent work! One more modification and I get below 10us/pass:

def getgroups(seq):
groups = []
push = groups.append
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
push(seq[start:stop])
start = stop
push(seq[start:])
return groups

-Matt

Ron Adam · Jul 20, 2007

Matimus said:
Excellent work! One more modification and I get below 10us/pass:

def getgroups(seq):
groups = []
push = groups.append
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
push(seq[start:stop])
start = stop
push(seq[start:])
return groups

-Matt

Looks good to me.

So a generator versions would be...

(not tested)

def getgroups(seq):
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
yield seq[start:stop]
start = stop
yield seq[start:]

(I also wanted to compare this to Georges solution, maybe later.)

Now if there is some way to generalize this so it can be used in a broader
range of situations without loosing too much of it's efficiency. Of course
then maybe group by would be better. <shrug>

Cheers,
Ron

Ron Adam · Jul 20, 2007

Matimus said:
Excellent work! One more modification and I get below 10us/pass:

def getgroups(seq):
groups = []
push = groups.append
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
push(seq[start:stop])
start = stop
push(seq[start:])
return groups

-Matt

Looks good to me.

So a generator versions would be...

(not tested)

def getgroups(seq):
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
yield seq[start:stop]
start = stop
yield seq[start:]

(I also wanted to compare this to Georges solution, maybe later.)

Now if there is some way to generalize this so it can be used in a broader
range of situations without loosing too much of it's efficiency. Of course
then maybe group by would be better. <shrug>

Cheers,
Ron

Machine Learning.. Endless Struggle	3	Feb 16, 2023
Minimum Total Difficulty	0	Nov 15, 2023
Append to python List	15	May 9, 2013
On-air script	46	Jul 23, 2023
Range / empty list issues??	1	Dec 11, 2023
Strange behavior for a 2D list	1	Apr 18, 2013
New baby here	2	Jan 3, 2020
pprint defaultdict one record per line	0	Mar 17, 2013

Break up list into groups

danmcleran

marduk

marduk

James Stroud

James Stroud

Matimus

Paul Rubin

danmcleran

danmcleran

James Stroud

Matimus

Ron Adam

Ron Adam

Ron Adam

Ron Adam

Miki

George Sakkis

Matimus

Ron Adam

Ron Adam

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads