Break up list into groups

D

danmcleran

All,

I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?

Here's what I've done and the expected output of the algorithm:

def splitIntoGroups(data):
groups = []
local = []

for value in data:
if 0x80 & value:
if len(local) > 0:
groups.append(local)

local = []
local.append(value)
else:
local.append(value)

if len(local) > 0:
groups.append(local)

return groups

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

print splitIntoGroups(l)

Desired result:

[[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
13], [240, 14], [241, 15]]

Thanks,

Dan McLeran
 
M

marduk

I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?

Here's what I've done and the expected output of the algorithm:

def splitIntoGroups(data):
groups = []
local = []

for value in data:
if 0x80 & value:
if len(local) > 0:
groups.append(local)

local = []
local.append(value)
else:
local.append(value)

if len(local) > 0:
groups.append(local)

return groups

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

print splitIntoGroups(l)

Desired result:

[[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
13], [240, 14], [241, 15]]

Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
=240 starts a new group? If so:

groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if i >= 240:
current = []
groups.append(current)
current.append(i)
 
M

marduk

Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
=240 starts a new group? If so:

groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if i >= 240:
current = []
groups.append(current)
current.append(i)

Misunderstood... actually that should have read

groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if 240 & i:
current = []
groups.append(current)
current.append(i)
 
J

James Stroud

All,

I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?

Here's what I've done and the expected output of the algorithm:

def splitIntoGroups(data):
groups = []
local = []

for value in data:
if 0x80 & value:
if len(local) > 0:
groups.append(local)

local = []
local.append(value)
else:
local.append(value)

if len(local) > 0:
groups.append(local)

return groups

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

print splitIntoGroups(l)

Desired result:

[[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
13], [240, 14], [241, 15]]

Thanks,

Dan McLeran

Here's how I *would* do it:

py> def doit(alist):
.... ary = []
.... for i in alist:
.... if 0xf0 & i:
.... ary.append()
.... else:
.... ary[-1].append(i)
.... return [x for x in ary if x]
....
py> doit(alist)

[[240, 1, 2, 3],
[240, 4, 5, 6],
[241, 7, 8],
[242, 9, 10, 11, 12, 13],
[240, 14],
[241, 15]]


Here's an ugly way I wouldn't recommended (using itertools groupby):

py> from itertools import groupby
py> alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
.... 0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
.... 0xF0, 14, 0xF1, 15]
py> def doit(alist):
.... i = (list(g) for k,g in groupby(alist, lambda x: 0xf0&x))
.... return [k for k in [j + i.next() for j in i] if len(k)>1]
....
py> doit(alist)

[[240, 1, 2, 3],
[240, 4, 5, 6],
[241, 7, 8],
[242, 9, 10, 11, 12, 13],
[240, 14],
[241, 15]]


James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
J

James Stroud

James said:
Here's how I *would* do it:

py> def doit(alist):
... ary = []
... for i in alist:
... if 0xf0 & i:
... ary.append()
... else:
... ary[-1].append(i)
... return [x for x in ary if x]
...



To be absolutely compliant with the specifications, it should be

return [x for x in ary if len(x)>1]

in the recommended way. This does not affect the example given, though.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
M

Matimus

This is a perfect place for a generator:

<code>
seq = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

def gengroups(seq):
group = []
for val in seq:
if val & 0x80 and group:
yield group
group = []
group.append(val)
yield group

if __name__ == "__main__":
print list(gengroups(seq))
</code>

The above assumes that the first value in the input sequence will have
0x80 set. Your implementation seems to makes the same assumption
though.

Also, just a note...
<code>
if len(local) > 0:
...
</code>

is better written

<code>
if local:
...
</code>
 
P

Paul Rubin

I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?

See:

http://groups.google.com/group/comp.lang.python/msg/2410c95c7f3b3654
 
D

danmcleran

No, I meant 0x80. 0x80 is only the most significant bit of the 0xF0
value. Every time this bit is set indicates the start of a new group.
Anyway, I like this algorithm better than mine, less code. I've
modified yours to make the fn a parameter:

def splitIntoGroups(data, fn):
groups = []
current = []
for i in data:
if fn(i):
current = []
groups.append(current)
current.append(i)

return groups

print splitIntoGroups(l, lambda x : x & 0x80)
 
D

danmcleran

Here's an ugly way I wouldn't recommended (using itertools groupby):
py> from itertools import groupby
py> alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
... 0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
... 0xF0, 14, 0xF1, 15]
py> def doit(alist):
... i = (list(g) for k,g in groupby(alist, lambda x: 0xf0&x))
... return [k for k in [j + i.next() for j in i] if len(k)>1]
...
py> doit(alist)

[[240, 1, 2, 3],
[240, 4, 5, 6],
[241, 7, 8],
[242, 9, 10, 11, 12, 13],
[240, 14],
[241, 15]]

James

Wow that is ugly. I think I like the simpler way better.

Thanks
 
J

James Stroud

Paul said:

Groupby is damn slow as far as I can tell (the Bates numbering in the
above link assumes more than the OP intended, I assume). It looks like
the author's original algorithm is the fastest python way as it bypasses
a lot of lookup, etc.

Here's the output from the script below (doit2 => groupby way):

doit
11.96 usec/pass
doit2
87.14 usec/pass


James

# timer script
from itertools import groupby
from timeit import Timer

alist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
0xF0, 14, 0xF1, 15]

def doit(alist):
ary = []
for i in alist:
if 0xf0 & i:
ary.append()
else:
ary[-1].append(i)
return [x for x in ary if x]

def c(x):
return 0xf0 & x

def doit2(alist):
i = (list(g) for k,g in groupby(alist, c))
return [k for k in [j + i.next() for j in i] if len(k)>1]

print doit(alist)

print 'doit'
t = Timer('doit(alist)',
'from __main__ import groupby, doit, alist, c')
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)

print 'doit2'
t = Timer('doit2(alist)',
'from __main__ import groupby, doit2, alist, c')
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)


--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
M

Matimus

I did some more experimenting and came up with the code below. It
shows several methods. When run, the script tests the robustness of
each method (roughly), and profiles it using timeit. The results from
running on my laptop are shown below the code.


<code>
seqs = [# Original:
[0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
12, 13,
0xF0, 14, 0xF1, 15],
# Single entry:
[0xF0, 1, 2, 3],
# empty
[],
# No values with 0x80 set
[1, 2, 3, 14, 15],
# Does not start with a value that has 0x80 set
[1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]

expected = [# Original:
[[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
10, 11, 12, 13],
[0xF0, 14], [0xF1, 15]],
# Single entry:
[[0xF0, 1, 2, 3]],
# empty
[],
# No values with 0x80 set
[],
# Does not start with a value that has 0x80 set
[[0xF0, 1, 2, 3]]]

def gengroups0(seq):
group = None
for val in seq:
if val & 0x80:
if group: yield group
group = []
try:
group.append(val)
except AttributeError:
pass
if group: yield group

def getgroups0(seq):
groups = []
group = None
for val in seq:
if val & 0x80:
if group:
groups.append(group)
group = []
try:
group.append(val)
except AttributeError:
pass
if group:
groups.append(group)
return groups

def gengroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
for i,j in zip(idxs,idxs[1:]+[None]):
yield seq[i:j]

def getgroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]

# Similar to the partition method on strings
def partition(seq,f=None):
if f is None:
f = lambda x:x
for i,v in enumerate(seq):
if f(v):
return seq[:i],[seq],seq[i+1:]
return seq,[],[]

def rpartition(seq, f=None):
if f is None:
f = lambda x:x
for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
if f(v):
return seq[:i],[seq],seq[i+1:]
return ([],[],seq)

def gengroups2(seq):
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val: yield key+val

def getgroups2(seq):
groups = []
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val:
groups.append(key+val)
return groups

def getgroups3(seq):
groups = []
for i in seq:
if 0x80 & i:
groups.append()
else:
groups[-1].append(i)
return [x for x in groups if x]

seq = seqs[0]
if __name__ == "__main__":
from timeit import Timer
import __main__
for i in range(4):
fname = "getgroups"+str(i)
f = getattr(__main__,fname)
print fname
for i,(s,e) in enumerate(zip(seqs,expected)):
print "test %d:"%i,
try:
if f(s) == e:
print "pass"
else:
print "fail"
except:
print "error"

t = Timer(fname+'(seq)',
'from __main__ import seq,'+fname)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
100000)
print
</code>

Output from running the above:
getgroups0
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
14.85 usec/pass

getgroups1
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
13.81 usec/pass

getgroups2
test 0: fail
test 1: pass
test 2: pass
test 3: pass
test 4: pass
56.38 usec/pass

getgroups3
test 0: pass
test 1: pass
test 2: pass
test 3: error
test 4: error
16.23 usec/pass

`getgropus2' fails test 0 because it produces a reversed list. That
can easily be fixed by re-reversing the output before returning. But,
since it is by far the slowest method, I didn't bother.

`getgroups3' is a method I got from another post in this thread, just
for comparison.
From my benchmarks it looks like getgroups1 is the winner. I didn't
scour the thread to test all the methods however.
 
R

Ron Adam

Matimus said:
I did some more experimenting and came up with the code below. It
shows several methods. When run, the script tests the robustness of
each method (roughly), and profiles it using timeit. The results from
running on my laptop are shown below the code.

Try this one...

def groupdata(data, fn):
start = 0
for n in range(1, len(data)):
if fn(data[n]):
yield data[start:n]
start = n
yield data[start:]

print list(groupdata(l, lambda x: x & 0x80))


Cheers ;-)
Ron


<code>
seqs = [# Original:
[0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
12, 13,
0xF0, 14, 0xF1, 15],
# Single entry:
[0xF0, 1, 2, 3],
# empty
[],
# No values with 0x80 set
[1, 2, 3, 14, 15],
# Does not start with a value that has 0x80 set
[1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]

expected = [# Original:
[[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
10, 11, 12, 13],
[0xF0, 14], [0xF1, 15]],
# Single entry:
[[0xF0, 1, 2, 3]],
# empty
[],
# No values with 0x80 set
[],
# Does not start with a value that has 0x80 set
[[0xF0, 1, 2, 3]]]

def gengroups0(seq):
group = None
for val in seq:
if val & 0x80:
if group: yield group
group = []
try:
group.append(val)
except AttributeError:
pass
if group: yield group

def getgroups0(seq):
groups = []
group = None
for val in seq:
if val & 0x80:
if group:
groups.append(group)
group = []
try:
group.append(val)
except AttributeError:
pass
if group:
groups.append(group)
return groups

def gengroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
for i,j in zip(idxs,idxs[1:]+[None]):
yield seq[i:j]

def getgroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]

# Similar to the partition method on strings
def partition(seq,f=None):
if f is None:
f = lambda x:x
for i,v in enumerate(seq):
if f(v):
return seq[:i],[seq],seq[i+1:]
return seq,[],[]

def rpartition(seq, f=None):
if f is None:
f = lambda x:x
for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
if f(v):
return seq[:i],[seq],seq[i+1:]
return ([],[],seq)

def gengroups2(seq):
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val: yield key+val

def getgroups2(seq):
groups = []
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val:
groups.append(key+val)
return groups

def getgroups3(seq):
groups = []
for i in seq:
if 0x80 & i:
groups.append()
else:
groups[-1].append(i)
return [x for x in groups if x]

seq = seqs[0]
if __name__ == "__main__":
from timeit import Timer
import __main__
for i in range(4):
fname = "getgroups"+str(i)
f = getattr(__main__,fname)
print fname
for i,(s,e) in enumerate(zip(seqs,expected)):
print "test %d:"%i,
try:
if f(s) == e:
print "pass"
else:
print "fail"
except:
print "error"

t = Timer(fname+'(seq)',
'from __main__ import seq,'+fname)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
100000)
print
</code>

Output from running the above:
getgroups0
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
14.85 usec/pass

getgroups1
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
13.81 usec/pass

getgroups2
test 0: fail
test 1: pass
test 2: pass
test 3: pass
test 4: pass
56.38 usec/pass

getgroups3
test 0: pass
test 1: pass
test 2: pass
test 3: error
test 4: error
16.23 usec/pass

`getgropus2' fails test 0 because it produces a reversed list. That
can easily be fixed by re-reversing the output before returning. But,
since it is by far the slowest method, I didn't bother.

`getgroups3' is a method I got from another post in this thread, just
for comparison.
From my benchmarks it looks like getgroups1 is the winner. I didn't
scour the thread to test all the methods however.
 
R

Ron Adam

Matimus said:
I did some more experimenting and came up with the code below. It
shows several methods. When run, the script tests the robustness of
each method (roughly), and profiles it using timeit. The results from
running on my laptop are shown below the code.

Try this one...

def groupdata(data, fn):
start = 0
for n in range(1, len(data)):
if fn(data[n]):
yield data[start:n]
start = n
yield data[start:]

print list(groupdata(l, lambda x: x & 0x80))


Cheers ;-)
Ron


<code>
seqs = [# Original:
[0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
12, 13,
0xF0, 14, 0xF1, 15],
# Single entry:
[0xF0, 1, 2, 3],
# empty
[],
# No values with 0x80 set
[1, 2, 3, 14, 15],
# Does not start with a value that has 0x80 set
[1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]

expected = [# Original:
[[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
10, 11, 12, 13],
[0xF0, 14], [0xF1, 15]],
# Single entry:
[[0xF0, 1, 2, 3]],
# empty
[],
# No values with 0x80 set
[],
# Does not start with a value that has 0x80 set
[[0xF0, 1, 2, 3]]]

def gengroups0(seq):
group = None
for val in seq:
if val & 0x80:
if group: yield group
group = []
try:
group.append(val)
except AttributeError:
pass
if group: yield group

def getgroups0(seq):
groups = []
group = None
for val in seq:
if val & 0x80:
if group:
groups.append(group)
group = []
try:
group.append(val)
except AttributeError:
pass
if group:
groups.append(group)
return groups

def gengroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
for i,j in zip(idxs,idxs[1:]+[None]):
yield seq[i:j]

def getgroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]

# Similar to the partition method on strings
def partition(seq,f=None):
if f is None:
f = lambda x:x
for i,v in enumerate(seq):
if f(v):
return seq[:i],[seq],seq[i+1:]
return seq,[],[]

def rpartition(seq, f=None):
if f is None:
f = lambda x:x
for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
if f(v):
return seq[:i],[seq],seq[i+1:]
return ([],[],seq)

def gengroups2(seq):
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val: yield key+val

def getgroups2(seq):
groups = []
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val:
groups.append(key+val)
return groups

def getgroups3(seq):
groups = []
for i in seq:
if 0x80 & i:
groups.append()
else:
groups[-1].append(i)
return [x for x in groups if x]

seq = seqs[0]
if __name__ == "__main__":
from timeit import Timer
import __main__
for i in range(4):
fname = "getgroups"+str(i)
f = getattr(__main__,fname)
print fname
for i,(s,e) in enumerate(zip(seqs,expected)):
print "test %d:"%i,
try:
if f(s) == e:
print "pass"
else:
print "fail"
except:
print "error"

t = Timer(fname+'(seq)',
'from __main__ import seq,'+fname)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
100000)
print
</code>

Output from running the above:
getgroups0
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
14.85 usec/pass

getgroups1
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
13.81 usec/pass

getgroups2
test 0: fail
test 1: pass
test 2: pass
test 3: pass
test 4: pass
56.38 usec/pass

getgroups3
test 0: pass
test 1: pass
test 2: pass
test 3: error
test 4: error
16.23 usec/pass

`getgropus2' fails test 0 because it produces a reversed list. That
can easily be fixed by re-reversing the output before returning. But,
since it is by far the slowest method, I didn't bother.

`getgroups3' is a method I got from another post in this thread, just
for comparison.
From my benchmarks it looks like getgroups1 is the winner. I didn't
scour the thread to test all the methods however.
 
R

Ron Adam

Matt said:
That certainly is fast, unfortunately it doesn't pass all of the tests.
I came up with those tests so I don't know how important they are to the
original poster. I modified it and came up with a generator and a
non-generator version based (roughly) on your algorithm, that are almost
as quick, and pass all of the tests. Some of the modifications were done
just to make it quicker, so it would be fair when comparing against the
other methods. I hard-coded the comparison instead of using a function
and created a function that directly generates and returns a list
instead of a generator. I would probably use the generator version in my
code, but wrapping `list' around a generator adds about 4us (on my
machine). Anyway, getgroups7 passes all of the tests I mentioned and it
was timed at 10.37usec/pass. The down side: the code doesn't seem nearly
as elegant.

Matt

In most cases you wouldn't wrap the generator version in a list(), but use
it directly as a loop iterator.


A little renaming of variables helps it be a bit more elegant I think ...

def getgroups8(seq):
groups = []
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
groups.append(seq[start:stop])
start = stop
groups.append(seq[start:])
return groups

This passes all the tests and runs about the same speed.


Cheers,
Ron




<code>
def gengroups7(seq):
iseq = iter(xrange(len(seq)))
start = 0
for i in iseq:
if seq&0x80:
start = i
break
else:
return
for i in iseq:
if seq&0x80:
yield seq[start:i]
start = i
yield seq[start:]


def getgroups7(seq):
groups = []
iseq = iter(xrange(len(seq)))
start = 0
for i in iseq:
if seq&0x80:
start = i
break
else:
return groups
for i in iseq:
if seq&0x80:
groups.append(seq[start:i])
start = i
groups.append(seq[start:])
return groups

</code>
 
R

Ron Adam

Matt said:
That certainly is fast, unfortunately it doesn't pass all of the tests.
I came up with those tests so I don't know how important they are to the
original poster. I modified it and came up with a generator and a
non-generator version based (roughly) on your algorithm, that are almost
as quick, and pass all of the tests. Some of the modifications were done
just to make it quicker, so it would be fair when comparing against the
other methods. I hard-coded the comparison instead of using a function
and created a function that directly generates and returns a list
instead of a generator. I would probably use the generator version in my
code, but wrapping `list' around a generator adds about 4us (on my
machine). Anyway, getgroups7 passes all of the tests I mentioned and it
was timed at 10.37usec/pass. The down side: the code doesn't seem nearly
as elegant.

Matt

In most cases you wouldn't wrap the generator version in a list(), but use
it directly as a loop iterator.


A little renaming of variables helps it be a bit more elegant I think ...

def getgroups8(seq):
groups = []
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
groups.append(seq[start:stop])
start = stop
groups.append(seq[start:])
return groups

This passes all the tests and runs about the same speed.


Cheers,
Ron




<code>
def gengroups7(seq):
iseq = iter(xrange(len(seq)))
start = 0
for i in iseq:
if seq&0x80:
start = i
break
else:
return
for i in iseq:
if seq&0x80:
yield seq[start:i]
start = i
yield seq[start:]


def getgroups7(seq):
groups = []
iseq = iter(xrange(len(seq)))
start = 0
for i in iseq:
if seq&0x80:
start = i
break
else:
return groups
for i in iseq:
if seq&0x80:
groups.append(seq[start:i])
start = i
groups.append(seq[start:])
return groups

</code>
 
M

Miki

Hello Dan,

Yet another option (using itertools.groupby):

from itertools import groupby

class GrouperToggler:
def __init__(self):
self.group = 1

def __call__(self, value):
# New packet, toggle group
if value & 0x80:
self.group = 1 - self.group
return self.group

def group(items):
for group, items in groupby(items, GrouperToggler()):
# groupby return [key, group_iterator]
yield [item for item in items]

i = [
0xF0, 1, 2, 3,
0xF0, 4, 5, 6,
0xF1, 7, 8,
0xF2, 9, 10, 11, 12, 13,
0xF0, 14,
0xF1, 15
]

for g in group(i):
print g

HTH,
 
G

George Sakkis

I did some more experimenting and came up with the code below. It
shows several methods. When run, the script tests the robustness of
each method (roughly), and profiles it using timeit. The results from
running on my laptop are shown below the code.

<code>
seqs = [# Original:
[0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11,
12, 13,
0xF0, 14, 0xF1, 15],
# Single entry:
[0xF0, 1, 2, 3],
# empty
[],
# No values with 0x80 set
[1, 2, 3, 14, 15],
# Does not start with a value that has 0x80 set
[1, 2, 3, 14, 15, 0xF0, 1, 2, 3]]

expected = [# Original:
[[0xF0, 1, 2, 3], [0xF0, 4, 5, 6], [0xF1, 7, 8], [0xF2, 9,
10, 11, 12, 13],
[0xF0, 14], [0xF1, 15]],
# Single entry:
[[0xF0, 1, 2, 3]],
# empty
[],
# No values with 0x80 set
[],
# Does not start with a value that has 0x80 set
[[0xF0, 1, 2, 3]]]

def gengroups0(seq):
group = None
for val in seq:
if val & 0x80:
if group: yield group
group = []
try:
group.append(val)
except AttributeError:
pass
if group: yield group

def getgroups0(seq):
groups = []
group = None
for val in seq:
if val & 0x80:
if group:
groups.append(group)
group = []
try:
group.append(val)
except AttributeError:
pass
if group:
groups.append(group)
return groups

def gengroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
for i,j in zip(idxs,idxs[1:]+[None]):
yield seq[i:j]

def getgroups1(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
return [seq[i:j] for i,j in zip(idxs,idxs[1:]+[None])]

# Similar to the partition method on strings
def partition(seq,f=None):
if f is None:
f = lambda x:x
for i,v in enumerate(seq):
if f(v):
return seq[:i],[seq],seq[i+1:]
return seq,[],[]

def rpartition(seq, f=None):
if f is None:
f = lambda x:x
for i,v in zip(range(len(seq)-1,-1,-1),seq[::-1]):
if f(v):
return seq[:i],[seq],seq[i+1:]
return ([],[],seq)

def gengroups2(seq):
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val: yield key+val

def getgroups2(seq):
groups = []
while seq:
seq, key, val = rpartition(seq, lambda x: x&0x80)
if key and val:
groups.append(key+val)
return groups

def getgroups3(seq):
groups = []
for i in seq:
if 0x80 & i:
groups.append()
else:
groups[-1].append(i)
return [x for x in groups if x]

seq = seqs[0]
if __name__ == "__main__":
from timeit import Timer
import __main__
for i in range(4):
fname = "getgroups"+str(i)
f = getattr(__main__,fname)
print fname
for i,(s,e) in enumerate(zip(seqs,expected)):
print "test %d:"%i,
try:
if f(s) == e:
print "pass"
else:
print "fail"
except:
print "error"

t = Timer(fname+'(seq)',
'from __main__ import seq,'+fname)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/
100000)
print
</code>

Output from running the above:
getgroups0
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
14.85 usec/pass

getgroups1
test 0: pass
test 1: pass
test 2: pass
test 3: pass
test 4: pass
13.81 usec/pass

getgroups2
test 0: fail
test 1: pass
test 2: pass
test 3: pass
test 4: pass
56.38 usec/pass

getgroups3
test 0: pass
test 1: pass
test 2: pass
test 3: error
test 4: error
16.23 usec/pass

`getgropus2' fails test 0 because it produces a reversed list. That
can easily be fixed by re-reversing the output before returning. But,
since it is by far the slowest method, I didn't bother.

`getgroups3' is a method I got from another post in this thread, just
for comparison.
From my benchmarks it looks like getgroups1 is the winner. I didn't

scour the thread to test all the methods however.


Here's a small improvement of getgroups1, both in time and memory:

from itertools import islice

def getgroups4(seq):
idxs = [i for i,v in enumerate(seq) if v&0x80]
idxs.append(None)
return [seq[i:j] for i,j in izip(idxs, islice(idxs,1,None))]


George
 
M

Matimus

A little renaming of variables helps it be a bit more elegant I think ...
def getgroups8(seq):
groups = []
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
groups.append(seq[start:stop])
start = stop
groups.append(seq[start:])
return groups

Excellent work! One more modification and I get below 10us/pass:

def getgroups(seq):
groups = []
push = groups.append
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
push(seq[start:stop])
start = stop
push(seq[start:])
return groups

-Matt
 
R

Ron Adam

Matimus said:
Excellent work! One more modification and I get below 10us/pass:

def getgroups(seq):
groups = []
push = groups.append
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
push(seq[start:stop])
start = stop
push(seq[start:])
return groups

-Matt


Looks good to me. :)

So a generator versions would be...

(not tested)

def getgroups(seq):
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
yield seq[start:stop]
start = stop
yield seq[start:]

(I also wanted to compare this to Georges solution, maybe later.)

Now if there is some way to generalize this so it can be used in a broader
range of situations without loosing too much of it's efficiency. Of course
then maybe group by would be better. <shrug>

Cheers,
Ron
 
R

Ron Adam

Matimus said:
Excellent work! One more modification and I get below 10us/pass:

def getgroups(seq):
groups = []
push = groups.append
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
push(seq[start:stop])
start = stop
push(seq[start:])
return groups

-Matt


Looks good to me. :)

So a generator versions would be...

(not tested)

def getgroups(seq):
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
yield seq[start:stop]
start = stop
yield seq[start:]

(I also wanted to compare this to Georges solution, maybe later.)

Now if there is some way to generalize this so it can be used in a broader
range of situations without loosing too much of it's efficiency. Of course
then maybe group by would be better. <shrug>

Cheers,
Ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top