grouping a flat list of number by range

J

joh12005

hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

best regards,

J.
 
P

Peter Otten

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

Don't hold back your code. Would it work to replace each occurrence of

result_list.append([start, stop])

with

yield [start, stop]

?

Peter
 
J

Jim Segrave

hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

There are probably better ways, but this works
best regards,

J.

class IterInterval(object):
"""Create an iterator which, given a list of integers,
for each run of consecutive integers i...j, yields a two
element list [i, j + 1]

Singleton lists return [i, i + 1]
Empty lists return None
"""
def __init__(self, seq):
self.seq = seq
self.firstval = None
self.index = 0

def __iter__(self):
# firstval = the start of a run of consecutive integers
# lastval = the last value found in the run
# nextval = the most recent value taken from the input list
if not self.firstval:
# set up the first iteration
if self.index >= len(self.seq):
# empty list, return
raise StopIteration
self.firstval = lastval = int(self.seq[self.index])
self.index += 1
while True:
if self.index >= len(self.seq):
# list exhausted, output the last value read
yield [self.firstval, lastval + 1]
raise StopIteration
nextval = int(self.seq[self.index])
self.index += 1
if nextval == lastval + 1:
lastval = nextval
continue
else:
# end of run - output the run, reset for next call
yield [self.firstval, lastval + 1]
self.firstval = lastval = nextval
continue

if __name__ == '__main__':
for l in [[3, 6, 7, 8, 12, 13, 15], [2], []]:
print l, "=>", [lst for lst in IterInterval(l)]

/usr/home/jes% python interval.py
[3, 6, 7, 8, 12, 13, 15] => [[3, 4], [6, 9], [12, 14], [15, 16]]
[3] => [[3, 4]]
[] => []
 
B

Ben Cartwright

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

Sure:

def group_intervals(it):
it = iter(it)
val = it.next()
run = [val, val+1]
for val in it:
if val == run[1]:
run[1] += 1
else:
yield run
run = [val, val+1]
yield run

--Ben
 
P

Paddy

hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

best regards,

J.

I did a proceedural version, then re-read your post and did a generator
based version ;-)

=== interv1 ===
inlist = [3, 6, 7, 8, 12, 13, 15]
tmp = []
for i,val in enumerate(inlist):
.... if i==0:
.... tmp.append(val)
.... elif val != valinc:
.... tmp += [valinc, val]
.... valinc = val+1
....
tmp.append(valinc)
tmp [3, 4, 6, 9, 12, 14, 15, 16]
tmp[0::2] [3, 6, 12, 15]
tmp[1::2] [4, 9, 14, 16]
zip(tmp[0::2], tmp[1::2]) [(3, 4), (6, 9), (12, 14), (15, 16)]

=== END interv1 ===

=== interv2 ===.... for i,val in enumerate(inlist):
.... if i==0:
.... tmp = val
.... elif val != valinc:
.... yield [tmp, valinc]
.... tmp = val
.... valinc = val+1
.... yield [tmp, valinc]
....[[3, 4], [6, 9], [12, 14], [15, 16]]

=== END interv2 ===

- Paddy.
 
P

Paddy

I did a little re-arranging of the generator version:

def interv3(inlist):
tmp = inlist[0]
valinc = tmp+1
for val in inlist[1:]:
if val != valinc:
yield [tmp, valinc];
tmp = val
valinc = val+1
yield [tmp, valinc]
 
J

Jim Segrave

=== interv2 ===... for i,val in enumerate(inlist):
... if i==0:
... tmp = val
... elif val != valinc:
... yield [tmp, valinc]
... tmp = val
... valinc = val+1
... yield [tmp, valinc]
...[[3, 4], [6, 9], [12, 14], [15, 16]]

=== END interv2 ===

This doesn't actually run, changing it to make it do so:

def interv2(inlist):
tmp = valinc = 0
for i,val in enumerate(inlist):
if i==0:
tmp = val
valinc = val + 1
elif val != valinc:
yield [tmp, valinc]
tmp = val
valinc = val+1
yield [tmp, valinc]

it now works, but returns [0, 0] when passed an empty list, when it
should return nothing at all
 
P

Paddy

Jim said:
=== interv2 ===
def interv2(inlist):
... for i,val in enumerate(inlist):
... if i==0:
... tmp = val
... elif val != valinc:
... yield [tmp, valinc]
... tmp = val
... valinc = val+1
... yield [tmp, valinc]
...
list(interv2(inlist))
[[3, 4], [6, 9], [12, 14], [15, 16]]

=== END interv2 ===

This doesn't actually run, changing it to make it do so:

def interv2(inlist):
tmp = valinc = 0
for i,val in enumerate(inlist):
if i==0:
tmp = val
valinc = val + 1
elif val != valinc:
yield [tmp, valinc]
tmp = val
valinc = val+1
yield [tmp, valinc]

it now works, but returns [0, 0] when passed an empty list, when it
should return nothing at all
Jim, I had tabs/spaces indent problems when cut-n-pasting.

What I ran was more like the version below, but i did a quick
separation of the line that has the ';' in it and goofed.
.... for i,val in enumerate(inlist):
.... if i==0:
.... tmp = val
.... elif val != valinc:
.... yield [tmp, valinc]; tmp = val
.... valinc = val+1
.... yield [tmp, valinc]
.... [[3, 4], [6, 9], [12, 14], [15, 16]]
 
J

Jim Segrave

I did a little re-arranging of the generator version:

def interv3(inlist):
tmp = inlist[0]
valinc = tmp+1
for val in inlist[1:]:
if val != valinc:
yield [tmp, valinc];
tmp = val
valinc = val+1
yield [tmp, valinc]

Still fails when passed an empty list, the initial assignment to tmp
is an IndexError
 
J

Jim Segrave

What I ran was more like the version below, but i did a quick
separation of the line that has the ';' in it and goofed.
... for i,val in enumerate(inlist):
... if i==0:
... tmp = val
... elif val != valinc:
... yield [tmp, valinc]; tmp = val
... valinc = val+1
... yield [tmp, valinc]
... [[3, 4], [6, 9], [12, 14], [15, 16]]

Fails on an empty list, as tmp is not defined when it hits the yield
 
J

John Machin

hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

best regards,

J.

I did a proceedural version, then re-read your post and did a generator
based version ;-)

=== interv1 ===
inlist = [3, 6, 7, 8, 12, 13, 15]
tmp = []
for i,val in enumerate(inlist):
... if i==0:
... tmp.append(val)
... elif val != valinc:
... tmp += [valinc, val]
... valinc = val+1
...
tmp.append(valinc)
tmp [3, 4, 6, 9, 12, 14, 15, 16]
tmp[0::2] [3, 6, 12, 15]
tmp[1::2] [4, 9, 14, 16]
zip(tmp[0::2], tmp[1::2])
[(3, 4), (6, 9), (12, 14), (15, 16)]

=== END interv1 ===

=== interv2 ===... for i,val in enumerate(inlist):
... if i==0:
... tmp = val
... elif val != valinc:
... yield [tmp, valinc]
... tmp = val
... valinc = val+1
... yield [tmp, valinc]
...[[3, 4], [6, 9], [12, 14], [15, 16]]

=== END interv2 ===

- Paddy.

Oh the siren call of every new feature in the language!
enumerate() just to get a first-time test, and then botch it??

Read the following; the replacement version uses a simple old-fashioned
inelegant flag, works with an empty sequence, and doesn't depend on the
input being sliceable or indexable.

HTH,
John

C:\junk>type interval.py
def interv2(inlist):
for i,val in enumerate(inlist):
if i==0:
tmp = val
elif val != valinc:
yield [tmp, valinc]
tmp = val
valinc = val+1
yield [tmp, valinc]

def interv3(inseq):
first = True
for val in inseq:
if first:
tmp = val
first = False
elif val != valinc:
yield [tmp, valinc]
tmp = val
valinc = val + 1
if not first:
yield [tmp, valinc]

tests = [
[3, 4, 6, 9, 12, 14, 15, 16],
[3, 3, 3],
[],
]

for func in (interv3, interv2):
for test in tests:
print "%s(%s) ->" % (func.__name__, test)
result = list(func(test))
print "\t%r" % result

C:\junk>interval.py
interv3([3, 4, 6, 9, 12, 14, 15, 16]) ->
[[3, 5], [6, 7], [9, 10], [12, 13], [14, 17]]
interv3([3, 3, 3]) ->
[[3, 4], [3, 4], [3, 4]]
interv3([]) ->
[]
interv2([3, 4, 6, 9, 12, 14, 15, 16]) ->
[[3, 5], [6, 7], [9, 10], [12, 13], [14, 17]]
interv2([3, 3, 3]) ->
[[3, 4], [3, 4], [3, 4]]
interv2([]) ->
Traceback (most recent call last):
File "C:\junk\interval.py", line 33, in ?
result = list(func(test))
File "C:\junk\interval.py", line 9, in interv2
yield [tmp, valinc]
UnboundLocalError: local variable 'tmp' referenced before assignment

C:\junk>
 
G

Gerard Flanagan

hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

best regards,

a list comprehension/itertools version (this won't work with an empty
list):

from itertools import groupby

a = [3, 6, 7, 8, 12, 13, 15]

result = [[3, 4], [6,9], [12, 14], [15, 16]]

b = [ list(g)[0] for k,g in groupby(range(a[0],a[-1]+2), lambda x:
x in a)]

c = [ b[i:i+2] for i in range(0,len(a),2) ]

assert c == result


Gerard
 
G

Gerard Flanagan

Gerard said:
hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

best regards,

a list comprehension/itertools version (this won't work with an empty
list):

from itertools import groupby

a = [3, 6, 7, 8, 12, 13, 15]

result = [[3, 4], [6,9], [12, 14], [15, 16]]

b = [ list(g)[0] for k,g in groupby(range(a[0],a[-1]+2), lambda x:
x in a)]

c = [ b[i:i+2] for i in range(0,len(a),2) ]

assert c == result


Gerard

change len(a) to len(b) in case a has duplicates like
[3,3,3,6,7,8,12,13,15]

Gerard
 
P

Paddy

Jim said:
What I ran was more like the version below, but i did a quick
separation of the line that has the ';' in it and goofed.
def interv2(inlist):
... for i,val in enumerate(inlist):
... if i==0:
... tmp = val
... elif val != valinc:
... yield [tmp, valinc]; tmp = val
... valinc = val+1
... yield [tmp, valinc]
...
list(interv2(inlist))
[[3, 4], [6, 9], [12, 14], [15, 16]]

Fails on an empty list, as tmp is not defined when it hits the yield

Yep, I still, purposfully, decided not to put any guard checks in
because:
Its very easy to add.
It detracts from the algorithm.
It might never be called with an empty list in-situ.
It is correct for the original posters testcases.
Yah gotta leave-em something to do ;-)

But you-too are right. Don't just cut-n-paste newsgroup solutions. they
need testing for your particular use. I doubt anyone on C.L.P. would be
malicious, but your actual use of a function having a wider scope to
inputs than mentioned might be fairy common.

(Damn, I wanted to use 'caveat emptor', but it does not apply as
nothing is being bought. Oh well :)

-- Pad.
 
P

Paddy

John said:
On 2/06/2006 8:36 AM, Paddy wrote:
Oh the siren call of every new feature in the language!
enumerate() just to get a first-time test, and then botch it??

Read the following; the replacement version uses a simple old-fashioned
inelegant flag, works with an empty sequence, and doesn't depend on the
input being sliceable or indexable.

HTH,
John

Thanks, un-botchable John. What if the poster doesn't send a null list?
What if the correct result for a null list is "spam"; or an exception?
What if.... ... we toned down the language?

Cheers, Pad. :)
 
J

Jim Segrave

Thanks, un-botchable John. What if the poster doesn't send a null list?
What if the correct result for a null list is "spam"; or an exception?
What if.... ... we toned down the language?

I'd still rate Ben Cartwright's solution as the best. The one I posted
tried to do the same, but I didn't see the right way to iterate over
the supplied list inside the function, wheras he did. It's short,
simple and correct.

If you really wanted to return 'spam' or an exception in his solution,
it's a simple if statement at the start of the function.
 
S

Steven Bethard

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

Know your itertools. From the examples section[1]:

"""
# Find runs of consecutive numbers using groupby. The key to the
# solution is differencing with a range so that consecutive numbers all
# appear in same group.
>>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
>>> for k, g in groupby(enumerate(data), lambda (i,x):i-x):
.... print map(operator.itemgetter(1), g)
....
[1]
[4, 5, 6]
[10]
[15, 16, 17, 18]
[22]
[25, 26, 27, 28]
"""


So I think something like this should work:
.... def key((i, n)):
.... return i - n
.... for key, group in itertools.groupby(enumerate(numbers), key):
.... group = list(group)
.... _, first_n = group[0]
.... _, last_n = group[-1]
.... yield first_n, last_n + 1
....
>>> list(intervals([3, 6, 7, 8, 12, 13, 15]))
[(3, 4), (6, 9), (12, 14), (15, 16)]

If you really need lists instead of tuples, just put brackets around the
terms in the yield statement.

[1] http://docs.python.org/lib/itertools-example.html

STeVe
 
J

joh12005

thanks to all !

my version was far less clever/long thant the one you posted, i'm going
to examine all these with much interest, and learn...

best regards.
 
G

Gerard Flanagan

hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)


Just another 'itertools.groupby' variation. It considers the whole
range of numbers spanned by the dataset - eg. in your example,
range(3,17) - so possibly not very efficient if the range is large and
the data sparse within the range.

def get_intervals(data):
intervals = [ [], [] ]
for k,g in groupby(range(data[0],data[-1]+2), lambda x:x in data):
intervals[k].append( list(g)[0] ) #k is 0 or 1
return zip(intervals[1],intervals[0])

a = [3, 6, 7, 8, 12, 13, 15]

assert get_intervals(a) == [(3,4),(6, 9),(12,14),(15,16)]


Gerard
 
S

Steven Bethard

Gerard said:
hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

Just another 'itertools.groupby' variation. It considers the whole
range of numbers spanned by the dataset - eg. in your example,
range(3,17) - so possibly not very efficient if the range is large and
the data sparse within the range.

def get_intervals(data):
intervals = [ [], [] ]
for k,g in groupby(range(data[0],data[-1]+2), lambda x:x in data):
intervals[k].append( list(g)[0] ) #k is 0 or 1
return zip(intervals[1],intervals[0])

If you're gonna go this route, you should definitely use a set to check
containment; ``x in data`` is going to be costly for long lists.
Defining your key something like the following would be better::

data_set = set(data)
def key(x):
return x in data_set

STeVe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,297
Messages
2,571,536
Members
48,284
Latest member
alphabetsalphabets

Latest Threads

Top