Overlap in python

B

Bearophile

Albert van der Horst:
That is an algorithmic question and has little to do with Python.<

Yes, but comp.lang.python isn't comp.lang.c, that kind of questions
are perfectly fine here. They help keep this place from becoming
boring.

Bye,
bearophile
 
M

Marcus Wanner

parts = [(5, 9, "a"), (7, 10, "b"), (3, 6, "c"), (15, 20, "d"), (18, 23, "e")]
parts.sort()
parts [(3, 6, 'c'), (5, 9, 'a'), (7, 10, 'b'), (15, 20, 'd'), (18, 23, 'e')]
# Merge overlapping intervals.
pos = 1
while pos < len(parts):
# Merge the pair in parts[pos - 1 : pos + 1] if they overlap.
p, q = parts[pos - 1 : pos + 1]
if p[1] >= q[0]:
parts[pos - 1 : pos + 1] = [(p[0], max(p[1], q[1]), p[2]
+ "." + q[2])]
else:
# They don't overlap, so try the next pair.
pos += 1

[(3, 10, 'c.a.b'), (15, 23, 'd.e')]
That's the best solution I've seen so far. It even has input/output
formatted as close as is reasonably possible to the format specified.

As we would say in googlecode, +1.

Marcus
 
N

nn

 >>> parts = [(5, 9, "a"), (7, 10, "b"), (3, 6, "c"), (15, 20, "d"),
(18, 23, "e")]
 >>> parts.sort()
 >>> parts
[(3, 6, 'c'), (5, 9, 'a'), (7, 10, 'b'), (15, 20, 'd'), (18, 23, 'e')]
 >>> # Merge overlapping intervals.
 >>> pos = 1
 >>> while pos < len(parts):
        # Merge the pair in parts[pos - 1 : pos + 1] if they overlap.
        p, q = parts[pos - 1 : pos + 1]
        if p[1] >= q[0]:
                parts[pos - 1 : pos + 1] = [(p[0], max(p[1], q[1]), p[2]
+ "." + q[2])]
        else:
                # They don't overlap, so try the next pair.
                pos += 1
 >>> parts
[(3, 10, 'c.a.b'), (15, 23, 'd.e')]

That's the best solution I've seen so far. It even has input/output
formatted as close as is reasonably possible to the format specified.

As we would say in googlecode, +1.

Marcus

How does it compare to this one?

http://groups.google.com/group/comp...ed9d05d11d0/56684b795fc527cc#56684b795fc527cc
 
M

Mark Lawrence

Jay said:
Hi everyone,

I wanted to thank you all for your help and *excellent* discussion. I
was able to utilize and embed the script by Grigor Lingl in the 6th
post of this discussion to get my program to work very quickly (I had
to do about 20 comparisons per data bin, with over 40K bins in
total). I am involved in genomic analysis research and this problem
comes up a lot and I was surprised to not have been able to find a
clear way to solve it. I will also look through all the tips in this
thread, I have a feeling they may come in handy for future use!

Thank you again,
Jay
I don't know if this is relevant, but http://planet.python.org/ has an
entry dated this morning which points here
http://www.logarithmic.net/pfh/blog/01249470842.

HTH.
 
M

Marcus Wanner

parts = [(5, 9, "a"), (7, 10, "b"), (3, 6, "c"), (15, 20, "d"),
(18, 23, "e")]
parts.sort()
parts
[(3, 6, 'c'), (5, 9, 'a'), (7, 10, 'b'), (15, 20, 'd'), (18, 23, 'e')]
# Merge overlapping intervals.
pos = 1
while pos < len(parts):
# Merge the pair in parts[pos - 1 : pos + 1] if they overlap.
p, q = parts[pos - 1 : pos + 1]
if p[1] >= q[0]:
parts[pos - 1 : pos + 1] = [(p[0], max(p[1], q[1]), p[2]
+ "." + q[2])]
else:
# They don't overlap, so try the next pair.
pos += 1
parts
[(3, 10, 'c.a.b'), (15, 23, 'd.e')]
That's the best solution I've seen so far. It even has input/output
formatted as close as is reasonably possible to the format specified.

As we would say in googlecode, +1.

Marcus

How does it compare to this one?

http://groups.google.com/group/comp...ed9d05d11d0/56684b795fc527cc#56684b795fc527cc
That is a different problem, and the solution is more complex.
I am not going to try to judge which is better.

Marcus

--
print ''.join([chr(((ord(z)+(ord("I'M/THE"[3])+sum(
[ord(x)for x in 'CRYPTOR'])))%(4*ord('8')+ord(
' ')))) for z in ''.join(([(('\xca\x10\x03\t'+
'\x01\xff\xe6\xbe\x0c\r\x06\x12\x17\xee\xbe'+
'\x10\x03\x06\x12\r\x0c\xdf\xbe\x12\x11\x13'+
'\xe8')[13*2-y]) for y in range(int(6.5*4)+1)]
))])
 
J

John Ladasky

Hi everyone,

I wanted to thank you all for your help and *excellent* discussion.  I
was able to utilize and embed the script by Grigor Lingl in the 6th
post of this discussion to get my program to work very quickly (I had
to do about 20 comparisons per data bin, with over 40K bins in
total).  I am involved in genomic analysis research and this problem
comes up a lot and I was surprised to not have been able to find a
clear way to solve it.  I will also look through all the tips in this
thread, I have a feeling they may come in handy for future use!

Thank you again,
Jay

Hi Jay,

I know this is a bit off-topic, but how does this pertain to genomic
analysis? Are you counting the lengths of microsatellite repeats or
something?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,274
Messages
2,571,366
Members
48,055
Latest member
RacheleCar

Latest Threads

Top