find overlapping lines & output times observed

  • Thread starter Linsey Raaijmakers
  • Start date
L

Linsey Raaijmakers

Hello,

I have a file like this:
action start end
50 5321 5321
7 5323 5347
12 5339 5351
45 5373 5373
45 5420 5420
25 5425 5425
26 5425 5425
50 5451 5451
45 5452 5452
14 5497 5503
26 5513 5513
25 5517 5517
45 5533 5533
50 5533 5533
5 5537 5540
25 5580 5580
45 5586 5586
26 5595 5595
45 5603 5603
50 5634 5634
45 5645 5645
7 5657 5689
25 5682 5682
26 5682 5690
26 5708 5708
45 5717 5717
50 5740 5740
45 5777 5777
45 5804 5804
7 5805 5845

and want to find how many times combinations occur in a time frame (between column 2 and 3 ). This can be multiple combinations, which is my problem now.
I have no problems finding overlap between 2 actions.
I want to start with the first line, action 50. and check for all lines in the rest of the file if there are lines that overlap this action.
So the first line has no overlap, but action 25 and 26 would be a combination that overlaps, and 45 and 50 (5533 -5533). but 7,25,26 would be a combination of 3(5682-5682 & 5682-5690 & 5657-5689 because these three overlap each other.

I have a script now that identifies overlap between two actions (see bottom page), but how can I change this so that it outputs all possible combinations?

My desired output would be:

action times observed apex
50 5 5321, 5451, 5533, 5634, 5740
50,45 1 5533;5533
7 4 5347, 5689, 5688, 5845
7,25 2 5347;5425, 5689;5682
7,25,26 1 5689;5682;5690

CODE:

from collections import Counter
f = open('and.txt','r');

action_list = []
onset_list = []
apex_list = []
offset_list = []
action_counter = 0
combination_list = []


for line in f:
fields = line.split("\t")
for col in fields:
action = fields[0]
onset = fields[1]
apex = fields[2]
offset = fields[3]

action_list.append(action)
onset_list.append(onset)
apex_list.append(apex)
offset_list.append(offset)

action_cnvrt = map(int, action_list)
c = Counter(action_cnvrt)

filtered = list(set(action_list))
filtered_cnvrt = map(int, filtered)

for a in filtered_cnvrt:
action_count = str(a)+"\t"+str(c[a])
print action_count

for i in range (0,len(onset_list)):
combination_list.append(action_list)
for j in range(0,len(apex_list)):
if i != j:
if onset_list[j]>= onset_list and apex_list[j] <= apex_list:
print action_list[j]+","+action_list+'\t'+onset_list[j]+'\t'+apex_list[j]+'\t'+onset_list+'\t'+apex_list


I hope somebody can help me :)
 
O

Oscar Benjamin

I have a file like this:
action start end
50 5321 5321
7 5323 5347
12 5339 5351
45 5373 5373
45 5420 5420
25 5425 5425
[snip]

your code below suggests that your file also has an "apex" column. If
that's correct what do you what is it and what do you want to do with
it?

[snip]
I have a script now that identifies overlap between two actions (see bottom page), but how can I change this so that it outputs all possible combinations?

I looked at that code and I don't think it does what you describe. Are
you sure that it works?
My desired output would be:

action times observed apex
50 5 5321, 5451, 5533, 5634, 5740
50,45 1 5533;5533
7 4 5347, 5689, 5688, 5845
7,25 2 5347;5425, 5689;5682
7,25,26 1 5689;5682;5690

CODE:

from collections import Counter
f = open('and.txt','r');

action_list = []
onset_list = []
apex_list = []
offset_list = []
action_counter = 0
combination_list = []


for line in f:
fields = line.split("\t")
for col in fields:
action = fields[0]
onset = fields[1]
apex = fields[2]
offset = fields[3]

The above code suggests that the file has four columns. Also since
you're not actually using the loop variable "col" you can just delete
the "for" line and unindent the rest. In fact the easiest way to do
all of this is just:

action, onset, apex, offset = line.split()
action_list.append(action)
onset_list.append(onset)
apex_list.append(apex)
offset_list.append(offset)

action_cnvrt = map(int, action_list)
c = Counter(action_cnvrt)

filtered = list(set(action_list))

There's no need to convert this back to a list if you're just going to
iterate over it again with map.
filtered_cnvrt = map(int, filtered)

for a in filtered_cnvrt:
action_count = str(a)+"\t"+str(c[a])
print action_count

The above counts the number of times each event occurs which is one
part of your desired output.
for i in range (0,len(onset_list)):
combination_list.append(action_list)
for j in range(0,len(apex_list)):
if i != j:
if onset_list[j]>= onset_list and apex_list[j] <= apex_list:
print action_list[j]+","+action_list+'\t'+onset_list[j]+'\t'+apex_list[j]+'\t'+onset_list+'\t'+apex_list


What is combination_list for? It should just end up containing the
same thing as action_list if I understand correctly.

It's generally better in Python to loop directly over things rather
than using indices so, instead of something like:

for i in range(len(onset_list)):
print offset_list - onset_list

you should do something like

for offset, onset in zip(offset_list, onset_list):
print offset - onset

and your code will be a lot clearer.

The algorithm you are using is to loop over all events and then to
loop over all other events comparing all pairs of events. This will
not scale very well if you want to look at a large file or to compare
simultaneous occurrences of more than two events.

It looks as if your input data is ordered by the onset column. Is that
the case? If so then you can use an algorithm that just loops once
over all the events. The way the algorithm works is that you store
which events are currently active and loop through the events keeping
track of the start time of the most recently added event adding and
removing events as they start and stop. In pseudocode:

now = start of first event
active = [first event]
for next_starting in events (not including first):
next_ending = event that ends soonest from active
while start of next_starting > end of next_ending:
report active events from now to end of next_ending
now = end of next_ending
remove next_ending from active
report active events from now until start of next_starting
now = start of next_starting
add next_starting to active

And some more code to deal with what happens when you get to the end
of the list of events...

The report steps probably mean adding to a Counter or dict to remember
that the currently active events were active during each particular
time window.


Oscar
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top