P
pruebauno
I am currently working on a tricky problem at work. I googled around a
bit, but "time intervals" did not come up with anything useful.
Although I have some rough idea of how I could solve it, I still would
value some input.
I have information of (It has only couple dozen entries.)
ServiceNum, DollarCost
and a input database table in the form (several GBytes):
ClientNumber (CN), BeginDate(BD), EndDate(ED),
ServiceNumber_Purchased(SN)
--Date intervals are always [closed, closed]
The output is:
ClientNumber(CN), BeginDate(BD), Enddate(ED), DollarCost(DC)
With the following requirements:
1) The input dates can be overlapping dates.
The output has to be non-overlapping "broken up" dates
Example: (assuming SN 1=$10 and SN 2=$20)
input (CN, BD, ED ,SN):
10, 1/1/2001,1/1/2005,1
10, 1/1/2002,1/1/2004,2
should result in:
output (CN, BD, ED, DC):
10, 1/1/2001, 12/31/2001, $10
10, 1/1/2002, 1/1/2004, $30
10, 1/2,2004, 1/1/2005, $10
2) if the same service is purchased twice for an interval it should be
billed only once
Example:
input:
10, 1/1/2001, 1/1/2005, 1
10, 1/1/2004, 1/1/2007, 1
output:
10, 1/1/2001, 1/1/2007, $10
Here are my thoughts about a solution so far:
1. fetch the input data sorted by clientNumber, Begindate, Enddate and
ServiceNumber
2. read the row, store as temporary good interval, then read another
row
3. if new begin-date is bigger than previous good interval end-date (or
previously saved end-date), output previous good interval, start new
good interval
4. else create new good interval entry with good interval begin-date
and current row begin-date, store good interval end-date into a list
with bisect module or something (so we always compare against smallest
end-date).
5. Use "bitwise or" on a service bitfield to add services and caluate
the total
As you can see my algorithm is a bit scetchy right now, but at this
point I am more interested to determine if the bisect module would be
the best way to approach the date interval problem or if I should use
some other data structure instead (stack, queue, set,...). I am working
on a Python proof of concept now. It is likely that the company will
want a C++ version of it later (yikes).
nes
bit, but "time intervals" did not come up with anything useful.
Although I have some rough idea of how I could solve it, I still would
value some input.
I have information of (It has only couple dozen entries.)
ServiceNum, DollarCost
and a input database table in the form (several GBytes):
ClientNumber (CN), BeginDate(BD), EndDate(ED),
ServiceNumber_Purchased(SN)
--Date intervals are always [closed, closed]
The output is:
ClientNumber(CN), BeginDate(BD), Enddate(ED), DollarCost(DC)
With the following requirements:
1) The input dates can be overlapping dates.
The output has to be non-overlapping "broken up" dates
Example: (assuming SN 1=$10 and SN 2=$20)
input (CN, BD, ED ,SN):
10, 1/1/2001,1/1/2005,1
10, 1/1/2002,1/1/2004,2
should result in:
output (CN, BD, ED, DC):
10, 1/1/2001, 12/31/2001, $10
10, 1/1/2002, 1/1/2004, $30
10, 1/2,2004, 1/1/2005, $10
2) if the same service is purchased twice for an interval it should be
billed only once
Example:
input:
10, 1/1/2001, 1/1/2005, 1
10, 1/1/2004, 1/1/2007, 1
output:
10, 1/1/2001, 1/1/2007, $10
Here are my thoughts about a solution so far:
1. fetch the input data sorted by clientNumber, Begindate, Enddate and
ServiceNumber
2. read the row, store as temporary good interval, then read another
row
3. if new begin-date is bigger than previous good interval end-date (or
previously saved end-date), output previous good interval, start new
good interval
4. else create new good interval entry with good interval begin-date
and current row begin-date, store good interval end-date into a list
with bisect module or something (so we always compare against smallest
end-date).
5. Use "bitwise or" on a service bitfield to add services and caluate
the total
As you can see my algorithm is a bit scetchy right now, but at this
point I am more interested to determine if the bisect module would be
the best way to approach the date interval problem or if I should use
some other data structure instead (stack, queue, set,...). I am working
on a Python proof of concept now. It is likely that the company will
want a C++ version of it later (yikes).
nes