tricky time interval billing problem

P

pruebauno

I am currently working on a tricky problem at work. I googled around a
bit, but "time intervals" did not come up with anything useful.
Although I have some rough idea of how I could solve it, I still would
value some input.

I have information of (It has only couple dozen entries.)
ServiceNum, DollarCost

and a input database table in the form (several GBytes):
ClientNumber (CN), BeginDate(BD), EndDate(ED),
ServiceNumber_Purchased(SN)
--Date intervals are always [closed, closed]

The output is:
ClientNumber(CN), BeginDate(BD), Enddate(ED), DollarCost(DC)

With the following requirements:
1) The input dates can be overlapping dates.
The output has to be non-overlapping "broken up" dates

Example: (assuming SN 1=$10 and SN 2=$20)
input (CN, BD, ED ,SN):
10, 1/1/2001,1/1/2005,1
10, 1/1/2002,1/1/2004,2

should result in:
output (CN, BD, ED, DC):
10, 1/1/2001, 12/31/2001, $10
10, 1/1/2002, 1/1/2004, $30
10, 1/2,2004, 1/1/2005, $10

2) if the same service is purchased twice for an interval it should be
billed only once
Example:
input:
10, 1/1/2001, 1/1/2005, 1
10, 1/1/2004, 1/1/2007, 1

output:
10, 1/1/2001, 1/1/2007, $10

Here are my thoughts about a solution so far:

1. fetch the input data sorted by clientNumber, Begindate, Enddate and
ServiceNumber

2. read the row, store as temporary good interval, then read another
row

3. if new begin-date is bigger than previous good interval end-date (or
previously saved end-date), output previous good interval, start new
good interval

4. else create new good interval entry with good interval begin-date
and current row begin-date, store good interval end-date into a list
with bisect module or something (so we always compare against smallest
end-date).

5. Use "bitwise or" on a service bitfield to add services and caluate
the total

As you can see my algorithm is a bit scetchy right now, but at this
point I am more interested to determine if the bisect module would be
the best way to approach the date interval problem or if I should use
some other data structure instead (stack, queue, set,...). I am working
on a Python proof of concept now. It is likely that the company will
want a C++ version of it later (yikes).

nes
 
S

Steve Holden

I am currently working on a tricky problem at work. I googled around a
bit, but "time intervals" did not come up with anything useful.
Although I have some rough idea of how I could solve it, I still would
value some input.

I have information of (It has only couple dozen entries.)
ServiceNum, DollarCost

and a input database table in the form (several GBytes):
ClientNumber (CN), BeginDate(BD), EndDate(ED),
ServiceNumber_Purchased(SN)
--Date intervals are always [closed, closed]

The output is:
ClientNumber(CN), BeginDate(BD), Enddate(ED), DollarCost(DC)

With the following requirements:
1) The input dates can be overlapping dates.
The output has to be non-overlapping "broken up" dates

Example: (assuming SN 1=$10 and SN 2=$20)
input (CN, BD, ED ,SN):
10, 1/1/2001,1/1/2005,1
10, 1/1/2002,1/1/2004,2

should result in:
output (CN, BD, ED, DC):
10, 1/1/2001, 12/31/2001, $10
10, 1/1/2002, 1/1/2004, $30
10, 1/2,2004, 1/1/2005, $10

2) if the same service is purchased twice for an interval it should be
billed only once
Example:
input:
10, 1/1/2001, 1/1/2005, 1
10, 1/1/2004, 1/1/2007, 1

output:
10, 1/1/2001, 1/1/2007, $10

Here are my thoughts about a solution so far:

1. fetch the input data sorted by clientNumber, Begindate, Enddate and
ServiceNumber

2. read the row, store as temporary good interval, then read another
row

3. if new begin-date is bigger than previous good interval end-date (or
previously saved end-date), output previous good interval, start new
good interval

4. else create new good interval entry with good interval begin-date
and current row begin-date, store good interval end-date into a list
with bisect module or something (so we always compare against smallest
end-date).

5. Use "bitwise or" on a service bitfield to add services and caluate
the total

As you can see my algorithm is a bit scetchy right now, but at this
point I am more interested to determine if the bisect module would be
the best way to approach the date interval problem or if I should use
some other data structure instead (stack, queue, set,...). I am working
on a Python proof of concept now. It is likely that the company will
want a C++ version of it later (yikes).

nes
First of all, you need to use ordering to ensure that the database gives
you the most convenient order for processing, as this will make your
computations much easier. So I'd suggest sorting by clientNumber,
ServiceNumber, Begindate and Enddate. That way you can consider each
service separately.

Then all you need to do is accumulate the entries where the clientNumber
and ServiceNumber are the same and the start date of the second is less
than or equal to the end date of the first, repeatedly until either
there's no date overlap or a new service and/or client is started.

This shouldn't need any intermediate storage of results: if the row
you've just read can be used to extend the current range then extend it,
otherwise emit the current range and replace it with the new row.

Or is there something I've failed to understand about how you want to
process the data?

regards
Steve
 
P

pruebauno

First of all, you need to use ordering to ensure that the database gives
you the most convenient order for processing, as this will make your
computations much easier. So I'd suggest sorting by clientNumber,
ServiceNumber, Begindate and Enddate. That way you can consider each
service separately.

Then all you need to do is accumulate the entries where the clientNumber
and ServiceNumber are the same and the start date of the second is less
than or equal to the end date of the first, repeatedly until either
there's no date overlap or a new service and/or client is started.

This shouldn't need any intermediate storage of results: if the row
you've just read can be used to extend the current range then extend it,
otherwise emit the current range and replace it with the new row.

Or is there something I've failed to understand about how you want to
process the data?

regards
Steve

Hi Steve, (btw I am the one that sent you the quotes at Pycon)

Hm, that would be an algorithm for requirement number 2. I do not
understand how it would help with respect to requirement 1. Notice that
by coincidence, in my examples the input data is already sorted as you
specified.

The real data is of course more messy than my analogy and I will post
it here so that nobody can accuse me of "feature creep". I hope I don't
totally confuse you now. Feel free to ignore it. The real output is not
100% correct either (that is why I am rewriting the program). Some of
the database in all its gory, ..eh glory:

INPUT
43756352|D|01/01/1999|09/30/2003|DCUD2B00|DCUD2B00|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756353|D|01/01/1999|09/30/2003|DCUD2B00|DCUD2B00|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756351|D|01/01/1999|09/02/2002|DCUD2B00|DCUD2B00|Y|A|43756350|M|83516
|00374 |9048327561|0001|
43756354|D|04/02/1999|09/30/2003|DCUD2B00|DCUD2B00|Y|A|43756350|W|83516
|00374 |9048327561|0001|
43756351|M|01/01/1999|03/31/1999|MARTPPG2|MARTPPG2|Y|A|43756350|M|83516
|00374 |9048327561|0001|
43756352|M|01/01/1999|03/31/1999|MARTPPG2|MARTPPG2|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756353|M|01/01/1999|03/31/1999|MARTPPG2|MARTPPG2|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756351|M|04/01/1999|07/31/2002|MBCPG002|MBCPG002|Y|A|43756350|M|83516
|00374 |9048327561|0001|
43756352|M|04/01/1999|07/31/2002|MBCPG002|MBCPG002|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756353|M|04/01/1999|07/31/2002|MBCPG002|MBCPG002|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756354|M|04/02/1999|07/31/2002|MBCPG002|MBCPG002|Y|A|43756350|W|83516
|00374 |9048327561|0001|
43756352|M|08/01/2002|09/30/2003|MBP07305|MBP07305|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756353|M|08/01/2002|09/30/2003|MBP07305|MBP07305|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756354|M|08/01/2002|09/30/2003|MBP07305|MBP07305|Y|A|43756350|W|83516
|00374 |9048327561|0001|
43756351|M|08/01/2002|09/02/2002|MBP07305|MBP07305|Y|A|43756350|M|83516
|00374 |9048327561|0001|

OUTPUT
43756350|9048327561|DCUD2B00|D|A|01/01/1999|04/01/1999|83516 |00374
|0001|Y|A|DCUD2B00|
43756350|9048327561|DCUD2B00|D|E|04/02/1999|09/30/2003|83516 |00374
|0001|Y|A|DCUD2B00|
43756350|9048327561|MARTPPG2|M|A|01/01/1999|03/31/1999|83516 |00374
|0001|Y|A|MARTPPG2|
43756350|9048327561|MBCPG002|M|A|04/01/1999|07/31/2002|83516 |00374
|0001|Y|A|MBCPG002|
43756350|9048327561|MBP07305|M|A|08/01/2002|09/02/2002|83516 |00374
|0001|Y|A|MBP07305|
43756350|9048327561|MBP07305|M|E|09/03/2002|09/30/2003|83516 |00374
|0001|Y|A|MBP07305|

CHEAT SHEET:
| (M) | (H,W) | (D,O,S) ||
+==============================
| - | - | - || O |
+-----------------------------------------+
| - | - | X || G |
+-----------------------------------------+
| - | X | - || F |
+-----------------------------------------+
| - | X | X || E |
+-----------------------------------------+
| X | - | - || C |
+-----------------------------------------+
| X | - | X || D |
+-----------------------------------------+
| X | X | - || B |
+-----------------------------------------+
| X | X | X || A |
+-----------------------------------------+

regards,
Nes
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top