Complex Nested Dictionaries

T

T. Earle

To list,

I'm trying to figure out the best approach to the following problem:

I have four variables:
1) headlines
2) times
3) states
4) zones

At this time, I'm thinking of creating a dictionary, headlinesDB, that
stores different headlines and their associated time(s), state(s), and
zone(s). The complexity is that each headline can have one or more times,
one or more states, and one or more zones. However, there can only be 1
zone per time, and 1 zone per state. What is the best way to tackle this
particular problem?

Here's an example of the complexity:

Let's say we have a "High Wind Warning" for our headline or hazard. In
addition, there are currently two "High Wind Warnings" in effect. The first
goes from Tonight through Friday morning (i.e., I'll probably store the
begin/end times in seconds from 1/1/1970). It affects three counties all in
the state of Oregon: ORZ047, ORZ048, and ORZ049. The second High Wind
Warning is in effect from Friday at Noon through Friday evening. It affects
two counties in two separate states: ORZ044 in Oregon and WAZ028 in
Washington. Here's the flow chart:

High Wind Warning --> time1 --> state1 --> zone1, zone2, zone3
|
--> time2 --> state1 --> zone4
--> state2 --> zone5

Keep in mind, each headline or hazard can have multiple times. Each time
will have one or more states with each state containing one or more zones.
Is there a better way than a dictionary. As mentioned above, the headline
or hazard is the key I'll be extracting all the information from.

Thanks in advance,

Tom
 
O

omission9

T. Earle said:
To list,

I'm trying to figure out the best approach to the following problem:

I have four variables:
1) headlines
2) times
3) states
4) zones

At this time, I'm thinking of creating a dictionary, headlinesDB, that
stores different headlines and their associated time(s), state(s), and
zone(s). The complexity is that each headline can have one or more times,
one or more states, and one or more zones. However, there can only be 1
zone per time, and 1 zone per state. What is the best way to tackle this
particular problem?

Here's an example of the complexity:

Let's say we have a "High Wind Warning" for our headline or hazard. In
addition, there are currently two "High Wind Warnings" in effect. The first
goes from Tonight through Friday morning (i.e., I'll probably store the
begin/end times in seconds from 1/1/1970). It affects three counties all in
the state of Oregon: ORZ047, ORZ048, and ORZ049. The second High Wind
Warning is in effect from Friday at Noon through Friday evening. It affects
two counties in two separate states: ORZ044 in Oregon and WAZ028 in
Washington. Here's the flow chart:

High Wind Warning --> time1 --> state1 --> zone1, zone2, zone3
|
--> time2 --> state1 --> zone4
--> state2 --> zone5

Keep in mind, each headline or hazard can have multiple times. Each time
will have one or more states with each state containing one or more zones.
Is there a better way than a dictionary. As mentioned above, the headline
or hazard is the key I'll be extracting all the information from.

Thanks in advance,

Tom

I'd recommend the mx.DateTime package for storing the times instead of
seconds. That module includes many useful functions may be need so give
it a look.
http://www.egenix.com/files/python/mxDateTime.html
Secondly, although I am not 100% sure about the stated problem I would
recommend that instead of nested dictionaries you use a tuple as a
key,say, headlines[(time,state,zone)]=someValue
From what you say above it would seem that this would create a unique
key for all the mentioned situations.
 
R

Russell E. Owen

"T. Earle" <[email protected]> said:
...
High Wind Warning --> time1 --> state1 --> zone1, zone2, zone3
|
--> time2 --> state1 --> zone4
--> state2 --> zone5

Keep in mind, each headline or hazard can have multiple times. Each time
will have one or more states with each state containing one or more zones.
Is there a better way than a dictionary. As mentioned above, the headline
or hazard is the key I'll be extracting all the information from.

If you really only want to look up data by headline, then a dictionary
of dictionaries or nested lists or some other kind of collection is easy
and should suffice. For instance:
warndict["High Wind Warning"] = (
(time1, {
state1: (zone1, zone2, zone3),
state2: (zone1, zone3),
}),
(time2, {...}),
)

However, I suspect you will also want to be able to locate data by
state, time or zone. If that is true, I really think you should consider
storing the data in a relational database. It sounds like a perfect
match to your problem. Python has some nice interfaces to various
databases (including PostgreSQL and MySQL).

-- Russell

P.S. if you do go with the dictionary, note that it is very easy to make
a variant dictionary that defines
a[key] = foo
to mean "if list a[key] exists, then append foo to that list, otherwise
create a new list with foo as its only element" (in fact my RO package
contains just such a class: RO.Alg.MultiDict -- see <http://www.astro.washington.edu/owen/ROPython.html>)
 
T

T. Earle

Russell,
If you really only want to look up data by headline, then a dictionary
of dictionaries or nested lists or some other kind of collection is easy
and should suffice. For instance:
warndict["High Wind Warning"] = (
(time1, {
state1: (zone1, zone2, zone3),
state2: (zone1, zone3),
}),
(time2, {...}),
)

This definitely seems to be the structure I've been looking for or at least
have in mind. Since I'm no expert, could offer some code examples on how to
create this structure on the fly?
However, I suspect you will also want to be able to locate data by
state, time or zone. If that is true, I really think you should consider
storing the data in a relational database. It sounds like a perfect
match to your problem. Python has some nice interfaces to various
databases (including PostgreSQL and MySQL).

My first inclination was to go with a database; however, I thought about it
and concluded there may be too much variability each time the program is
executed. For example, there will be times when there are no headlines;
other times, there will be numerous headlines. Because of this variability,
the database would have to be created from scratch each time the program is
ran. As a result, would a database still be the right choice?

I really appreciate your help and suggestions

T. Earle
 
S

Scott David Daniels

T. Earle said:
Russell,

If you really only want to look up data by headline, then a dictionary
of dictionaries or nested lists or some other kind of collection is easy
and should suffice. For instance:
warndict["High Wind Warning"] = (
(time1, {
state1: (zone1, zone2, zone3),
state2: (zone1, zone3),
}),
(time2, {...}),
)
This definitely seems to be the structure I've been looking for or at least
have in mind. Since I'm no expert, could offer some code examples on how to
create this structure on the fly?
For something very much (but not quite) like the above:

warndict['High Wind Warning'] = {
time1: {
state1: [zone1, zone2, zone3],
state2: [zone1, zone3]},
time2: {...},
...}

can be built with something like:
warndict = {}
for headline, time, state, zone in somesource:
timedict = warndict.setdefault(headline, {})
statedict = timedict.setdefault(time, {})
stateentry = statedict.setdefault(state, [])
stateentry.append(zone)

My first inclination was to go with a database; however, I thought about it
and concluded there may be too much variability each time the program is
executed. For example, there will be times when there are no headlines;
other times, there will be numerous headlines. Because of this variability,
the database would have to be created from scratch each time the program is
ran. As a result, would a database still be the right choice?
It really depends on the volume of data and the kinds of searches.
Anything under a thousand or so entries will be searchable by simple
brute force in reasonable time, so internal data structures may well
be the way to go.
 
T

T. Earle

Scot,

I really appreciate your help and code. It really helps me to understand
the underlying solution to my problem. I have another question though,
what's the best way to test if the headline already exists? If it does not,
I need to create it along with the required associated data; however, if it
already exists, I need to test to ensure I'm not already adding data that's
already there (e.g., time and/or state already exists). Basically, I
envision, if the state already exists all I need to do is add the new zone.
I probably should check to make sure the zone doesn't already exists too.
Any help would be greatly appreciated. I believe it would be similiar to
what Russell mentioned in his previous responce:

"if list a[key] exists, then append; otherwise, create a new list"

Would it be possible to supply a code snippet of this logic to get me
started? What are state and time? Is it possible to use the "key" keyword
on these variables to test for their existence? I apologize for my lack of
knowledge in this particular realm of programming in Python. Nested
dictionaries have always given me trouble.

Thanks,

T. Earle
 
E

Edward C. Jones

T. Earle said:
Scot,

I really appreciate your help and code. It really helps me to understand
the underlying solution to my problem. I have another question though,
what's the best way to test if the headline already exists? If it does not,
I need to create it along with the required associated data; however, if it
already exists, I need to test to ensure I'm not already adding data that's
already there (e.g., time and/or state already exists). Basically, I
envision, if the state already exists all I need to do is add the new zone.
I probably should check to make sure the zone doesn't already exists too.
Any help would be greatly appreciated. I believe it would be similiar to
what Russell mentioned in his previous responce:

"if list a[key] exists, then append; otherwise, create a new list"

Would it be possible to supply a code snippet of this logic to get me
started? What are state and time? Is it possible to use the "key" keyword
on these variables to test for their existence? I apologize for my lack of
knowledge in this particular realm of programming in Python. Nested
dictionaries have always given me trouble.

Thanks,

T. Earle

Check out "MultiDict.py" at "http://members.tripod.com/~edcjones/".

Ed Jones
 
I

Irmen de Jong

T. Earle said:
"if list a[key] exists, then append; otherwise, create a new list"

when a is a dict, key the required key, and object the value you want
to insert;

a.setdefault(key,[]).append(object)

--Irmen
 
H

has

T. Earle said:
To list,

I'm trying to figure out the best approach to the following problem:

I have four variables:
1) headlines
2) times
3) states
4) zones

At this time, I'm thinking of creating a dictionary, headlinesDB, that
stores different headlines and their associated time(s), state(s), and
zone(s). The complexity is that each headline can have one or more times,
one or more states, and one or more zones. However, there can only be 1
zone per time, and 1 zone per state. What is the best way to tackle this
particular problem?

Shake out non-essential complexity first. Not really up on relational
DBs and stuff, so take my attempts at table design with a pinch of
salt, but think I'd break your problem down something like this:


- Hazard Type Table
TYPE
High Wind Warning
Tornado Warning
Blizzard Warning

- Hazard Event Table
ID TYPE START END
ZONES
1 High Wind Warning 2004-03-01-22-00-00 2004-03-02-08-00-00
[ORZ047, ORZ048, ORZ049]
2 High Wind Warning 2004-03-02-12-00-00 2004-03-02-20-00-00
[ORZ044, WAZ028]

- Zone Table
ZONE STATE
ORZ044 Oregon
ORZ047 Oregon
ORZ048 Oregon
ORZ049 Oregon
WAZ028 Washington


Note this organises around individual hazard 'events', rather than
hazard types, making it easier to think see what's going on. Also,
because Zones already identify their States, there's no need to put
State info into hazard events. (State names, if you need them, can be
looked up separately.)

How you actually implement it - as a relational DB/a list of
HazardEvent instances stuffed into a list and brute-force searched via
list comprehensions/nested dicts and lists - really depends on how
you're going to manipulate it, how much flexibility/simplicity you
need, etc.

HTH

has
 
R

Russell E. Owen

If you really only want to look up data by headline, then a dictionary
of dictionaries or nested lists or some other kind of collection is easy
and should suffice. For instance:
warndict["High Wind Warning"] = (
(time1, {
state1: (zone1, zone2, zone3),
state2: (zone1, zone3),
}),
(time2, {...}),
)

This definitely seems to be the structure I've been looking for or at least
have in mind. Since I'm no expert, could offer some code examples on how to
create this structure on the fly?
However, I suspect you will also want to be able to locate data by
state, time or zone. If that is true, I really think you should consider
storing the data in a relational database. It sounds like a perfect
match to your problem. Python has some nice interfaces to various
databases (including PostgreSQL and MySQL).

My first inclination was to go with a database; however, I thought about it
and concluded there may be too much variability each time the program is
executed. For example, there will be times when there are no headlines;
other times, there will be numerous headlines. Because of this variability,
the database would have to be created from scratch each time the program is
ran. As a result, would a database still be the right choice?

I really appreciate your help and suggestions[/QUOTE]

Regarding a database: if you are mainly interested in fairly current
events (rather than being able to go back and search for old events) and
you don't have a huge # of events, then a database does seem "overkill".

However, if you have a lot of events or want to do a lot of searching,
it may be worth keeping a database around. If you use a database, I
recommend creating only one of them. Just add new events, and
occasionally purge old data if you don't care about it anymore.

Here is some sample code (untested) to create the structure shown above.
I assume a simple (for me) structure for the input data; modify
addHealine accordingly if your data needs more massaging first.

This code exposes the internal data, because the class is itself the
dictionary of data. Whether or not this is a good idea depends on how
you want to search for data. If the built in dict methods are of
interest, then you are all set. If not, I would make HeadDict *contain*
a dict instead of *being* a dict, then write your own methods to
retrieve data.

- Download the RO package from http://astro.washington.edu/owen and install it in site-packages
or anywhere on your PythonPath. RO includes RO.Alg.ListDict, which
supports a dictionary whose values are a list and for which the
expression md[key] = value appends "value" to the list associated with
"key", creating a new list if "key" doesn't already exist.

import RO.Alg

class HeadDict(RO.Alg.ListDict):
def addHeadline(self, headline, time, stateZoneList)
"""Add a headline for a given time. stateZoneList is of the form:
((state1, zones_for_state1), (state2, zones_for_state2), ...)
"""
stateZoneDict = dict(stateZoneList)
self[headline] = (time, stateZoneDict)

warndict = HeadDict()
warnDict.addHeadline("High Wind Warning", time1, stateZoneList1)
warnDict.addHeadline("High Wind Warning", time2, stateZoneList2)

-- Russell
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
474,183
Messages
2,570,968
Members
47,518
Latest member
TobiasAxf

Latest Threads

Top