B
braver
Greetings: I wonder how does one uses single-name variables to refer
to nested sunhashes (subdictionaries). Here's an example:
In [41]: orig = { 'abra':{'foo':7, 'bar':9}, 'ca':{}, 'dabra':{'baz':
4} }
In [42]: orig
Out[42]: {'abra': {'bar': 9, 'foo': 7}, 'ca': {}, 'dabra': {'baz': 4}}
In [43]: h = orig['ca']
In [44]: h = { 'adanac':69 }
In [45]: h
Out[45]: {'adanac': 69}
In [46]: orig
Out[46]: {'abra': {'bar': 9, 'foo': 7}, 'ca': {}, 'dabra': {'baz': 4}}
I want to change orig['ca'], which is determined somewhere else in a
program's logic, where subhashes are referred to as h -- e.g., for x
in orig: ... . But assigning to h doesn't change orig.
The real-life motivation for this is n-gram counting. Say you want to
maintain a hash for bigrams. For each two subsequent words a, b in a
text, you do
bigram_count[a] += 1
-- notice you do want to have nested subhashes as it decreases memory
usage dramatically.
In order to generalize this to N-grammity, you want to do something
like,
h = bigram_count
# iterating over i, not word, to notice the last i
for i in range(len(ngram):
word = ngram
if word not in h:
if i < N:
h[word] = {}
else:
h[word] = 0
h = h[word]
h += 1
-- doesn't work and is just a sketch; also, if at any level we get an
empty subhash, we can short-circuit vivify all remaining levels and
add 1 in the lowest, count, level.
Yet since names are not exactly references, something else is needed
for generalized ngram multi-level counting hash -- what?
Cheers,
Alexy
to nested sunhashes (subdictionaries). Here's an example:
In [41]: orig = { 'abra':{'foo':7, 'bar':9}, 'ca':{}, 'dabra':{'baz':
4} }
In [42]: orig
Out[42]: {'abra': {'bar': 9, 'foo': 7}, 'ca': {}, 'dabra': {'baz': 4}}
In [43]: h = orig['ca']
In [44]: h = { 'adanac':69 }
In [45]: h
Out[45]: {'adanac': 69}
In [46]: orig
Out[46]: {'abra': {'bar': 9, 'foo': 7}, 'ca': {}, 'dabra': {'baz': 4}}
I want to change orig['ca'], which is determined somewhere else in a
program's logic, where subhashes are referred to as h -- e.g., for x
in orig: ... . But assigning to h doesn't change orig.
The real-life motivation for this is n-gram counting. Say you want to
maintain a hash for bigrams. For each two subsequent words a, b in a
text, you do
bigram_count[a] += 1
-- notice you do want to have nested subhashes as it decreases memory
usage dramatically.
In order to generalize this to N-grammity, you want to do something
like,
h = bigram_count
# iterating over i, not word, to notice the last i
for i in range(len(ngram):
word = ngram
if word not in h:
if i < N:
h[word] = {}
else:
h[word] = 0
h = h[word]
h += 1
-- doesn't work and is just a sketch; also, if at any level we get an
empty subhash, we can short-circuit vivify all remaining levels and
add 1 in the lowest, count, level.
Yet since names are not exactly references, something else is needed
for generalized ngram multi-level counting hash -- what?
Cheers,
Alexy