save dictionary to a file without brackets.

  • Thread starter giuseppe.amatulli
  • Start date
C

Chris Angelico

O(n) for all other entries in the dict which suffer a hash collision
with the searched entry.

True, a sensible choice of hash function will reduce n to 1 in common
cases, but it becomes an important consideration for larger datasets.

I'm glad you're wrong for CPython's dictionaries. The only time the
lookup would degenerate to O[n] would be if the hash table had only one
slot. CPython sensibly increases the hash table size when it becomes
too small for efficiency.

Where have you seen dictionaries so poorly implemented?

In vanilla CPython up to version (I think) 3.3, where it's possible to
DoS the hash generator. Hash collisions are always possible, just
ridiculously unlikely unless deliberately exploited.

(And yes, I know an option was added to older versions to randomize
the hashes there too. It's not active by default, so "vanilla CPython"
is still vulnerable.)

ChrisA
 
A

Andrew Cooper

Sligtly off topic, but looking up a value in a dictionary is actually
O(n) for all other entries in the dict which suffer a hash collision
with the searched entry.

True, a sensible choice of hash function will reduce n to 1 in common
cases, but it becomes an important consideration for larger datasets.

~Andrew

I'm glad you're wrong for CPython's dictionaries. The only time the
lookup would degenerate to O[n] would be if the hash table had only one
slot. CPython sensibly increases the hash table size when it becomes
too small for efficiency.


Where have you seen dictionaries so poorly implemented?

Different n, which I should have made more clear. I was using it for
consistency with O() notation. My statement was O(n) where n is the
number of hash collisions.

The choice of hash algorithm (or several depending on the
implementation) should specifically be chosen to reduce collisions to
aid in efficient space utilisation and lookup times, but any
implementation must allow for collisions. There are certainly runtime
methods of improving efficiency using amortized operations.

As for poor implementations,

class Foo(object):

...

def __hash__(self):
return 0

I seriously found that in some older code I had the misfortune of
reading. It didn't remain in that state for long.

~Andrew
 
C

Chris Angelico

On 08/09/2012 06:03 PM, Andrew Cooper wrote:
I'm glad you're wrong for CPython's dictionaries. The only time the
lookup would degenerate to O[n] would be if the hash table had only one
slot. CPython sensibly increases the hash table size when it becomes
too small for efficiency.

Where have you seen dictionaries so poorly implemented?

PHP?

http://www.phpclasses.org/blog/post/171-PHP-Vulnerability-May-Halt-Millions-of-Servers.html

That's the same hash collision attack that I alluded to above, and it
strikes *many* language implementations. Most released a patch fairly
quickly and quietly (Pike, Lua, V8 (JavaScript/ECMAScript), PHP), but
CPython dared not, on account of various applications depending on
hash order (at least for tests). It's not (for once) an indictment of
PHP (maybe that should be an "inarrayment"?), it's a consequence of a
hashing algorithm that favored simplicity over cryptographic
qualities.

(It feels weird to be defending PHP...)

ChrisA
 
R

Roy Smith

Andrew Cooper said:
As for poor implementations,

class Foo(object):
def __hash__(self):
return 0

I seriously found that in some older code I had the misfortune of
reading.

Python assumes you are a consenting adult. If you wish to engage in
activities which are hazardous to your health, so be it. But then
again, you could commit this particular stupidity just as easily in C++
or any other language which lets you define your own hash() function.
 
C

Chris Angelico

Python assumes you are a consenting adult. If you wish to engage in
activities which are hazardous to your health, so be it.

.... you mean, Python lets you make a hash of it?

*ducks for cover*

ChrisA
 
M

Mark Lawrence

Only if you order it with spam, spam, spam, spam, spam, spam, and spam.

Now now gentlemen we're getting slightly off topic here and wouldn't
want to upset the people who insist on staying on topic. Or would we? :)
 
D

Dave Angel

On 09/08/2012 22:34, Roman Vashkevich wrote:
Actually, they are different.
Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
Dict uses hashing to get a value from the dict and this is why it's O(1).

Sligtly off topic, but looking up a value in a dictionary is actually
O(n) for all other entries in the dict which suffer a hash collision
with the searched entry.

True, a sensible choice of hash function will reduce n to 1 in common
cases, but it becomes an important consideration for larger datasets.

~Andrew
I'm glad you're wrong for CPython's dictionaries. The only time the
lookup would degenerate to O[n] would be if the hash table had only one
slot. CPython sensibly increases the hash table size when it becomes
too small for efficiency.


Where have you seen dictionaries so poorly implemented?
Different n, which I should have made more clear. I was using it for
consistency with O() notation. My statement was O(n) where n is the
number of hash collisions.
That's a little like doing a survey, and reporting the results as
showing that 100% of the women hit their husbands, among the population
of women who hit their husbands.

In your original message, you already stated the assumption that a
proper hash algorithm would be chosen, then went on to apparently claim
that large datasets would still have an order n problem. That last is
what I was challenging.

The rest of your message here refers to client code, not the system.
 
D

Dave Angel

O(n) for all other entries in the dict which suffer a hash collision
with the searched entry.

True, a sensible choice of hash function will reduce n to 1 in common
cases, but it becomes an important consideration for larger datasets.
I'm glad you're wrong for CPython's dictionaries. The only time the
lookup would degenerate to O[n] would be if the hash table had only one
slot. CPython sensibly increases the hash table size when it becomes
too small for efficiency.

Where have you seen dictionaries so poorly implemented?
In vanilla CPython up to version (I think) 3.3, where it's possible to
DoS the hash generator. Hash collisions are always possible, just
ridiculously unlikely unless deliberately exploited.

(And yes, I know an option was added to older versions to randomize
the hashes there too. It's not active by default, so "vanilla CPython"
is still vulnerable.)

ChrisA

Thank you to you and others, who have corrected my over-general
response. I was not intending to claim anything about either a
deliberate DOS, nor a foolishly chosen hash function. But the message I
was replying to seemed to me to claim that for large datasets, even a
good hash algorithm would end up giving O(n) performance.
 
T

Tim Chase

Now now gentlemen we're getting slightly off topic here and wouldn't
want to upset the people who insist on staying on topic. Or would we? :)

We apologise for the off-topicness in the thread. Those responsible
have been sacked...

-tkc
 
C

Chris Angelico

We apologise for the off-topicness in the thread. Those responsible
have been sacked...

So if you take every mapping variable in your program and name them
"dFoo", "dBar", "dQuux", etc, for "dict"... would that be a dirty
Hungarian dictionary?

Excuse me, I'll go and sack myself now.

ChrisA
 
8

88888 Dihedral

Andrew Cooperæ–¼ 2012å¹´8月10日星期五UTC+8上åˆ6時03分26秒寫é“:
Sligtly off topic, but looking up a value in a dictionary is actually

O(n) for all other entries in the dict which suffer a hash collision

with the searched entry.



True, a sensible choice of hash function will reduce n to 1 in common

cases, but it becomes an important consideration for larger datasets.



~Andrew

This is the classical problem of storing the hash collision items one by one.
Those items should be stored by some order.
 
M

Mark Lawrence

Sacked? They were beaten to death with a large halibut!

Well whatever you do *DON'T* mention Cython. I mentioned it just now but
I think I've got away with it.
 
8

88888 Dihedral

Dave Angelæ–¼ 2012å¹´8月10日星期五UTC+8上åˆ5時47分45秒寫é“:
Actually, they are different.
Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
Dict uses hashing to get a value from the dict and this is why it's O(1).



Sure, that's why



for key in dict:

print key[0], key[1], dict[key]



is probably slower than



for (edge1, edge2), cost in d.iteritems(): # or .items()

print edge1, edge2, cost





So, the latter is both faster and easier to read. Why are you arguing against it?



Also, please stop top-posting. It's impolite here, and makes it much harder to figure out who is saying what, in what order.







--



DaveA

OK, lets estimate the hash colision rate first.

For those items hashed to the same key, I'll store a sorted list with a
known lenth m to be accessed in O(LOG(M)).

Of couse another hash can be attatched.
 
8

88888 Dihedral

Dave Angelæ–¼ 2012å¹´8月10日星期五UTC+8上åˆ5時47分45秒寫é“:
Actually, they are different.
Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
Dict uses hashing to get a value from the dict and this is why it's O(1).



Sure, that's why



for key in dict:

print key[0], key[1], dict[key]



is probably slower than



for (edge1, edge2), cost in d.iteritems(): # or .items()

print edge1, edge2, cost





So, the latter is both faster and easier to read. Why are you arguing against it?



Also, please stop top-posting. It's impolite here, and makes it much harder to figure out who is saying what, in what order.







--



DaveA

OK, lets estimate the hash colision rate first.

For those items hashed to the same key, I'll store a sorted list with a
known lenth m to be accessed in O(LOG(M)).

Of couse another hash can be attatched.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,146
Messages
2,570,831
Members
47,374
Latest member
anuragag27

Latest Threads

Top