R
Ramchandra Apte
I interpret this as meaning that "a == True" should be special-cased by
the interpreter as "a is True" instead of calling a.__eq__.
Steven you are right.
I interpret this as meaning that "a == True" should be special-cased by
the interpreter as "a is True" instead of calling a.__eq__.
the is statement could be made into a functionSeeing this thread, I think the is statment should be removed. It has areplacement syntax of id(x) == id(y)
A terrible idea.
Because "is" is a keyword, it is implemented as a fast object comparison
directly in C (for CPython) or Java (for Jython). In the C implementation
"x is y" is *extremely* fast because it is just a pointer comparison
performed directly by the interpreter.
Because id() is a function, it is much slower. And because it is not a
keyword, Python needs to do a name look-up for it, then push the argument
on the stack, call the function (which may not even be the built-in id()
any more!) and then pop back to the caller.
And worst, *it doesn't even do what you think it does*. In some Python
implementations, IDs can be reused. That leads to code like this, from
CPython 2.7:
py> id("spam ham"[1:]) == id("foo bar"[1:])
True
You *cannot* replace is with id() except when the objects are guaranteed
to both be alive at the same time, and even then you *shouldn't* replace
is with id() because that is a pessimation (the opposite of an
optimization -- something that makes code run slower, not faster).
and "a==True" should be automatically changed into memory comparison.
Absolutely not. That would be a backward-incompatible change that would
break existing programs:
py> 1.0 == True
True
py> from decimal import Decimal
py> Decimal("1.0000") == True
True
the is statement could be made into a function
It is however perfectly possible for one object to be at two or more memory
addresses at the same time.
Steven D'Aprano said:I interpret this as meaning that "a == True" should be special-cased by
the interpreter as "a is True" instead of calling a.__eq__.
That would break classes which provide their own __eq__() method.
Yes.
Keep in mind, though, that in some implementation (e.g. Jython), the
physical address may change during the life time of an object.
It's usually phrased as "a and b are the same object". If the object
is mutable, then changing a will also change b. If a and b aren't
mutable, then it doesn't really matter whether they share a physical
address.
[got some free time, catching up to threads two months old]
Yes.
Keep in mind, though, that in some implementation (e.g. Jython), the
physical address may change during the life time of an object.
It's usually phrased as "a and b are the same object". If the object
is mutable, then changing a will also change b. If a and b aren't
mutable, then it doesn't really matter whether they share a physical
address.
That last sentence is not quite true. intern() is used to ensure that
strings share a physical address to save memory.
[got some free time, catching up to threads two months old]
Hans Mulder said:On 5/09/12 15:19:47, Franck Ditter wrote:
- I should have said that I work with Python 3. Does that matter ? -
May I reformulate the queston : "a is b" and "id(a) == id(b)"
both mean : "a et b share the same physical address". Is that True
?
Yes.
Keep in mind, though, that in some implementation (e.g. Jython), the
physical address may change during the life time of an object.
It's usually phrased as "a and b are the same object". If the object
is mutable, then changing a will also change b. If a and b aren't
mutable, then it doesn't really matter whether they share a physical
address.
That last sentence is not quite true. intern() is used to ensure that
strings share a physical address to save memory.
That's a matter of perspective: in my book, the primary advantage of
working with interned strings is that I can use 'is' rather than '==' to
test for equality if I know my strings are interned. The space savings
are minor; the time savings may be significant.
Hans Mulder said:That's a matter of perspective: in my book, the primary advantage of
working with interned strings is that I can use 'is' rather than '=='
to test for equality if I know my strings are interned. The space
savings are minor; the time savings may be significant.
On Sat, 03 Nov 2012 22:49:07 +0100, Hans Mulder wrote:
Actually, for many applications, the space "savings" may actually be
*costs*, since interning forces Python to hold onto strings even after
they would normally be garbage collected. CPython interns strings that
look like identifiers. It really wouldn't be a good idea for it to
automatically intern every string.
You can make your own intern system with a simple dict:
interned_strings = {}
Then, for every string you care about, do:
s = interned_strings.set_default(s, s)
to ensure you are always working with a single string object for each
unique value. In some applications that will save time at the expense of
space.
And there is no need to write "is" instead of "==", because string
equality already optimizes the "strings are identical" case. By using ==,
you don't get into bad habits, you defend against the odd un-interned
string sneaking in, and you still have high speed equality tests.
This one I haven't checked the source for, but ISTR discussions on
this list about comparison of two unequal interned strings not being
optimized, so they'll end up being compared char-for-char. Using 'is'
guarantees that the check stops with identity. This may or may not be
significant, and as you say, defending against an uninterned string
slipping through is potentially critical.
The source is here (and it shows what you suggest):
http://hg.python.org/cpython/file/6c639a1ff53d/Objects/unicodeobject.c#l6128
Comparing strings char for char is really not that big a deal though.
This has been discussed before: you don't need to compare very many
characters to conclude that strings are unequal (if I remember
correctly you were part of that discussion).
I can imagine cases where I might consider using intern on lots of
strings to speed up comparisons but I would have to be involved in
some seriously heavy and obscure string processing problem before I
considered using 'is' to compare those interned strings. That is
confusing to anyone who reads the code, prone to bugs and unlikely to
achieve the desired outcome of speeding things up (noticeably).
unicodeobject.c#l6128The source is here (and it shows what you suggest):
http://hg.python.org/cpython/file/6c639a1ff53d/Objects/
Comparing strings char for char is really not that big a deal though.
This has been discussed before: you don't need to compare very many
characters to conclude that strings are unequal (if I remember correctly
you were part of that discussion).
/* Shortcut for empty or interned objects */
if (v == u) {
Py_DECREF(u);
Py_DECREF(v);
return 0;
}
result = unicode_compare(u, v);
where v and u are pointers to the unicode object.
Actually, for many applications, the space "savings" may actually be
*costs*, since interning forces Python to hold onto strings even after
they would normally be garbage collected.
There's a shortcut if they're the same. There's no shortcut if they're
both interned and have different pointers, which is a guarantee that
they're distinct strings. They'll still be compared char-for-char
until there's a difference.
That's a matter of perspective: in my book, the primary advantage of
working with interned strings is that I can use 'is' rather than '=='
to test for equality if I know my strings are interned. The space
savings are minor; the time savings may be significant.
The function id(x) might not be implementedTrue. In principle, some day there might be a version of Python that runs
on some exotic quantum computer where the very concept of "physical
address" is meaningless. Or some sort of peptide or DNA computer, where
the calculations are performed via molecular interactions rather than by
flipping bits in fixed memory locations.
But less exotically, Frank isn't entirely wrong. With current day
computers, it is reasonable to say that any object has exactly one
physical location at any time. In Jython, objects can move around; in
CPython, they can't. But at any moment, any object has a specific
location, and no other object can have that same location. Two objects
cannot both be at the same memory address at the same time.
So, for current day computers at least, it is reasonable to say that
"a is b" implies that a and b are the same object at a single location.
The second half of the question is more complex:
"id(a) == id(b)" *only* implies that a and b are the same object at the
same location if they exist at the same time. If they don't exist at the
same time, then you can't conclude anything.
Without looking at the code, I'm pretty sure there's a hash check first.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.