is implemented with id ?

R

Ramchandra Apte

I interpret this as meaning that "a == True" should be special-cased by

the interpreter as "a is True" instead of calling a.__eq__.

Steven you are right.
 
R

Ramchandra Apte

Seeing this thread, I think the is statment should be removed. It has a
replacement syntax of id(x) == id(y)



A terrible idea.



Because "is" is a keyword, it is implemented as a fast object comparison

directly in C (for CPython) or Java (for Jython). In the C implementation

"x is y" is *extremely* fast because it is just a pointer comparison

performed directly by the interpreter.



Because id() is a function, it is much slower. And because it is not a

keyword, Python needs to do a name look-up for it, then push the argument

on the stack, call the function (which may not even be the built-in id()

any more!) and then pop back to the caller.



And worst, *it doesn't even do what you think it does*. In some Python

implementations, IDs can be reused. That leads to code like this, from

CPython 2.7:



py> id("spam ham"[1:]) == id("foo bar"[1:])

True



You *cannot* replace is with id() except when the objects are guaranteed

to both be alive at the same time, and even then you *shouldn't* replace

is with id() because that is a pessimation (the opposite of an

optimization -- something that makes code run slower, not faster).




and "a==True" should be automatically changed into memory comparison.



Absolutely not. That would be a backward-incompatible change that would

break existing programs:



py> 1.0 == True

True

py> from decimal import Decimal

py> Decimal("1.0000") == True

True
the is statement could be made into a function
 
C

Chris Angelico

the is statement could be made into a function

It's not a statement, it's an operator; and functions have far more
overhead than direct operators. There's little benefit in making 'is'
into a function, and high cost; unlike 'print', whose cost is
dominated by the cost of producing output to a console or similar
device, 'is' would be dominated by the cost of name lookups and
function call overhead.

ChrisA
 
C

Chris Angelico

It is however perfectly possible for one object to be at two or more memory
addresses at the same time.

And of course, memory addresses have to be taken as per-process, since
it's entirely possible for two processes to reuse addresses. But I
think all these considerations of object identity are made with the
assumption that we're working within a single Python process.

ChrisA
 
R

Roy Smith

Steven D'Aprano said:
I interpret this as meaning that "a == True" should be special-cased by
the interpreter as "a is True" instead of calling a.__eq__.

That would break classes which provide their own __eq__() method.
 
A

Aahz

[got some free time, catching up to threads two months old]

Yes.

Keep in mind, though, that in some implementation (e.g. Jython), the
physical address may change during the life time of an object.

It's usually phrased as "a and b are the same object". If the object
is mutable, then changing a will also change b. If a and b aren't
mutable, then it doesn't really matter whether they share a physical
address.

That last sentence is not quite true. intern() is used to ensure that
strings share a physical address to save memory.
 
H

Hans Mulder

[got some free time, catching up to threads two months old]

Yes.

Keep in mind, though, that in some implementation (e.g. Jython), the
physical address may change during the life time of an object.

It's usually phrased as "a and b are the same object". If the object
is mutable, then changing a will also change b. If a and b aren't
mutable, then it doesn't really matter whether they share a physical
address.

That last sentence is not quite true. intern() is used to ensure that
strings share a physical address to save memory.

That's a matter of perspective: in my book, the primary advantage of
working with interned strings is that I can use 'is' rather than '=='
to test for equality if I know my strings are interned. The space
savings are minor; the time savings may be significant.

-- HansM
 
S

Steven D'Aprano

[got some free time, catching up to threads two months old]

Hans Mulder said:
On 5/09/12 15:19:47, Franck Ditter wrote:

- I should have said that I work with Python 3. Does that matter ? -
May I reformulate the queston : "a is b" and "id(a) == id(b)"
both mean : "a et b share the same physical address". Is that True
?

Yes.

Keep in mind, though, that in some implementation (e.g. Jython), the
physical address may change during the life time of an object.

It's usually phrased as "a and b are the same object". If the object
is mutable, then changing a will also change b. If a and b aren't
mutable, then it doesn't really matter whether they share a physical
address.

That last sentence is not quite true. intern() is used to ensure that
strings share a physical address to save memory.

That's a matter of perspective: in my book, the primary advantage of
working with interned strings is that I can use 'is' rather than '==' to
test for equality if I know my strings are interned. The space savings
are minor; the time savings may be significant.

Actually, for many applications, the space "savings" may actually be
*costs*, since interning forces Python to hold onto strings even after
they would normally be garbage collected. CPython interns strings that
look like identifiers. It really wouldn't be a good idea for it to
automatically intern every string.

You can make your own intern system with a simple dict:

interned_strings = {}

Then, for every string you care about, do:

s = interned_strings.set_default(s, s)

to ensure you are always working with a single string object for each
unique value. In some applications that will save time at the expense of
space.

And there is no need to write "is" instead of "==", because string
equality already optimizes the "strings are identical" case. By using ==,
you don't get into bad habits, you defend against the odd un-interned
string sneaking in, and you still have high speed equality tests.
 
R

Roy Smith

Hans Mulder said:
That's a matter of perspective: in my book, the primary advantage of
working with interned strings is that I can use 'is' rather than '=='
to test for equality if I know my strings are interned. The space
savings are minor; the time savings may be significant.

Depending on your problem domain, the space savings may be considerable.
 
C

Chris Angelico

On Sat, 03 Nov 2012 22:49:07 +0100, Hans Mulder wrote:
Actually, for many applications, the space "savings" may actually be
*costs*, since interning forces Python to hold onto strings even after
they would normally be garbage collected. CPython interns strings that
look like identifiers. It really wouldn't be a good idea for it to
automatically intern every string.

I don't know about that.

/* This dictionary holds all interned unicode strings. Note that references
to strings in this dictionary are *not* counted in the string's ob_refcnt.
When the interned string reaches a refcnt of 0 the string deallocation
function will delete the reference from this dictionary.

Another way to look at this is that to say that the actual reference
count of a string is: s->ob_refcnt + (s->state ? 2 : 0)
*/
static PyObject *interned;

Empirical testing (on a Linux 3.3a0 that I had lying around) showed
the process's memory usage drop, but I closed the terminal before
copying and pasting (oops). Attempting to recreate in IDLE on 3.2 on
Windows.
--> MemoryError. Blah. This is what I get for only having a gig and a
half in this laptop. And I was working with 1024*1024*1024 on the
other box. Start over...

Memory usage (according to Task Mangler) goes up to ~512MB when I
create a new string (like c), then back down to ~256MB when I intern
it. So far so good.

Memory usage has dropped to 12MB. Unnecessarily-interned strings don't
cost anything. (The source does refer to immortal interned strings,
but AFAIK you can't create them in user-level code. At least, I didn't
find it in help(sys.intern) which is the obvious place to look.)
You can make your own intern system with a simple dict:

interned_strings = {}

Then, for every string you care about, do:

s = interned_strings.set_default(s, s)

to ensure you are always working with a single string object for each
unique value. In some applications that will save time at the expense of
space.

Doing it manually like this _will_ leak like that, though, unless you
periodically check sys.getrefcount and dispose of unreferenced
entries.
And there is no need to write "is" instead of "==", because string
equality already optimizes the "strings are identical" case. By using ==,
you don't get into bad habits, you defend against the odd un-interned
string sneaking in, and you still have high speed equality tests.

This one I haven't checked the source for, but ISTR discussions on
this list about comparison of two unequal interned strings not being
optimized, so they'll end up being compared char-for-char. Using 'is'
guarantees that the check stops with identity. This may or may not be
significant, and as you say, defending against an uninterned string
slipping through is potentially critical.

ChrisA
 
O

Oscar Benjamin

This one I haven't checked the source for, but ISTR discussions on
this list about comparison of two unequal interned strings not being
optimized, so they'll end up being compared char-for-char. Using 'is'
guarantees that the check stops with identity. This may or may not be
significant, and as you say, defending against an uninterned string
slipping through is potentially critical.

The source is here (and it shows what you suggest):
http://hg.python.org/cpython/file/6c639a1ff53d/Objects/unicodeobject.c#l6128

Comparing strings char for char is really not that big a deal though.
This has been discussed before: you don't need to compare very many
characters to conclude that strings are unequal (if I remember
correctly you were part of that discussion).

I can imagine cases where I might consider using intern on lots of
strings to speed up comparisons but I would have to be involved in
some seriously heavy and obscure string processing problem before I
considered using 'is' to compare those interned strings. That is
confusing to anyone who reads the code, prone to bugs and unlikely to
achieve the desired outcome of speeding things up (noticeably).


Oscar
 
C

Chris Angelico

The source is here (and it shows what you suggest):
http://hg.python.org/cpython/file/6c639a1ff53d/Objects/unicodeobject.c#l6128

Comparing strings char for char is really not that big a deal though.
This has been discussed before: you don't need to compare very many
characters to conclude that strings are unequal (if I remember
correctly you were part of that discussion).

Yes, and a quite wide-ranging discussion it was too! What color did we
end up whitewashing that bikeshed? *whistles innocently*
I can imagine cases where I might consider using intern on lots of
strings to speed up comparisons but I would have to be involved in
some seriously heavy and obscure string processing problem before I
considered using 'is' to compare those interned strings. That is
confusing to anyone who reads the code, prone to bugs and unlikely to
achieve the desired outcome of speeding things up (noticeably).

Good point. It's still true that 'is' will be faster, it's just not worth it.

ChrisA
 
S

Steven D'Aprano

The source is here (and it shows what you suggest):
http://hg.python.org/cpython/file/6c639a1ff53d/Objects/
unicodeobject.c#l6128

I don't think it does, although I could be wrong, I find reading C to be
quite difficult.

The unicode_compare function compares character by character, true, but
it doesn't get called directly. The public interface is
PyUnicode_Compare, which includes this test before calling
unicode_compare:

/* Shortcut for empty or interned objects */
if (v == u) {
Py_DECREF(u);
Py_DECREF(v);
return 0;
}
result = unicode_compare(u, v);

where v and u are pointers to the unicode object.

So it appears that the test for strings being equal length have been
dropped, but the identity test is still present.
Comparing strings char for char is really not that big a deal though.

Depends on how big the string and where the first difference is.
This has been discussed before: you don't need to compare very many
characters to conclude that strings are unequal (if I remember correctly
you were part of that discussion).

On average. Worst case, you have to look at every character.
 
C

Chris Angelico

/* Shortcut for empty or interned objects */
if (v == u) {
Py_DECREF(u);
Py_DECREF(v);
return 0;
}
result = unicode_compare(u, v);

where v and u are pointers to the unicode object.

There's a shortcut if they're the same. There's no shortcut if they're
both interned and have different pointers, which is a guarantee that
they're distinct strings. They'll still be compared char-for-char
until there's a difference.

But it probably isn't enough of a performance penalty to be concerned
with. It's enough to technically prove the point that 'is' is faster
than '==' and is still safe if both strings are interned; it's not
enough to make 'is' better than '==', except in very specific
situations.

ChrisA
 
A

Aahz

Actually, for many applications, the space "savings" may actually be
*costs*, since interning forces Python to hold onto strings even after
they would normally be garbage collected.

That's old news, fixed in 2.5 or 2.6 IIRC -- interned strings now get
collected by refcounting like everything else.
 
A

Aahz

There's a shortcut if they're the same. There's no shortcut if they're
both interned and have different pointers, which is a guarantee that
they're distinct strings. They'll still be compared char-for-char
until there's a difference.

Without looking at the code, I'm pretty sure there's a hash check first.
 
A

Aahz

That's a matter of perspective: in my book, the primary advantage of
working with interned strings is that I can use 'is' rather than '=='
to test for equality if I know my strings are interned. The space
savings are minor; the time savings may be significant.

As others have pointed out, using ``is`` with strings is a Bad Habit
likely leading to nasty, hard-to-find bugs.

intern() costs time, but saves considerable space in any application
with lots of duplicate computed strings (hundreds of megabytes in some
cases).
 
8

88888 Dihedral

True. In principle, some day there might be a version of Python that runs

on some exotic quantum computer where the very concept of "physical

address" is meaningless. Or some sort of peptide or DNA computer, where

the calculations are performed via molecular interactions rather than by

flipping bits in fixed memory locations.



But less exotically, Frank isn't entirely wrong. With current day

computers, it is reasonable to say that any object has exactly one

physical location at any time. In Jython, objects can move around; in

CPython, they can't. But at any moment, any object has a specific

location, and no other object can have that same location. Two objects

cannot both be at the same memory address at the same time.



So, for current day computers at least, it is reasonable to say that

"a is b" implies that a and b are the same object at a single location.



The second half of the question is more complex:



"id(a) == id(b)" *only* implies that a and b are the same object at the

same location if they exist at the same time. If they don't exist at the

same time, then you can't conclude anything.
The function id(x) might not be implemented
as an address in the user space.

Do we need to distinguish archived objets and
objects in the memory?
 
H

Hans Mulder

Without looking at the code, I'm pretty sure there's a hash check first.

In 3.3, there is no such check.

It was recently proposed on python-dev to add such a check,
but AFAIK, no action was taken.

-- HansM
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,144
Messages
2,570,823
Members
47,369
Latest member
FTMZ

Latest Threads

Top