PEP 3147 - new .pyc format

B

Benjamin Peterson

Sean DiZazzo said:
Does "magic" really need to be used? Why not just use the revision
number?

Because magic is easier and otherwise CPython developers would have to rebuild
their pycs everytime their working copy was updated.
 
S

Steven D'Aprano

Based on the magic numbers I've seen so far it looks like that not an
option. They increment with every minor change.

They increment with every *incompatible* change to the marshal format,
not every change to the compiler.
So to me, at this moment
(and maybe it's my ignorance) it looks like a made up example to justify
what to me still looks like a bad decision.

Of course it's a made-up example. But with Python now entering a period
where there is a moratorium on changes to the language, is it really so
difficult to imagine that the marshal format will settle down for a while
even as the standard library goes through upgrades?
 
S

Steven D'Aprano

Ugh... That would mean that for an application using, say 20
files,
one now has 20 subdirectories for what, in a lot of cases, will contain
just one file each (and since I doubt older Python's will be modified to
support this scheme, it will only be applicable to 3.x, and maybe a
2.7?)

If you only use one version of Python, then don't run it with the -R
switch.

Have you read the PEP? It is quite explicit that the default behaviour
of .pyc files will remain unchanged, that to get the proposed behaviour
you have to specifically ask for it.
 
D

Daniel Fetchinson

PEP 3147 has just been posted, proposing that, beginning in release
3.2 (and possibly 2.7) compiled .pyc and .pyo files be placed in a
directory with a .pyr extension. The reason is so that compiled
versions of a program can coexist, which isn't possible now.

Frankly, I think this is a really good idea, although I've got a few
comments.

1. Apple's MAC OS X should be mentioned, since 10.5 (and presumably
10.6) ship with both Python release 2.3 and 2.5 installed.

2. I think the proposed logic is too complex. If this is installed in
3.2, then that release should simply store its .pyc file in the .pyr
directory, without the need for either a command line switch or an
environment variable (both are mentioned in the PEP.)

3. Tool support. There are tools that look for the .pyc files; these
need to be upgraded somehow. The ones that ship with Python should, of
course, be fixed with the PEP, but there are others.

4. I'm in favor of putting the source in the .pyr directory as well,
but that's got a couple more issues. One is tool support, which is
likely to be worse for source, and the other is some kind of algorithm
for identifying which source goes with which object.

I also think the PEP is a great idea and proposes a solution to a real
problem. But I also hear the 'directory clutter' argument and I'm
really concerned too, having all these extra directories around (and
quite a large number of them indeed!). How about this scheme:

1. install python source files to a shared (among python
installations) location /this/is/shared
2. when python X.Y imports a source file from /this/is/shared it will
create pyc files in its private area /usr/lib/pythonX.Y/site-packages/
Time comparison would be between /this/is/shared/x.py and
/usr/lib/pythonX.Y/site-packages/x.pyc, for instance.

Obviously pythonX.Y needs to know the path to /this/is/shared so it
can import modules from there, but this can be controlled by an
environment variable. There would be only .py files in
/this/is/shared.

Linux distro packagers would only offer a single python-myapp to
install and it would only contain python source, and the
version-specific pyc files would be created the first time the
application is used by python. In /usr/lib/pythonX.Y/site-packages
there would be only pyc files with magic number matching python X.Y.

So, basically nothing would change only the location of py and pyc
files would be different from current behavior, but the same algorithm
would be run to determine which one to load, when to create a pyc
file, when to ignore the old one, etc.

What would be wrong with this setup?

Cheers,
Daniel
 
S

Steven D'Aprano

I also think the PEP is a great idea and proposes a solution to a real
problem. But I also hear the 'directory clutter' argument and I'm really
concerned too, having all these extra directories around (and quite a
large number of them indeed!).

Keep in mind that if you don't explicitly ask for the proposed feature,
you won't see any change at all. You need to run Python with the -R
switch, or set an environment variable. The average developer won't see
any clutter at all unless she is explicitly supporting multiple versions.


How about this scheme:

1. install python source files to a shared (among python installations)
location /this/is/shared
2. when python X.Y imports a source file from /this/is/shared it will
create pyc files in its private area /usr/lib/pythonX.Y/site-packages/

$ touch /usr/lib/python2.5/site-packages/STEVEN
touch: cannot touch `/usr/lib/python2.5/site-packages/STEVEN': Permission
denied

There's your first problem: most users don't have write-access to the
private area. When you install a package, you normally do so as root, and
it all works. When you import a module and it gets compiled as a .pyc
file, you're generally running as a regular user.

Time comparison would be between /this/is/shared/x.py and
/usr/lib/pythonX.Y/site-packages/x.pyc, for instance.

I don't quite understand what you mean by "time comparison".


[...]
In /usr/lib/pythonX.Y/site-packages there would be only pyc files with
magic number matching python X.Y.

Personally, I think it is a terribly idea to keep the source file and
byte code file in such radically different places. They should be kept
together. What you call "clutter" I call having the files that belong
together kept together.


So, basically nothing would change only the location of py and pyc files
would be different from current behavior, but the same algorithm would
be run to determine which one to load, when to create a pyc file, when
to ignore the old one, etc.

What happens when there is a .pyc file in the same location as the .py
file? Because it *will* happen. Does it get ignored, or does it take
precedence over the site specific file?

Given:

./module.pyc
/usr/lib/pythonX.Y/site-packages/module.pyc

and you execute "import module", which gets used? Note that in this
situation, there may or may not be a module.py file.

What would be wrong with this setup?

Consider:

./module.py
./package/module.py

Under your suggestion, both of these will compile to

/usr/lib/pythonX.Y/site-packages/module.pyc
 
D

Daniel Fetchinson

I also think the PEP is a great idea and proposes a solution to a real
Keep in mind that if you don't explicitly ask for the proposed feature,
you won't see any change at all. You need to run Python with the -R
switch, or set an environment variable. The average developer won't see
any clutter at all unless she is explicitly supporting multiple versions.




$ touch /usr/lib/python2.5/site-packages/STEVEN
touch: cannot touch `/usr/lib/python2.5/site-packages/STEVEN': Permission
denied

There's your first problem: most users don't have write-access to the
private area.

True, I haven't thought about that (I should have though).
When you install a package, you normally do so as root, and
it all works. When you import a module and it gets compiled as a .pyc
file, you're generally running as a regular user.



I don't quite understand what you mean by "time comparison".

I meant the comparison of timestamps on .py and .pyc files in order to
determine which is newer and if a recompilation should take place or
not.
[...]
In /usr/lib/pythonX.Y/site-packages there would be only pyc files with
magic number matching python X.Y.

Personally, I think it is a terribly idea to keep the source file and
byte code file in such radically different places. They should be kept
together. What you call "clutter" I call having the files that belong
together kept together.

I see why you think so, it's reasonable, however there is compelling
argument, I think, for the opposite view: namely to keep things
separate. An average developer definitely wants easy access to .py
files. However I see no good reason for having access to .pyc files. I
for one have never inspected a .pyc file. Why would you want to have a
..pyc file at hand?

If we don't really want to have .pyc files in convenient locations
because we (almost) never want to access them really, then I'd say
it's a good idea to keep them totally separate and so make don't get
in the way.
What happens when there is a .pyc file in the same location as the .py
file? Because it *will* happen. Does it get ignored, or does it take
precedence over the site specific file?

Given:

./module.pyc
/usr/lib/pythonX.Y/site-packages/module.pyc

and you execute "import module", which gets used? Note that in this
situation, there may or may not be a module.py file.



Consider:

./module.py
./package/module.py

Under your suggestion, both of these will compile to

/usr/lib/pythonX.Y/site-packages/module.pyc

I see the problems with my suggestion. However it would be great if in
some other way the .pyc files could be kept out of the way. Granted, I
don't have a good proposal for this.

Cheers,
Daniel
 
S

Steven D'Aprano

I see why you think so, it's reasonable, however there is compelling
argument, I think, for the opposite view: namely to keep things
separate. An average developer definitely wants easy access to .py
files. However I see no good reason for having access to .pyc files. I
for one have never inspected a .pyc file. Why would you want to have a
.pyc file at hand?

If you don't care about access to .pyc files, why do you care where they
are? If they are in a subdirectory module.pyr, then shrug and ignore the
subdirectory.

If you (generic you) are one of those developers who don't care
about .pyc files, then when you are browsing your source directory and
see this:


module.py
module.pyc

you just ignore the .pyc file. Or delete it, and Python will re-create it
as needed. So if you see

module.pyr/

just ignore that as well.


If we don't really want to have .pyc files in convenient locations
because we (almost) never want to access them really, then I'd say it's
a good idea to keep them totally separate and so make don't get in the
way.

I like seeing them in the same place as the source file, because when I
start developing a module, I often end up renaming it multiple times
before it settles on a final name. When I rename or move it, I delete
the .pyc file, and that ensures that if I miss changing an import, and
try to import the old name, it will fail.

By hiding the .pyc file elsewhere, it is easy to miss deleting one, and
then the import won't fail, it will succeed, but use the old, obsolete
byte code.
 
D

Daniel Fetchinson

Personally, I think it is a terribly idea to keep the source file and
If you don't care about access to .pyc files, why do you care where they
are? If they are in a subdirectory module.pyr, then shrug and ignore the
subdirectory.

If you (generic you) are one of those developers who don't care
about .pyc files, then when you are browsing your source directory and
see this:


module.py
module.pyc

you just ignore the .pyc file. Or delete it, and Python will re-create it
as needed. So if you see

module.pyr/

just ignore that as well.




I like seeing them in the same place as the source file, because when I
start developing a module, I often end up renaming it multiple times
before it settles on a final name. When I rename or move it, I delete
the .pyc file, and that ensures that if I miss changing an import, and
try to import the old name, it will fail.

By hiding the .pyc file elsewhere, it is easy to miss deleting one, and
then the import won't fail, it will succeed, but use the old, obsolete
byte code.


Okay, I see your point but I think your argument about importing shows
that python is doing something suboptimal because I have to worry
about .pyc files. Ideally, I only would need to worry about python
source files. There is now a chance to 'fix' (quotation marks because
maybe there is nothing to fix, according to some) this issue and make
all pyc files go away and having python magically doing the right
thing. A central pyc repository would be something I was thinking
about, but I admit it's a half baked or not even that, probably
quarter baked idea.

Cheers,
Daniel
 
S

Steven D'Aprano

Okay, I see your point but I think your argument about importing shows
that python is doing something suboptimal because I have to worry about
.pyc files. Ideally, I only would need to worry about python source
files.

That's no different from any language that is compiled: you have to worry
about keeping the compiled code (byte code or machine language) in sync
with the source code.

Python does most of that for you: it automatically recompiles the source
whenever the source code's last modified date stamp is newer than that of
the byte code. So to a first approximation you can forget all about
the .pyc files and just care about the source.

But that's only a first approximation. You might care about the .pyc
files if:

(1) you want to distribute your application in a non-human readable
format;

(2) if you care about clutter in your file system;

(3) if you suspect a bug in the compiler;

(4) if you are working with byte-code hacks;

(5) if the clock on your PC is wonky;

(6) if you leave random .pyc files floating around earlier in the
PYTHONPATH than your source files;

etc.



There is now a chance to 'fix' (quotation marks because maybe
there is nothing to fix, according to some) this issue and make all pyc
files go away and having python magically doing the right thing.

Famous last words...

The only ways I can see to have Python magically do the right thing in
all cases would be:

(1) Forget about byte-code compiling, and just treat Python as a purely
interpreted language. If you think Python is slow now...

(2) Compile as we do now, but only keep the byte code in memory. This
would avoid all worries about scattered .pyc files, but would slow Python
down significantly *and* reduce functionality (e.g. losing the ability to
distribute non-source files).

Neither of these are seriously an option.

A
central pyc repository would be something I was thinking about, but I
admit it's a half baked or not even that, probably quarter baked idea.

A central .pyc repository doesn't eliminate the issues developers may
have with byte code files, it just puts them somewhere else, out of
sight, where they are more likely to bite.
 
D

Daniel Fetchinson

I like seeing them in the same place as the source file, because when I
That's no different from any language that is compiled: you have to worry
about keeping the compiled code (byte code or machine language) in sync
with the source code.
True.

Python does most of that for you: it automatically recompiles the source
whenever the source code's last modified date stamp is newer than that of
the byte code. So to a first approximation you can forget all about
the .pyc files and just care about the source.

True, but the .pyc file is lying around and I always have to do 'ls
-al | grep -v pyc' in my python source directory.
But that's only a first approximation. You might care about the .pyc
files if:

(1) you want to distribute your application in a non-human readable
format;

Sure, I do care about pyc files, of course, I just would prefer to
have them at a separate location.
(2) if you care about clutter in your file system;

You mean having an extra directory structure for the pyc files? This I
think would be better than having the pyc files in the source
directory, but we are getting into 'gut feelings' territory :)
(3) if you suspect a bug in the compiler;

If the pyc files are somewhere else you can still inspect them if you want.
(4) if you are working with byte-code hacks;

Again, just because they are somewhere else doesn't mean you can't get to them.
(5) if the clock on your PC is wonky;

Same as above.
(6) if you leave random .pyc files floating around earlier in the
PYTHONPATH than your source files;

etc.





Famous last words...

The only ways I can see to have Python magically do the right thing in
all cases would be:

(1) Forget about byte-code compiling, and just treat Python as a purely
interpreted language. If you think Python is slow now...

I'm not advocating this option, naturally.
(2) Compile as we do now, but only keep the byte code in memory. This
would avoid all worries about scattered .pyc files, but would slow Python
down significantly *and* reduce functionality (e.g. losing the ability to
distribute non-source files).

I'm not advocating this option either.
Neither of these are seriously an option.
Agreed.


A central .pyc repository doesn't eliminate the issues developers may
have with byte code files, it just puts them somewhere else, out of
sight, where they are more likely to bite.

Here is an example: shared object files. If your code needs them, you
can use them easily, you can access them easily if you want to, but
they are not in the directory where you keep your C files. They are
somewhere in /usr/lib for example, where they are conveniently
collected, you can inspect them, look at them, distribute them, do
basically whatever you want, but they are out of the way, and 99% of
the time while you develop your code, you don't need them. In the 1%
of the case you can easily get at them in the centralized location,
/usr/lib in our example.

Of course the relationship between C source files and shared objects
is not parallel to the relationship to python source files and the
created pyc files, please don't nitpick on this point. The analogy is
in the sense that your project inevitable needs for whatever reason
some binary files which are rarely needed at hand, only the
linker/compiler/interpreter/etc needs to know where they are. These
files can be stored separately, but at a location where one can
inspect them if needed (which rarely happens).

Cheers,
Daniel
 
S

Steven D'Aprano

On Wed, 03 Feb 2010 11:55:57 +0100, Daniel Fetchinson wrote:

[...]
True, but the .pyc file is lying around and I always have to do 'ls -al
| grep -v pyc' in my python source directory.


So alias a one-word name to that :)


[...]
Here is an example: shared object files. If your code needs them, you
can use them easily, you can access them easily if you want to, but they
are not in the directory where you keep your C files. They are somewhere
in /usr/lib for example, where they are conveniently collected, you can
inspect them, look at them, distribute them, do basically whatever you
want, but they are out of the way, and 99% of the time while you develop
your code, you don't need them. In the 1% of the case you can easily get
at them in the centralized location, /usr/lib in our example.

Of course the relationship between C source files and shared objects is
not parallel to the relationship to python source files and the created
pyc files, please don't nitpick on this point. The analogy is in the
sense that your project inevitable needs for whatever reason some binary
files which are rarely needed at hand, only the
linker/compiler/interpreter/etc needs to know where they are. These
files can be stored separately, but at a location where one can inspect
them if needed (which rarely happens).

I'll try not to nit-pick :)

When an object file is in /usr/lib, you're dealing with it as a user.
You, or likely someone else, have almost certainly compiled it in a
different directory and then used make to drop it in place. It's now a
library, you're a user of that library, and you don't care where the
object file is so long as your app can find it (until you have a
conflict, and then you do).

While you are actively developing the library, on the other hand, the
compiler typically puts the object file in the same directory as the
source file. (There may be an option to gcc to do otherwise, but surely
most people don't use it often.) While the library is still being
actively developed, the last thing you want is for the object file to be
placed somewhere other than in your working directory. A potentially
unstable or broken library could end up in /usr/lib and stomp all over a
working version. Even if it doesn't, it means you have to be flipping
backwards and forwards between two locations to get anything done.

Python development is much the same, the only(?) differences are that we
have a lower threshold between "in production" and "in development", and
that we typically install both the source and the binary instead of just
the binary.

When you are *using* a library/script/module, you don't care whether
import uses the .py file or the .pyc, and you don't care where they are,
so long as they are in your PYTHONPATH (and there are no conflicts). But
I would argue that while you are *developing* the module, it would more
nuisance than help to have the .pyc file anywhere other than immediately
next to the .py file (either in the same directory, or in a clearly named
sub-directory).
 
D

Daniel Fetchinson

Python does most of that for you: it automatically recompiles the
True, but the .pyc file is lying around and I always have to do 'ls -al
| grep -v pyc' in my python source directory.


So alias a one-word name to that :)


[...]
Here is an example: shared object files. If your code needs them, you
can use them easily, you can access them easily if you want to, but they
are not in the directory where you keep your C files. They are somewhere
in /usr/lib for example, where they are conveniently collected, you can
inspect them, look at them, distribute them, do basically whatever you
want, but they are out of the way, and 99% of the time while you develop
your code, you don't need them. In the 1% of the case you can easily get
at them in the centralized location, /usr/lib in our example.

Of course the relationship between C source files and shared objects is
not parallel to the relationship to python source files and the created
pyc files, please don't nitpick on this point. The analogy is in the
sense that your project inevitable needs for whatever reason some binary
files which are rarely needed at hand, only the
linker/compiler/interpreter/etc needs to know where they are. These
files can be stored separately, but at a location where one can inspect
them if needed (which rarely happens).

I'll try not to nit-pick :)

When an object file is in /usr/lib, you're dealing with it as a user.
You, or likely someone else, have almost certainly compiled it in a
different directory and then used make to drop it in place. It's now a
library, you're a user of that library, and you don't care where the
object file is so long as your app can find it (until you have a
conflict, and then you do).

While you are actively developing the library, on the other hand, the
compiler typically puts the object file in the same directory as the
source file. (There may be an option to gcc to do otherwise, but surely
most people don't use it often.) While the library is still being
actively developed, the last thing you want is for the object file to be
placed somewhere other than in your working directory. A potentially
unstable or broken library could end up in /usr/lib and stomp all over a
working version. Even if it doesn't, it means you have to be flipping
backwards and forwards between two locations to get anything done.

Python development is much the same, the only(?) differences are that we
have a lower threshold between "in production" and "in development", and
that we typically install both the source and the binary instead of just
the binary.

When you are *using* a library/script/module, you don't care whether
import uses the .py file or the .pyc, and you don't care where they are,
so long as they are in your PYTHONPATH (and there are no conflicts). But
I would argue that while you are *developing* the module, it would more
nuisance than help to have the .pyc file anywhere other than immediately
next to the .py file (either in the same directory, or in a clearly named
sub-directory).

Okay, I think we got to a point where it's more about rationalizing
gut feelings than factual stuff. But that's okay,
system/language/architecure design is often times more about gut
feelings than facts so nothing to be too surprised about :)

Cheers,
Daniel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top