improvement for copy.deepcopy : no memo for immutable types

  • Thread starter Inquisitive Scientist
  • Start date
I

Inquisitive Scientist

I am having problems with running copy.deepcopy on very large data
structures containing lots of numeric data:

1. copy.deepcopy can be very slow
2. copy.deepcopy can cause memory errors even when I have plenty of
memory

I think the problem is that the current implementation keeps a memo
for everything it copies even immutable types. In addition to being
slow, this makes the memo dict grow very large when there is lots of
simple numeric data to be copied. For long running programs, large
memo dicts seem to cause memory fragmentation and result in memory
errors.

It seems like this could be easily fixed by adding the following lines
at the very start of the deepcopy function:

if isinstance(x, (type(None), int, long, float, bool, str)):
return x

This seems perfectly safe, should speed things up, keep the memo dict
smaller, and be easy to add. Can someone add this to copy.py or point
me to the proper procedure for requesting this change in copy.py?

Thanks,
-I.S.
 
S

Stefan Behnel

Inquisitive Scientist, 16.07.2010 14:45:
I am having problems with running copy.deepcopy on very large data
structures containing lots of numeric data:

1. copy.deepcopy can be very slow
2. copy.deepcopy can cause memory errors even when I have plenty of
memory

I think the problem is that the current implementation keeps a memo
for everything it copies even immutable types. In addition to being
slow, this makes the memo dict grow very large when there is lots of
simple numeric data to be copied. For long running programs, large
memo dicts seem to cause memory fragmentation and result in memory
errors.

It seems like this could be easily fixed by adding the following lines
at the very start of the deepcopy function:

if isinstance(x, (type(None), int, long, float, bool, str)):
return x

This seems perfectly safe, should speed things up, keep the memo dict
smaller, and be easy to add.

and - have you tried it?

Stefan
 
S

Steven D'Aprano

I am having problems with running copy.deepcopy on very large data
structures containing lots of numeric data: [...]
This seems perfectly safe, should speed things up, keep the memo dict
smaller, and be easy to add. Can someone add this to copy.py or point me
to the proper procedure for requesting this change in copy.py?

These are the minimum steps you can take:

(1) Go to the Python bug tracker: http://bugs.python.org/

(2) If you don't already have one, create an account.

(3) Create a new bug report, explaining why you think deepcopy is buggy,
the nature of the bug, and your suggested fix.

If you do so, it might be a good idea to post a link to the bug here, for
interested people to follow up.

However doing the minimum isn't likely to be very useful. Python is
maintained by volunteers, and there are more bugs than person-hours
available to fix them. Consequently, unless a bug is serious, high-
profile, or affects a developer personally, it is likely to be ignored.
Sometimes for years. Sad but true.

You can improve the odds of having the bug (assuming you are right that
it is a bug) fixed by doing more than the minimum. The more of these you
can do, the better the chances:

(4) Create a test that fails with the current code, following the
examples in the standard library tests. Confirm that it fails with the
existing module.

(5) Patch the copy module to fix the bug. Confirm that the new test
passes with your patch, and that you don't cause any regressions (failed
tests).

(6) Create a patch file that adds the new test and the patch. Upload it
to the bug tracker.

There's no point in writing the patch for Python 2.5 or 3.0, don't waste
your time. Version 2.6 *might* be accepted. 2.7 and/or 3.1 should be,
provided people agree that it is a bug.

If you do all these things -- demonstrate successfully that this is a
genuine bug, create a test for it, and fix the bug without breaking
anything else, then you have a good chance of having the fix accepted.

Good luck! Your first patch is always the hardest.
 
M

Mark Lawrence

On 16/07/2010 14:59, Steven D'Aprano wrote:

[snip]
However doing the minimum isn't likely to be very useful. Python is
maintained by volunteers, and there are more bugs than person-hours
available to fix them. Consequently, unless a bug is serious, high-
profile, or affects a developer personally, it is likely to be ignored.
Sometimes for years. Sad but true.

To give people an idea, here's the weekly Summary of Python tracker
Issues on python-dev and timed at 17:07 today.

"
2807 open (+44) / 18285 closed (+18) / 21092 total (+62)

Open issues with patches: 1144

Average duration of open issues: 703 days.
Median duration of open issues: 497 days.

Open Issues Breakdown
open 2765 (+42)
languishing 14 ( +0)
pending 27 ( +2)

Issues Created Or Reopened (64)
"

I've spent a lot of time helping out in the last few weeks on the issue
tracker. The oldest open issue I've come across was dated 2001, and
there could be older. Unless more volunteers come forward, particularly
to do patch reviews or similar, the situation as I see it can only get
worse.

Kindest regards.

Mark Lawrence.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,810
Latest member
Kassie0918

Latest Threads

Top