MemoryError in data conversion

M

Mok-Kong Shen

The code attached below produces in one of the two IMHO similar cases
(excepting the sizes of the lists involved) MemoryError. Could experts
kindly tell why that's so and whether there is any work-around feasible.

Thanks in advances.

M. K. Shen

-----------------------------------------------------------------

import ast

def buildhuffmantree(slist,flist):
item=slist[:]
freq=flist[:]
while len(item)>2:
mn=min(freq)
id=freq.index(mn)
u=item[id]
del item[id]
del freq[id]
mn1=min(freq)
id=freq.index(mn1)
v=item[id]
del item[id]
del freq[id]
item.append([u,v])
freq.append(mn+mn1)
return(item)

def processing(slist,flist):
bintree=buildhuffmantree(slist,flist)
print(bintree)
byarray=bytearray(str(bintree),"latin-1")
bintree1=ast.literal_eval(byarray.decode("latin-1"))
print(bintree1)
print(bintree==bintree1)

slist1=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 'eof']

flist1=[18, 16, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, -1]

slist2=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110,
111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123,
124, 125, 126, 127, 'eof']

flist2=[2, 2, 0, 2, 0, 0, 1, 2, 1, 0, 2, 0, 0, 1, 1, 0, 2, 0, 0, 0, 1,
0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 2, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, -1]

processing(slist1,flist1) ### This works fine
print()

processing(slist2,flist2) ### This leads to MemoryError
 
D

dieter

Mok-Kong Shen said:
The code attached below produces in one of the two IMHO similar cases
(excepting the sizes of the lists involved) MemoryError. Could experts
kindly tell why that's so and whether there is any work-around feasible.

"MemoryError" means: the Python process wants more memory from the
operating system than this can give.

Your options:

* increase the memory resources (RAM, swap space) of your system

* check the memory related configuration of your operating system
(there may be a limit for memory allocated to processes -
try to increase this)

* change your algorithm such that less memory is needed
 
P

Peter Otten

Mok-Kong Shen said:
The code attached below produces in one of the two IMHO similar cases
(excepting the sizes of the lists involved) MemoryError. Could experts
kindly tell why that's so and whether there is any work-around feasible.

Here's a simpler way to reproduce the error:
.... return "[" * n + "42" + "]" * n
....
ast.literal_eval(nested_list_literal(98)) [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[42]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
ast.literal_eval(nested_list_literal(99))
s_push: parser stack overflow
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.3/ast.py", line 47, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/usr/lib/python3.3/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
MemoryError

You ran into a limitation of the compiler. For us to suggest a workaround
you'd have to explain why you want to convert the list returned from
buildhuffmantree() into python source code and back.
Thanks in advances.

M. K. Shen

-----------------------------------------------------------------

import ast

def buildhuffmantree(slist,flist):
item=slist[:]
freq=flist[:]
while len(item)>2:
mn=min(freq)
id=freq.index(mn)
u=item[id]
del item[id]
del freq[id]
mn1=min(freq)
id=freq.index(mn1)
v=item[id]
del item[id]
del freq[id]
item.append([u,v])
freq.append(mn+mn1)
return(item)

def processing(slist,flist):
bintree=buildhuffmantree(slist,flist)
print(bintree)
byarray=bytearray(str(bintree),"latin-1")
bintree1=ast.literal_eval(byarray.decode("latin-1"))
print(bintree1)
print(bintree==bintree1)

slist1=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 'eof']

flist1=[18, 16, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, -1]

slist2=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110,
111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123,
124, 125, 126, 127, 'eof']

flist2=[2, 2, 0, 2, 0, 0, 1, 2, 1, 0, 2, 0, 0, 1, 1, 0, 2, 0, 0, 0, 1,
0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 2, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, -1]

processing(slist1,flist1) ### This works fine
print()

processing(slist2,flist2) ### This leads to MemoryError
 
M

Mok-Kong Shen

Am 14.04.2014 09:46, schrieb Peter Otten:
You ran into a limitation of the compiler. For us to suggest a workaround
you'd have to explain why you want to convert the list returned from
buildhuffmantree() into python source code and back.

That list gives the Huffman encoding tree for compressing a given piece
of source text. I am writing a Python code to implement an algorithm
(not new, being first sketched in the literature since decades but yet
having no publically available implementation as far as I am aware) of
encryption processing that has Huffman data compression as its major
constituent. Now, for good security against cryptanalysis, this list
(which has to be included in the ciphertext for decryption by the
recipient) has to be well scrambled in some way. I choose to use 8-bit
bytes as units for the scrambling. Hence I convert the list to a
bytearray for performing scrambling. On decryption I reverse the
scrambling and get back the original bytearray and use ast to recover
from it the list so as to be able to do the decompression. Hopefully
this description is sufficiently clear.

M. K. Shen
 
D

Dave Angel

Mok-Kong Shen said:
The code attached below produces in one of the two IMHO similar cases
(excepting the sizes of the lists involved) MemoryError. Could experts
kindly tell why that's so and whether there is any work-around feasible.

Where's your stack trace for the error? If it happens that it
gets the error in the Ast call, then examine byarray. I expect
it's too large or too complex for the compiler.
 
P

Peter Otten

Mok-Kong Shen said:
Am 14.04.2014 09:46, schrieb Peter Otten:


That list gives the Huffman encoding tree for compressing a given piece
of source text. I am writing a Python code to implement an algorithm
(not new, being first sketched in the literature since decades but yet
having no publically available implementation as far as I am aware) of
encryption processing that has Huffman data compression as its major
constituent. Now, for good security against cryptanalysis, this list
(which has to be included in the ciphertext for decryption by the
recipient) has to be well scrambled in some way. I choose to use 8-bit
bytes as units for the scrambling. Hence I convert the list to a
bytearray for performing scrambling. On decryption I reverse the
scrambling and get back the original bytearray and use ast to recover
from it the list so as to be able to do the decompression. Hopefully
this description is sufficiently clear.

You could use json, but you may run into the same problem with that, too
(only later):
import json
items = []
for i in range(1000):
.... s = json.dumps(items)
.... items = [items]
....
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib/python3.3/json/__init__.py", line 236, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.3/json/encoder.py", line 191, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.3/json/encoder.py", line 249, in iterencode
return _iterencode(o, 0)
RuntimeError: maximum recursion depth exceeded while encoding a JSON object995

The safest option is probably to serialize the original flist and slist, and
use them to create the tree on the fly.
 
M

Mok-Kong Shen

Am 14.04.2014 15:59, schrieb Peter Otten:
You could use json, but you may run into the same problem with that, too
(only later):
import json
items = []
for i in range(1000):
... s = json.dumps(items)
... items = [items]
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib/python3.3/json/__init__.py", line 236, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.3/json/encoder.py", line 191, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.3/json/encoder.py", line 249, in iterencode
return _iterencode(o, 0)
RuntimeError: maximum recursion depth exceeded while encoding a JSON object995

The safest option is probably to serialize the original flist and slist, and
use them to create the tree on the fly.

Thank you very much for your efforts to help me.

I have yet a question out of curiosity: Why is my 2nd list structure,
that apparently is too complex for handling by eval and json, seemingly
not a problem for pickle?

M. K. Shen
 
G

Gregory Ewing

Mok-Kong Shen said:
I have yet a question out of curiosity: Why is my 2nd list structure,
that apparently is too complex for handling by eval and json, seemingly
not a problem for pickle?

Pickle is intended for arbitrary data structures, so it
is designed to be able to handle deeply-nested and/or
recursive data. Eval only has to handle nesting to depths
likely to be encountered in source code. Apparently the
json parser also assumes you're not going to be using
very deep nesting.
 
M

Mok-Kong Shen

Am 15.04.2014 01:51, schrieb Gregory Ewing:
Pickle is intended for arbitrary data structures, so it
is designed to be able to handle deeply-nested and/or
recursive data. Eval only has to handle nesting to depths
likely to be encountered in source code. Apparently the
json parser also assumes you're not going to be using
very deep nesting.

What I need is to have my (complicated) list to be put into
a bytearray, do some proceesing, transfer it to the recipient.
The recipient reverses the processing, obtains a bytearray
that is the same as my original one and gets from it the same
list as mine. Using pickle I can manage to do it (in fact I
did try and succeed with my 2nd list), but that's IMHO a
rather unnatural/awkward work-around. (It means that I have
to pickle out the list to a file and read in the content of
the file in order to have it as a bytearray etc. etc.)

M. K. Shen
 
P

Peter Otten

Gregory said:
Pickle is intended for arbitrary data structures, so it
is designed to be able to handle deeply-nested and/or
recursive data. Eval only has to handle nesting to depths
likely to be encountered in source code. Apparently the
json parser also assumes you're not going to be using
very deep nesting.

But pickle does have the same limitation:
.... items = []
.... try:
.... for i in range(10**6):
.... assert load(dump(items)) == items
.... items = [items]
.... except RuntimeError:
.... return i
....499

Mok-Kong Shen, for pickle and json you can increase the limit a bit:
999

But be careful, if you choose the limit too high you'll see Python react
like any other C program:
sys.setrecursionlimit(100000)
items = []
for i in range(100000):
.... items = [items]
....Segmentation fault

For literal_eval() the limit is unfortunately hard-coded in the C source.

Mok-Kong Shen said:
What I need is to have my (complicated) list to be put into
a bytearray, do some proceesing, transfer it to the recipient.
The recipient reverses the processing, obtains a bytearray
that is the same as my original one and gets from it the same
list as mine. Using pickle I can manage to do it (in fact I
did try and succeed with my 2nd list), but that's IMHO a
rather unnatural/awkward work-around. (It means that I have
to pickle out the list to a file and read in the content of
the file in order to have it as a bytearray etc. etc.)

The limit has been increased before, see

http://bugs.python.org/issue1881

and maybe you can get the core developers to increase it again, but
generally speaking the existence of a recursion limit is the price you pay
for easy interfacing with C.

literal_eval() could allocate its stack dynamically, but that would probably
complicate the code so that return on investment is questionable.
 
G

Gregory Ewing

Mok-Kong Shen said:
(It means that I have
to pickle out the list to a file and read in the content of
the file in order to have it as a bytearray etc. etc.)

No, you don't -- pickle.dumps() returns the pickled
data as a bytes object instead of writing it to a file.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top