Finding size of Variable

A

Ayushi Dalmia

Ayushi Dalmia said:
On 2014-02-04 14:21, Dave Angel wrote:

To get the "total" size of a list of strings, try (untested):



a = sys.getsizeof (mylist )

for item in mylist:

a += sys.getsizeof (item)



I always find this sort of accumulation weird (well, at least in

Python; it's the *only* way in many other languages) and would write

it as



a = getsizeof(mylist) + sum(getsizeof(item) for item in mylist)



-tkc
This also doesn't gives the true size. I did the following:
import sys
f=open('stopWords.txt','r')

for line in f:
line=line.split()
data.extend(line)

print sys.getsizeof(data)



Did you actually READ either of my posts or Tim's? For a

container, you can't just use getsizeof on the container.





a = sys.getsizeof (data)

for item in mylist:

a += sys.getsizeof (data)

print a

Yes, I did. I now understand how to find the size.
 
D

Dennis Lee Bieber

My guess is that if you split a 4K file into words, then put the words
into a list, you'll probably end up with 6-8K in memory.

I'd guess rather more; Python strings have a fair bit of fixed
overhead, so with a whole lot of small strings, it will get more
costly.
'3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan 5 2014, 16:23:43) [MSC v.1600 32
bit (Intel)]'
sys.getsizeof("asdf")
29
import sys
indata = "221B or not to be seeing you again"
sys.getsizeof(indata) 67
worddata = indata.split()
worddata ['221B', 'or', 'not', 'to', 'be', 'seeing', 'you', 'again']
sys.getsizeof(worddata) + sum(sys.getsizeof(wd) for wd in worddata)
451

That's a 7X expansion for just splitting a single line into a list of
words.
 
W

wxjmfauth

Le mercredi 5 février 2014 12:44:47 UTC+1, Chris Angelico a écrit :
My guess is that if you split a 4K file into words, then put the words
into a list, you'll probably end up with 6-8K in memory.



I'd guess rather more; Python strings have a fair bit of fixed

overhead, so with a whole lot of small strings, it will get more

costly.



'3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan 5 2014, 16:23:43) [MSC v.1600 32

bit (Intel)]'

29



"Stop words" tend to be short, rather than long, words, so I'd look at

an average of 2-3 letters per word. Assuming they're separated by

spaces or newlines, that means there'll be roughly a thousand of them

in the file, for about 25K of overhead. A bit less if the words are

longer, but still quite a bit. (Byte strings have slightly less

overhead, 17 bytes apiece, but still quite a bit.)



ChrisA
sum([sys.getsizeof(c) for c in ['a']]) 26
sum([sys.getsizeof(c) for c in ['a', 'a EURO']]) 68
sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']]) 112
sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO']]) 158
sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO', 'aaaaaaaaaaaaaaaaaaaa EURO']]) 238


sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a']]) 21
sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO']]) 46
sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa EURO']]) 75
sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO']]) 108
sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO', 'aaaaaaaaaaaaaaaaaaaa EURO']]) 209


sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3]) 336
sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3]) 150
sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa EURO']*3]) 261
sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3]) 135

jmf
 
N

Ned Batchelder

sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3]) 336
sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3]) 150
sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa EURO']*3]) 261
sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3]) 135

jmf

JMF, we've told you I-don't-know-how-many-times to stop this.
Seriously: think hard about what your purpose is in sending these absurd
benchmarks. I guarantee you are not accomplishing it.
 
W

wxjmfauth

Le jeudi 6 février 2014 12:10:08 UTC+1, Ned Batchelder a écrit :
sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3])
sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3])
sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa EURO']*3])
sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3])



JMF, we've told you I-don't-know-how-many-times to stop this.

Seriously: think hard about what your purpose is in sending these absurd

benchmarks. I guarantee you are not accomplishing it.

Sorry, I'm only pointing you may lose memory when
working with short strings as it was explained.
I really, very really, do not see what is absurd
or obsure in:
37

I apologize for the " a EURO" which should have
been a real "EURO". No idea, what's happend.

jmf
 
W

wxjmfauth

Some mysterious problem with the "euro".
Let's take a real "French" char.37

or a "German" char, ẞẞẞẞẞ
37
 
S

Steven D'Aprano

Sorry, I'm only pointing you may lose memory when working with short
strings as it was explained. I really, very really, do not see what is
absurd or obsure in:

37


Why do you care about NINE bytes? The least amount of memory in any PC
that I know about is 500000000 bytes, more than fifty million times more.
And you are whinging about wasting nine bytes?

If you care about that lousy nine bytes, Python is not the language for
you. Go and program in C, where you can spent ten or twenty times longer
programming, but save nine bytes in every string.

Nobody cares about your memory "benchmark" except you. Python is not
designed to save memory, Python is designed to use as much memory as
needed to give the programmer an easier job. In C, I can store a single
integer in a single byte. In Python, horror upon horrors, it takes 14
bytes!!!

py> sys.getsizeof(1)
14

We consider it A GOOD THING that Python spends memory for programmer
convenience and safety. Python looks for memory optimizations when it can
save large amounts of memory, not utterly trivial amounts. So in a Python
wide build, a ten-thousand block character string requires a little bit
more than 40KB. In Python 3.3, that can be reduced to only 10KB for a
purely Latin-1 string, or 20K for a string without any astral characters.
That's the sort of memory savings that are worthwhile, reducing memory
usage by 75%.

Could Python save memory by using UTF-8? Yes. But it would cost
complexity and time, strings would be even slower than they are now. That
is not a trade-off that the core developers have chosen to make, and I
agree with them.
 
E

Ethan Furman

That is not a trade-off that the core developers have chosen to make,
and I agree with them.

Even though you haven't broken all the build-bots yet, you can still stop saying "them". ;)
 
M

Mark Lawrence

Why do you care about NINE bytes? The least amount of memory in any PC
that I know about is 500000000 bytes, more than fifty million times more.
And you are whinging about wasting nine bytes?

If you care about that lousy nine bytes, Python is not the language for
you. Go and program in C, where you can spent ten or twenty times longer
programming, but save nine bytes in every string.

Nobody cares about your memory "benchmark" except you. Python is not
designed to save memory, Python is designed to use as much memory as
needed to give the programmer an easier job. In C, I can store a single
integer in a single byte. In Python, horror upon horrors, it takes 14
bytes!!!

py> sys.getsizeof(1)
14

We consider it A GOOD THING that Python spends memory for programmer
convenience and safety. Python looks for memory optimizations when it can
save large amounts of memory, not utterly trivial amounts. So in a Python
wide build, a ten-thousand block character string requires a little bit
more than 40KB. In Python 3.3, that can be reduced to only 10KB for a
purely Latin-1 string, or 20K for a string without any astral characters.
That's the sort of memory savings that are worthwhile, reducing memory
usage by 75%.

Could Python save memory by using UTF-8? Yes. But it would cost
complexity and time, strings would be even slower than they are now. That
is not a trade-off that the core developers have chosen to make, and I
agree with them.

This is a C +1 to save memory when compared against this Python +1 :)
 
R

Rustom Mody

One could argue that if you're parsing a particular file, a very large one, that those 9 bytes can go into the optimization of parsing aforementioned file. Of, course we have faster processors, so why care?
Because it goes into the optimization of the code one is 'developing' in python.

Yes... There are cases when python is an inappropriate language to use...
So???

Its good to get a bit of context here.

loop:
jmf says python is inappropriate.
Someone asks him: Is it? In what case?
jmf: No answer
After a delay of few days jmp to start of loop

[BTW: In my book this classic trolling]
 
C

Chris Angelico

I didn't say she couldn't optimize in another language, and was just
prototyping in Python. I just said she was optimizing her python
code...dufus.

And there are a *lot* of cases where that is inappropriate language to
use. Please don't.

ChrisA
 
N

Ned Batchelder

On Sat, Feb 8, 2014 at 8:25 PM, Rustom Mody <[email protected]


large one, that those 9 bytes can go into the optimization of
parsing aforementioned file. Of, course we have faster processors,
so why care?
'developing' in python.

Yes... There are cases when python is an inappropriate language to
use...
So???


I didn't say she couldn't optimize in another language, and was just
prototyping in Python. I just said she was optimizing her python
code...dufus.

Please keep the discussion respectful. Misunderstandings are easy, I
suspect this is one of them. There's no reason to start calling people
names.
Its good to get a bit of context here.

loop:
jmf says python is inappropriate.
Someone asks him: Is it? In what case?
jmf: No answer
After a delay of few days jmp to start of loop

loop:
mov head,up_your_ass
push repeat
pop repeat
jmp loop

Please keep in mind the Code of Conduct:

http://www.python.org/psf/codeofconduct

Thanks.
[BTW: In my book this classic trolling]
--


And the title of this book would be..."Pieces of Cliche Bullshit
Internet Arguments for Dummies"

https://mail.python.org/mailman/listinfo/python-list
 
D

David Hutto

Maybe I'll just roll my fat, bald, troll arse out from under the bridge,
and comment back, off list, next time.
 
N

Ned Batchelder

Maybe I'll just roll my fat, bald, troll arse out from under the bridge,
and comment back, off list, next time.

I'm not sure what happened in this thread. It might be that you think
Rustom Mody was referring to you when he said, "BTW: In my book this
classic trolling." I don't think he was, I think he was referring to JMF.

In any case, perhaps it would be best to just take a break?
 
R

Rustom Mody

I'm not sure what happened in this thread. It might be that you think
Rustom Mody was referring to you when he said, "BTW: In my book this
classic trolling." I don't think he was, I think he was referring to JMF.

Of course!
And given the turn of this thread, we must hand it to jmf for being even better at trolling than I thought :)

See the first para
http://en.wikipedia.org/wiki/Troll_(Internet)
 
W

wxjmfauth

Le samedi 8 février 2014 03:48:12 UTC+1, Steven D'Aprano a écrit :
We consider it A GOOD THING that Python spends memory for programmer

convenience and safety. Python looks for memory optimizations when it can

save large amounts of memory, not utterly trivial amounts. So in a Python

wide build, a ten-thousand block character string requires a little bit

more than 40KB. In Python 3.3, that can be reduced to only 10KB for a

purely Latin-1 string, or 20K for a string without any astral characters.

That's the sort of memory savings that are worthwhile, reducing memory

usage by 75%.

In its attempt to save memory, Python only succeeds to
do worse than any utf* coding schemes.

---

Python does not save memory at all. A str (unicode string)
uses less memory only - and only - because and when one uses
explicitly characters which are consuming less memory.

Not only the memory gain is zero, Python falls back to the
worse case.
4000048

The opposite of what the utf8/utf16 do!
2000025


jmf
 
A

Asaf Las

On Monday, February 10, 2014 4:07:14 PM UTC+2, (e-mail address removed) wrote:
Interesting
here you get string type
and here bytes

Why?
 
M

Mark Lawrence

On Monday, February 10, 2014 4:07:14 PM UTC+2, (e-mail address removed) wrote:
Interesting

here you get string type

and here bytes


Why?

Please don't feed this particular troll, he's spent 18 months driving us
nuts with his nonsense.
 
T

Tim Chase

Python does not save memory at all. A str (unicode string)
uses less memory only - and only - because and when one uses
explicitly characters which are consuming less memory.

Not only the memory gain is zero, Python falls back to the
worse case.

4000048

If Python used UTF-32 for EVERYTHING, then all three of those cases
would be 4000048, so it clearly disproves your claim that "python
does not save memory at all".
The opposite of what the utf8/utf16 do!

2000025

However, as pointed out repeatedly, string-indexing in fixed-width
encodings are O(1) while indexing into variable-width encodings (e.g.
UTF8/UTF16) are O(N). The FSR gives the benefits of O(1) indexing
while saving space when a string doesn't need to use a full 32-bit
width.

-tkc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,573
Members
47,205
Latest member
ElwoodDurh

Latest Threads

Top