Finding size of Variable

N

Ned Batchelder

If Python used UTF-32 for EVERYTHING, then all three of those cases
would be 4000048, so it clearly disproves your claim that "python
does not save memory at all".


However, as pointed out repeatedly, string-indexing in fixed-width
encodings are O(1) while indexing into variable-width encodings (e.g.
UTF8/UTF16) are O(N). The FSR gives the benefits of O(1) indexing
while saving space when a string doesn't need to use a full 32-bit
width.

-tkc

Please don't engage in this debate with JMF. His mind is made up, and
he will not be swayed, no matter how persuasive and reasonable your
arguments. Just ignore him.
 
N

Neil Cerutti

Please don't engage in this debate with JMF. His mind is made
up, and he will not be swayed, no matter how persuasive and
reasonable your arguments. Just ignore him.

I think reasonable criticisms should be contested no matter who
posts them. I agree jmf shouldn't be singled out for abuse,
summoned, insulted, or have his few controversial opinions
brought into other topics. Tim's post was responding to a
specific, well-presented criticism of Python's string
implementation. Left unchallenged, it might linger unhappily in
the air, like a symphony ended on a dominant 7th chord.
 
W

wxjmfauth

Le lundi 10 février 2014 15:43:08 UTC+1, Tim Chase a écrit :
If Python used UTF-32 for EVERYTHING, then all three of those cases

would be 4000048, so it clearly disproves your claim that "python

does not save memory at all".







However, as pointed out repeatedly, string-indexing in fixed-width

encodings are O(1) while indexing into variable-width encodings (e.g.

UTF8/UTF16) are O(N). The FSR gives the benefits of O(1) indexing

while saving space when a string doesn't need to use a full 32-bit

width.

A utf optimizes the memory and the performance at the same time.
It behaves like a mathematical operator, a unique operator for
a unique set of elements. Unbeatable.

The FSR is an exclusive or mechanism. I you wish to
same memory, you have to encode, and if you are encoding,
maybe because you have to, one loses performance. Paradoxal.

Your O(1) indexing works only and only because and
when you are working explicitly with a "static" unicode
string you never touch.
It's a little bit the the "corresponding" performance
case of the memory case.

jmf
 
M

Mark Lawrence

Le lundi 10 février 2014 15:43:08 UTC+1, Tim Chase a écrit :

A utf optimizes the memory and the performance at the same time.
It behaves like a mathematical operator, a unique operator for
a unique set of elements. Unbeatable.

The FSR is an exclusive or mechanism. I you wish to
same memory, you have to encode, and if you are encoding,
maybe because you have to, one loses performance. Paradoxal.

Your O(1) indexing works only and only because and
when you are working explicitly with a "static" unicode
string you never touch.
It's a little bit the the "corresponding" performance
case of the memory case.

jmf

Why are you so rude as to continually post your nonsense here that not a
single person believes, and at the same time still quite deliberately
use gg to post it with double line spacing. If you lack the courtesy to
stop the former, please have the courtesy to stop the latter.
 
W

wxjmfauth

Le mardi 11 février 2014 20:04:02 UTC+1, Mark Lawrence a écrit :
Why are you so rude as to continually post your nonsense here that not a

single person believes, and at the same time still quite deliberately

use gg to post it with double line spacing. If you lack the courtesy to

stop the former, please have the courtesy to stop the latter.



--

My fellow Pythonistas, ask not what our language can do for you, ask

what you can do for our language.


Nonsense?
-1


The day you find an operator working on the set of
reals (R) and it is somehow "optimized" for N
(the subset of natural numbers), let me know.

A conflict is quickly appearing. Either the operator is
not correctly defined or the choice of the set is wrong.

You can replace the "operator" with an "encoding" and
the "set" with a "repertoire of characters".

It's the main reason, why we have to live today with
all these coding schemes. Even in more sophisticated
cases like, CID-fonts or "char boxes" in a pdf (with the
hope you understand how it works).

jmf
 
C

Chris Angelico

The day you find an operator working on the set of
reals (R) and it is somehow "optimized" for N
(the subset of natural numbers), let me know.

I have yet to find any computer that works with the set of real
numbers in any way. Never mind optimization, they simply cannot work
with real numbers.

As to operations that are optimized for integers (usually not for
naturals - supporting zero and negatives isn't hard), they are legion.
In Python, integers have arbitrary precision, but floats, Fractions,
and Decimals, don't. Nearly any operation on arbitrarily large numbers
will be either more accurate or more efficient (maybe both) with
integers than with any of the other types.

Letting you know, that's all.

ChrisA
 
W

wxjmfauth

Integers are integers. (1)
Characters are characters. (2)

(1) is a unique "natural" set.

(2) is an artificial construct working
with 3 sets (unicode).

jmf
 
W

wxjmfauth

Le mercredi 12 février 2014 09:35:38 UTC+1, (e-mail address removed) a écrit :
Integers are integers. (1)

Characters are characters. (2)



(1) is a unique "natural" set.



(2) is an artificial construct working

with 3 sets (unicode).



jmf

Addendum: One should not confuse unicode and the implementation
of unicode.

jmf
 
C

Chris Angelico

Not *any* computer? Not in *any* way? The Python built-in ‘float’ type
“works with the set of real numbersâ€, in a way.

No, the Python built-in float type works with a subset of real numbers:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
float("pi")
ValueError: could not convert string to float: 'pi'Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
float("Ï€")
ValueError: could not convert string to float: 'Ï€'

Same goes for fractions.Fraction and [c]decimal.Decimal. All of them
are restricted to some subset of rational numbers, not all reals.
The <URL:http://docs.python.org/2/library/numbers.html#numbers.Real> ABC
defines behaviours for types implementing the set of real numbers.

What specific behaviour would, for you, qualify as “works with the set
of real numbers in any way�

Being able to represent surds, pi, e, etc, for a start. It'd
theoretically be possible with an algebraic notation (eg by carrying
through some representation like "2*pi" rather than 6.28....), but
otherwise, irrationals can't be represented with finite storage and a
digit-based system.

ChrisA
 
J

Jussi Piitulainen

Chris said:
....

In Python, integers have arbitrary precision, but floats, Fractions,
and Decimals, don't. Nearly any operation on arbitrarily large
numbers will be either more accurate or more efficient (maybe both)
with integers than with any of the other types.

Is that true about Fractions?
 
C

Chris Angelico

So, if I understand you right, you want to say that you've not found a
computer that works with the *complete* set of real numbers. Yes?

Correct. When jmf referred to real numbers, he implied that there are
no optimizations done for natural numbers, that everything's just as
efficient for any real number as for any other. My point is that
computers *do not* work with real numbers, but only ever with some
subset thereof, and that certain subsets (integers usually) are
optimized for in ways that other subsets aren't.

A true "real number" type might be useful in a few extremely narrow
situations, but for the most part, I'd much rather have the optimized
implementation that works with a subset thereof, and actually runs
within reasonable time/space complexity. (Though, that said, I think a
lot of programmers could do with some education on exactly _what_
subset of real numbers they're working with. The classic IEEE
double-precision floating point type is good enough with low numbers
that lots of people seem to think it stores reals, which it doesn't.)

ChrisA
 
J

Jussi Piitulainen

Chris said:
Being able to represent surds, pi, e, etc, for a start. It'd
theoretically be possible with an algebraic notation (eg by carrying
through some representation like "2*pi" rather than 6.28....), but
otherwise, irrationals can't be represented with finite storage and
a digit-based system.

I've seen papers on exact computable reals that would, in effect,
generate more precision when needed for some operation. It wasn't
symbolic like 2pi, more like 6.28... with a promise to delve into the
ellipsis, and some notable operations not supported.

Equality testing was missing, I think, and I think it could not be
known in general whether such a number is positive, zero or negative,
so even approximate printing in the usual digit notation would not be
possible. (Interval arithmetic, I hear, has a similar problem about
not knowing the sign of a number.)

In stark contrast, exact rationals work nicely, up to efficiency
considerations.
 
C

Chris Angelico

Is that true about Fractions?

I'm not 100% sure if fraction.Fraction and decimal.Decimal ever limit
the size or precision of their data, but certainly if they don't,
it'll be at horrendous expense of performance. (Decimal can add and
subtract in reasonable time complexity, but multiplication and
division will get slow when you have huge numbers of digits. Fraction
can multiply and divide efficiently, but will get crazily slow on
addition and subtraction.) Integers are an optimized case in many
ways. I can do accurate arbitrary-precision integer arithmetic without
worrying about simple operations suddenly saturating the CPU. I can't
do that with non-integers in any way.

It's not optimized for natural numbers (nonnegative integers), as
negatives are just as cheap as positives, but it's certainly an
optimization for integers.

ChrisA
 
J

Jussi Piitulainen

Chris said:
I'm not 100% sure if fraction.Fraction and decimal.Decimal ever limit
the size or precision of their data, but certainly if they don't,
it'll be at horrendous expense of performance. (Decimal can add and
subtract in reasonable time complexity, but multiplication and
division will get slow when you have huge numbers of digits. Fraction
can multiply and divide efficiently, but will get crazily slow on
addition and subtraction.) Integers are an optimized case in many
ways. I can do accurate arbitrary-precision integer arithmetic without
worrying about simple operations suddenly saturating the CPU. I can't
do that with non-integers in any way.

It's not optimized for natural numbers (nonnegative integers), as
negatives are just as cheap as positives, but it's certainly an
optimization for integers.

Right. I don't know about Decimal, but I don't think there are any
precision restrictions in Fraction, other than running out of heap (or
possibly integer precision).

In my (quite limited) experience, the most expensive operation on both
exact rationals and exact integers has been the printing, in decimal,
of several screenfuls of digits. The actual calculations have taken a
couple of seconds and then I have wished that I could interrupt the
printing of a single number :)
 
C

Chris Angelico

Chris Angelico said:
So, if I understand you right, you want to say that you've not found
a computer that works with the *complete* set of real numbers. Yes?

Correct. […] My point is that computers *do not* work with real
numbers, but only ever with some subset thereof […]

You've done it again: by saying that “computers *do not* work with real
numbersâ€, that if I find a real number – e.g. the number 4 – your
position is that, since it's a real number, computers don't work with
that number.

That's why I think you need to be clear that your point isn't “computers
don't work with real numbersâ€, but rather “computers workonly with a
limited subset of real numbersâ€.

Hmm, I'm not sure that my statement is false. If a computer can work
with "real numbers", then I would expect it to be able to work with
any real number. In C, I can declare an 'int' variable, which can hold
the real number 4 - does that mean that that variable stores real
numbers? No, and it's not useful to say that it does. It doesn't store
rationals either, even though 4 is a rational. The fact that computers
can work with some subset of real numbers does not disprove my
statement that computers don't work with "real numbers" as a class.
Program X works with text files, but it fails if the file contains
U+003C; can I feed it this thing, which is a text file? No, I can't,
because it works only with a subset of text files.

ChrisA
 
C

Chris Angelico

Likewise, if you claim that a computer *does not* work with real
numbers, then I would expect that for any real number, the computer
would fail to work with that number.

Which is why neither of those is a good statement of your position, IMO,
and you're better off saying the *limitations* you're describing.

I think we're using different words to say the same thing here :)

What I mean is that one cannot accurately say that a computer works
with real numbers, because it cannot work with them all. Of course a
computer can work with _some_ real numbers; but only some. (An awful
lot of them, of course. A ridiculously huge number of numbers. More
numbers than you could read in a lifetime! While the number is
extremely large, it still falls pitifully short of infinity.[1]) And
so we do have optimizations for some subset of reals: in approximate
order of performance, an arbitrary-precision integer type, a limited
precision floating point type, and two types that handle fractions
(vulgar and decimal). They're all, in a sense, optimizations. In pure
theory, we could have a single "real number" type and do everything
with that; all the other types are approximations to that.

[1] http://tools.ietf.org/search/rfc2795
 
M

Marko Rauhamaa

Chris Angelico said:
Hmm, I'm not sure that my statement is false. If a computer can work
with "real numbers", then I would expect it to be able to work with
any real number. In C, I can declare an 'int' variable, which can hold
the real number 4 - does that mean that that variable stores real
numbers? No, and it's not useful to say that it does. It doesn't store
rationals either, even though 4 is a rational. The fact that computers
can work with some subset of real numbers does not disprove my
statement that computers don't work with "real numbers" as a class.
Program X works with text files, but it fails if the file contains
U+003C; can I feed it this thing, which is a text file? No, I can't,
because it works only with a subset of text files.

According to your definition, there's no computer in the world that can
work with integers or text files.


Marko
 
C

Chris Angelico

According to your definition, there's no computer in the world that can
work with integers or text files.

Integers as far as RAM will allow, usually (which is the same caveat
as is used when describing a programming language as "Turing complete"
- strictly, that term is valid only if it has infinite memory
available), but yes, technically that's a subset of integers. However,
that subset is bounded by something other than the code, algorithms,
or even hardware - it's theoretically possible to add two numbers
larger than will fit in memory, by reading them in (even over the
network), adding segments, and writing them out again.

Text files. Since there's already no such thing as a "text file"
unless you know what its encoding is, I don't see a problem with this.
There's no such thing as an integer in memory, either, unless you know
how it's encoded (those same bits could be a floating point number, or
a pointer, or anything). If you know that the bytes in the file are,
say, a UTF-8 stream, then the file is a text file, just as it could be
a bash script, or an MS-DOS .COM file, if you've been told to decode
it in that way. Once your encoding is declared (out of band), the file
consists of a series of ASCII characters, or Unicode codepoints, or
whatever else it is. A fully functional program should be able to
process that file regardless of what sequence of codepoints it
carries. Say you want to search a file for a particular string, for
instance. You want to know whether or not "foobar" occurs in a file.
(I'll leave aside the question of word boundaries and say you're
looking for that string of six characters.) The program should be able
to determine the presence or absence of "foobar" regardless of what
other characters (or codepoints) are around it. Having U+001A
shouldn't stop the search there; nor should U+0000 cause problems, nor
U+003C, nor any other value. Doing otherwise would be a restriction:
this program supports only a subset of text files (those not
containing these "problem characters"). It might not be a bug, per se
(maybe text inside <angle_brackets> is considered to be an XML tag and
is deemed to be not what you're looking for), but it's still a
restriction. An inability to represent the integer 9007199254740993
(but able to represent ...992 and ...994) is a restriction.
Restrictions aren't necessarily bad, but they need to be acknowledged.

ChrisA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,574
Members
47,205
Latest member
ElwoodDurh

Latest Threads

Top