A few questiosn about encoding

ÎÎ¹ÎºÏŒÎ»Î±Î¿Ï‚ ÎšÎ¿ÏÏÎ±Ï‚ · Jun 13, 2013

In Python 2:

typing 16474 in interactive session both in python 2 and 3 gives back
the number 16474

while we want the the binary representation of the number 16474

Nobody · Jun 13, 2013

And a proper UTF-8 decoder will reject "\xC0\x80" and "\xed\xa0\x80", even
though mathematically they would translate into U+0000 and U+D800
respectively. The UTF-16 *mechanism* is limited to no more than Unicode
has currently used, but I'm left wondering if that's actually the other
way around - that Unicode planes were deemed to stop at the point where
UTF-16 can't encode any more.

Indeed. 5-byte and 6-byte sequences were originally part of the UTF-8
specification, allowing for 31 bits. Later revisions of the standard
imposed the UTF-16 limit on Unicode as a whole.

Steven D'Aprano · Jun 13, 2013

typing 16474 in interactive session both in python 2 and 3 gives back
the number 16474

while we want the the binary representation of the number 16474

Python does not work that way. Ints *always* display in decimal.
Regardless of whether you enter the decimal in binary:

py> 0b100000001011010
16474

octal:

py> 0o40132
16474

or hexadecimal:

py> 0x405A
16474

ints always display in decimal. The only way to display in another base
is to build a string showing what the int would look like in a different
base:

py> hex(16474)
'0x405a'

Notice that the return value of bin, oct and hex are all strings. If they
were ints, then they would display in decimal, defeating the purpose!

ÎÎ¹ÎºÏŒÎ»Î±Î¿Ï‚ ÎšÎ¿ÏÏÎ±Ï‚ · Jun 13, 2013

On 13/6/2013 2:49 Î¼Î¼, Steven D'Aprano wrote:

Please confirm these are true statement:

A code-point and the code-point's ordinal value are associated into a
Unicode charset. They have the so called 1:1 mapping.

So, i was under the impression that by encoding the code-point into
utf-8 was the same as encoding the code-point's ordinal value into utf-8.

So, now i believe they are two different things.
The code-point *is what actually* needs to be encoded and *not* its
ordinal value.

The leading 0b is just syntax to tell you "this is base 2, not base 8
(0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.

But byte objects are represented as '\x' instead of the aforementioned
'0x'. Why is that?

ints always display in decimal. The only way to display in another base
is to build a string showing what the int would look like in a different
base:

py> hex(16474)
'0x405a'

Notice that the return value of bin, oct and hex are all strings. If they
were ints, then they would display in decimal, defeating the purpose!

Thank you didn't knew that! indeed it working like this.

To encode a number we have to turn it into a string first.

"16474".encode('utf-8')
b'16474'

That 'b' stand for bytes.
How can i view this byte's object representation as hex() or as bin()?

============
Also:17

You said this string consists of 17 chars.
Why the leading syntax of '0b' counts as bits as well? Shouldn't be 15
bits instead of 17?

Dennis Lee Bieber · Jun 13, 2013

So, the first high-bits are a directive that UTF-8 uses to know how many
bytes each character is being represented as.

0-127 codepoints(characters) use 1 bit to signify they need 1 bit for
storage and the rest 7 bits to actually store the character ?

Not quite... The leading bit is a 0 -> which means 0..127 are sent
as-is, no manipulation.

while

128-256 codepoints(characters) use 2 bit to signify they need 2 bits for
storage and the rest 14 bits to actually store the character ?

128..255 -- in what encoding? These all have the leading bit with a
value of 1. In 8-bit encodings (ISO-Latin-1) the meaning of those values is
inherent in the specified encoding and they are sent as-is.

BUT, in UTF-8, a byte with a leading 1-bit signals that the byte
identifies a multi-byte sequence. CF:
https://en.wikipedia.org/wiki/UTF-8#Description

So anything that starts with bits 110 is a two byte sequence (and the
second byte must start with bits 10 to be valid)

1110 starts a three byte sequence, 11110 starts a four byte sequence...
Basically, count the number of leading 1-bits before a 0 bit, and that
tells you how many bytes are in the multi-byte sequence -- and all bytes
that start with 10 are supposed to be the continuations of a multibyte set
(and not a signal that this is a 1-byte entry -- those only have a leading
0)

Isn't 14 bits way to many to store a character ?

Original UTF-8 allowed for 31-bits to specify a character in the Unicode
set. It used 6 bytes -- 48 bits total, but 7 bits of the first byte were
the flag (6 leading 1 bits and a 0 bit), and two bits (leading 10) of each
continuation.

Cameron Simpson · Jun 14, 2013

| A code-point and the code-point's ordinal value are associated into
| a Unicode charset. They have the so called 1:1 mapping.
|
| So, i was under the impression that by encoding the code-point into
| utf-8 was the same as encoding the code-point's ordinal value into
| utf-8.
|
| So, now i believe they are two different things.
| The code-point *is what actually* needs to be encoded and *not* its
| ordinal value.

Because there is a 1:1 mapping, these are the same thing: a code
point is directly _represented_ by the ordinal value, and the ordinal
value is encoded for storage as bytes.

| > The leading 0b is just syntax to tell you "this is base 2, not base 8
| > (0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.
|
| But byte objects are represented as '\x' instead of the
| aforementioned '0x'. Why is that?

You're confusing a "string representation of a single number in
some base (eg 2 or 16)" with the "string-ish representation of a
bytes object".

The former is just notation for writing a number in different bases, eg:

27 base 10
1b base 16
33 base 8
11011 base 2

A common convention, and the one used by hex(), oct() and bin() in
Python, is to prefix the non-base-10 representations with "0x" for
base 16, "0o" for base 8 ("o"ctal) and "0b" for base 2 ("b"inary):

27
0x1b
0o33
0b11011

This allows the human reader or a machine lexer to decide what base
the number is written in, and therefore to figure out what the
underlying numeric value is.

Conversely, consider the bytes object consisting of the values [97,
98, 99, 27, 10]. In ASCII (and UTF-8 and the iso-8859-x encodings)
these may all represent the characters ['a', 'b', 'c', ESC, NL].
So when "printing" a bytes object, which is a sequence of small integers representing
values stored in bytes, it is compact to print:

b'abc\x1b\n'

which is ['a', 'b', 'c', chr(27), newline].

The slosh (\) is the common convention in C-like languages and many
others for representing special characters not directly represents
by themselves. So "\\" for a slosh, "\n" for a newline and "\x1b"
for character 27 (ESC).

The bytes object is still just a sequence on integers, but because
it is very common to have those integers represent text, and very
common to have some text one want represented as bytes in a direct
1:1 mapping, this compact text form is useful and readable. It is
also legal Python syntax for making a small bytes object.

To demonstrate that this is just a _representation_, run this:

[ i for i in b'abc\x1b\n' ]

Click to expand...

Click to expand...

[97, 98, 99, 27, 10]

at an interactive Python 3 prompt. See? Just numbers.

| To encode a number we have to turn it into a string first.
|
| "16474".encode('utf-8')
| b'16474'
|
| That 'b' stand for bytes.

Syntactic details. Read this:
http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

| How can i view this byte's object representation as hex() or as bin()?

See above. A bytes is a _sequence_ of values. hex() and bin() print
individual values in hexadecimal or binary respectively. You could
do this:

for value in b'16474':
print(value, hex(value), bin(value))

Cheers,
--
Cameron Simpson <[email protected]>

Uhlmann's Razor: When stupidity is a sufficient explanation, there is no need
to have recourse to any other.
- Michael M. Uhlmann, assistant attorney general
for legislation in the Ford Administration

Nick the Gr33k · Jun 14, 2013

Not quite... The leading bit is a 0 -> which means 0..127 are sent
as-is, no manipulation.

So, in utf-8, the leading bit which is a zero 0, its actually a flag to
tell that the code-point needs 1 byte to be stored and the rest 7 bits
is for the actual value of 0-127 code-points ?

128..255 -- in what encoding? These all have the leading bit with a
value of 1. In 8-bit encodings (ISO-Latin-1) the meaning of those values is
inherent in the specified encoding and they are sent as-is.

So, latin-iso or greek-iso, the leading 0 is not a flag like it is in
utf-8 encoding because latin-iso and greek-iso and all *-iso use all 8
bits for storage?

But, in utf-8, the leading bit, which is 1, is to tell that the
code-point needs 2 byte to be stored and the rest 7 bits is for the
actual value of 128-255 code-points ?

But why 2 bytes? leading 1 is a flag and the rest 7 bits can hold the
encoded value.

Bu that is not the case since we know that utf-8 needs 2 bytes to store
code-points 127-255

1110 starts a three byte sequence, 11110 starts a four byte sequence...
Basically, count the number of leading 1-bits before a 0 bit, and that
tells you how many bytes are in the multi-byte sequence -- and all bytes
that start with 10 are supposed to be the continuations of a multibyte set
(and not a signal that this is a 1-byte entry -- those only have a leading
0)

Why doesn't it work like this?

leading 0 = 1 byte flag
leading 1 = 2 bytes flag
leading 00 = 3 bytes flag
leading 01 = 4 bytes flag
leading 10 = 5 bytes flag
leading 11 = 6 bytes flag

Wouldn't it be more logical?

Original UTF-8 allowed for 31-bits to specify a character in the Unicode
set. It used 6 bytes -- 48 bits total, but 7 bits of the first byte were
the flag (6 leading 1 bits and a 0 bit), and two bits (leading 10) of each
continuation.

utf8 6 byted = 48 bits - 7 bits(from first bytes) - 2 bits(for each
continuation) * 5 = 48 - 7 - 10 = 31 bits indeed to store the actual
code-point. But 2^31 is still a huge number to store any kind of
character isnt it?

Zero Piraeus · Jun 14, 2013

:

Why doesn't it work like this?

leading 0 = 1 byte flag
leading 1 = 2 bytes flag
leading 00 = 3 bytes flag
leading 01 = 4 bytes flag
leading 10 = 5 bytes flag
leading 11 = 6 bytes flag

Wouldn't it be more logical?

Think about it. Let's say that, as per your scheme, a leading 0
indicates "1 byte" (as is indeed the case in UTF8). What things could
follow that leading 0? How does that impact your choice of a leading
00 or 01 for other numbers of bytes?

.... okay, you're obviously going to need to be spoon-fed a little more
than that. Here's a byte:

01010101

Is that a single byte representing a code point in the 0-127 range, or
the first of 4 bytes representing something else, in your proposed
scheme? How can you tell?

Now look at the way UTF8 does it:
<http://en.wikipedia.org/wiki/Utf-8#Description>

Really, follow the link and study the table carefully. Don't continue
reading this until you believe you understand the choices that the
designers of UTF8 made, and why they made them.

Pay particular attention to the possible values for byte 1. Do you
notice the difference between that scheme, and yours:

0xxxxxxx
1xxxxxxx
00xxxxxx
01xxxxxx
10xxxxxx
11xxxxxx

If you don't see it, keep looking until you do ... this email gives
you more than enough hints to work it out. Don't ask someone here to
explain it to you. If you want to become competent, you must use your
brain.

-[]z.

Nick the Gr33k · Jun 14, 2013

| A code-point and the code-point's ordinal value are associated into
| a Unicode charset. They have the so called 1:1 mapping.
|
| So, i was under the impression that by encoding the code-point into
| utf-8 was the same as encoding the code-point's ordinal value into
| utf-8.
|
| So, now i believe they are two different things.
| The code-point *is what actually* needs to be encoded and *not* its
| ordinal value.

Because there is a 1:1 mapping, these are the same thing: a code
point is directly _represented_ by the ordinal value, and the ordinal
value is encoded for storage as bytes.

So, you are saying that:

chr(16474).encode('utf-8') #being the code-point encoded

ord(chr(16474)).encode('utf-8') #being the code-point's ordinal
encoded which gives an error.

that shows us that a character is what is being be encoded to utf-8 but
the character's ordinal cannot.

So, whay you say "....and the ordinal value is encoded for storage as
bytes." ?

| > The leading 0b is just syntax to tell you "this is base 2, not base 8
| > (0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.
|
| But byte objects are represented as '\x' instead of the
| aforementioned '0x'. Why is that?

You're confusing a "string representation of a single number in
some base (eg 2 or 16)" with the "string-ish representation of a
bytes object".

'0b100000001011010'
that is a binary format string representation of number 16474, yes?
'0x405a'
that is a hexadecimal format string representation of number 16474, yes?

WHILE:

b'abc\x1b\n' = a string representation of a byte, which in turn is a
series of integers, so that makes this a string representation of
integers, is this correct?

\x1b = ESC character

\ = for seperating bytes
x = to flag that the following bytes are going to be represented as hex
values? whats exactly 'x' means here? character perhaps?

Still its not clear into my head what the difference of '0x1b' and
'\x1b' is:

i think:
0x1b = an integer represented in hex format

\x1b = a character represented in hex format

id this true?

| How can i view this byte's object representation as hex() or as bin()?

See above. A bytes is a _sequence_ of values. hex() and bin() print
individual values in hexadecimal or binary respectively.

.... print(value, hex(value), bin(value))
....
151 0x97 0b10010111
152 0x98 0b10011000
153 0x99 0b10011001
39 0x27 0b100111
16 0x10 0b10000

.... print(value, hex(value), bin(value))
....
97 0x61 0b1100001
98 0x62 0b1100010
99 0x63 0b1100011
27 0x1b 0b11011
10 0xa 0b1010

Why these two give different values when printed?

Nick the Gr33k · Jun 14, 2013

:

Think about it. Let's say that, as per your scheme, a leading 0
indicates "1 byte" (as is indeed the case in UTF8). What things could
follow that leading 0? How does that impact your choice of a leading
00 or 01 for other numbers of bytes?

... okay, you're obviously going to need to be spoon-fed a little more
than that. Here's a byte:

01010101

Is that a single byte representing a code point in the 0-127 range, or
the first of 4 bytes representing something else, in your proposed
scheme? How can you tell?

Indeed.

You cannot tell if it stands for 1 byte or a 4 byte sequence:

0 + 1010101 = leading 0 stands for 1byte representation of a code-point

01 + 010101 = leading 01 stands for 4byte representation of a code-point

the problem here in my scheme of how utf8 encoding works is that you
cannot tell whether the flag is '0' or '01'

Same happen with leading '1' and '11'. You cannot tell what the flag is,
so you cannot know if the Unicode code-point is being represented as
2-byte sequence or 6 bye sequence

Understood

Now look at the way UTF8 does it:
<http://en.wikipedia.org/wiki/Utf-8#Description>

Really, follow the link and study the table carefully. Don't continue
reading this until you believe you understand the choices that the
designers of UTF8 made, and why they made them.

Pay particular attention to the possible values for byte 1. Do you
notice the difference between that scheme, and yours:

0xxxxxxx
1xxxxxxx
00xxxxxx
01xxxxxx
10xxxxxx
11xxxxxx

If you don't see it, keep looking until you do ... this email gives
you more than enough hints to work it out. Don't ask someone here to
explain it to you. If you want to become competent, you must use your
brain.

0xxxxxxx
110xxxxx 10xxxxxx
1110xxxx 10xxxxxx 10xxxxxx
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

I did read the link but i still cannot see why

1. '110' is the flag for 2-byte code-point
2. why the in the 2nd byte and every subsequent byte leading flag has to
be '10'

Antoon Pardon · Jun 14, 2013

Op 13-06-13 10:08, ÎÎ¹ÎºÏŒÎ»Î±Î¿Ï‚ ÎšÎ¿ÏÏÎ±Ï‚ schreef:

Indeed python embraced it in single quoting '0b100000001011010' and
not as 0b100000001011010 which in fact makes it a string.

But since bin(16474) seems to create a string rather than an expected
number(at leat into my mind) then how do we get the binary
representation of the number 16474 as a number?

You don't. You should remember that python (or any programming language)
doesn't print numbers. It always prints string representations of
numbers. It is just so that we are so used to the decimal representation
that we think of that representation as being the number.

Normally that is not a problem but it can cause confusion when you are
working with mulitple representations.

Nick the Gr33k · Jun 14, 2013

Op 13-06-13 10:08, ÎÎ¹ÎºÏŒÎ»Î±Î¿Ï‚ ÎšÎ¿ÏÏÎ±Ï‚ schreef:

You don't. You should remember that python (or any programming language)
doesn't print numbers. It always prints string representations of
numbers. It is just so that we are so used to the decimal representation
that we think of that representation as being the number.

Normally that is not a problem but it can cause confusion when you are
working with mulitple representations.

Hold on!
Youa re basically saying here that:

16474

is nto a number as we think but instead is string representation of a
number?

I dont think so, if it were a string representation of a number that
would print the following:
'16474'

Python prints numbers:
16474

it prints them all to decimal format though.
but when we need a decimal integer to be turned into bin() or hex() we
can bin(number) hex(number) and just remove the pair of single quoting.

Antoon Pardon · Jun 14, 2013

Op 14-06-13 09:49, Nick the Gr33k schreef:

Hold on!
Youa re basically saying here that:

16474

is nto a number as we think but instead is string representation of a
number?

Yes, or if you prefer what python prints is the decimal notation of the number.

I dont think so, if it were a string representation of a number that
would print the following:

'16474'

No it wouldn't, You are confusing representation in the everyday meaning
with representation as python jargon.

Python prints numbers:

No it doesn't, numbers are abstract concepts that can be represented in
various notations, these notations are strings. Those notaional strings
end up being printed. As I said before we are so used in using the
decimal notation that we often use the notation and the number interchangebly
without a problem. But when we are working with multiple notations that
can become confusing and we should be careful to seperate numbers from their
representaions/notations.

but when we need a decimal integer

There are no decimal integers. There is only a decimal notation of the number.
Decimal, octal etc are not characteristics of the numbers themselves.

Nick the Gr33k · Jun 14, 2013

No it doesn't, numbers are abstract concepts that can be represented in
various notations, these notations are strings. Those notaional strings
end up being printed. As I said before we are so used in using the
decimal notation that we often use the notation and the number interchangebly
without a problem. But when we are working with multiple notations that
can become confusing and we should be careful to seperate numbers from their
representaions/notations.

How do we separate a number then from its represenation-natation?

What is a notation anywat? is it a way of displayment? but that would be
a represeantion then....

Please explain this line as it uses both terms.

No it doesn't, numbers are abstract concepts that can be represented in
various notations

There are no decimal integers. There is only a decimal notation of the number.
Decimal, octal etc are not characteristics of the numbers themselves.

So everything we see like:

16474
nikos
abc123

everything is a string and nothing is a number? not even number 1?

Heiko Wundram · Jun 14, 2013

Am 14.06.2013 10:37, schrieb Nick the Gr33k:

So everything we see like:

16474
nikos
abc123

everything is a string and nothing is a number? not even number 1?

Come on now, this is _so_ obviously trolling, it's not even remotely
funny anymore. Why doesn't killfiling work with the mailing list version
of the python list? :-(

Nick the Gr33k · Jun 14, 2013

Am 14.06.2013 10:37, schrieb Nick the Gr33k:

Come on now, this is _so_ obviously trolling, it's not even remotely
funny anymore. Why doesn't killfiling work with the mailing list version
of the python list? :-(

I'mm not trolling man, i just have hard time understanding why numbers
acts as strings.

Cameron Simpson · Jun 14, 2013

| On 14/6/2013 4:00 Ï€Î¼, Cameron Simpson wrote:
| >| A code-point and the code-point's ordinal value are associated into
| >| a Unicode charset. They have the so called 1:1 mapping.
| >|
| >| So, i was under the impression that by encoding the code-point into
| >| utf-8 was the same as encoding the code-point's ordinal value into
| >| utf-8.
| >|
| >| So, now i believe they are two different things.
| >| The code-point *is what actually* needs to be encoded and *not* its
| >| ordinal value.
| >
| >Because there is a 1:1 mapping, these are the same thing: a code
| >point is directly _represented_ by the ordinal value, and the ordinal
| >value is encoded for storage as bytes.
|
| So, you are saying that:
|
| chr(16474).encode('utf-8') #being the code-point encoded
|
| ord(chr(16474)).encode('utf-8') #being the code-point's ordinal
| encoded which gives an error.
|
| that shows us that a character is what is being be encoded to utf-8
| but the character's ordinal cannot.
|
| So, whay you say "....and the ordinal value is encoded for storage
| as bytes." ?

No, I mean conceptually, there is no difference between a codepoint
and its ordinal value. They are the same thing.

Inside Python itself, a character (a string of length 1; there is
no separate character type) is a distinct type. Interally, the
characters in a string are stored numericly. As Unicode codepoints,
as their ordinal values.

It is a meaningful idea to store a Python string encoded into bytes
using some text encoding scheme (utf-8, iso-8859-7, what have you).

It is not a meaningful thing to store a number "encoded" without
some more context. The .encode() method that accepts an encoding
name like "utf-8" is specificly an encoding procedure FOR TEXT.

So strings have such a method, and integers do not.

When you write:

chr(16474)

you receive a _string_, containing the single character whose ordinal
is 16474. It is meaningful to transcribe this string to bytes using
a text encoding procedure like 'utf-8'.

When you write:

ord(chr(16474))

you get an integer. Because ord() is the reverse of chr(), you get
the integer 16474.

Integers do not have .encode() methods that accept a _text_ encoding
name like 'utf-8' because integers are not text.

| >| > The leading 0b is just syntax to tell you "this is base 2, not base 8
| >| > (0o) or base 10 or base 16 (0x)". Also, leading zero bits are dropped.
| >|
| >| But byte objects are represented as '\x' instead of the
| >| aforementioned '0x'. Why is that?
| >
| >You're confusing a "string representation of a single number in
| >some base (eg 2 or 16)" with the "string-ish representation of a
| >bytes object".
|
| >>> bin(16474)
| '0b100000001011010'
| that is a binary format string representation of number 16474, yes?

Yes.

| >>> hex(16474)
| '0x405a'
| that is a hexadecimal format string representation of number 16474, yes?

Yes.

| WHILE:
| b'abc\x1b\n' = a string representation of a byte, which in turn is a
| series of integers, so that makes this a string representation of
| integers, is this correct?

A "bytes" Python object. So not "a byte", 5 bytes.
It is a string representation of the series of byte values,
ON THE PREMISE that the bytes may well represent text.
On that basis, b'abc\x1b\n' is a reasonable way to display them.

In other contexts this might not be a sensible way to display these
bytes, and then another format would be chosen, possibly hand
constructed by the programmer, or equally reasonable, the hexlify()
function from the binascii module.

| \x1b = ESC character

Considering the bytes to be representing characters, then yes.

| \ = for seperating bytes

No, \ to introduce a sequence of characters with special meaning.

Normally a character in a b'...' item represents the byte value
matching the character's Unicode ordinal value. But several characters
are hard or confusing to place literally in a b'...' string. For
example a newline character or and escape character.

'a' means 65.
'\n' means 10 (newline, hence the 'n').
'\x1b' means 33 (escape, value 27, value 0x1b in hexadecimal).
And, of course, '\\' means a literal slosh, value 92.

| x = to flag that the following bytes are going to be represented as
| hex values? whats exactly 'x' means here? character perhaps?

A slosh followed by an 'x' means there will be 2 hexadecimal digits
to follow, and those two digits represent the byte value.

So, yes.

| Still its not clear into my head what the difference of '0x1b' and
| '\x1b' is:

They're the same thing in two similar but slightly different formats.

0x1b is a legitimate "bare" integer value in Python.

\x1b is a sequence you find inside strings (and "byte" strings, the
b'...' format).

| i think:
| 0x1b = an integer represented in hex format

Yes.

| \x1b = a character represented in hex format

Yes.

| >| How can i view this byte's object representation as hex() or as bin()?
| >
| >See above. A bytes is a _sequence_ of values. hex() and bin() print
| >individual values in hexadecimal or binary respectively.
|
| >>> for value in b'\x97\x98\x99\x27\x10':
| ... print(value, hex(value), bin(value))
| ...
| 151 0x97 0b10010111
| 152 0x98 0b10011000
| 153 0x99 0b10011001
| 39 0x27 0b100111
| 16 0x10 0b10000
|
|
| >>> for value in b'abc\x1b\n':
| ... print(value, hex(value), bin(value))
| ...
| 97 0x61 0b1100001
| 98 0x62 0b1100010
| 99 0x63 0b1100011
| 27 0x1b 0b11011
| 10 0xa 0b1010
|
|
| Why these two give different values when printed?

97 is in base 10 (9*10+7=97), but the notation '\x97' is base 16, so 9*16+7=151.

Cheers,

Heiko Wundram · Jun 14, 2013

Am 14.06.2013 11:32, schrieb Nick the Gr33k:

I'mm not trolling man, i just have hard time understanding why numbers
acts as strings.

If you can't grasp the conceptual differences between numbers and
their/a representation, it's probably best if you stayed away from
programming alltogether.

I don't think you're actually as thick as you sound, but rather either
you're simply too damn lazy to take the time to inform yourself from all
the hints/links/information you've been given, or you're trolling. I'm
still leaning towards the second.

Cameron Simpson · Jun 14, 2013

| On 14/6/2013 11:22 Ï€Î¼, Antoon Pardon wrote:
|
| >>Python prints numbers:
| >No it doesn't, numbers are abstract concepts that can be represented in
| >various notations, these notations are strings. Those notaional strings
| >end up being printed. As I said before we are so used in using the
| >decimal notation that we often use the notation and the number interchangebly
| >without a problem. But when we are working with multiple notations that
| >can become confusing and we should be careful to seperate numbers from their
| >representaions/notations.
|
| How do we separate a number then from its represenation-natation?

Shrug. When you "print" a number, Python transcribes a string
representation of it to your terminal.

| What is a notation anywat? is it a way of displayment? but that
| would be a represeantion then....

Yep. Same thing. A "notation" is a particulart formal method of
representation.

| No it doesn't, numbers are abstract concepts that can be represented in
| various notations
|
| >>but when we need a decimal integer
| >
| >There are no decimal integers. There is only a decimal notation of the number.
| >Decimal, octal etc are not characteristics of the numbers themselves.
|
| So everything we see like:
|
| 16474
| nikos
| abc123
|
| everything is a string and nothing is a number? not even number 1?

Everything you see like that is textual information. Internally to
Python, various types are used: strings, bytes, integers etc. But
when you print something, text is output.

Cheers,

Fábio Santos · Jun 14, 2013

Am 14.06.2013 10:37, schrieb Nick the Gr33k:

Come on now, this is _so_ obviously trolling, it's not even remotely

funny anymore. Why doesn't killfiling work with the mailing list version of
the python list? :-(

I have skimmed the archives for this month, and I estimate that a third of
this month's activity on this list was helping this person. About 80% of
that is wasted in explaining basic concepts he refuses to read in links
given to him. A depressingly large number of replies to his posts are
seemingly ignored.

Since this is a lot of spam, I feel like leaving the list, but I also
honestly want to help people use python and the replies to questions of
others often give me much insight on several matters.

Encoding of surrogate code points to UTF-8	14	Oct 8, 2013
files.py (encoding error)	0	Jun 10, 2013
files.py (weird encoding error)	0	Jun 10, 2013
newbie with a encoding question, please help	8	Apr 1, 2010
Question of UTF16BE encoding / decoding	2	May 5, 2009
Python3 - encoding issues	4	Nov 29, 2009
UTF - SEEK_SET workaround for BOM encoding(utf-16/32) layer Bug	2	Aug 5, 2009
Flatten an email Message with a non-ASCII body using 8bit CTE	0	Jan 24, 2013

A few questiosn about encoding

ÎÎ¹ÎºÏŒÎ»Î±Î¿Ï‚ ÎšÎ¿ÏÏÎ±Ï‚

Nobody

Steven D'Aprano

ÎÎ¹ÎºÏŒÎ»Î±Î¿Ï‚ ÎšÎ¿ÏÏÎ±Ï‚

Dennis Lee Bieber

Cameron Simpson

Nick the Gr33k

Zero Piraeus

Nick the Gr33k

Nick the Gr33k

Antoon Pardon

Nick the Gr33k

Antoon Pardon

Nick the Gr33k

Heiko Wundram

Nick the Gr33k

Cameron Simpson

Heiko Wundram

Cameron Simpson

Fábio Santos

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads