Claudio said:
It's a funny story
data:image/s3,"s3://crabby-images/1c4fb/1c4fb4a004ac374ae735c210f8560be0dce354ac" alt="Smile :) :)"
)
, but in my eyes not appropriate in given context, because my prior goal
is to understand some more about Python internals, i.e. what is and if
it is at all a differerence between the internal representation of a
string and a long integer.
I know, I should probably look into the C source of Python, but I
suppose it could be too hard for me to find the appropriate piece of
code, so I welcome any hints.
If I knew the internal representation of string and long integer objects
and were able to read/write to memory and point an identifier at a given
memory address, a conversion between long integer and string types were
probably nothing else as changing some bytes and repointing an
identifier (assuming that string and long integer values are in memory
the same data if they represent the same value). That it would save CPU
time is secondary here, but with increasing costs of energy making the
number on my electrical power bill higher each year due to higher power
consumption with increasing number of programs I run (it makes a
difference of 50 Watt between an algorithm keeping the CPU 100% busy and
an algorithm using only 1% of it) it is not necessarily paranoia driving
one to consider potential savings of CPU time.
In this context the example of the bigdec class comes to my mind, where
usage of another algorithm made it possible to cut down power
consumption and time of printing a decimal form of the largest known
prime number from 7 hours of a 100% busy CPU down to 7 seconds!
Claudio
From stringobject.h :
typedef struct {
PyObject_VAR_HEAD
long ob_shash;
int ob_sstate;
char ob_sval[1];
/* Invariants:
* ob_sval contains space for 'ob_size+1' elements.
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the string or -1 if not computed yet.
* ob_sstate != 0 iff the string object is in stringobject.c's
* 'interned' dictionary; in this case the two references
* from 'interned' to this object are *not counted* in ob_refcnt.
*/
} PyStringObject;
From longobject.h :
typedef struct _longobject PyLongObject; /* Revealed in longintrepr.h */
From longintrepr.h :
typedef unsigned short digit;
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
From this I mean to see, that the string data header is longer
containing an additional long and an int compared to long integer data
header. The long integer seem to be an array of unsigned short values
and the string an array of characters. My MSDN help tells me, that:
"Type short int (or simply short) is an integral type that is larger
than or equal to the size of type char, and shorter than or equal to the
size of type int." In Microsoft Visual C++ short is 2 bytes long, and
because I am on an Intel processor the sequence of bytes will differ
from what I will get when creating a binary string out of a long integer
due to swapping of byte order. Am I right here?
My conclusion is, that beeing on e.g. Motorola 68000 based processors
the actual data behind the string and the long integer types will be the
same, but beeing on an Intel 386 series processor based systems the
actual data stored in memory behind a string and long integer will differ.
So finally the idea of converting binary strings forth and back to long
integers without looping over the data content does for sure not work on
Intel processor based systems, but may work on processors which don't
need swapped bytes for integer operations ( I found this page
http://www.cs.princeton.edu/~kazad/resources/cs/endian.htm gives a good
overview about the problems with byte swapping, is there any better
reference related to this subject online? ).
As Python tries to be multiplatform, I don't understand why it stores
long integers as an array of shorts and not as an array of chars?
Historical reasons? Speed reasons? Any other reasons?
Wouldn't it be more intuitive to have in Python a data type beeing an
object with binary data which can be easily 'casted' to both string and
long integer depending how it should be processed or used?
Do you see any advantage to have such data type, or am I quite alone
with this idea?
Claudio