Strings

D

Dan

I've having trouble coming to grip with Python strings.

I need to send binary data over a socket. I'm taking the data from a
database. When I extract it, non-printable characters come out as a
backslash followed by a three numeric characters representing the
numeric value of the data. I guess this is what you would call a raw
Python string. I want to convert those four characters ( in C-think,
say "\\012" ) into a single character and put it in a new string.

There's probably a simple way to do it, but I haven't figured it out.
What I've done so far is to step through the string, character by
character. Normal characters are appended onto a new string. If I
come across a '\' character, I look for the next three numeric
characters. But I don't know how to convert this code into a single
character and append it onto the new string.

I'm sure what I'm doing is long and convoluted. Any suggestions would
be appreciated.

Dan
 
K

keirr

I'd use the int and chr casts. e.g.,

new_string = ""
a = '012'
new_string += chr(int(a))

Just in case the 012 is an octal code I'll mention that to cast to int
in general you can pass the base, as in int('034',8) or int('AF',16)

Cheers,

Keir.
 
P

Peter Hansen

Dan said:
I've having trouble coming to grip with Python strings.

I need to send binary data over a socket. I'm taking the data from a
database. When I extract it, non-printable characters come out as a
backslash followed by a three numeric characters representing the
numeric value of the data. I guess this is what you would call a raw
Python string. I want to convert those four characters ( in C-think,
say "\\012" ) into a single character and put it in a new string.

Does this help?
foo
bar
Note that the \n in the first one is because I didn't
*print* the result, but merely allowed the interpreter
to call repr() on it. repr() for a newline is of course
backslash-n, so that's what you see (inside quotation marks)
but the string itself has only 9 characters in it, as
you wished.

-Peter
 
T

Terry Reedy

Dan said:
I've having trouble coming to grip with Python strings.

I need to send binary data over a socket. I'm taking the data from a
database. When I extract it, non-printable characters come out as a
backslash followed by a three numeric characters representing the
numeric value of the data.

Are you sure that the printable expansion is actually in the string itself,
and not just occurring when you 'look' at the string by printing it -- as
in...
3
?
I guess this is what you would call a raw Python string.

No such thing. There are only strings (and unicode strings). 'raw' only
applies to a mode of interpreting string literals in the process of turning
them into bytes or unicode.

Terry J. Reedy
 
J

John Machin

I've having trouble coming to grip with Python strings.

I need to send binary data over a socket. I'm taking the data from a
database. When I extract it, non-printable characters come out as a
backslash followed by a three numeric characters representing the
numeric value of the data.

It would be very strange, but not beyond belief, for a DBMS to be
storing strings like that. What you are seeing is more likely an
artifact of how you are extracting it. If this is so, it would be
better to avoid the complication and error-proneness of converting to
an octal[yuk!]-based representation and back again.

However if the DBMS *IS* storing strings like that, then it would
require a look in the DBMS docs PLUS a look at the empirical evidence
to produce a reliable transcoding.

If you were to tell us which DBMS, and supply a copy&paste snippet
(*NOT* a re-typing) of the *actual* extraction code that you are
using, then we should be able to help you further.

Cheers,

John
 
B

Bengt Richter

Does this help?


Note that the \n in the first one is because I didn't
*print* the result, but merely allowed the interpreter
to call repr() on it. repr() for a newline is of course
backslash-n, so that's what you see (inside quotation marks)
but the string itself has only 9 characters in it, as
you wished.
When I wonder how many characters are actually in a_string, I find
list(a_string) helpful, which BTW also re-reprents equivalent escapes
in a consistent way, e.g., note \n's at the end:
['e', 's', 'c', 'a', 'p', 'e', 's', ' ', '\\', 'n', ' ', '\n', ' ', '\n', ' ', '\n']

OTOH, don't try that with '\a':
['\x07', ' ', '\x07', ' ', '\x07']

Why not like \n above or like \t
['\t', ' ', '\t', ' ', '\t']

Is this fixed by now? It's not news ;-)
'\x07'

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,816
Latest member
nipsseyhussle

Latest Threads

Top