Translating unicode data

C

CaptainMcCrank

Hi list,

I'm struggling with a problem analyzing large amounts of unicode data
in an http wireshark capture.
I've solved the problem with the interpreter, but I'm not sure how to
do this in an automated fashion.

I'd like to grab a line from a text file & translate the unicode
sections of it to ascii. So, for example
I'd like to take
"\u003cb\u003eMar 17\u003c/b\u003e"

and turn it into

"<b>Mar 17</b>"

I can handle this from the interpreter as follows:

But I don't know what I need to do to automate this! The data that is
in the quotes from line 2 will have to come from a variable. I am
unable to figure out how to do this using a variable rather than a
literal string.

Please help!
 
P

Peter Otten

CaptainMcCrank said:
I'm struggling with a problem analyzing large amounts of unicode data
in an http wireshark capture.
I've solved the problem with the interpreter, but I'm not sure how to
do this in an automated fashion.

I'd like to grab a line from a text file & translate the unicode
sections of it to ascii. So, for example
I'd like to take
"\u003cb\u003eMar 17\u003c/b\u003e"

and turn it into

"<b>Mar 17</b>"

I can handle this from the interpreter as follows:


But I don't know what I need to do to automate this! The data that is
in the quotes from line 2 will have to come from a variable. I am
unable to figure out how to do this using a variable rather than a
literal string.

If wireshark uses the same escape codes as python you can use str.decode()
or open the file with codecs.open():
u'<b>Mar 17</b>'

Peter
 
C

CaptainMcCrank

If wireshark uses the same escape codes as python you can use str.decode()
or open the file with codecs.open():


'\\u003cb\\u003eMar 17\\u003c/b\\u003e'>>> s.decode("unicode-escape")



u'<b>Mar 17</b>'

Peter

This is a workable solution! Thank you Peter!
 
J

John Machin

You really need to say what version of Python you are working with,
how the code you tried, and the results you got.

Always very good advice, not often taken :)
Using Python 3.1, I get:
     >>> "\u003cb\u003eMar 17\u003c/b\u003e" == '<b>Mar 17</b>'
     True

Using Python 2.1.3 I get: 1

But so what? AFAICT from the OP's description and his joyous response
to Peter's suggestion, what he has (in 3.0 syntax) is not
"\u003cb\u003e etc"
it's
b"\u003cb\u003e etc"

HTH,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top