Embedding a literal "\u" in a unicode raw string.

R

Romano Giannetti

Hi,

while writing some LaTeX preprocessing code, I stumbled into this problem: (I
have a -*- coding: utf-8 -*- line, obviously)

s = ur"añado $\uparrow$"

Which gave an error because the \u escape is interpreted in raw unicode strings,
too. So I found that the only way to solve this is to write:

s = unicode(r"añado $\uparrow$", "utf-8")

or

s = ur"añado $\u005cuparrow$"

The second one is too ugly to live, while the first is at least acceptable; but
looking around the Python 3.0 doc, I saw that the first one will fail, too.

Am I doing something wrong here or there is another solution for this?

Romano
 
D

Diez B. Roggisch

Romano said:
Hi,

while writing some LaTeX preprocessing code, I stumbled into this problem:
(I have a -*- coding: utf-8 -*- line, obviously)

s = ur"añado $\uparrow$"

Which gave an error because the \u escape is interpreted in raw unicode
strings, too. So I found that the only way to solve this is to write:

s = unicode(r"añado $\uparrow$", "utf-8")

or

s = ur"añado $\u005cuparrow$"

The second one is too ugly to live, while the first is at least
acceptable; but looking around the Python 3.0 doc, I saw that the first
one will fail, too.

Am I doing something wrong here or there is another solution for this?

Why don't you rid yourself of the raw-string? Then you need to do

s = u"anando $\\uparrow$"

which is considerably easier to read than both other variants above.

Diez
 
O

OKB (not okblacke)

Romano said:
Hi,

while writing some LaTeX preprocessing code, I stumbled into this
problem: (I have a -*- coding: utf-8 -*- line, obviously)

s = ur"añado $\uparrow$"

Which gave an error because the \u escape is interpreted in raw
unicode strings, too. So I found that the only way to solve this is
to write:

s = unicode(r"añado $\uparrow$", "utf-8")

or

s = ur"añado $\u005cuparrow$"

The second one is too ugly to live, while the first is at least
acceptable; but looking around the Python 3.0 doc, I saw that the
first one will fail, too.

Am I doing something wrong here or there is another solution for
this?

I too encountered this problem, in the same situation (making
strings that contain LaTeX commands). One possibility is to separate
out just the bit that has the \u, and use string juxtaposition to attach
it to the others:

s = ur"añado " u"$\\uparrow$"

It's not ideal, but I think it's easier to read than your solution
#2.


--
--OKB (not okblacke)
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is
no path, and leave a trail."
--author unknown
 
R

romano.giannetti

I too encountered this problem, in the same situation (making
strings that contain LaTeX commands). One possibility is to separate
out just the bit that has the \u, and use string juxtaposition to attach
it to the others:

s = ur"añado " u"$\\uparrow$"

It's not ideal, but I think it's easier to read than your solution
#2.

Yes, I think I will do something like that, although... I really do
not understand why \x5c is not interpreted in a raw string but \u005c
is interpreted in a unicode raw string... is, well, not elegant. Raw
should be raw...

Thanks anyway
 
M

Martin v. Löwis

Yes, I think I will do something like that, although... I really do
not understand why \x5c is not interpreted in a raw string but \u005c
is interpreted in a unicode raw string... is, well, not elegant. Raw
should be raw...

Right. IMO, this is just a plain design mistake in the Python Unicode
handling. Unfortunately, there was discussion about this specific issue
in the past, and the proponent of the status quo always defended it,
with the rationale (IIUC) that a) without that, you can't put arbitrary
Unicode characters into a string, and b) the semantics of \u in Java and
C is so that \u gets processed even before tokenization even starts, and
it should be the same in Python.

Regards,
Martin
 
R

rmano

Right. IMO, this is just a plain design mistake in the Python Unicode
handling. Unfortunately, there was discussion about this specific issue
in the past, and the proponent of the status quo always defended it,
with the rationale (IIUC) that a) without that, you can't put arbitrary
Unicode characters into a string, and b) the semantics of \u in Java and
C is so that \u gets processed even before tokenization even starts, and
it should be the same in Python.

Well, I do not know Java, but C AFAIK has no raw strings, so you have
nevertheless
to use double backslashes. Raw strings are a handy shorthand when you
can generate
the characters with your keyboard, and this asymmetry quite defeat it.

Is it decided or it is possible to lobby for it? :)

Thanks,
Romano

BTW, 2to3.py should warn when a raw string (not unicode) with \u in
it, I think.
I tried it and it seems to ignore the problem...
 
N

NickC

BTW, 2to3.py should warn when a raw string (not unicode) with \u in
it, I think.
I tried it and it seems to ignore the problem...

Python 3.0a3+ (py3k:61229, Mar 4 2008, 21:38:15)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.'\\'

2to3.py may be ignoring a problem, but existing raw 8-bit string
literals containing a '\u' aren't going to be it. If anything is going
to have a problem with conversion to Py3k at this point, it is raw
Unicode literals that contain a Unicode escape.
 
R

rmano

Python 3.0a3+ (py3k:61229, Mar 4 2008, 21:38:15)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.>>> r"\u"
'\\u''\\uparrow'

Nice to know... so it seems that the 3.0 doc was not updated. I think
this is the correct
behaviour. Thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,990
Messages
2,570,211
Members
46,796
Latest member
SteveBreed

Latest Threads

Top