Unrecognized escape sequences in string literals

D

Douglas Alan

A friend of mine is just learning Python, and he's a bit tweaked about
how unrecognized escape sequences are treated in Python. This is from
the Python 3.0 reference manual:

Unlike Standard C, all unrecognized escape sequences are left in
the string unchanged, i.e.,
the backslash is left in the string. (This behavior is useful
when debugging: if an escape
sequence is mistyped, the resulting output is more easily
recognized as broken.) It is also
important to note that the escape sequences only recognized in
string literals fall into the
category of unrecognized escapes for bytes literals.

My friend begs to differ with the above. It would be much better for
debugging if Python generated a parsing error for unrecognized escape
sequences, rather than leaving them unchanged. g++ outputs a warning
for such escape sequences, for instance. This is what I would consider
to be the correct behavior. (Actually, I think it should just generate
a fatal parsing error, but a warning is okay too.)

In any case, I think my friend should mellow out a bit, but we both
consider this something of a wart. He's just more wart-phobic than I
am. Is there any way that this behavior can be considered anything
other than a wart? Other than the unconvincing claim that you can use
this "feature" to save you a bit of typing sometimes when you actually
want a backslash to be in your string?

|>ouglas
 
S

Steven D'Aprano

A friend of mine is just learning Python, and he's a bit tweaked about
how unrecognized escape sequences are treated in Python. ....
In any case, I think my friend should mellow out a bit, but we both
consider this something of a wart. He's just more wart-phobic than I am.
Is there any way that this behavior can be considered anything other
than a wart? Other than the unconvincing claim that you can use this
"feature" to save you a bit of typing sometimes when you actually want a
backslash to be in your string?

I'd put it this way: a backslash is just an ordinary character, except
when it needs to be special. So Python's behaviour is "treat backslash as
a normal character, except for these exceptions" while the behaviour your
friend wants is "treat a backslash as an error, except for these
exceptions".

Why should a backslash in a string literal be an error?
 
D

Douglas Alan

Steven said:
Why should a backslash in a string literal be an error?

Because in Python, if my friend sees the string "foo\xbar\n", he has
no idea whether the "\x" is an escape sequence, or if it is just the
characters "\x", unless he looks it up in the manual, or tries it out
in the REPL, or what have you. My friend is adamant that it would be
better if he could just look at the string literal and know. He
doesn't want to be bothered to have to store stuff like that in his
head. He wants to be able to figure out programs just by looking at
them, to the maximum degree that that is feasible.

In comparison to Python, in C++, he can just look "foo\xbar\n" and
know that "\x" is a special character. (As long as it compiles without
warnings under g++.)

He's particularly annoyed too, that if he types "foo\xbar" at the
REPL, it echoes back as "foo\\xbar". He finds that to be some sort of
annoying DWIM feature, and if Python is going to have DWIM features,
then it should, for example, figure out what he means by "\" and not
bother him with a syntax error in that case.

Another reason that Python should not behave the way that it does, is
that it pegs Python into a corner where it can't add new escape
sequences in the future, as doing so will break existing code.
Generating a syntax error instead for unknown escape sequences would
allow for future extensions.

Now not to pick on Python unfairly, most other languages have similar
issues with escape sequences. (Except for the Bourne Shell and bash,
where "\x" always just means "x", no matter what character "x" happens
to be.) But I've been telling my friend for years to switch to Python
because of how wonderful and consistent Python is in comparison to
most other languages, and now he seems disappointed and seems to think
that Python is just more of the same.

Of course I think that he's overreacting a bit. My point of view is
that every language has *some* warts; Python just has a bit fewer than
most. It would have been nice, I should think, if this wart had been
"fixed" in Python 3, as I do consider it to be a minor wart.

|>ouglas
 
C

Carl Banks

I'd put it this way: a backslash is just an ordinary character, except
when it needs to be special. So Python's behaviour is "treat backslash as
a normal character, except for these exceptions" while the behaviour your
friend wants is "treat a backslash as an error, except for these
exceptions".

Why should a backslash in a string literal be an error?

Because the behavior of \ in a string is context-dependent, which
means a reader can't know if \ is a literal character or escape
character without knowing the context, and it means an innocuous
change in context can cause a rather significant change in \.

IOW it's an error-prone mess. It would be better if Python (like C)
treated \ consistently as an escape character. (And in raw strings,
consistently as a literal.)

It's kind of a minor issue in terms of overall real-world importance,
but in terms of raw unPythonicness this might be the worst offense the
language makes.


Carl Banks
 
D

Douglas Alan

while the behaviour your
friend wants is "treat a backslash as an error, except for these
exceptions".

Besides, can't all error situations be described as, "treat the error
situation as an error, except for the exception of when the situation
isn't an error"???

The behavior my friend wants isn't any more exceptional than that!

|>ouglas
 
S

Steven D'Aprano

Because in Python, if my friend sees the string "foo\xbar\n", he has no
idea whether the "\x" is an escape sequence, or if it is just the
characters "\x", unless he looks it up in the manual, or tries it out in
the REPL, or what have you.

Fair enough, but isn't that just another way of saying that if you look
at a piece of code and don't know what it does, you don't know what it
does unless you look it up or try it out?

My friend is adamant that it would be better
if he could just look at the string literal and know. He doesn't want to
be bothered to have to store stuff like that in his head. He wants to be
able to figure out programs just by looking at them, to the maximum
degree that that is feasible.

I actually sympathize strongly with that attitude. But, honestly, your
friend is a programmer (or at least pretends to be one *wink*). You can't
be a programmer without memorizing stuff: syntax, function calls, modules
to import, quoting rules, blah blah blah. Take C as an example -- there's
absolutely nothing about () that says "group expressions or call a
function" and {} that says "group a code block". You just have to
memorize it. If you don't know what a backslash escape is going to do,
why would you use it? I'm sure your friend isn't in the habit of randomly
adding backslashes to strings just to see whether it will still compile.

This is especially important when reading (as opposed to writing) code.
You read somebody else's code, and see "foo\xbar\n". Let's say you know
it compiles without warning. Big deal -- you don't know what the escape
codes do unless you've memorized them. What does \n resolve to? chr(13)
or chr(97) or chr(0)? Who knows?

Unless you know the rules, you have no idea what is in the string.
Allowing \y to resolve to a literal backslash followed by y doesn't
change that. All it means is that some \c combinations return a single
character, and some return two.


In comparison to Python, in C++, he can just look "foo\xbar\n" and know
that "\x" is a special character. (As long as it compiles without
warnings under g++.)

So what you mean is, he can just look at "foo\xbar\n" AND COMPILE IT
USING g++, and know whether or not \x is a special character.

[sarcasm] Gosh. That's an enormous difference from Python, where you have
to print the string at the REPL to know what it does. [/sarcasm]

Aside:
\x isn't a special character:
ValueError: invalid \x escape

However, \xba is:
186



He's particularly annoyed too, that if he types "foo\xbar" at the REPL,
it echoes back as "foo\\xbar". He finds that to be some sort of annoying
DWIM feature, and if Python is going to have DWIM features, then it
should, for example, figure out what he means by "\" and not bother him
with a syntax error in that case.

Now your friend is confused. This is a good thing. Any backslash you see
in Python's default string output is *always* an escape:
"a string with an 'improper' escape \\y (backslash-y)"

The REPL is actually doing him a favour. It always escapes backslashes,
so there is no ambiguity. A backslash is displayed as \\, any other \c is
a special character.

Of course I think that he's overreacting a bit.
:)


My point of view is that
every language has *some* warts; Python just has a bit fewer than most.
It would have been nice, I should think, if this wart had been "fixed"
in Python 3, as I do consider it to be a minor wart.

And if anyone had cared enough to raise it a couple of years back, it
possibly might have been.
 
J

John Nagle

Carl said:
IOW it's an error-prone mess. It would be better if Python (like C)
treated \ consistently as an escape character. (And in raw strings,
consistently as a literal.)

Agreed. For one thing, if another escape character ever has to be
added to the language, that may change the semantics of previously
correct strings. If "\" followed by a non-special character is treated
as an error, that doesn't happen.

John Nagle
 
S

Steven D'Aprano

Because the behavior of \ in a string is context-dependent, which means
a reader can't know if \ is a literal character or escape character
without knowing the context, and it means an innocuous change in context
can cause a rather significant change in \.

*Any* change in context is significant with escapes.

"this \nhas two lines"

If you change the \n to a \t you get a significant difference. If you
change the \n to a \y you get a significant difference. Why is the first
one acceptable but the second not?

IOW it's an error-prone mess.

I've never had any errors caused by this. I've never seen anyone write to
this newsgroup confused over escape behaviour, or asking for help with an
error caused by it, and until this thread, never seen anyone complain
about it either.

Excuse my cynicism, but I believe that you are using "error-prone" to
mean "I don't like this behaviour" rather than "it causes lots of errors".
 
S

Steven D'Aprano

if another escape character ever has to be
added to the language, that may change the semantics of previously
correct strings.

And that's the only argument in favour of prohibiting non-special
backslash sequences I've seen yet that is even close to convincing.
 
D

Douglas Alan

Fair enough, but isn't that just another way of saying that if you look
at a piece of code and don't know what it does, you don't know what it
does unless you look it up or try it out?

Not really. It's more like saying that easy things should be easy, and
hard things should possible. But in this case, Python is making
something that should be really easy, a bit harder and more error
prone than it should be.

In C++, if I know that the code I'm looking at compiles, then I never
need worry that I've misinterpreted what a string literal means. At
least not if it doesn't have any escape characters in it that I'm not
familiar with. But in Python, if I see, "\f\o\o\b\a\z", I'm not really
sure what I'm seeing, as I surely don't have committed to memory some
of the more obscure escape sequences. If I saw this in C++, and I knew
that it was in code that compiled, then I'd at least know that there
are some strange escape codes that I have to look up. Unlike with
Python, it would never be the case in C++ code that the programmer who
wrote the code was just too lazy to type in "\\f\\o\\o\\b\\a\\z"
instead.
I actually sympathize strongly with that attitude. But, honestly, your
friend is a programmer (or at least pretends to be one *wink*).

Actually, he's probably written more code than you, me, and ten other
random decent programmers put together. As he can slap out massive
amounts of code very quickly, he'd prefer not to have crap getting in
his way. In the time it takes him to look something up, he might have
written another page of code.

He's perfectly capable of dealing with crap, as years of writing large
programs in Perl and PHP quickly proves, but his whole reason for
learning Python, I take it, is so that he will be bothered with less
crap and therefore write code even faster.
You can't be a programmer without memorizing stuff: syntax, function
calls, modules to import, quoting rules, blah blah blah. Take C as
an example -- there's absolutely nothing about () that says "group
expressions or call a function" and {} that says "group a code
block".

I don't really think that this is a good analogy. It's like the
difference between remembering rules of grammar and remembering
English spelling. As a kid, I was the best in my school at grammar,
and one of the worst at speling.
You just have to memorize it. If you don't know what a backslash
escape is going to do, why would you use it?

(1) You're looking at code that someone else wrote, or (2) you forget
to type "\\" instead of "\" in your code (or get lazy sometimes), as
that is okay most of the time, and you inadvertently get a subtle bug.
This is especially important when reading (as opposed to writing) code.
You read somebody else's code, and see "foo\xbar\n". Let's say you know
it compiles without warning. Big deal -- you don't know what the escape
codes do unless you've memorized them. What does \n resolve to? chr(13)
or chr(97) or chr(0)? Who knows?

It *is* a big deal. Or at least a non-trivial deal. It means that you
can tell just by looking at the code that there are funny characters
in the string, and not just a backslashes. You don't have to go
running for the manual every time you see code with backslashes, where
the upshot might be that the programmer was merely saving themselves
some typing.
So what you mean is, he can just look at "foo\xbar\n" AND COMPILE IT
USING g++, and know whether or not \x is a special character.

I'm not sure that your comments are paying due diligence to full
life-cycle software development issues that involve multiple
programmers (or even just your own program that you wrote a year ago,
and you don't remember all the details of what you did) combined with
maintaining and modifying existing code, etc.
Aside:
\x isn't a special character:


ValueError: invalid \x escape

I think that this all just goes to prove my friend's point! Here I've
been programming in Python for more than a decade (not full time, mind
you, as I also program in other languages, like C++), and even I
didn't know that "\xba" was an escape sequence, and I inadvertently
introduced a subtle bug into my argument because it just so happens
that the first two characters of "bar" are legal hexadecimal! If I did
the very same thing in a real program, it might take me a lot of time
to track down the bug.

Also, it seems that Python is being inconsistent here. Python knows
that
the string "\x" doesn't contain a full escape sequence, so why doesn't
it
treat the string "\x" the same way that it treats the string "\z"?
After all, if you're a Python programmer, you should know that "\x"
doesn't contain a complete escape sequence, and therefore, you would
not be surprised if Python were so kind as to just leave it alone,
rather than raising a ValueError.

I.e., "\z" is not a legal escape sequence, so it gets left as
"\\z". "\x" is not a legal escape sequence. Shouldn't it also get left
as "\\x"?
Now your friend is confused. This is a good thing. Any backslash you see
in Python's default string output is *always* an escape:

Well, I think he's more annoyed that if Python is going to be so
helpful as to put in the missing "\" for you in "foo\zbar", then it
should put in the missing "\" for you in "\". He considers this to be
an
inconsistency.

Me, I'd never, ever, EVER want a language to special-case something at
the end of a string, but I can see that from his new-to-Python
perspective, Python seems to be DWIMing in one place and not the
other, and he thinks that it should either do no DWIMing at all, or
consistently DWIM. To not be consistent in this regard is "inelegant",
says he.

And I can see his point that allowing "foo\zbar" and "foo\\zbar" to be
synonymous is a form of DWIMing.
And if anyone had cared enough to raise it a couple of years back, it
possibly might have been.

So, now if only my friend had learned Python years ago, when I told
him to, he possibly might be happy with Python by now!

|>ouglas
 
C

Carl Banks

*Any* change in context is significant with escapes.

"this \nhas two lines"

If you change the \n to a \t you get a significant difference. If you
change the \n to a \y you get a significant difference. Why is the first
one acceptable but the second not?

Because when you change \n to \t, you've haven't changed the meaning
of the \ character; but when you change \n to \y, you have, and you
did so without even touching the backslash.

I've never had any errors caused by this.

Thank you for your anecdotal evidence. Here's mine: This has gotten
me at least twice, and a compiler complaint would have reduced my bug-
hunting time from tens of minutes to ones of seconds. [Aside: it was
when I was using Python on Windows for the first time]

I've never seen anyone write to
this newsgroup confused over escape behaviour, or asking for help with an
error caused by it, and until this thread, never seen anyone complain
about it either.

More anecdotal evidence. Here's mine: I have.

Excuse my cynicism, but I believe that you are using "error-prone" to
mean "I don't like this behaviour" rather than "it causes lots of errors"..

No, I'm using error-prone to mean error-prone.

Someone (obviously not you because you're have perfect knowledge of
the language and 100% situation awareness at all times) might have a
string like "abcd\stuv" and change it to "abcd\tuvw" without even
thinking about the fact that the s comes after the backslash.

Worst of all: they might not even notice the error, because the repr
of this string is:

'abcd\tuwv'

They might not notice that the backslash is single, because (unlike
you) mortal fallible human beings don't always register tiny details
like a backslash being single when it should be double.

Point is, this is a very bad inconsistency. It makes the behavior of
\ impossible to learn by analogy, now you have to memorize a list of
situations where it behaves one way or another.


Carl Banks
 
D

Douglas Alan

On Aug 10, 2:10 am, Steven D'Aprano
I've never had any errors caused by this.

But you've seen an error caused by this, in this very discussion.
I.e., "foo\xbar".

"\xba" isn't an escape sequence in any other language that I've used,
which is one reason I made this error... Oh, wait a minute -- it *is*
an escape sequence in JavaScript. But in JavaScript, while "\xba" is a
special character, "\xb" is synonymous with "xb".

The fact that every language seems to treat these things similarly but
differently, is yet another reason why they should just be treated
utterly consistently by all of the languages: I.e., escape sequences
that don't have a special meaning should be an error!
I've never seen anyone write to
this newsgroup confused over escape behaviour,

My friend objects strongly the claim that he is "confused" by it, so I
guess you are right that no one is confused. He just thinks that it
violates the beautiful sense of aesthetics that he was sworn over and
over again Python to have.

But aesthetics is a non-negligible issue with practical ramifications.
(Not that anything can be done about this wart at this point,
however.)
or asking for help with an error caused by it, and until
this thread, never seen anyone complain about it either.

Oh, this bothered me too when I first learned Python, and I thought it
was stupid. It just didn't bother me enough to complain publicly.

Besides, the vast majority of Python noobs don't come here, despite
appearance sometimes, and by the time most people get here, they've
probably got bigger fish to fry.

|>ouglas
 
S

Steven D'Aprano

Because when you change \n to \t, you've haven't changed the meaning of
the \ character;

I assume you mean the \ character in the literal, not the (non-existent)
\ character in the string.

but when you change \n to \y, you have, and you did so
without even touching the backslash.

Not at all.

'\n' maps to the string chr(10).
'\y' maps to the string chr(92) + chr(121).

In both cases the backslash in the literal have the same meaning: grab
the next token (usually a single character, but not always), look it up
in a mapping somewhere, and insert the result in the string object being
built.

(I don't know if the *implementation* is precisely as described, but
that's irrelevant. It's still functionally a mapping.)


I've never had any errors caused by this.

Thank you for your anecdotal evidence. Here's mine: This has gotten me
at least twice, and a compiler complaint would have reduced my bug-
hunting time from tens of minutes to ones of seconds. [Aside: it was
when I was using Python on Windows for the first time]

Okay, that's twice in, how many years have you been programming?

I've mistyped "xrange" as "xrnage" two or three times. Does that make
xrange() "an error-prone mess" too? Probably not. Why is my mistake my
mistake, but your mistake the language's fault?


[...]

Oh, wait, no, I tell I lie -- I *have* seen people reporting "bugs" here
caused by backslashes. They're invariably Windows programmers writing
pathnames using backslashes, so I'll give you that one: if you don't know
that Python treats backslashes as special in string literals, you will
screw up your Windows pathnames.

Interestingly, the problem there is not that \y resolves to literal
backslash followed by y, but that \t DOESN'T resolve to the expected
backslash-t. So it seems to me that the problem for Windows coders is not
that \y doesn't raise an error, but the mere existence of backslash
escapes.


Someone (obviously not you because you're have perfect knowledge of the
language and 100% situation awareness at all times) might have a string
like "abcd\stuv" and change it to "abcd\tuvw" without even thinking
about the fact that the s comes after the backslash.

Deary me. And they might type "4+15" instead of "4*51", and now
arithmetic is an "error-prone mess" too. If you know of a programming
language which can prevent you making semantic errors, please let us all
know what it is.

If you edit code without thinking, you will be burnt, and you get *zero*
sympathy from me.

Worst of all: they might not even notice the error, because the repr of
this string is:

'abcd\tuwv'

They might not notice that the backslash is single, because (unlike you)
mortal fallible human beings don't always register tiny details like a
backslash being single when it should be double.

"Help help, 123145 looks too similar to 1231145, and now I calculated my
taxes wrong and will go to jail!!!"

Point is, this is a very bad inconsistency. It makes the behavior of \
impossible to learn by analogy, now you have to memorize a list of
situations where it behaves one way or another.

No, you don't "have" to memorize anything, you can go right ahead and
escape every backslash, as I did for years. Your code will still work
fine.

You already have to memorize what escape codes return special characters.
The only difference is whether you learn "...and everything else raises
an exception" or "...and everything else is returned unchanged".

There is at least one good reason for preferring an error, namely that it
allows Python to introduce new escape codes without going through a long,
slow process. But the rest of these complaints are terribly unconvincing.
 
S

Steven D'Aprano

On Aug 10, 2:10 am, Steven D'Aprano


But you've seen an error caused by this, in this very discussion. I.e.,
"foo\xbar".


Your complaint is that "invalid" escapes like \y resolve to a literal
backslash-y instead of raising an error. But \xbar doesn't contain an
invalid escape, it contains a valid hex escape. Your ignorance that \xHH
is a valid hex escape (for suitable hex digits) isn't an example of an
error caused by "invalid" escapes like \y.


"\xba" isn't an escape sequence in any other language that I've used,
which is one reason I made this error... Oh, wait a minute -- it *is* an
escape sequence in JavaScript. But in JavaScript, while "\xba" is a
special character, "\xb" is synonymous with "xb".

The fact that every language seems to treat these things similarly but
differently, is yet another reason why they should just be treated
utterly consistently by all of the languages: I.e., escape sequences
that don't have a special meaning should be an error!

Perhaps all the other languages should follow Python's lead instead?

Or perhaps they should follow bash's lead, and map \C to C for every
character. If there were no special escapes at all, Windows programmers
wouldn't keep getting burnt when they write "C:\\Documents\today\foo" and
end up with something completely unexpected.

Oh wait, no, that still wouldn't work, because they'd end up with
C:\Documentstodayfoo. So copying bash doesn't work.

But copying C will upset the bash coders, because they'll write
"some\ file\ with\ spaces" and suddenly their code won't even compile!!!

Seems like no matter what you do, you're going to upset *somebody*.


My friend objects strongly the claim that he is "confused" by it, so I
guess you are right that no one is confused. He just thinks that it
violates the beautiful sense of aesthetics that he was sworn over and
over again Python to have.

Fair enough.
 
S

Steven D'Aprano

In C++, if I know that the code I'm looking at compiles, then I never
need worry that I've misinterpreted what a string literal means.

If you don't know what your string literals are, you don't know what your
program does. You can't expect the compiler to save you from semantic
errors. Adding escape codes into the string literal doesn't change this
basic truth.

Semantics matters, and unlike syntax, the compiler can't check it.
There's a difference between a program that does the equivalent of:

os.system("cp myfile myfile~")

and one which does this

os.system("rm myfile myfile~")


The compiler can't save you from typing 1234 instead of 11234, or 31.45
instead of 3.145, or "My darling Ho" instead of "My darling Jo", so why
do you expect it to save you from typing "abc\d" instead of "abc\\d"?

Perhaps it can catch *some* errors of that type, but only at the cost of
extra effort required to defeat the compiler (forcing the programmer to
type \\d to prevent the compiler complaining about \d). I don't think the
benefit is worth the cost. You and your friend do. Who is to say you're
right?


At
least not if it doesn't have any escape characters in it that I'm not
familiar with. But in Python, if I see, "\f\o\o\b\a\z", I'm not really
sure what I'm seeing, as I surely don't have committed to memory some of
the more obscure escape sequences. If I saw this in C++, and I knew that
it was in code that compiled, then I'd at least know that there are some
strange escape codes that I have to look up.

And if you saw that in Python, you'd also know that there are some
strange escape codes that you have to look up. Fortunately, in Python,
that's really simple:
'\x0c\\o\\o\x08\x07\\z'

Immediately you can see that the \o and \z sequences resolve to
themselves, and the \f \b and \a don't.


Unlike with Python, it
would never be the case in C++ code that the programmer who wrote the
code was just too lazy to type in "\\f\\o\\o\\b\\a\\z" instead.

But if you see "abc\n", you can't be sure whether the lazy programmer
intended "abc"+newline, or "abc"+backslash+"n". Either way, the compiler
won't complain.


(1) You're looking at code that someone else wrote, or (2) you forget to
type "\\" instead of "\" in your code (or get lazy sometimes), as that
is okay most of the time, and you inadvertently get a subtle bug.

The same error can occur in C++, if you intend \\n but type \n by
mistake. Or vice versa. The compiler won't save you from that.


It *is* a big deal. Or at least a non-trivial deal. It means that you
can tell just by looking at the code that there are funny characters in
the string, and not just a backslashes.

I'm not entirely sure why you think that's a big deal. Strictly speaking,
there are no "funny characters", not even \0, in Python. They're all just
characters. Perhaps the closest is newline (which is pretty obvious).


You don't have to go running for
the manual every time you see code with backslashes, where the upshot
might be that the programmer was merely saving themselves some typing.

Why do you care if there are "funny characters"?

In C++, if you see an escape you don't recognize, do you care? Do you go
running for the manual? If the answer is No, then why do it in Python?

And if the answer is Yes, then how is Python worse than C++?


[...]
Also, it seems that Python is being inconsistent here. Python knows that
the string "\x" doesn't contain a full escape sequence, so why doesn't
it
treat the string "\x" the same way that it treats the string "\z"? [...]
I.e., "\z" is not a legal escape sequence, so it gets left as "\\z".

No. \z *is* a legal escape sequence, it just happens to map to \z.

If you stop thinking of \z as an illegal escape sequence that Python
refuses to raise an error for, the problem goes away. It's a legal escape
sequence that maps to backslash + z.


"\x" is not a legal escape sequence. Shouldn't it also get left as
"\\x"?

No, because it actually is an illegal escape sequence.


Well, I think he's more annoyed that if Python is going to be so helpful
as to put in the missing "\" for you in "foo\zbar", then it should put
in the missing "\" for you in "\". He considers this to be an
inconsistency.

(1) There is no missing \ in "foo\zbar".

(2) The problem with "\" isn't a missing backslash, but a missing end-
quote.




Me, I'd never, ever, EVER want a language to special-case something at
the end of a string, but I can see that from his new-to-Python
perspective, Python seems to be DWIMing in one place and not the other,
and he thinks that it should either do no DWIMing at all, or
consistently DWIM. To not be consistent in this regard is "inelegant",
says he.

Python isn't DWIMing here. The rules are simple and straightforward,
there's no mind-reading or guessing required. There is no heuristic
trying to predict what the user intends. It's a simple rule:

When parsing a string literal (apart from raw strings), if you see a
backslash, then grab the next token (usually a single character, but for
\x and \0 it could be multiple characters). If there is a mapping
available for that token, insert that in the string being built, and if
not, insert the backslash and the token.

(As I said earlier, this may not be precisely how it is implemented, but
functionally, it is what Python does.)

And I can see his point that allowing "foo\zbar" and "foo\\zbar" to be
synonymous is a form of DWIMing.

Is it "a form of DWIMing" to consider 1.234e1 and 12.34 synonymous?

What about 86 and 0x44? Is that DWIMing?

I'm sure both you and your friend are excellent programmers, but you're
tossing around DWIM as a meaningless term of opprobrium without any
apparent understand of what DWIM actually is.
 
M

MRAB

Steven said:
On Sun, 09 Aug 2009 17:56:55 -0700, Douglas Alan wrote:
[snip]
My point of view is that
every language has *some* warts; Python just has a bit fewer than most.
It would have been nice, I should think, if this wart had been "fixed"
in Python 3, as I do consider it to be a minor wart.

And if anyone had cared enough to raise it a couple of years back, it
possibly might have been.
My preference would've been that a backslash followed by A-Z, a-z, or
0-9 is special, but a backslash followed by any other character is just
the character, except for backslash followed by a newline, which
suppresses the newline.

I would also have preferred a backslash in a raw string to always be a
literal.

Ah well, something for Python 4.x. :)
 
D

Douglas Alan

On Aug 10, 4:37 am, Steven D'Aprano
There is at least one good reason for preferring an error, namely that it
allows Python to introduce new escape codes without going through a long,
slow process. But the rest of these complaints are terribly unconvincing.


What about:

o Beautiful is better than ugly
o Explicit is better than implicit
o Simple is better than complex
o Readability counts
o Special cases aren't special enough to break the rules
o Errors should never pass silently

?

And most importantly:

o In the face of ambiguity, refuse the temptation to guess.
o There should be one -- and preferably only one -- obvious way to
do it.

?

So, what's the one obvious right way to express "foo\zbar"? Is it

"foo\zbar"

or

"foo\\zbar"

And if it's the latter, what possible benefit is there in allowing the
former? And if it's the former, why does Python echo the latter?

|>ouglas
 
D

Douglas Alan

 The string rules reflect C's rules, and I see little
excuse for trying to change them now.

No they don't. Or at least not C++'s rules. C++ behaves exactly as I
should like.

(Or at least g++ does. Or rather *almost* as I would like, as by
default it generates a warning for "foo\zbar", while I think that an
error would be somewhat preferable.)

But you're right, it's too late to change this now.

|>ouglas
 
C

Carl Banks

Steven said:
On Sun, 09 Aug 2009 17:56:55 -0700, Douglas Alan wrote:
[snip]
My point of view is that
every language has *some* warts; Python just has a bit fewer than most..
It would have been nice, I should think, if this wart had been "fixed"
in Python 3, as I do consider it to be a minor wart.
And if anyone had cared enough to raise it a couple of years back, it
possibly might have been.

My preference would've been that a backslash followed by A-Z, a-z, or
0-9 is special, but a backslash followed by any other character is just
the character, except for backslash followed by a newline, which
suppresses the newline.

That would be reasonable; it'd match the behavior of regexps.


Carl Banks
 
C

Carl Banks

I assume you mean the \ character in the literal, not the (non-existent)
\ character in the string.


Not at all.

'\n' maps to the string chr(10).
'\y' maps to the string chr(92) + chr(121).

In both cases the backslash in the literal have the same meaning: grab
the next token (usually a single character, but not always), look it up
in a mapping somewhere, and insert the result in the string object being
built.

That is a ridiculous rationalization. Nobody sees "\y" in a string
and thinks "it's an escape sequence that returns the bytes '\y'".


[snip rest, because an argument in favor inconsistent, context-
dependent behavior doesn't need any further refutation than to point
out that it is an argument in favor of inconsistent, context-dependent
behavior]


Carl Banks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top