Blog "about python 3"

MRAB · Jan 5, 2014

On the flip side, that might be the best salesman your company has
ever known, if those three cities have the most customers!

ChrisA
wondering why nobody cares about the customers in TSP discussions

Or, for that matter, ISP customers who don't live in an urban area.

Steven D'Aprano · Jan 5, 2014

The very interesting aspect in the way you are holding
unicodes (strings). By comparing Python 2 with Python 3.3,
you are comparing utf-8 with the the internal "representation"
of Python 3.3 (the flexible string represenation).

This is incorrect. Python 2 has never used UTF-8 internally for Unicode
strings. In narrow builds, it uses UTF-16, but makes no allowance for
surrogate pairs in strings. In wide builds, it uses UTF-32.

Other implementations, such as Jython or IronPython, may do something else.

Roy Smith · Jan 5, 2014

Fast is never more important than correct.

Sure it is.

Let's imagine you're building a system which sorts packages for
delivery. You sort 1 million packages every night and put them on
trucks going out for final delivery.

Some assumptions:

Every second I can cut from the sort time saves me $0.01.

If I mis-sort a package, it goes out on the wrong truck, doesn't get
discovered until the end of the day, and ends up costing me $5
(including not just the direct cost of redelivering it, but also
factoring in ill will and having to make the occasional refund for not
meeting the promised delivery time).

I've got a new sorting algorithm which is guaranteed to cut 10 seconds
off the sorting time (i.e. $0.10 per package). The problem is, it makes
a mistake 1% of the time.

Let's see:

1 million packages x $0.10 = $100,000 saved per day because I sort them
faster. 10,000 of them will go to the wrong place, and that will cost
me $50,000 per day. By going fast and making mistakes once in a while,
I increase my profit by $50,000 per day.

The numbers above are fabricated, but I'm sure UPS, FexEx, and all the
other package delivery companies are doing these sorts of analyses every
day. I watch the UPS guy come to my house. He gets out of his truck,
walks to my front door, rings the bell, waits approximately 5
microseconds, leaves the package on the porch, and goes back to his
truck. I'm sure UPS has figured out that the amortized cost of the
occasional stolen or lost package is less than the cost for the delivery
guy to wait for me to come to the front door and sign for the delivery.

Looking at another problem domain, let's say you're a contestant on
Jeopardy. If you listen to the entire clue and spend 3 seconds making
sure you know the correct answer before hitting the buzzer, it doesn't
matter if you're right or wrong. Somebody else beat you to the buzzer,
2.5 seconds ago.

Or, let's take an example from sports. I'm standing at home plate
holding a bat. 60 feet away from me, the pitcher is about to throw a
baseball towards me at darn close to 100 MPH (insert words like "bowl"
and "wicket" as geographically appropriate). 400 ms later, the ball is
going to be in the catcher's glove if you don't hit it. If you have an
absolutely perfect algorithm to determining if it's a ball or a strike,
which takes 500 ms to run, you're going back to the minor leagues. If
you have a 300 ms algorithm which is right 75% of the time, you're
heading to the hall of fame.

Chris Angelico · Jan 5, 2014

I've got a new sorting algorithm which is guaranteed to cut 10 seconds
off the sorting time (i.e. $0.10 per package). The problem is, it makes
a mistake 1% of the time.

That's a valid line of argument in big business, these days, because
we've been conditioned to accept low quality. But there are places
where quality trumps all, and we're happy to pay for that. Allow me to
expound two examples.

1) Amazon

http://www.amazon.com/exec/obidos/ASIN/1782010165/evertype-20

I bought this book a while ago. It's about the size of a typical
paperback. It arrived in a box too large for it on every dimension,
with absolutely no packaging. I complained. Clearly their algorithm
was: "Most stuff will get there in good enough shape, so people can't
be bothered complaining. And when they do complain, it's cheaper to
ship them another for free than to debate with them on chat." Because
that's what they did. Fortunately I bought the book for myself, not
for a gift, because the *replacement* arrived in another box of the
same size, with ... one little sausage for protection. That saved it
in one dimension out of three, so it arrived only slightly
used-looking instead of very used-looking. And this a brand new book.
When I complained the second time, I was basically told "any
replacement we ship you will be exactly the same". Thanks.

2) Bad Monkey Productions

http://kck.st/1bgG8Pl

The cheapest the book itself will be is $60, and the limited edition
early ones are more (I'm getting the gold level book, $200 for one of
the first 25 books, with special sauce). The people producing this are
absolutely committed to quality, as are the nearly 800 backers. If
this project is delayed slightly in order to ensure that we get
something fully awesome, I don't think there will be complaints. This
promises to be a beautiful book that'll be treasured for generations,
so quality's far FAR more important than the exact delivery date.

I don't think we'll ever see type #2 become universal, for the same
reason that people buy cheap Chinese imports in the supermarket rather
than something that costs five times as much from a specialist. The
expensive one might be better, but why bother? When the cheap one
breaks, you just get another. The expensive one might fail too, so why
take that risk?

But it's always a tradeoff, and there'll always be a few companies
around who offer the more expensive product. (We have a really high
quality cheese slicer. It's still the best I've seen, after something
like 20 years of usage.) Fast or right? It'd have to be really
*really* fast to justify not being right, unless the lack of rightness
is less than measurable (like representing time in nanoseconds -
anything smaller than that is unlikely to be measurable on most
computers).

ChrisA

Rustom Mody · Jan 5, 2014

Sure it is.

Let's imagine you're building a system which sorts packages for
delivery. You sort 1 million packages every night and put them on
trucks going out for final delivery.

Some assumptions:

Every second I can cut from the sort time saves me $0.01.

If I mis-sort a package, it goes out on the wrong truck, doesn't get
discovered until the end of the day, and ends up costing me $5
(including not just the direct cost of redelivering it, but also
factoring in ill will and having to make the occasional refund for not
meeting the promised delivery time).

I've got a new sorting algorithm which is guaranteed to cut 10 seconds
off the sorting time (i.e. $0.10 per package). The problem is, it makes
a mistake 1% of the time.

Let's see:

1 million packages x $0.10 = $100,000 saved per day because I sort them
faster. 10,000 of them will go to the wrong place, and that will cost
me $50,000 per day. By going fast and making mistakes once in a while,
I increase my profit by $50,000 per day.

The numbers above are fabricated, but I'm sure UPS, FexEx, and all the
other package delivery companies are doing these sorts of analyses every
day. I watch the UPS guy come to my house. He gets out of his truck,
walks to my front door, rings the bell, waits approximately 5
microseconds, leaves the package on the porch, and goes back to his
truck. I'm sure UPS has figured out that the amortized cost of the
occasional stolen or lost package is less than the cost for the delivery
guy to wait for me to come to the front door and sign for the delivery.

Looking at another problem domain, let's say you're a contestant on
Jeopardy. If you listen to the entire clue and spend 3 seconds making
sure you know the correct answer before hitting the buzzer, it doesn't
matter if you're right or wrong. Somebody else beat you to the buzzer,
2.5 seconds ago.

Or, let's take an example from sports. I'm standing at home plate
holding a bat. 60 feet away from me, the pitcher is about to throw a
baseball towards me at darn close to 100 MPH (insert words like "bowl"
and "wicket" as geographically appropriate). 400 ms later, the ball is
going to be in the catcher's glove if you don't hit it. If you have an
absolutely perfect algorithm to determining if it's a ball or a strike,
which takes 500 ms to run, you're going back to the minor leagues. If
you have a 300 ms algorithm which is right 75% of the time, you're
heading to the hall of fame.

Neat examples -- thanks
Only minor quibble isnt $5 cost of mis-sorting a gross underestimate?

I am reminded of a passage of Dijkstra in Discipline of Programming --
something to this effect

He laments the fact that hardware engineers were not including
overflow checks in machine ALUs.
He explained as follows:
If a test is moderately balanced (statistically speaking) a programmer
will not mind writing an if statement

If however the test is very skew -- say if 99% times, else 1% -- he
will tend to skimp on the test, producing 'buggy' code [EWD would
never use the bad b word or course]

The cost equation for hardware is very different -- once the
investment in the silicon is done with -- fixed cost albeit high --
there is no variable cost to executing that circuitry once or a
zillion times

Moral of Story: Intel should take up FSR
[Ducks and runs for cover]

Roy Smith · Jan 5, 2014

Rustom Mody said:
Neat examples -- thanks
Only minor quibble isnt $5 cost of mis-sorting a gross underestimate?

I have no idea. Like I said, the numbers are all fabricated.

I do have a friend who used to work for UPS. He told me lots of UPS
efficiency stories. One of them had to do with mis-routed packages.
IIRC, the process for dealing with a mis-routed package was to NOT waste
any time trying to figure out why it was mis-routed. It was just thrown
back into the input hopper to go through the whole system again. The
sorting software kept track of how many times it had sorted a particular
package. Only after N attempts (where N was something like 3), was it
kicked out of the automated process for human intervention.

Steven D'Aprano · Jan 5, 2014

Roy said:
Sure it is.

Sure it isn't. I think you stopped reading my post too early.

None of your examples contradict what I am saying. They all involve exactly
the same sort of compromise regarding "correctness" that I'm talking about,
where you loosen what counts as "correct" for the purpose of getting extra
speed. So, for example:

Let's imagine you're building a system which sorts packages for
delivery. You sort 1 million packages every night and put them on
trucks going out for final delivery.

What's your requirement, i.e. what counts as "correct" for the delivery
algorithm being used? Is it that every parcel is delivered to the specified
delivery address the first time? No it is not. What counts as "correct" for
the delivery algorithm is something on the lines of "No less than 95% of
parcels will be sorted correctly and delivered directly; no more than 5%
may be mis-sorted at most three times" (or some similar requirement).

It may even been that the requirements are even looser, e.g.:

"No more than 1% of parcels will be lost/damaged/stolen/destroyed"

in which case they don't care unless a particular driver loses or destroys
more than 1% of his deliveries. But if it turns out that Fred is dumping
every single one of his parcels straight into the river, the fact that he
can make thirty deliveries in the time it takes other drivers to make one
will not save his job. "But it's much faster to dump the parcels in the
river" does not matter. What matters is that the deliveries are made within
the bounds of allowable time and loss.

Things get interesting when the people setting the requirements and the
people responsible for meeting those requirements aren't able to agree.
Then you have customers who complain that the software is buggy, and
developers who complain that the customer requirements are impossible to
provide. Sometimes they're both right.

Looking at another problem domain, let's say you're a contestant on
Jeopardy. If you listen to the entire clue and spend 3 seconds making
sure you know the correct answer before hitting the buzzer, it doesn't
matter if you're right or wrong. Somebody else beat you to the buzzer,
2.5 seconds ago.

I've heard of Jeopardy, but never seen it. But I know about game shows, and
in this case, what you care about is *winning the game*, not answering the
questions correctly. Answering the questions correctly is only a means to
the end, which is "Win". If the rules allow it, your best strategy might
even be to give wrong answers, every time!

(It's not quite a game show, but the British quiz show QI is almost like
that. The rules, if there are any, encourage *interesting* answers over
correct answers. Occasionally that leads to panelists telling what can best
be described as utter porkies[1].)

If Jeopardy does not penalise wrong answers, the "best" strategy might be to
jump in with an answer as quickly as possible, without caring too much
about whether it is the right answer. But if Jeopardy penalises mistakes,
then the "best" strategy might be to take as much time as you can to answer
the question, and hope for others to make mistakes. That's often the
strategy in Test cricket: play defensively, and wait for the opposition to
make a mistake.

Or, let's take an example from sports. I'm standing at home plate
holding a bat. 60 feet away from me, the pitcher is about to throw a
baseball towards me at darn close to 100 MPH (insert words like "bowl"
and "wicket" as geographically appropriate). 400 ms later, the ball is
going to be in the catcher's glove if you don't hit it. If you have an
absolutely perfect algorithm to determining if it's a ball or a strike,
which takes 500 ms to run, you're going back to the minor leagues. If
you have a 300 ms algorithm which is right 75% of the time, you're
heading to the hall of fame.

And if you catch the ball, stick it in your pocket and race through all the
bases, what's that? It's almost certainly faster than trying to play by the
rules. If speed is all that matters, that's what people would do. But it
isn't -- the "correct" strategy depends on many different factors, one of
which is that you have a de facto time limit on deciding whether to swing
or let the ball through.

Your baseball example is no different from the example I gave before. "Find
the optimal path for the Travelling Salesman Problem in a week's time",
versus "Find a close to optimal path in three minutes" is conceptually the
same problem, with the same solution: an imperfect answer *now* can be
better than a perfect answer *later*.

[1] Porkies, or "pork pies", from Cockney rhyming slang.

wxjmfauth · Jan 5, 2014

Le dimanche 5 janvier 2014 03:54:29 UTC+1, Chris Angelico a écrit :

That's for Python's unicode type. What Robin said was that they were

using either a byte string ("str") with UTF-8 data, or a Unicode

string ("unicode") with character data. So jmf was right, except that

it's not specifically to do with Py2 vs Py3.3.

Yes, the key point is the preparation of the "unicode
text" for the PDF producer.

This is at this level the different flavours of Python
may be relevant.

I see four possibilites, I do not know what
the PDF producer API is expecting.

- Py2 with utf-8 byte string (ev. utf-16, utf-32)
- Py2 with its internal unicode
- Py3.2 with its internal unicode
- Py3.3 with its internal unicode

jmf

Johannes Bauer · Jan 5, 2014

Mark said:
Mark said:

http://blog.startifact.com/posts/alex-gaynor-on-python-3.html.

Click to expand...

I quote:

"...perhaps a brave group of volunteers will stand up and fork Python 2, and
take the incremental steps forward. This will have to remain just an idle
suggestion, as I'm not volunteering myself."

I expect that as excuses for not migrating get fewer, and the deadline for
Python 2.7 end-of-life starts to loom closer, more and more haters^W
Concerned People will whine about the lack of version 2.8 and ask for
*somebody else* to fork Python.

I find it, hmmm, interesting, that so many of these Concerned People who say
that they're worried about splitting the Python community[1] end up
suggesting that we *split the community* into those who have moved forward
to Python 3 and those who won't.

Exactly. I don't know what exactly their problem is. I've pushed the
migration of *large* projects at work to Python3 when support was pretty
early and it really wasn't a huge deal.

Specifically because I love pretty much every single aspect that Python3
introduced. The codec support is so good that I've never seen anything
like it in any other programming language and then there's the tons of
beautiful changes (div/intdiv, functools.lru_cache, print(),
datetime.timedelta.total_seconds(), int.bit_length(), bytes/bytearray).

Regards,
Joe

--

Zumindest nicht öffentlich!

Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>

Steven D'Aprano · Jan 5, 2014

Devin said:
Is this statement even falsifiable? Can you conceive of a circumstance
where someone has traded correctness for speed, but where one couldn't
describe it that latter way? I can't.

Every time some programmer "optimises" a piece of code (or, more often,
*thinks* they have optimised it) which introduces bugs into the software,
that's a case where somebody has traded correctness for speed where my
statement doesn't apply. Sometimes the response to the subsequent bug
report is "will not fix", and a retroactive change in the software
requirements. ("Oh, did we say that indexing a string would return a
character? We meant it would return a character, so long as the string only
includes no Unicode characters in the astral planes.") Sometimes it is to
revert the optimisation or otherwise fix the bug.

I accept that there is sometimes a fine line here. I'm assuming that
software applications have their requirements fully documented, which in
the real world is hardly ever the case. Although, even if the requirements
aren't always written down, often they are implicitly understood. (Although
it gets very interesting when the users' understanding and the developers'
understanding is different.)

Take as an example this "torture test" for a mathematical sum function,
where the built-in sum() gets the wrong answer but math.fsum() gets it
right:

py> from math import fsum
py> values = [1e12, 0.0001, -1e12, 0.0001]*10000
py> fsum(values)
2.0
py> sum(values)
2.4413841796875

Here's another example of the same thing, just to prove it's not a fluke:

py> values = [1e17, 1, 1, -1e17]
py> fsum(values)
2.0
py> sum(values)
0.0

The reason for the different results is that fsum() tries hard to account
for intermediate rounding errors and sum() does not. If you benchmark the
two functions, you'll find that sum() is significantly faster than fsum. So
the question to be asked is, does sum() promise to calculate floating point
sums accurately? If so, then this is a bug, probably introduced by the
desire for speed. But in fact, sum() does not promise to calculate floating
point sums accurately. What it promises to do is to calculate the
equivalent of a + b + c + ... for as many values as given, and that's
exactly what it does. Conveniently, that's faster than fsum(), and usually
accurate enough for most uses.

Is sum() buggy? No, of course not. It does what it promises, it's just that
what it promises to do falls short of "calculate floating point summations
to high accuracy".

Now, here's something which *would* be a bug, if sum() did it:

class MyInt(int):
def __add__(self, other):
return MyInt(super(MyInt, self).__add__(other))
def __radd__(self, other):
return MyInt(super(MyInt, self).__radd__(other))
def __repr__(self):
return "MyInt(%d)" % self

Adding a zero MyInt to an int gives a MyInt:

py> MyInt(0) + 23
MyInt(23)

so sum() should do the same thing. If it didn't, if it optimised away the
actual addition because "adding zero to a number can't change anything", it
would be buggy. But in fact, sum() does the right thing:

py> sum([MyInt(0), 23])
MyInt(23)

I think by definition you can
always describe it that way, you just make "what counts as
correctness" be "what the customer wants given the resources
available".

Not quite. "Correct" means "does what the customer wants". Or if there is no
customer, it's "does what you say it will do".

How do we tell when software is buggy? We compare what it actually does to
the promised behaviour, or expected behaviour, and if there is a
discrepancy, we call it a bug. We don't compare it to some ideal that
cannot be met. A bug report that math.pi does not have infinite number of
decimal places would be closed as "Will Not Fix".

Likewise, if your customer pays you to solve the Travelling Salesman Problem
exactly, even if it takes a week to calculate, then nothing short of a
program that solves the Travelling Salesman Problem exactly will satisfy
their requirements. It's no good telling the customer that you can
calculate a non-optimal answer twenty times faster if they want the actual
optimal answer.

(Of course, you may try to persuade them that they don't really need the
optimal solution, or that they cannot afford it, or that you cannot deliver
and they need to compromise.)

The conventional definition, however, is "what the
customer wants, imagining that you have infinite resources".

I don't think the resources really come into it. At least, certainly not
*infinite* resources. fsum() doesn't require infinite resources to
calculate floating point summations to high accuracy. An even more accurate
(but even slower) version would convert each float into a Fraction, then
add the Fractions.

With just
a little redefinition that seems reasonable, you can be made never to
be wrong!

I'm never wrong because I'm always right! *wink*

Let's bring this back to the claim made at the beginning. Someone (Mark?)
made a facetious comment about preferring fast code to correct code.
Someone else (I forget who, and am too lazy to look it up -- Roy Smith
perhaps?) suggested that we accept incorrect code if it is fast quite
often. But I maintain that we don't. If we did, we'd explicitly say:

"Sure, I know this program calculates the wrong answer, but gosh look how
fast it is!"

much like a anecdote I gave about the roadie driving in the wrong direction
who stated "Who cares, we're making great time!".

I maintain that people don't as a rule justify incorrect code on the basis
of it being fast. They claim the code isn't incorrect, that any compromises
made are deliberate and not bugs:

- "sum() doesn't promise to calculate floats to high accuracy, it
promises to give the same answer as if you repeatedly added them
with the + operator."

- "We never promised 100% uptime, we promised four nines uptime."

- "Our anti-virus scanner is blindingly fast, while still identifying
at least 99% of all known computer viruses!"

- "The Unix 'locate' command doesn't do a live search of the file
system because that would be too slow, it uses a snapshot of the
state of the file system."

Is locate buggy because it tells you what files existed the last time the
updatedb command ran, instead of what files exist right now? No, of course
not. locate does exactly what it promises to do.

Chris Angelico · Jan 5, 2014

- "The Unix 'locate' command doesn't do a live search of the file
system because that would be too slow, it uses a snapshot of the
state of the file system."

Is locate buggy because it tells you what files existed the last time the
updatedb command ran, instead of what files exist right now? No, of course
not. locate does exactly what it promises to do.

Even more strongly: We say colloquially that Google, DuckDuckGo, etc,
etc, are tools for "searching the web". But they're not. They're tools
for *indexing* the World Wide Web, and then searching that index. It's
plausible to actually search your file system (and there are times
when you want that), but completely implausible to search the (F or
otherwise) web. We accept the delayed appearance of a page in the
search results because we want immediate results, no waiting a month
to find anything! So the difference between what's technically
promised and what's colloquially described may be more than just
concealing bugs - it may be the vital difference between uselessness
and usefulness. And yet we like the handwave.

ChrisA

Mark Lawrence · Jan 5, 2014

I quote:

"...perhaps a brave group of volunteers will stand up and fork Python 2, and
take the incremental steps forward. This will have to remain just an idle
suggestion, as I'm not volunteering myself."

I expect that as excuses for not migrating get fewer, and the deadline for
Python 2.7 end-of-life starts to loom closer, more and more haters^W
Concerned People will whine about the lack of version 2.8 and ask for
*somebody else* to fork Python.

Should the "somebody else" fork Python, in ten (ish) years time the
Concerned People will be complaining that they can't port their code to
Python 4 and will "somebody else" please produce version 2.9.

Stefan Behnel · Jan 5, 2014

Johannes Bauer, 05.01.2014 13:14:

I've pushed the
migration of *large* projects at work to Python3 when support was pretty
early and it really wasn't a huge deal.

I think there are two sides to consider. Those who can switch their code
base to Py3 and be happy (as you did, apparently), and those who cannot
make the switch but have to keep supporting Py2 until 'everyone' else has
switched, too. The latter is a bit more work generally and applies mostly
to Python packages on PyPI, i.e. application dependencies.

There are two ways to approach that problem. One is to try convincing
people that "Py3 has failed, let's stop migrating more code before I have
to start migrating mine", and the other is to say "let's finish the
migration and get it done, so that we can finally drop Py2 support in our
new releases and clean up our code again".

As long as we stick in the middle and keep the status quo, we keep the
worst of both worlds. And, IMHO, pushing loudly for a Py2.8 release
provides a very good excuse for others to not finish their part of the
migration, thus prolonging the maintenance burden for those who already did
their share.

Maybe a couple of major projects should start dropping their Py2 support,
just to make their own life easier and to help others in taking their
decision, too.

(And that's me saying that, who maintains two major projects that still
have legacy support for Py2.4 ...)

Stefan

wxjmfauth · Jan 5, 2014

Le samedi 4 janvier 2014 23:46:49 UTC+1, Terry Reedy a écrit :

Chris, I appreciate the many contributions you make to this list, but

that does not exempt you from out standard of conduct.

Troll baiting is a form of trolling. I think you are intelligent enough

to know this. Please stop.

If this is true, it is because you have ignored and not read my

numerous, relatively polite posts. To repeat very briefly:

1. Cherry picking (presenting the most extreme case as representative).

2. Calling space saving a problem (repeatedly).

3. Ignoring bug fixes.

4. Repetition (of the 'gazillion example' without new content).

Have you ever acknowledged, let alone thank people for, the fix for the

one bad regression you did find. The FSR is still a work in progress.

Just today, Serhiy pushed a patch speeding up the UTF-32 encoder, after

previously speeding up the UTF-32 decoder.

--

My examples are ONLY ILLUSTRATING, this FSR
is wrong by design, can be on the side of
memory, performance, linguistic or even
typography.

I will not refrain you to waste your time
in adjusting bytes, if the problem is not
on that side.

jmf

Ned Batchelder · Jan 5, 2014

Le samedi 4 janvier 2014 23:46:49 UTC+1, Terry Reedy a écrit :

My examples are ONLY ILLUSTRATING, this FSR
is wrong by design, can be on the side of
memory, performance, linguistic or even
typography.

JMF: this has been pointed out to you time and again: the flexible
string representation is not wrong. To show that it is wrong, you would
have to demonstrate some semantic of Unicode that is violated. You have
never done this. You've picked pathological cases and shown
micro-timing output, and memory usage. The Unicode standard doesn't
promise anything about timing or memory use.

The FSR makes a trade-off of time and space. Everyone but you considers
it a good trade-off. I don't think you are showing real use cases, but
if they are, I'm sorry that your use-case suffers. That doesn't make
the FSR wrong. The most accurate statement is that you don't like the
FSR. That's fine, you're entitled to your opinion.

You say the FSR is wrong linguistically. This can't be true, since an
FSR Unicode string is indistinguishable from an internally-UTF-32
Unicode string, and no, memory use or timings are irrelevant when
discussing the linguistic performance of a Unicode string.

You've also said that the internal representation of the FSR is
incorrect because of encodings somehow. Encodings have nothing to do
with the internal representation of a Unicode string, they are for
interchanging data. You seem to know a lot about Unicode, but when you
make this fundamental mistake, you call all of your expertise into question.

To re-iterate what you are doing wrong:

1) You continue to claim things that are not true, and that you have
never substantiated.

2) You paste code samples without accompanying text that explain what
you are trying to demonstrate.

3) You ignore refutations that disprove your points.

These are all the behaviors of a troll. Please stop.

If you want to discuss the details of Unicode implementations, I'd
welcome an offlist discussion, but only if you will approach it honestly
enough to leave open the possibility that you are wrong. I know I would
be glad to learn details of Unicode that I have missed, but so far you
haven't provided any.

--Ned.

Roy Smith · Jan 5, 2014

Steven D'Aprano said:
How do we tell when software is buggy? We compare what it actually does to
the promised behaviour, or expected behaviour, and if there is a
discrepancy, we call it a bug. We don't compare it to some ideal that
cannot be met. A bug report that math.pi does not have infinite number of
decimal places would be closed as "Will Not Fix".

That's because it is inherently impossible to "fix" that. But lots of
bug reports legitimately get closed with "Will Not Fix" simply because
the added value from fixing it doesn't justify the cost (whether in
terms of development effort, or run-time resource consumption).

Go back to the package sorting example I gave. If the sorting software
mis-reads the address and sends my package to Newark instead of New York
by mistake, that's clearly a bug.

Presumably, it's an error which could be eliminated (or, at least, the
rate of occurrence reduced) by using a more sophisticated OCR algorithm.
But, if those algorithms take longer to run, the overall expected value
of implementing the bug fix software may well be negative.

In the real world, nobody cares if software is buggy. They care that it
provides value.

Roy Smith · Jan 5, 2014

Chris Angelico said:
That's a valid line of argument in big business, these days, because
we've been conditioned to accept low quality. But there are places
where quality trumps all, and we're happy to pay for that. Allow me to
expound two examples.

1) Amazon

http://www.amazon.com/exec/obidos/ASIN/1782010165/evertype-20

I bought this book a while ago. It's about the size of a typical
paperback. It arrived in a box too large for it on every dimension,
with absolutely no packaging. I complained. Clearly their algorithm
was: "Most stuff will get there in good enough shape, so people can't
be bothered complaining. And when they do complain, it's cheaper to
ship them another for free than to debate with them on chat."

You're missing my point.

Amazon's (short-term) goal is to increase their market share by
undercutting everybody on price. They have implemented a box-packing
algorithm which clearly has a bug in it. You are complaining that they
failed to deliver your purchase in good condition, and apparently don't
care. You're right, they don't. The cost to them to manually correct
this situation exceeds the value. This is one shipment. It doesn't
matter. You are one customer, you don't matter either. Seriously.
This may be annoying to you, but it's good business for Amazon. For
them, fast and cheap is absolutely better than correct.

I'm not saying this is always the case. Clearly, there are companies
which have been very successful at producing a premium product (Apple,
for example). I'm not saying that fast is always better than correct.
I'm just saying that correct is not always better than fast.

Roy Smith · Jan 5, 2014

Chris Angelico said:
Can you really run a business by not caring about your customers?

http://snltranscripts.jt.org/76/76aphonecompany.phtml

Dennis Lee Bieber · Jan 5, 2014

I know somebody who was once touring in the States, and ended up travelling
cross-country by road with the roadies rather than flying. She tells me of
the time someone pointed out that they were travelling in the wrong
direction, away from their destination. The roadie driving replied "Who
cares? We're making fantastic time!"

At least it wasn't a neophyte to the Panama Canal... Where the Atlantic
end is to the west of the Pacific end.

Dennis Lee Bieber · Jan 5, 2014

I've heard of Jeopardy, but never seen it. But I know about game shows, and
in this case, what you care about is *winning the game*, not answering the
questions correctly. Answering the questions correctly is only a means to
the end, which is "Win". If the rules allow it, your best strategy might
even be to give wrong answers, every time!

Jeopardy partly derived from the game show scandals of the 50s ($64000
question; where it came out that some contestants were coached on answers).
Jeopardy's clues ARE the answers, and the contestants have to provide a
question that could be answered by that clue. Granted, modern Jeopardy's
responses are all in the "who|what|when|where was ..." realm.

(It's not quite a game show, but the British quiz show QI is almost like
that. The rules, if there are any, encourage *interesting* answers over
correct answers. Occasionally that leads to panelists telling what can best
be described as utter porkies[1].)

Would not work for Jeopardy. Hogwash responses will result in penalty
(losing the question $$$) and permitting the other two contestants to ring
in with their response.

In the early days, one could ring in even while the question was being
read. They now don't activate the button until the "answer" has been
completely read (meaning contestants need to be quick on the button WHEN
the reading is over). The old days meant one could click as the clue was
revealed and thereby lock-in while having the whole reading AND the
15-30second response window in which to make a reply. Now they have to
compete at the end of the reading and then take advantage of only the
15-30second window.

If Jeopardy does not penalise wrong answers, the "best" strategy might be to

It does, though. You lose the amount of the answer (the "jeopardy") for
wrong response questions.

Programming Blog	2	Apr 7, 2024
Python 3 advice	1	Oct 4, 2024
How keep Python 3 moving forward	37	May 23, 2014
Why Python 3?	65	Apr 19, 2014
Python 3: dict & dict.keys()	4	Jul 24, 2013
Question about WEKA, Python and Python-WEKA-Wrapper3	0	Mar 31, 2022
Implementing Longest Common Subsequence (LCS) in Python	0	Sep 11, 2023
Do you feel bad because of the Python docs?	84	Feb 26, 2013

Blog "about python 3"

MRAB

Steven D'Aprano

Roy Smith

Chris Angelico

Rustom Mody

Roy Smith

Steven D'Aprano

wxjmfauth

Johannes Bauer

Steven D'Aprano

Chris Angelico

Mark Lawrence

Stefan Behnel

wxjmfauth

Ned Batchelder

Roy Smith

Roy Smith

Roy Smith

Dennis Lee Bieber

Dennis Lee Bieber

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads