Stack Overflow moderator “animusonâ€

J

Joshua Landau

I don't know what you mean by that, but since the joke appears to have
flown over your head, I'll explain it. Steven's "pos" was clearly
mea

What? I don't understand.
 
J

Joshua Landau

<Unjustified Insult>. [anumuson from Stack Overflow] has deleted all
my postings regarding Python regular expression matching being
extremely slow compared to Perl. Additionally my account has been
suspended for 7 days. <Unjustified Insult>.

Whilst I don't normally respond to trolls, I'm actually curious.

Do you have any non-trivial, properly benchmarked real-world examples
that this affects, remembering to use full Unicode support in Perl (as
Python has it by default)?

Remember to try on both major CPython versions, and PyPy -- all of
which are in large-scale usage. Remember not just to use the builtin
re module, as most people also use https://pypi.python.org/pypi/regex
and https://code.google.com/p/re2/ when they are appropriate, so
pathological cases for re aren't actually a concern anyone cares
about.

If you actually can satisfy these basic standards for a comparison (as
I'm sure any competent person with so much bravo could) I'd be willing
to converse with you. I'd like to see these results where Python compares
as "extremely slow". Note that, by your own wording, a 30% drop is irrelevant.
 
A

Antoon Pardon

Op 10-07-13 11:03, Mats Peterson schreef:
Not a troll. It's just hard to convince Python users that their beloved
language would have inferior regular expression performance to Perl.

All right, you have convinced me. Now what? Why should I care?
 
J

Joshua Landau

Google Groups is writing about your recently sent mail to "Joshua
Landau". Unfortunately this address has been discontinued from usage
for the foreseeable future. The sent message is displayed below:
 
S

Steve Simmons

Steven D'Aprano said:
That's by design. We don't want to make the same mistake as Perl, where

every problem is solved by a regular expression:

http://neilk.net/blog/2000/06/01/abigails-regex-to-test-for-prime-numbers/

so we deliberately make regexes as slow as possible so that programmers

will look for a better way to solve their problem. If you check the
source code for the re engine, you'll find that for certain regexes, it

busy-waits for anything up to 30 seconds at a time, deliberately
wasting
cycles.

The same with Unicode. We hate French people, you see, and so in an
effort to drive everyone back to ASCII-only text, Python 3.3 introduces

some memory optimizations that ensures that Unicode strings work
correctly and are up to four times smaller than they used to be. You
should get together with jmfauth, who has discovered our dastardly plot

and keeps posting benchmarks showing how on carefully contrived micro-
benchmarks using a beta version of Python 3.3, non-ASCII string
operations can be marginally slower than in 3.2.


dickwad.

I cannot imagine why he would have done that.

:) Thank you.

Sent from a Galaxy far far away
 
S

Skip Montanaro

... meant to be the word "posted", before his sentence got cut off by the
Python Secret Underground.

Argh! That which shall not be named! Please, for the sake of all that
is right, please only use the initials, PS
 
M

Mats Peterson

Antoon Pardon said:
Op 10-07-13 11:03, Mats Peterson schreef:

All right, you have convinced me. Now what? Why should I care?

Right. Why should you. And who cares about you?

Mats
 
M

Mats Peterson

Joshua Landau said:
<Unjustified Insult>. [anumuson from Stack Overflow] has deleted all
my postings regarding Python regular expression matching being
extremely slow compared to Perl. Additionally my account has been
suspended for 7 days. <Unjustified Insult>.

Whilst I don't normally respond to trolls, I'm actually curious.

Do you have any non-trivial, properly benchmarked real-world examples
that this affects, remembering to use full Unicode support in Perl (as
Python has it by default)?

Remember to try on both major CPython versions, and PyPy -- all of
which are in large-scale usage. Remember not just to use the builtin
re module, as most people also use https://pypi.python.org/pypi/regex
and https://code.google.com/p/re2/ when they are appropriate, so
pathological cases for re aren't actually a concern anyone cares
about.

If you actually can satisfy these basic standards for a comparison (as
I'm sure any competent person with so much bravo could) I'd be willing
to converse with you. I'd like to see these results where Python compares
as "extremely slow". Note that, by your own wording, a 30% drop is irrelevant.

I haven't provided a "real-world" example, since I expect you Python
Einsteins to be able do an A/B test between Python and Perl yourselves
(provided you know Perl, of course, which I'm afraid is not always the
case). And why would I use any "custom" version of Python, when I don't
have to do that with Perl?

Mats
 
M

Mats Peterson

Steven D'Aprano said:
That's by design. We don't want to make the same mistake as Perl, where
every problem is solved by a regular expression:

http://neilk.net/blog/2000/06/01/abigails-regex-to-test-for-prime-numbers/

so we deliberately make regexes as slow as possible so that programmers
will look for a better way to solve their problem. If you check the
source code for the re engine, you'll find that for certain regexes, it
busy-waits for anything up to 30 seconds at a time, deliberately wasting
cycles.

The same with Unicode. We hate French people, you see, and so in an
effort to drive everyone back to ASCII-only text, Python 3.3 introduces
some memory optimizations that ensures that Unicode strings work
correctly and are up to four times smaller than they used to be. You
should get together with jmfauth, who has discovered our dastardly plot
and keeps posting benchmarks showing how on carefully contrived micro-
benchmarks using a beta version of Python 3.3, non-ASCII string
operations can be marginally slower than in 3.2.



I cannot imagine why he would have done that.

You're obviously trying hard to be funny. It fails miserably.

Mats
 
M

Mats Peterson

Chris Angelico said:
You do? And you haven't noticed the inferior performance of regular
expressions in Python compared to Perl? Then you obviously haven't
used them a lot.

That would be correct. Why have I not used them all that much? Because
Python has way better ways of doing many things. Regexps are
notoriously hard to debug, largely because a nonmatching regex can't
give much information about _where_ it failed to match, and when I
parse strings, it's more often with (s)scanf notation instead - stuff
like this (Pike example as Python doesn't, afaik, have scanf support):
data="Hello, world! I am number 42.";
sscanf(data,"Hello, %s! I am number %d.",foo,x); (3) Result: 2
foo; (4) Result: "world"
x;
(5) Result: 42

Or a more complicated example:

sscanf(Stdio.File("/proc/meminfo")->read(),"%{%s: %d%*s\n%}",array data);
mapping meminfo=(mapping)data;

That builds up a mapping (Pike terminology for what Python calls a
dict) with the important information out of /proc/meminfo, something
like this:

([
"MemTotal": 2026144,
"MemFree": 627652,
"Buffers": 183572,
"Cached": 380724,
..... etc etc
])

So, no. I haven't figured out that Perl's regular expressions
outperform Python's or Pike's or SciTE's, because I simply don't need
them all that much. With sscanf, I can at least get a partial match,
which tells me where to look for the problem.

ChrisA

You're showing by these examples what regular expressions mean to you.

Mats
 
M

memilanuk

Or us Brits.

Or the Yanks...

Normally I kill-file threads like this pretty early on, but I have to
admit - I'm enjoying watching y'all play with the troll this time ;)
 
S

Steven D'Aprano

Ahhh.... so this is pos, right? Telling the truth? Interesting.


Mats, I fear you have misunderstood. If the Python Secret Underground
existed, which it most certainly does not, it would absolutely not have
the power to censor people's emails or cut them off in the middle of
 
J

Joshua Landau

That's by design. We don't want to make the same mistake as Perl, where
every problem is solved by a regular expression:

http://neilk.net/blog/2000/06/01/abigails-regex-to-test-for-prime-numbers/

so we deliberately make regexes as slow as possible so that programmers
will look for a better way to solve their problem. If you check the
source code for the re engine, you'll find that for certain regexes, it
busy-waits for anything up to 30 seconds at a time, deliberately wasting
cycles.

I hate to sound like this but do you realise that this is exactly what
you're arguing for when saying that sum() shouldn't use "+="?

(There is no spite in the above sentence, but it sounds like there is.
There is however no way obvious to me to remove it without changing
the sentence's meaning.)
The same with Unicode. We hate French people,

And for good damn reason too. They're ruining our language, á mon avis..
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,129
Messages
2,570,770
Members
47,326
Latest member
Itfrontdesk

Latest Threads

Top