Few things

B

bearophile

Hello,
here are a four more questions (or suggestions) for the language
(probably people have already discussed some of/all such things:

I've seen the contracts for Python:
http://www.wayforward.net/pycontract/
http://www.python.org/peps/pep-0316.html
They look interesting and nice, how Python developers feel about
accepting something like this in the standard language? (Maybe they
are a bit complex).


I think it can be useful a little stat standard module that computes
permutations, combinations, median (quickselect), etc. There is even a
C implementation (of most of them):
http://probstat.sourceforge.net/
Probably some Win users can appreciate to have this already compiled
(and built in).


A command like this:
print 0x9f, 054135
This prints an hex and octal. I think the syntax for the hex is a bit
ugly; and the syntax for the octal looks just dangerous (and wrong) to
me.


In some python source codes that I'm finding around, I find things
like:
def foo():
'''This is just a
silly text'''
....

Because:
def foo():
'''This is just a
silly text'''
print foo.__doc__

Outputs:
This is just a
silly text

I think a better syntax for such multiline strings can be something
like: remove from all the beginnings of the lines successive to the
first one a number of spaces equal to the position of ''' in the
soucecode.
With this sintax such print outputs:
This is just a
silly text

Note: even less indentation of the lines successive the first one can
be simply ignored:
def foo2():
'''This is just a
silly text'''
print foo.__doc__

Outputs:
This is just a
silly text

Hello,
Bearophile
 
J

Josiah Carlson

Hello,
here are a four more questions (or suggestions) for the language
(probably people have already discussed some of/all such things:

I've seen the contracts for Python:
http://www.wayforward.net/pycontract/
http://www.python.org/peps/pep-0316.html
They look interesting and nice, how Python developers feel about
accepting something like this in the standard language? (Maybe they
are a bit complex).

Decorators can do this without additional syntax. Think @accepts and
@returns.

I think it can be useful a little stat standard module that computes
permutations, combinations, median (quickselect), etc. There is even a
C implementation (of most of them):
http://probstat.sourceforge.net/
Probably some Win users can appreciate to have this already compiled
(and built in).

Having a 'faq' for permutation and combination generation would be 99%
of the way there. Quickselect, really, doesn't gain you a whole lot.
Sure, it's a log factor faster to select a median, but many algorithms
involving selecting medians (at least the ones that I run into in CS
theory) end up repeatedly (logn) time selecting the 'kth' smallest
element (varying k's), where sorting would actually run slightly faster.

As for the rest of it, be specific with what you would want to be in
this mythical 'statistics' module ('stat' is already used for the
filesystem stat module). A single-pass average/standard deviation has
already been discussed for such a module, as well as give me all the
k-smallest items of this sequence, etc., but was tossed by Raymond
Hettinger due to the limited demand for such a module.

A command like this:
print 0x9f, 054135
This prints an hex and octal. I think the syntax for the hex is a bit
ugly; and the syntax for the octal looks just dangerous (and wrong) to
me.

Internally those values are Python integers, there would need to be a
special way to tag integers as being originally hex or octal. Or the
pyc would need to store the fact that it was originally one of those
other methods specifically for the print statement.

The preferred way for doing such things (printing some internal type via
some special method) is via string interpolation:
print "0x%x 0%o"%(0x9f, 054135)

Ugly or not, "Special cases aren't special enough to break the rules."
Don't hold your breath for print doing anything special with integers.

In some python source codes that I'm finding around, I find things
like:
def foo():
'''This is just a
silly text'''
...

Because:
def foo():
'''This is just a
silly text'''
print foo.__doc__

Outputs:
This is just a
silly text

I think a better syntax for such multiline strings can be something
like: remove from all the beginnings of the lines successive to the
first one a number of spaces equal to the position of ''' in the
soucecode.
With this sintax such print outputs:
This is just a
silly text

Note: even less indentation of the lines successive the first one can
be simply ignored:
def foo2():
'''This is just a
silly text'''
print foo.__doc__

Outputs:
This is just a
silly text

It is a wart. An option is to use:
def foo():
'''\
This is just a
silly text'''

Me, I just don't use docstrings. I put everything in comments indented
with the code. I have contemplated writing an import hook to do
pre-processing of modules to convert such comments to docstrings, but I
never actually use docstrings, so have never written the hook.


- Josiah
 
J

Josiah Carlson

Josiah Carlson said:
theory) end up repeatedly (logn) time selecting the 'kth' smallest
element (varying k's), where sorting would actually run slightly faster.

That should have read:
theory) end up repeatedly (logn times) selecting the...

- Josiah
 
N

Nick Coghlan

Josiah said:
Internally those values are Python integers, there would need to be a
special way to tag integers as being originally hex or octal. Or the
pyc would need to store the fact that it was originally one of those
other methods specifically for the print statement.

I believe the OP was objecting to the spelling of "this integer literal is hex"
and "this integer literal is octal".

Python stole these spellings directly from C. Saying it's ugly without
suggesting an alternative isn't likely to result in developers taking any
action, though. (Not that that is particularly likely on this point, regardless)

If the spelling really bothers the OP, the following works:

print int("9f", 16), int("54135", 8)

That's harder to type, is a lot slower at run-time and uses more memory, though.

Cheers,
Nick.
 
N

Nick Coghlan

bearophile said:
I think a better syntax for such multiline strings can be something
like: remove from all the beginnings of the lines successive to the
first one a number of spaces equal to the position of ''' in the
soucecode.

Indeed, a similar rule is used by docstring parsing tools (e.g. the builtin
help() function). The raw text is kept around, but the display tools clean it up
according to whatever algorithm best suits their needs.
.... '''This is just a
.... silly text'''
....This is just a
silly textHelp on function foo in module __main__:

foo()
This is just a
silly text

Raw docstrings are rarely what you want to be looking at. The best place for
info on recommended docstring formats is:
http://www.python.org/peps/pep-0257.html

If you absolutely, positively need the raw docstrings to be nicely formatted,
then line continuations can help you out:
.... "This is actually "\
.... "all one line\n"\
.... "but this is a second line"
....This is actually all one line
but this is a second line

I'd advise against actually doing this, though - it's non-standard, handling the
whitespace manually is a serious pain, and most docstring parsers clean up the
indentation of the conventional form automatically.

Cheers,
Nick.
 
C

Carlos Ribeiro

I think a better syntax for such multiline strings can be something
like: remove from all the beginnings of the lines successive to the
first one a number of spaces equal to the position of ''' in the
soucecode.

I was thinking exactly about this earlier today. There is a utility
function described somewhere in the docutils documentation that does
that. I've borrowed that code and called it "stripindent". It handles
all stuff that you mentioned and also tabs & space conversion. I
already call it almost everywhere I use the triple-quote strings.

The end result is that my code is full of constructs of the form:

sqlquery = stripindent("""
select column1, column2
from sometable
where column1 > blahblahblah
""")

And I thought, "wouldn't it be nice if Python automatically
reformatted such strings"? Of course, this is not a change to be taken
lightly. Some pros and cons:

0) it automatically supports what is already done by tools such as
pydoc, coctools, doctest, and every Python-enabled IDE that gets
information from docstrings.

1) the source code reads much better; the intention of the writer in
the case above is clearly *not* to have all those extra spaces
clutering the string contents.

2) It encourages use of triple-quoted strings in real code (by making
it more practical) and avoids idioms such as:

s = stripindent("""...
""")
s = "abcdef..." +
"rstuvwxyz..."
s = "abcdef..." \
"rstuvwxyz..."

3) it uses indentation to change the string parsing behavior.
Indentation already has meaning in Python, but not in this situation.

4) It's a change, and people are usually afraid of changes, specially
in this case where it may look like there are so little to gain from
it.

5) it may break old code that uses triple-quoted strings, and that may
require the extra spaces at the beginning of each line.

6) it may lead to surprised in some cases (specially for Python old-timers).


At this point, this is not still a serious proposal, but more like a
"Call for Comments". I have another bunch of ideas being worked out
for possibly future PEPs ("iunpack" & named tuples), so why not give
this one a try?

The idea is as follows:

1) triple-quote strings will automatically be reformatted to remove
any extra space on the left side due to indentation. The indentation
will be controled by the position of the leftmost non-space character
in the string.

2) raw triple-quoted strings will *NOT* be reformatted. Any space to
the left side is deemed to be significant.

This is indeed a quite simple idea, with the potential to simplify
some code. It will also encourage people to write triple-quoted
strings for long strings, which is something that people usually do to
avoid the extra space.


--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
J

Josiah Carlson

Nick Coghlan said:
I believe the OP was objecting to the spelling of "this integer literal is hex"
and "this integer literal is octal".

Python stole these spellings directly from C. Saying it's ugly without
suggesting an alternative isn't likely to result in developers taking any
action, though. (Not that that is particularly likely on this point, regardless)

If the spelling really bothers the OP, the following works:

print int("9f", 16), int("54135", 8)

That's harder to type, is a lot slower at run-time and uses more memory, though.

Perhaps, though I thought he was talking specifically about printing
(hence using a print statement). Regardless, I also don't believe the
"I don't like this" without "this is the way it should be" will result
in anything.

- Josiah
 
N

Nick Coghlan

Carlos said:
The idea is as follows:

1) triple-quote strings will automatically be reformatted to remove
any extra space on the left side due to indentation. The indentation
will be controled by the position of the leftmost non-space character
in the string.

2) raw triple-quoted strings will *NOT* be reformatted. Any space to
the left side is deemed to be significant.

This is indeed a quite simple idea, with the potential to simplify
some code. It will also encourage people to write triple-quoted
strings for long strings, which is something that people usually do to
avoid the extra space.

I'd be +0, since the behaviour you suggest is what I generally want when I use
long strings. However, I almost always set up such strings as module globals, so
the indenting issue doesn't usually bother me in practice. . .

If the compatibility problems prove to be a deal breaker (i.e. someone somewhere
actually wants the extra space, and adding an 'r' character to the source for
compatibility with a new Python release is too much of a burden), then another
alternative is a new string type character (e.g. 't' for 'trimmed', to use the
PEP 257 terminology. 'i' for 'indented' would also work - the source code for
the string literal is indented, so that indenting should be removed from the
resulting string).

The argument against the inevitable suggestion of just using a function (as you
already do) is that a function call doesn't work for a docstring.

Cheers,
Nick.
 
B

bearophile

Thank you for the comments and answers, and sorry for my answering
delay...

Josiah Carlson:
Decorators can do this without additional syntax. Think @accepts and
@returns.<

The purpose of those pre-post is to write something simile and very
*clean* that states what inputs and outputs must be. This is an
example of a pre-post conditional for a sorting function taken from
that site (all this is inside the docstring of the function):

pre:
# must be a list
isinstance(a, list)

# all elements must be comparable with all other items
forall(range(len(a)),
lambda i: forall(range(len(a)),
lambda j: (a < a[j]) ^ (a >= a[j])))

post[a]:
# length of array is unchanged
len(a) == len(__old__.a)

# all elements given are still in the array
forall(__old__.a, lambda e: __old__.a.count(e) == a.count(e))

# the array is sorted
forall([a >= a[i-1] for i in range(1, len(a))])


Surely such things can be passed (at least as strings) to the @accepts
and @returns decorators (using a "decorate" identifier instead of @ is
probably nicer, because the @ makes Python look more like Perl, but
I've seen that lots of people have already discussed such topic). Such
testing performed by such decorators can be "switched off" with a
global boolean flag when the program is debugged and tested.
So now someone can write and let standardise a couple of good @accepts
and @returns decorators/functors :-]

Having a 'faq' for permutation and combination generation would be
99% of the way there.<

Uh, I'm sorry, but I don't understand :-]
Aren't such functions quite well defined?

[Fixed] Quickselect, really, doesn't gain you a whole lot. Sure, it's
a log factor faster to select a median, but many algorithms involving
selecting medians (at least the ones that I run into in CS theory) end
up repeatedly (logn times) selecting the 'kth' smallest element
(varying k's), where sorting would actually run slightly faster.<

I've done some tests with a Quickselect that I have essentially
translated and adapted to pure Python from "Numerical Recipes" (it
seems a bit faster than the Quickselect coded by Raymond Hettinger
that can be seen in the cookbook). I have seen that on my PC, on
random sequence of FP numbers, a *single* Quickselect (to find just
the median) is faster than the standard sort for lists longer than
about 3 million elements. So it's often useless.
But using Psyco, that Quickselect becomes 5-6 times faster (for long
lists), so it beats the (good) standard Sort for lists longer than
600-3000 elements. If the Quickselect works in place (as the sort)
then it returns a partially ordered list, and you can use it to
quickly select other positions (so for close positions, like the
computing of the two central values for the median, the complexity of
the second select is nearly a constant time).
So coding the Quickselect in C/Pyrex can probably make it useful.
If you are interested I can give the Python Quickselect code, etc.

Raymond Hettinger<

I have already seen that this person is working a lot on Python, often
in the algorithmic parts.


Nick Coghlan>I believe the OP was objecting to the spelling of "this
integer literal is hex" and "this integer literal is octal".<

Right.


Josiah Carlson>Regardless, I also don't believe the "I don't like
this" without "this is the way it should be" will result in anything.<

You are right, I was mostly afraid of saying silly things... Here is:
Such syntax can be like:
number<Separator><Base>

(Putting <Base><Separator> at the beginning of the number is probably
worse and it goes against normal base representation in mathematics,
where you often subscript the base number).

<Separator> cannot be "B" or "b" (that stands for "base") because
number can be a Hex containing B too... So <Separator> can be "_"
(this is the Subscript in TeX markup, so this agrees with normal
representation of the base)

<Base> can be:
1)just an integer number representing the base (similar to the second
parameter of "int", this also allows to specify any base).
2) a symbol to represent a smaller class of possibilities, like 0=2,
1=8, 2=10, 3=16, 4=64. Instead
of such digits a letter can be used: a=2, b=8, c=10, etc.
I think the first option is better.

So integer numbers can be written like:
1010100111011_2
154545_10
777_8
afa35a_16
Fi3pK_64


Thank you to Carlos Ribeiro for your development of such doc string
ideas, I appreciate them :-]

Bear hugs,
Bearophile
 
J

Josiah Carlson

Thank you for the comments and answers, and sorry for my answering
delay...

Josiah Carlson:
Decorators can do this without additional syntax. Think @accepts and
@returns.<

The purpose of those pre-post is to write something simile and very
*clean* that states what inputs and outputs must be. This is an
example of a pre-post conditional for a sorting function taken from
that site (all this is inside the docstring of the function):

pre:
# must be a list
isinstance(a, list)

# all elements must be comparable with all other items
forall(range(len(a)),
lambda i: forall(range(len(a)),
lambda j: (a < a[j]) ^ (a >= a[j])))

post[a]:
# length of array is unchanged
len(a) == len(__old__.a)

# all elements given are still in the array
forall(__old__.a, lambda e: __old__.a.count(e) == a.count(e))

# the array is sorted
forall([a >= a[i-1] for i in range(1, len(a))])


That is simple and clean? In my opinion, if one wants to write such
complicated pre and post conditions, one should have to write the pre
and post condition functions that would do the test, and either use
decorators, or use calls within the function to do the tests. That is
the way it is done now, and I personally don't see a good reason to make
a change. Then again, I document and test, and haven't used pre/post
conditions in 5+ years.

Surely such things can be passed (at least as strings) to the @accepts
and @returns decorators (using a "decorate" identifier instead of @ is
probably nicer, because the @ makes Python look more like Perl, but
I've seen that lots of people have already discussed such topic). Such
testing performed by such decorators can be "switched off" with a
global boolean flag when the program is debugged and tested.
So now someone can write and let standardise a couple of good @accepts
and @returns decorators/functors :-]

Discussion of the @ decorator syntax is a moot point. Python 2.4 final
was released within the last couple days and uses @, which Guido has
decided is the way it will be. You are around 6 months too late to have
anything to say about what syntax is better or worse. It is done.

Having a 'faq' for permutation and combination generation would be
99% of the way there.<

Uh, I'm sorry, but I don't understand :-]
Aren't such functions quite well defined?

Think of it like an 'example' in the documentation, where the code is
provided for doing both permutations and combinations. There exists a
FAQ for Python that addresses all sorts of "why does Python do A and not
B" questions. Regardless, both are offered in the Python cookbook.

[Fixed] Quickselect, really, doesn't gain you a whole lot. Sure, it's
a log factor faster to select a median, but many algorithms involving
selecting medians (at least the ones that I run into in CS theory) end
up repeatedly (logn times) selecting the 'kth' smallest element
(varying k's), where sorting would actually run slightly faster.<

I've done some tests with a Quickselect that I have essentially
translated and adapted to pure Python from "Numerical Recipes" (it
seems a bit faster than the Quickselect coded by Raymond Hettinger
that can be seen in the cookbook). I have seen that on my PC, on
random sequence of FP numbers, a *single* Quickselect (to find just
the median) is faster than the standard sort for lists longer than
about 3 million elements. So it's often useless.
But using Psyco, that Quickselect becomes 5-6 times faster (for long
lists), so it beats the (good) standard Sort for lists longer than
600-3000 elements. If the Quickselect works in place (as the sort)
then it returns a partially ordered list, and you can use it to
quickly select other positions (so for close positions, like the
computing of the two central values for the median, the complexity of
the second select is nearly a constant time).
So coding the Quickselect in C/Pyrex can probably make it useful.
If you are interested I can give the Python Quickselect code, etc.

No thank you, I have my own.

Nick Coghlan>I believe the OP was objecting to the spelling of "this
integer literal is hex" and "this integer literal is octal".<

Right.


Josiah Carlson>Regardless, I also don't believe the "I don't like
this" without "this is the way it should be" will result in anything.<

You are right, I was mostly afraid of saying silly things... Here is:
Such syntax can be like:
number<Separator><Base>

(Putting <Base><Separator> at the beginning of the number is probably
worse and it goes against normal base representation in mathematics,
where you often subscript the base number).

<Separator> cannot be "B" or "b" (that stands for "base") because
number can be a Hex containing B too... So <Separator> can be "_"
(this is the Subscript in TeX markup, so this agrees with normal
representation of the base)

<Base> can be:
1)just an integer number representing the base (similar to the second
parameter of "int", this also allows to specify any base).
2) a symbol to represent a smaller class of possibilities, like 0=2,
1=8, 2=10, 3=16, 4=64. Instead
of such digits a letter can be used: a=2, b=8, c=10, etc.
I think the first option is better.

So integer numbers can be written like:
1010100111011_2
154545_10
777_8
afa35a_16
Fi3pK_64

Ick. In Python, the language is generally read left to right, in a
similar fashion to english. The prefix notation of 0<octal> and 0x<hex>,
in my opinion, reads better than your postfix-with-punctuation notation.

I'll also mention that two of your examples; afa35a_16 and Fi3pK_64, are
valid python variable names through all of the Python versions I have
access to, so are ambiguous if you want to represent 'integer literals',
which have historically been 'unquoted strings prefixed with a number'.

Furthermore, there /is/ already a postfix notation for representing
integers, though it doesn't support all bases at the moment, requires
a bit more punctuation, and is runtime-evaluated:
Traceback (most recent call last):


Your second option (replacing the _2, _10, etc., with _1, _2, ...) is,
in my opinion, shit. You take something that is unambiguous (base
representation) and make it ambiguous through the use of a numbering of
a set of 'standard' bases. What is the use of representing base 10 as a
'2' or 'c'? I cannot think of a good reason to do so, unless being
almost unreadable is desireable.


An option if you want to get all of the base representations available
is a prefix notation that is similar to what already exists. I'm not
advocating it (because I also think its crap), but the following fixes
the problems with your postfix notation, and is explicit about bases.
0<base>_<number>
like:
016_feff
02_10010010101
010_9329765872
08_767

The above syntax is:
1. unambiguous
2. readable from left-to-right

Note that I think that the syntax that I just provided is ugly. I much
prefer just using decimal and offering the proper base notation
afterwards in a comment...

val = 15 # 1111 in binary
val = 35 # 0x23 in hex
val = 17 # 021 in octal


- Josiah
 
B

bearophile

Thank you Josiah Carlson for your answers. If certain parts of my
messages upset you, then you can surely skip those parts.
(Now I am reading a big good manual: "Learning Python, II ed.", so in
the future I hope to write better things about this language.)

That is simple and clean?<

Well, it's a clean way to express such complexity.
And now I think decorators aren't so fit for such purpose (they
produce less clear pre-post). The pre-posts in the docs seem better to
me.

Then again, I document and test, and haven't used pre/post conditions
in 5+ years.<

Yep, documentation and doctests (etc) are useful.

topic).<<

Discussion of the @ decorator syntax is a moot point.<

I know, I noticed that... Still, only few things are fixed in stone
:-]

Think of it like an 'example' in the documentation, where the code is
provided for doing both permutations and combinations.<

Ah, now I see.
In my original post of this thread I have given an URL of some
routines written in C because they are about 10 times faster than the
routines that I can write in Python.

No thank you, I have my own.<

I see. I haven't tried to force you to take my routines, but to offer
any interested person a way to cheek my words.

Ick. In Python, the language is generally read left to right, in a
similar fashion to english. The prefix notation of 0<octal> and
0x<hex>, in my opinion, reads better than your
postfix-with-punctuation notation.<

As you say, it's your opinion, and I respect it, but in mathematics I
think you usually put the base at the end of the number, as a small
subscripted number (but I understand your following explanations).

I'll also mention that two of your examples; afa35a_16 and Fi3pK_64,
are valid python variable names<

Uh, I am sorry... As I feared, I have written a silly thing :)

I much prefer just using decimal and offering the proper base
notation afterwards in a comment...<

I agree.

I'll avoid to give hugs,
Bearophile
 
S

Scott David Daniels

Josiah said:
An option if you want to get all of the base representations available
is a prefix notation that is similar to what already exists. I'm not
advocating it (because I also think its crap), but the following fixes
the problems with your postfix notation, and is explicit about bases.
0<base>_<number>
like:
016_feff
02_10010010101
010_9329765872
08_767

The above syntax is:
1. unambiguous
2. readable from left-to-right
I built an interpreted language where based numbers were of the form:
<base-1>..<number>
It has 1 & 2, can often be snagged from standard lexers, and keeps _
available for group-separator (for things like 1_000_000_000).
The nice thing is that it is unambiguous no matter what base you
read in. It also worked for floating point values, but we were never
happy with how the exponent should be done.
f..feff == 1..1111_1110_1111_1111
== 1..1_111_111_011_111_111 == 7..77377
1..10010010101
9..9329765872
7..767
Takes a bit to associate 9 with base ten, but lexing the number is
duck soup. Note I am not advocating this syntax any more than Josiah
is advocating his. I just find alternate representations interesting.

--Scott David Daniels
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,811
Latest member
SaulFernan

Latest Threads

Top