Experiences/guidance on teaching Python as a first programminglanguage

M

Mark Lawrence

Who makes comments like that? As far as I can tell, I'm the resident
regexphile on this newsgroup, and I certainly don't say that.

I find it frustrating that Pythonistas shy away from regex as much as
they do. Yes, Python strings have a rich set of built-in operations
which provide easy ways to do a lot of things traditionally done with
regexes in other languages.

Regex is a powerful tool, and programmers will improve their skill set
by knowing how to use them. But that's not the same as saying you can't
be a programmer if you don't know regex.

Idiots make comments like that, I've seen them in the past, and no, I
can't remember where :) As for me I'm not a regexphobe, more a
stringmethodphile. But I'm not going to use a dozen string methods when
one fairly simple regex will do the same job in one hit.
 
R

rusi

I suspect what you mean is, "There are some things that don't make sense
until you understand computer architecture".

One way of rephrasing what Grant is saying is:
"You cannot be a C programmer without being a system programmer"

This certainly includes machine (hardware) architecture.

But it includes much else besides, which can generally be subsumed under
the rubric "toolchain"

A python programmer can write foo.py and run:
$ python foo.py

A C programmer writes foo.c and has to run the sequence:
$ gcc foo.c
$ a.out

So far the difference looks minimal. However it does not stop here.
Soon the foo has to split into foo1.c and foo2.c. And suddenly you need to
understand:

1. Separate compilation
2. Make (which is separate from 'separate compilation')
3. Header files and libraries and the connection and difference

Now if youve taught a few classes you will know what a hell each of these is.
In particular, every C teacher struggles with:
"stdio.h is the standard library"

And all this has not yet touched the labyrinths of linker errors with
the corresponding magic spells called ranlib, nm etc

Got past all this kid-stuff?
Now for the Great Initiation into Manhood -- autoconf

So...

Is all this core computer science?
Or is it the curiosities of 40 year old technology?
 
R

Roy Smith

rusi said:
Soon the foo has to split into foo1.c and foo2.c. And suddenly you need to
understand:

1. Separate compilation
2. Make (which is separate from 'separate compilation')
3. Header files and libraries and the connection and difference

None of that is specific to C. Virtually any language (including
Python) allows a program to be split up into multiple source files. If
you're running all but the most trivial example, you need to know how to
manage these multiple files and how the pieces interact.

It's pretty common here to have people ask questions about how import
works. How altering sys.path effects import. Why is import not finding
my module? You quickly get into things like virtualenv, and now you've
got modules coming from your source tree, from your vitualenv, from your
system library. You need to understand all of that to make it all work.
 
C

Chris Angelico

It's pretty common here to have people ask questions about how import
works. How altering sys.path effects import. Why is import not finding
my module? You quickly get into things like virtualenv, and now you've
got modules coming from your source tree, from your vitualenv, from your
system library. You need to understand all of that to make it all work.

Python might one day want to separate "system paths" from "local
paths", to give the same effect as:

#include <stdio.h>
#include "local_config.h"

where the current directory won't be searched for stdio.h. But other
than that, it's exactly the same consideration in either Python or C.

ChrisA
 
R

rusi

rusi wrote:

It's pretty common here to have people ask questions about how import
works. How altering sys.path effects import. Why is import not finding
my module? You quickly get into things like virtualenv, and now you've
got modules coming from your source tree, from your vitualenv, from your
system library. You need to understand all of that to make it all work.

Yes agreed. Python is far from stellar in this regard.
Just as distutils got into the core at 2.3(??) now at 3.3 virtualenv(+pip+wheel)
is getting in. Belated but better late than never.
None of that is specific to C. Virtually any language (including
Python) allows a program to be split up into multiple source files. If
you're running all but the most trivial example, you need to know how to
manage these multiple files and how the pieces interact.

Thats a strange thing to say. In the abstract every language that
allows for significant programs supports separate units/modules.

Somewhere those units will map onto system entities -- usually though
not always files (think of PL-SQL inside Oracle).

Even assuming files, the lines drawn between interior (to the language)
and exterior (OS-facing) are vastly different.

C, Pascal, Python, Java, SML, APL -- all very different in this regard.
 
G

Gregory Ewing

Roy said:
even
if you've got all the signatures of foo() in front of you, it can
sometimes be hard to figure out which one the compiler will pick.

And conversely, sometimes the compiler will have a hard
time figuring out which one you want it to pick!

I had an experience in Java recently where a library
author had provided two overloads of a function, that
at first sight could be disambiguated by argument types.
But for a certain combination of types it was
ambiguous, and I was unlucky enough to want to use that
particular combination, and the compiler insisted on
picking the wrong one. As far as I could see, it was
*impossible* to call the other overload with those
parameter types.

I ended up resorting to copying the whole function and
giving it another name, just so I could get it called.
 
R

Roy Smith

Gregory Ewing said:
And conversely, sometimes the compiler will have a hard
time figuring out which one you want it to pick!

I had an experience in Java recently where a library
author had provided two overloads of a function, that
at first sight could be disambiguated by argument types.
But for a certain combination of types it was
ambiguous, and I was unlucky enough to want to use that
particular combination, and the compiler insisted on
picking the wrong one. As far as I could see, it was
*impossible* to call the other overload with those
parameter types.

I ended up resorting to copying the whole function and
giving it another name, just so I could get it called.

BTDT.

We were doing a huge network management application. There was an
SNMP_Manager class, which had three or four different constructors, each
one taking a dozen or more arguments, many of them optional.

I finally got fed up with eternally trying to figure out which
constructor was being called and replaced them with a series of factory
functions: construct_for_traps(), construct_for_polling(), etc.
 
8

88888 Dihedral

Roy Smithæ–¼ 2013å¹´12月19日星期四UTC+8下åˆ12時16分26秒寫é“:
None of that is specific to C. Virtually any language (including

Python) allows a program to be split up into multiple source files. If

you're running all but the most trivial example, you need to know how to

manage these multiple files and how the pieces interact.



It's pretty common here to have people ask questions about how import

works. How altering sys.path effects import. Why is import not finding

my module? You quickly get into things like virtualenv, and now you've

got modules coming from your source tree, from your vitualenv, from your

system library. You need to understand all of that to make it all work.

OK, just any novice can take the
BOA and WXPYTHON packages to
implement an editor in 1 to 3 hours,
but that is trivial in Delphi and
object pascal long time ago.

The GUI to python scrit generation
engine is the smarter way to
let the mass interested in programming.
 
G

Gregory Ewing

Roy said:
I suspect what you mean is, "There are some things that don't make sense
until you understand computer architecture".

An example of that kind of thing from a different
perspective: I learned Z80 assembly language by first
learning Z80 *machine* language (my homebrew computer
didn't have an assembler, so I had to write my own
and hand assemble it (after writing my own DOS
(after building my own disk controller... etc!))).

If you just look at the Z80 architecture at the
assembly language level, the rules for which
instructions go with which registers and which
addressing modes seem very haphazard. But because
I knew how the instructions were encoded, I never
had any trouble remembering the allowable combinations.
If it didn't have an encoding, you couldn't do it!
 
G

Gregory Ewing

Dave said:
C is a glorified macro assembler. So the -> operator is not analogous
to the dot operator, it's Syntactic sugar:

p-> a. Is really
(*p).a

But it's not above inferring a dereferencing
operation when you call a function via a
pointer. If f is a pointer to a function,
then

f(a)

is equivalent to

(*f)(a)

If the compiler can do that for function calls,
there's no reason it couldn't do it for member
access as well.

If I remember rightly, Ada not only does implicit
dereferencing like this, it doesn't even have an
explicit dereferencing operator! If you want to
refer to the whole record pointed to by p, you
have to say 'p.all'.

BTW, the whole notion of a "pointer to a function"
is redundant in C, since you can't do anything
with what it points to other than call it. The
equivalent concept in Modula, for example, is
just called a function type, not a pointer-to-
function type. Similarly in most languages that
have functions as values.
 
D

Dave Angel

Re: Experiences/guidance on teaching Python as a first programming
language


But it's not above inferring a dereferencing
operation when you call a function via a
pointer. If f is a pointer to a function,
then



is equivalent to



If the compiler can do that for function calls,
there's no reason it couldn't do it for member
access as well.

Quite right. And I recall being confounded by the function pointer
syntax; it never fit in my mental model of how the rest of C worked.

Anyway I was not intending to defend C choices, merely to point out
an advantage this choice gave me. On a language without garbage
collection, the indirection was very important to keep in mind.
 
W

Wolfgang Keller

I find it frustrating that Pythonistas shy away from regex as much as

I find regular expression syntax frustrating. >;->

As long as I have the choice, I still prefer syntax like e.g.
VerbalExpressions. That's made for actual humans like me.

Sincerely,

Wolfgang
 
W

Wolfgang Keller

I've never heard C syntax reviled quite so intensely. What syntax
Pascal, Python, if written by someone who uses semantic identifiers
and avoids to use C(++)/Java-isms. I've seen Eiffel as well (without
understanding it) and it didn't look ridiculous to me.

Nor did a recent dialect of Cobol (since someone else mentioned it)
horrify me at first sight to the point all those C-derivatives do. I
also get to use SQL a bit (instead of those "query builders" that I
consider as garbage), although that's just for databases of course.

Verbosity is definitely A Good Thing.

In fact, thinking of it, a really good language should imho *require*
verbosity (how about a *minimum* length - or maybe even a
dictionary-based sanity check - for identifiers?), since that already
keeps all those lazy morons away who think that "shortcuts are cool".

Sincerely,

Wolfgang
 
S

Steven D'Aprano

I disagree about undefined behaviour causing a large proportion of
security holes.

I didn't actually specify "large proportion", that's your words. But
since you mention crashes:
Maybe it produces some, but it's more likely to produce
crashes or inoperative codde.

*Every* crash is a potential security hole. Not only is a denial of
service, but a fatal exception[1] is a sign that arbitrary memory has
been executed as if it were code, or an illegal instruction executed.
Every such crash is a potential opportunity for an attacker to run
arbitrary code. There are only two sorts of bugs: bugs with exploits, and
bugs that haven't been exploited *yet*.

I think you are severely under-estimating the rule of undefined behaviour
in C on security vulnerabilities. I quote from "Silent Elimination of
Bounds Checks":

"Most of the security vulnerabilities described in my book, Secure Coding
in C and C++, Second Edition, are the result of exploiting undefined
behavior in code."

http://www.informit.com/articles/article.aspx?p=2086870

Undefined behaviour interferes with the ability of the programmer to
understand causality with respect to his source code. That makes bugs of
all sorts more likely, including buffer overflows.

Earlier this year, four researchers at MIT analysed how undefined
behaviour is effecting software, and they found that C compilers are
becoming increasingly aggressive at optimizing such code, resulting in
more bugs and vulnerabilities. They found 32 previously unknown bugs in
the Linux kernel, 9 in Postgres and 5 in Python.

http://www.itworld.com/security/380406/how-your-compiler-may-be-compromising-application-security


I believe that the sheer number of buffer overflows in C is more due to
the language semantics than the (lack of) skill of the programmers. C the
language pushes responsibility for safety onto the developer. Even expert
C programmers cannot always tell what their own code will do. Why else do
you think there are so many applications for checking C code for buffer
overflows, memory leaks, buggy code, and so forth? Because even expert C
programmers cannot detect these things without help, and they don't get
that help from the language or the compiler.

[...]
Apart from the last one (file system atomicity, not a C issue at all),
every single issue on that page comes back to one thing: fixed-size
buffers and functions that treat a char pointer as if it were a string.
In fact, that one fundamental issue - the buffer overrun - comes up
directly when I search Google for 'most common security holes in c code'

I think that you have missed the point that buffer overflows are often a
direct consequence of the language. For example:

http://www.kb.cert.org/vuls/id/162289

Quote:

"Some C compilers optimize away pointer arithmetic overflow tests that
depend on undefined behavior without providing a diagnostic (a warning).
Applications containing these tests may be vulnerable to buffer overflows
if compiled with these compilers."

The truly frightening thing about this is that even if the programmer
tries to write safe code that checks the buffer length, the C compiler is
*allowed to silently optimize that check away*.

Python is actually *worse* than C in this respect.

You've got to be joking.

I know this
particular one is reasonably well known now, but how likely is it that
you'll still see code like this:

def create_file():
f = open(".....", "w")
f.write(".......")
f.write(".......")
f.write(".......")

Looks fine, is nice and simple, does exactly what it should. And in
(current versions of) CPython, this will close the file before the
function returns, so it'd be perfectly safe to then immediately read
from that file. But that's undefined behaviour.

No it isn't. I got chastised for (allegedly) conflating undefined and
implementation-specific behaviour. In this case, whether the file is
closed or not is clearly implementation-specific behaviour, not
undefined. An implementation is permitted to delay closing the file. It's
not permitted to erase your hard drive.

Python doesn't have an ISO standard like C, so where the documentation
doesn't define the semantics of something, CPython behaves as the
reference implementation. CPython allows you to simultaneously open the
same file for reading and writing, in which case subsequent reads and
writes will deterministically depend on the precise timing of when writes
are written to disk. That's not something which the language can control,
given the expected semantics of file I/O. The behaviour is defined, but
it's defined in such a way that what you'll get is deterministic but
unpredictable -- a bit like dict order, or pseudo-random numbers.

A Python implementation is not permitted to optimize away subsequent
reads, erase your hard drive, or download a copy of Wikipedia from the
Internet. A C compiler is permitted to do any of these.

(Of course, no competent C compiler would actually download all of
Wikipedia, since that would be slow. Instead, they would probably only
download the HTTP headers for the main page.)




[1] I'm talking low level exceptions or errors, not Python exceptions.
 
S

Steven D'Aprano

On 12/18/2013 12:18 AM, Steven D'Aprano wrote:

No, you're being an ass.

My my, it doesn't take much of a challenge to the Holy Church Of C to
bring out the personal attacks.
 
C

Chris Angelico

I didn't actually specify "large proportion", that's your words. But
since you mention crashes:

You implied that it's a significant cause of security holes. I counter
by saying that most security holes come from well-defined behaviour.
I think you are severely under-estimating the rule of undefined behaviour
in C on security vulnerabilities. I quote from "Silent Elimination of
Bounds Checks":

"Most of the security vulnerabilities described in my book, Secure Coding
in C and C++, Second Edition, are the result of exploiting undefined
behavior in code."

http://www.informit.com/articles/article.aspx?p=2086870

I don't intend to buy the book to find out what he's talking about.
All I know is that the one single most common cause of problems in C,
the buffer overrun, is NOT "exploiting undefined behavior", an nor are
several other common problems (as described in my previous message).
Earlier this year, four researchers at MIT analysed how undefined
behaviour is effecting software, and they found that C compilers are
becoming increasingly aggressive at optimizing such code, resulting in
more bugs and vulnerabilities. They found 32 previously unknown bugs in
the Linux kernel, 9 in Postgres and 5 in Python.

http://www.itworld.com/security/380406/how-your-compiler-may-be-compromising-application-security

Yes, those are issues. Not nearly as large as the ones that _don't_
involve your compiler hurting you, except that CPython had proper
memory-usage discipline and didn't have the more glaring bugs.
I believe that the sheer number of buffer overflows in C is more due to
the language semantics than the (lack of) skill of the programmers. C the
language pushes responsibility for safety onto the developer. Even expert
C programmers cannot always tell what their own code will do. Why else do
you think there are so many applications for checking C code for buffer
overflows, memory leaks, buggy code, and so forth? Because even expert C
programmers cannot detect these things without help, and they don't get
that help from the language or the compiler.

I agree. The lack of a native string type is fundamental to probably
99% of C program bugs. (Maybe I'm exaggerating, but I reckon it'll be
ball-park.) But at no point do these programs or programmers *exploit*
undefined behaviour. They might run into it when things go wrong, but
by that time, things have already gone wrong. Example:

int foo()
{
char buffer[80];
gets(buffer);
return buffer[0]=='A';
}

So long as the user enters no more than 79 characters, this function's
perfectly well defined. It's vulnerable because user input can trigger
a problem, but if anyone consciously exploits compiler-specific memory
layouts, it's the attacker, and *NOT* the original code. On the flip
side, this code actually does depend on undefined behaviour:

int bar()
{
char buffer[5];
char tmp;
memset(buffer,0,6);
return tmp;
}

This code is always going to go past its buffer, and if 'tmp' happens
to be the next thing in memory, it'll be happily zeroed. I'm pretty
sure I saw code like this on thedailywtf.com a while back.
You've got to be joking.

Trolling, more than joking, but as usual, there is a grain of truth in
what I say.
No it isn't. I got chastised for (allegedly) conflating undefined and
implementation-specific behaviour. In this case, whether the file is
closed or not is clearly implementation-specific behaviour, not
undefined. An implementation is permitted to delay closing the file. It's
not permitted to erase your hard drive.

The problem is that delaying closing the file is a potentially major
issue, if the file is about to be reopened. And it _is_ undefined
behaviour that one particular Python implementation handles in a very
simple and convenient way (and, what's more, in a way that matches how
other languages (eg C++, Pike) would handle it, so it's going to "feel
right" to people); it's actually very easy to depend on this without
realizing it.
Python doesn't have an ISO standard like C, so where the documentation
doesn't define the semantics of something, CPython behaves as the
reference implementation. CPython allows you to simultaneously open the
same file for reading and writing, in which case subsequent reads and
writes will deterministically depend on the precise timing of when writes
are written to disk.

Errr, Python does have its standard. It's not an
implementation-defined language. Yes, there are places where CPython
is the de facto standard, but that doesn't mean something's not
undefined.

Delaying the close might be completely insignificant, but it has the
potential to be critical (depending on the exact share modes and
such). And, in the strictest sense of the word, it *is* undefined, and
it *is* depended on.

ChrisA
 
S

Steven D'Aprano

We don't know what locals()['spam'] = 42 will do inside a function,

I am mystified that you would write this.

Context is everything. locals() doesn't just return any old dictionary.
It returns a dictionary of variables.

Locals() will "Update and
return a dictionary representing the current local symbol table." The
only thing unspecified is the relation between the 'current local symbol
table' and the *dict* that 'represents' it.
Precisely.


Given that a dict is returned, the rest is unambiguous.
unlike the C case, we can reason about it:

- it may bind 42 to the name "spam";

"somedict['spam'] = 42" will do exactly that.

We're not talking about setting items in an arbitrary dict. We're talking
about setting variables using locals(), and in that case, writing to
locals() does not guarantee to bind the value to the *name*.

def test():
spam = 23
locals()["spam"] = 42
assert spam == 42


test() passes the assertion in IronPython 2.6, but fails in CPython 2.7
and 3.4, and Jython 2.5.

Absolutely not.

I don't know of any Python implementation which does so, but the
documentation says:

The contents of this dictionary should not be modified

so it is hardly beyond the realm of possibility that some implementation
may choose to treat it as an error and raise an exception.

Absolutely not.


In the example I show above, it is a no-op. The dict returned by locals
is modified and then immediately garbage-collected. There are no side-
effects. Should some implementation decide to compile that away as dead
code, it would be perfectly allowed to. (Well, assuming that it
determined first that locals() actually was the built-in and not some
substitute, either by static analysis or runtime testing.) It wouldn't
surprise me if PyPy was capable of doing that today.
 
S

Steven D'Aprano

Plenty of compile-time warnings depending on the compiler, which the
CPython core devs take a great deal of trouble to eliminate on every
buildbot.

Correct. The *great deal of trouble* part is important. Things which are
the responsibility of the language and compiler in (say) Java, D, Rust,
Go, etc. are the responsibility of the programmer with C.

I mention these languages as they are all intended to be safer languages
than C while still being efficient. Whether they succeed or not is
another question.

Now, I wish to be absolutely clear. There are certain programming areas
where squeezing out every last iota of performance is important, and to
do so may require making some compromises on correctness or safety. I
find the C standard's position on undefined behaviour to be
irresponsible, but, hey, maybe it is justified on the basis that C is a
systems language intended for use in writing performance-critical
operating system kernels, device drivers and similar. It's fine for
Python to promise that nothing you do will ever cause a segfault, but for
a language used to write kernels and device drivers, you probably want
something more powerful and less constrained.

But why is so much non-performance critical code written in C? Why so
many user-space applications? History has shown us that the decision to
prefer efficiency-by-default rather than correctness-by-default has been
a disaster for software safety and security. C language practically is
the embodiment of premature optimization: the language allows compilers
to silently throw your code away in order to generate efficient code by
default, whether you need it or not.
 
R

rusi

Yes agreed. Python is far from stellar in this regard.
Just as distutils got into the core at 2.3(??) now at 3.3 virtualenv(+pip+wheel)
is getting in. Belated but better late than never.
Thats a strange thing to say. In the abstract every language that
allows for significant programs supports separate units/modules.
Somewhere those units will map onto system entities -- usually though
not always files (think of PL-SQL inside Oracle).
Even assuming files, the lines drawn between interior (to the language)
and exterior (OS-facing) are vastly different.
C, Pascal, Python, Java, SML, APL -- all very different in this regard.

Just adding this:
Different languages do their modularizing and packaging differently (what I
earlier said) in order to achieve different tradeoffs.

Here's a thread by a competent programmer who switched from Lisp to C++.
https://groups.google.com/forum/#!topic/ledger-cli/Mjky9AvrRKU
He clearly says that while he loves Lisp the language, its packaging facilities
lost out to C++ and so he rewrote his whole app in C++.
 
C

Chris Angelico

But why is so much non-performance critical code written in C? Why so
many user-space applications?

Very good question! I don't have an answer. There are a few
"maybe-answers", but they mostly come down to "programmer didn't know
of a viable alternative". When I wrote RosMud, I wrote it in C++,
because I was making tweaks to an existing C++ program (Gmud) and
because I thought that that was the way to make it run fast enough for
what I needed. Seven years on (or will be, come January), I've learned
how much can be done in Python, Pike, and other high level languages,
and RosMud's successor is not written in C.

Maybe part of the answer comes from people who've learned based on old
hardware. Growing up in the 80s on an Epson XT-clone, I wrote code in
BASIC, C, and assembly. Now, most of my performance problems in BASIC
were because of flawed algorithms (it's amazing how slowly an O(n*n)
algorithm will run, isn't it!), but I could imagine someone growing up
learning "C is the only way to make code run fast" and then going on
to teach the next generation of programmers to use C, without
necessarily even explaining why.

But that's just speculation. All I know is, even if you do need to
write in C for some reason (your preferred language doesn't have
bindings for some library, maybe), chances are you can write the
tiniest bit of code that way, and do the rest in a high level
language.

ChrisA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,574
Members
47,206
Latest member
Zenden

Latest Threads

Top