Proposal: runtime validation statement

P

Paul Rubin

I frequently find myself writing stuff like

# compute frob function, x has to be nonnegative
x = read_input_data()
assert x >= 0, x # mis-use of "assert" statement
frob = sqrt(x) + x/2. + 3.

This is not really correct because the assert statement is supposed to
validate the logical consistency of the program itself, not the input
data. So, for example, when you compile with optimization on, assert
statements become no-ops. And yet, it's generally desirable to
validate input whenever you can, and raising an exception is
frequently the right thing to do with bad data. A function like

class ValidationError(Exception): pass
def _validate(cond, message):
if not cond: raise ValidationError, message

takes care of it, of course, so it's slightly redundant to add a
special statement like

validate x >= 0, (x, "must not be negative")

which works exactly like assert but raises a different exception and
is never optimized away. But the same can be said of the print
statement (use sys.stdout.write or a print function instead) and for
that matter the addition operator (use "x - (-y)" instead of x+y), the
bool type (use 1 and 0 instead of True and False), etc.

We have to conclude that choosing what statements the language
supports is not just a matter of making things possible, but also of
steering what the common idioms should be. Using a user-defined
function to check input means a couple more program-specific things to
remember (the function itself and the exception class it raises),
clutters up the code, etc. And so I've come to feel that a "validate"
statement (maybe with some different keyword) like the above is in the
Pythonic spirit and should be considered for some forthcoming release.

Thoughts?
 
F

F. GEIGER

Paul said:
I frequently find myself writing stuff like

# compute frob function, x has to be nonnegative
x = read_input_data()
assert x >= 0, x # mis-use of "assert" statement
frob = sqrt(x) + x/2. + 3.

This is not really correct because the assert statement is supposed to
validate the logical consistency of the program itself, not the input
data. So, for example, when you compile with optimization on, assert
statements become no-ops. And yet, it's generally desirable to
validate input whenever you can, and raising an exception is
frequently the right thing to do with bad data. A function like

class ValidationError(Exception): pass
def _validate(cond, message):
if not cond: raise ValidationError, message

takes care of it, of course, so it's slightly redundant to add a
special statement like

validate x >= 0, (x, "must not be negative")

which works exactly like assert but raises a different exception and
is never optimized away. But the same can be said of the print
statement (use sys.stdout.write or a print function instead) and for
that matter the addition operator (use "x - (-y)" instead of x+y), the
bool type (use 1 and 0 instead of True and False), etc.

We have to conclude that choosing what statements the language
supports is not just a matter of making things possible, but also of
steering what the common idioms should be. Using a user-defined
function to check input means a couple more program-specific things to
remember (the function itself and the exception class it raises),
clutters up the code, etc. And so I've come to feel that a "validate"
statement (maybe with some different keyword) like the above is in the
Pythonic spirit and should be considered for some forthcoming release.

Thoughts?

I use assert to protect my software from me, i.e. to catch programming
errors, e.g. to catch cases where I called a method the wrong way. If
this can happen under production consitions too, then it's a user error,
not a programming error.

So, I use raise to protect my software from user errors.

Your sample seems to be a case of the latter, i.e. you have to write
some sort of exception handling anyway. Issuing an error message
anywhere in your code might not be what you really want.

Kind regards
Franz GEIGER
 
D

Dave Brueck

Paul Rubin wrote:
[snip]
frequently the right thing to do with bad data. A function like

class ValidationError(Exception): pass
def _validate(cond, message):
if not cond: raise ValidationError, message

takes care of it, of course, so it's slightly redundant to add a
special statement like

validate x >= 0, (x, "must not be negative")

which works exactly like assert but raises a different exception and
is never optimized away. But the same can be said of the print
statement (use sys.stdout.write or a print function instead) and for
that matter the addition operator (use "x - (-y)" instead of x+y), the
bool type (use 1 and 0 instead of True and False), etc.

We have to conclude that choosing what statements the language
supports is not just a matter of making things possible, but also of
steering what the common idioms should be. Using a user-defined
function to check input means a couple more program-specific things to
remember (the function itself and the exception class it raises),
clutters up the code, etc. And so I've come to feel that a "validate"
statement (maybe with some different keyword) like the above is in the
Pythonic spirit and should be considered for some forthcoming release.

Thoughts?

Agreed!

I usually end up subclassing Exception and writing a validation function
like you show above. At first I liked the fact that a module threw a
module-specific family of exceptions that could be caught downstream,
but after having used this approach for some time I've come to the
conclusion that the vast majority of the time the exceptions thrown are
of the generic "ValidationError" variety, and that having them defined
in a module-specific way added no value. By extension, the validation
function itself adds no value and is just a nuisance.

Also, a developer-defined function doesn't stand out as well as a
statement would - a statement sets it apart from normal function calls
which are doing the actual work to solve the problem at hand - and it'd
be easy for syntax-highlighting editors to color it differently too.

IMO 'validate' isn't too bad a choice for a keyword. Sorta long but it's
quick to type.

-Dave
 
V

Ville Vainio

Paul> takes care of it, of course, so it's slightly redundant to
Paul> add a special statement like

Paul> validate x >= 0, (x, "must not be negative")

Paul> which works exactly like assert but raises a different
Paul> exception and is never optimized away. But the same can be
Paul> said of the print statement (use sys.stdout.write or a print
Paul> function instead) and for that matter the addition operator
Paul> (use "x - (-y)" instead of x+y), the bool type (use 1 and 0
Paul> instead of True and False), etc.

Yes, and I (and many others, I feel) consider print statement a wart
in the language. Let's not make any more of these... Too bad it's so
widely used it can't be right out deprecated.

Paul> steering what the common idioms should be. Using a
Paul> user-defined function to check input means a couple more
Paul> program-specific things to remember (the function itself and
Paul> the exception class it raises), clutters up the code, etc.
Paul> And so I've come to feel that a "validate" statement (maybe
Paul> with some different keyword) like the above is in the
Paul> Pythonic spirit and should be considered for some
Paul> forthcoming release.

Any specific reason not to make it a builtin function instead of
statement? I wouldn't mind a validation function that could also
verify the data types of the arguments, which could then be used for
code completion assistance and type inference... Since we don't know
when a "real" type declarations happen. Expecting them to hit 2.5 is
probably a bit too optimistic ;-).

def a(x,y):
validate((x,int), (y,str), x > int(y))

(validate checks every tuple with isinstance(t[0],t[1]), every arg to
validate that is "false" in the pythonic falsehoos sense2 fails the
validation)

Before something like that goes "official", help tools and IDEs can't
use the type information.
 
V

Ville Vainio

Dave> Also, a developer-defined function doesn't stand out as well
Dave> as a statement would - a statement sets it apart from normal
Dave> function calls which are doing the actual work to solve the
Dave> problem at hand - and it'd be easy for syntax-highlighting
Dave> editors to color it differently too.

It's as easy to color a function.

We have too much statements that don't need to be statements
already. "validate" is obvious library stuff...
 
D

Dave Brueck

Ville said:
Dave> Also, a developer-defined function doesn't stand out as well
Dave> as a statement would - a statement sets it apart from normal
Dave> function calls which are doing the actual work to solve the
Dave> problem at hand - and it'd be easy for syntax-highlighting
Dave> editors to color it differently too.

It's as easy to color a function.

We have too much statements that don't need to be statements
already. "validate" is obvious library stuff...

I disagree - there's a clear distinction between solving the problem and
e.g. validating inputs to the problem solver, and having such checks as
a statement is a good way to implement that distinction. That's why
'assert' as a statement makes sense to me too - it and validate are sort
of "out of band" with getting the actual work done, but useful nonetheless.

Whether or not a validate keyword is a good idea should be judged
independently of your opinion of whether or not 'print' is a wart.

It's definitely not "obvious library stuff" IMO - if nothing else,
making you import a library just to validate parameters is goofy. It
would be semi-tolerable (though less than ideal) as a builtin.

-Dave
 
C

Christopher T King

I disagree - there's a clear distinction between solving the problem and
e.g. validating inputs to the problem solver, and having such checks as
a statement is a good way to implement that distinction. That's why
'assert' as a statement makes sense to me too - it and validate are sort
of "out of band" with getting the actual work done, but useful nonetheless.

I like to think 'print' falls into this same "out of band" category --
assuming it's used for debug purposes. Perhaps 'print' should be
deprecated for 'normal' uses (i.e. file IO, user interaction) in favor of
file IO operators (what one should hope any more than trivial program uses
anyways), and, in some future Python, tossed away in optimized bytecode
(much like assert statements).
 
V

Ville Vainio

Dave> distinction. That's why 'assert' as a statement makes sense
Dave> to me too - it and validate are sort of "out of band" with
Dave> getting the actual work done, but useful nonetheless.

Perhaps calling it _validate could imply out-of-bandness? I don't like
the idea of making the "less important" constructs statements, only
the most fundamental things should be statements.

And we already have assert.

Dave> It's definitely not "obvious library stuff" IMO - if nothing
Dave> else, making you import a library just to validate
Dave> parameters is goofy. It would be semi-tolerable (though less
Dave> than ideal) as a builtin.

+1 on making it a builtin.
 
P

Paul Rubin

F. GEIGER said:
I use assert to protect my software from me, i.e. to catch programming
errors, e.g. to catch cases where I called a method the wrong way. If
this can happen under production consitions too, then it's a user
error, not a programming error.

So, I use raise to protect my software from user errors.

But I write a lot of code for my own use, which means the user and the
programmer are the same person, and any user error is also a
programming error. Lots of times also, the data came from some other
part of the program, so if the data is invalid, that's still a
programming error.
Your sample seems to be a case of the latter, i.e. you have to write
some sort of exception handling anyway. Issuing an error message
anywhere in your code might not be what you really want.

If you want to keep running after an AssertionError, you have to handle
that too, but that doesn't make the assert statement useless.
 
P

Paul Rubin

Ville Vainio said:
Yes, and I (and many others, I feel) consider print statement a wart
in the language. Let's not make any more of these... Too bad it's so
widely used it can't be right out deprecated.

I can sympathize with the notion that print and assert are warts, but
I think they're considered to be important to Python's
newbie-friendliness or something like that. As such, "validate" ought
to be considered about the same way.
Any specific reason not to make [validate] a builtin function
instead of statement?

It's similar enough to assert that for consistency in the language I
think it ought to be done the same way. But a builtin function would
be ok.
I wouldn't mind a validation function that could also verify the
data types of the arguments, which could then be used for code
completion assistance and type inference... Since we don't know when
a "real" type declarations happen. Expecting them to hit 2.5 is
probably a bit too optimistic ;-).

def a(x,y):
validate((x,int), (y,str), x > int(y))

If the compiler is going to rely on something like that, then it
should definitely be a statement, rather than a function that the user
can shadow with his own function that does something completely
different. But if there's going to be type declarations, they ought
to go into some new construction that cleans up the current scoping
mes at the same time:

local x:int, y:str # type declarations
assert x > int(y) # compiler can use this as advice
(validate checks every tuple with isinstance(t[0],t[1]), every arg to
validate that is "false" in the pythonic falsehoos sense2 fails the
validation)

This is a little bit bogus: first of all, data validation should be
able to check arbitrary conditions including those that happen to be
tuples. Second, validation must always be performed and must throw an
exception at runtime, while compiler advice can be optimized away.
So your example is more like "assert" than what I meant by validate.
 
V

Ville Vainio

Paul> I can sympathize with the notion that print and assert are
Paul> warts, but I think they're considered to be important to
Paul> Python's newbie-friendliness or something like that. As
Paul> such, "validate" ought to be considered about the same way.

How is

print("hello",42)

less newbie friendly than

print "hello",42?

To me, the former actually seems *more* newbie friendly because there
is nothing special about it. The same applies for assert, validate and
other statements that don't need to be statements.
>> Any specific reason not to make [validate] a builtin function
>> instead of statement?

Paul> It's similar enough to assert that for consistency in the
Paul> language I think it ought to be done the same way. But a
Paul> builtin function would be ok.

"foolish consistency..." ;-).

Paul> If the compiler is going to rely on something like that,
Paul> then it should definitely be a statement, rather than a
Paul> function that the user

Of course the compiler wouldn't rely on it. I was thinking of an
interim solution for auxiliary tools like doc generators - but then I
remembered we are going to get decorators soon, and they are better
for this purpose.
>> (validate checks every tuple with isinstance(t[0],t[1]), every arg to
>> validate that is "false" in the pythonic falsehoos sense2 fails the
>> validation)

Paul> This is a little bit bogus: first of all, data validation should be

Yes, it is - as I said, decorators will work better. My bad.
 
J

John Roth

Paul Rubin said:
I can sympathize with the notion that print and assert are warts, but
I think they're considered to be important to Python's
newbie-friendliness or something like that. As such, "validate" ought
to be considered about the same way.

Both statements have specific functions. The problem arises
with people that try to use them for other purposes than
they were intended.

Print is for debugging prints. I use it a lot for that purpose,
and never use it for any "user" output, even though I am
both the programmer and the user in most cases. It
works well when used for the purpose it was intended.

Assert is intended for debugging and program validation.
Using it for standard program logic is using it outside of
its intended purpose. All the complaints I've ever seen
about assert revolve around that.

I don't like the notion of a validate statement for a number
of reasons. One of them is that the proposal doesn't say
what it would do in enough detail for me to figure out
whether I could actually use it. My suspicion is that it
wouldn't fit in with the way I write validation routines
anyway. Clue: I generally don't throw exceptions.
For me, validation logic is tightly tied to the (g)UI
logic, since it's the UI that's going to have to tell the
user that he mucked up.

John Roth
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,836
Latest member
login dogas

Latest Threads

Top