Sorry about "obfuscation contest," I just balked at the reduce code, which seemed
like premature overgeneralization ;-)
It's a personal preference, but the only thing I consider premature is
optimization, not generalization. I prefer to try and capture any
concept I can as its own abstracted procedure. In practice this usually
means using a lot of lazy data structures at the end of the day
Yes, but does collect(fields, fun) always know that fun takes two args? And that
it makes sense to apply fun pairwise as a boolean relation to reduce to a single boolean value?
You could say that by definition it does. In that case, will your
future library extender need
to add some other kind of collect-like function?
The collect does assume that the function takes 2 parameters and the
logic is to apply to the result of each call and the next item in the
list.
The ony more general way I can think of making this would be to do
inspection on the function to determine the number. To make a general
algorithm that does something like:
let n = number of elements a given function f takes
apply n elements from the list to the function
given a result of the function r, apply [r, *(n-1)] recursively to the
function.
def f(a, b, c):
return (b+c)*a
map_col([1, 2, 3, 4, 5], f) => 45
However, I don't know how applicable that would be in this context,
since this is a boolean system and in the simple binary setup, it makes
sense to return the last operand to the beginning of the next
evaluation. In a more general case like this, the value of the
evalution would be the most useful, not the boolean truth value.
I do have one other generic rountine besides the booleans and collect
collectors which is called check:
def check(fields, test):
""" check(list fields, func test) -> rule
"""
def rule(record):
values = [record[field] for field in fields]
return test(*values)
return rule
it basically lets you pull out whatever fields you want and pass them
to the provided function.
def do_complicated_test(a, b, tolerance=0.001):
return a - ((a+b)/b) < a*tolerance
rule = check(('x', 'y'), do_complicated_test)
Now that I think of it, I should probably swap this to be
check(do_complicated_test, 'x', 'y')
When I first wrote this thing it was 3 stage construction, not 2. I
wanted to abstract out specific tests and then JUST apply fields to
them in the actual rules for the business code. I then figured out if I
standardize having the fields as the last arguments on the check
routines, then I could use a simple curry technique to build the
abstraction.
from functional import curry
comp_rule = curry(check)(do_complicated_test)
Then in the actual business code, can use it like this:
comp_rule(('a', 'b'))
Or:
ssn_rule = curry(match)(is_ssn)
ssn_rule('my-social-security-field')
For this to work properly and easily for the business logic. I do need
to get down some more solid rule invocation standards.
(have('length'), have('width')),
check(['length', 'width'], lambda x, y: x == y))
assert rule({'length' : '2', 'width' : '2'}) == True
assert rule({'length' : '2', 'width' : '1'}) == False
assert rule({'length' : '1', 'width' : '2'}) == False
But what about when the "when" clause says the rule does not apply?
Maybe return NotImplemented, (which passes as True in an if test) e.g.,
I don't really see why a special exception needs to be here because
"when" easily falls into the logic. When is basically the truth
function, "if A therefore B" and when(A, B) => NOT A OR B
If A is false, then the expression is always true.
def when(predicate, expr):
""" when(rule predicate, rule expr) -> rule
"""
def rule(record):
if predicate(record):
return expr(record)
else:
return True
return rule
So if the "test" fails, then the expression is True OR X, which is
always True.
Or, form another POV, the entire system is based on conjunction.
Returning "true" means that it doesn't force the entire validation
False. When the predicate fails on the When expression, then the
"therefore" expression does not matter because the predicate failed.
Therefore, the expresison is true no matter what.
Trying not to be condesending, I just think that the typical
propositional logic works well in this case.
Because this fits well into the system, can you give me an example why
this should make an exception to the boolean logic and raise an
exception?
The inline code generation is an interesting concept. It would
definately run faster. Hmm, I did this little test.
This one does arbitrary recursion calls
.... if not i:
.... return v
.... else:
.... return doit(v, i-1)
# Simple Adding while stressing function calls.... for i in range(times):
.... doit(i, depth/2) + doit(i, depth/2)
# benchmark.... for i in range(times):
.... i + i
# timer.... s = time.time()
.... f(*a, **k)
.... return time.time() - s
....
Even in the best case where there are 2 function calls per evaluation,
then it's at least 10^1 slower.
However, given a more realistic example with the kind of domain I'm
going to apply this to:
Hmm, I'd say that for practical application a record with 500 different
fields with an average of 3 rules per field is a common case upper
bound. using the product of that for a profile:
I can probably live with those results for now, though optimiziation in
the future is always an option
You know, the first implementation of this thing I did had the fields
seperated from the complete rules. When I realized that this was just a
basic logic system, it became a problem having the fields seperated
from the rules.
In practice, I'm looking to use this for easily collecting many rules
together and applying them to a data set.
def test_validator2(self):
from strlib import is_ssn
rules = [
have('first_name'),
have('last_name'),
all(have('ssn'), match(is_ssn, 'ssn')),
when(all(have('birth_date'), have('hire_date')),
lt('birth_date', 'hire_date'))
]
v = validator(rules)
data = {'first_name' : 'John',
'last_name' : 'Smith',
'ssn' : '123-34-2343',
'birth_date' : 0, 'hire_date' : 0}
assert v(data) == []
data = {'first_name' : 'John',
'last_name' : 'Smith',
'ssn' : '12-34-2343',
'birth_date' : 0, 'hire_date' : 0}
assert v(data) == [rules[2]]
The thing I'm wrestling with now is the identity of rules and fields.
I'd like to fully support localization. Making a localization library
that will provide string tables for "error messages". Could use
positions to do this.
rules = [
have('first_name'),
have('last_name'),
all(have('ssn'), match(is_ssn, 'ssn')),
when(all(have('birth_date'), have('hire_date')), lt('birth_date',
'hire_date'))
]
msgs = [
locale('MISSING_VALUE', 'FIRST_NAME'),
locale('MISSING_VALUE', 'LAST_NAME'),
locale('INVALID_SSN'),
locale('INVALID_BIRTH_HIRE_DATE'),
]
# Could use a functional construction to get all of the locale
references out of here if wanted
and then use a generic routine to join them:
error_map = assoc_list(rules, msgs)
errors = validate(record)
for e in errors:
print locale.lookup(error_map[e])