S
Steven D'Aprano
I am soliciting feedback regarding the API of my statistics module:
http://code.google.com/p/pycalcstats/
Specifically the following couple of issues:
(1) Multivariate statistics such as covariance have two obvious APIs:
A pass the X and Y values as two separate iterable arguments, e.g.:
cov([1, 2, 3], [4, 5, 6])
B pass the X and Y values as a single iterable of tuples, e.g.:
cov([(1, 4), (2, 5), (3, 6)]
I currently support both APIs. Do people prefer one, or the other, or
both? If there is a clear preference for one over the other, I may drop
support for the other.
(2) Statistics text books often give formulae in terms of sums and
differences such as
Sxx = n*Σ(x**2) - (Σx)**2
There are quite a few of these: I count at least six common ones, all
closely related and confusing named:
Sxx, Syy, Sxy, SSx, SSy, SPxy
(the x and y should all be subscript).
Are they useful, or would they just add unnecessary complexity? Would
people would like to see these included in the package?
Thank you for your feedback.
http://code.google.com/p/pycalcstats/
Specifically the following couple of issues:
(1) Multivariate statistics such as covariance have two obvious APIs:
A pass the X and Y values as two separate iterable arguments, e.g.:
cov([1, 2, 3], [4, 5, 6])
B pass the X and Y values as a single iterable of tuples, e.g.:
cov([(1, 4), (2, 5), (3, 6)]
I currently support both APIs. Do people prefer one, or the other, or
both? If there is a clear preference for one over the other, I may drop
support for the other.
(2) Statistics text books often give formulae in terms of sums and
differences such as
Sxx = n*Σ(x**2) - (Σx)**2
There are quite a few of these: I count at least six common ones, all
closely related and confusing named:
Sxx, Syy, Sxy, SSx, SSy, SPxy
(the x and y should all be subscript).
Are they useful, or would they just add unnecessary complexity? Would
people would like to see these included in the package?
Thank you for your feedback.