Voting Project Needs Python People

A

Andrew Dalke

Alan Dechert:
Let me try to clarify what I'm after. The paper record should always match
the electronic record. So the allowable defects is zero. If there is a
mismatch found in the sample, we don't publish the electronic tally: we take
all the paper ballots and check them.

You cannot get zero allowable defects. As best, you can get the likelihood
of allowable defects to be less than one.

For example, there will be hardware problems. Parts break down,
the machines are exposed to large temperature swings, the scanners
might be confused by a swarm of bugs, pages can stick together because
of the humidity, ballots can be left in the trunk of a car 'by accident'
(happened here in NM for the 2000 election).

You can spend more money to reduce the likelihood of error. You
can develop higher quality scanners and you can hire more people to
review the ballots. But even if you scan the ballots twice by machine
(each with different mechanisms) and another time by human, with
descrepancies broken by another inspection, you still have the chance
that all of those steps fail. A low chance, but still a chance.

Even in the method you propose (machine scanning with a human
double checking a sample), suppose the human doing the sampling
doesn't notice the machine's result is wrong? (Boredom is insidious.)
So, if by the preliminary electronic tally a candidate won a race by 1
percent, I want to know how many ballots we have to check (random sample) to
be certain that the result predicted is true.

Right. In this case you shouldn't mind errors - you just want to be 99.99%
sure that there's less than a, say, 0.05% error rate.
When I put one million into this Confidence Level Calculator, and Acceptable
Quality Level of .01, a sample of 10,000 shows a confidence level of "1." A
sample of 2000 give C.L of 0.999999998 Presumably, "1" is really
.9999999999+ more 9s. Can we get more decimal places?

At most 17 9s and possibly less. Python's 'str' rounds to 1 at 13 places,
while its 'repr' (which uses a more precise method) takes 17.

More likely it depends on whoever wrote the calculator you're using.

If you want 100% certainty (and not 99.9999+%) then you'll end up
having to verify everything, which means you are no longer sampling
the results. But the reason to use sampling is because the verification
step is too expensive to apply it to everything, and because if you
*did* test everything then you'll still need a sample to verify your
verifiers. (Whether they be human or machine.)

Do you have an estimate for the failure rate in the verifiers?
So I guess the Lot Fraction Defective is analgous to the predicted victory
margin. Is that right?

I'm the wrong one for that - can you go over to the nearest statistics
department and ask for assistance?

Andrew
(e-mail address removed)
 
I

Ian Smith

Alan Dechert said:
Thanks, Ian. I could not quite figure out if your binomial distribution
calculator would be applicable.

Here's what Bill Buck ( (e-mail address removed) ) sent me. It turns out there is a
BINOMDIST function in Excel. I think it might be what I want.

http://home.earthlink.net/~adechert/LOTAcceptanceCalculator.xls

Let me try to clarify what I'm after. The paper record should always match
the electronic record. So the allowable defects is zero. If there is a
mismatch found in the sample, we don't publish the electronic tally: we take
all the paper ballots and check them.

We are talking about an election conducted with computer-generated paper
ballots. The paper ballots represent the actual vote since these ballots
are what voters actually saw, verified, and cast. We will have an
electronic record obtained from the computers which should match each paper
ballot generated. We want to use the electronic record since it will give
us an instant result -- but we have to check it against the paper ballots to
be sure the election result is correct. So, in this scenario, the
electronic count is a prediction (or preliminary tally).

So, if by the preliminary electronic tally a candidate won a race by 1
percent, I want to know how many ballots we have to check (random sample) to
be certain that the result predicted is true.

When I put one million into this Confidence Level Calculator, and Acceptable
Quality Level of .01, a sample of 10,000 shows a confidence level of "1." A
sample of 2000 give C.L of 0.999999998 Presumably, "1" is really
.9999999999+ more 9s. Can we get more decimal places?

So I guess the Lot Fraction Defective is analgous to the predicted victory
margin. Is that right?

I would still like a standalone calculator that doesn't require Excel.

Alan Dechert


The BINOMDIST function in Excel calculates either the cumulative
probability function or the probability mass function for the binomial
distribution. BINOMDIST is usually accurate when it works but is known
to fail with large numbers of trials, small event probabilities etc.
The functions used by the calculators in
http://members.aol.com/iandjmsmith/EXAMPLES.HTM are somewhat more
robust and more accurate than BINOMDIST.

I gather what you want is to choose a sample size based on how may
mismatches there would have to be to change the result. If any
mismatches are found then you take all the ballot papers and check
them. This would mean that the closer the result was, the bigger the
sample size would have to be. In the case where the electronic count
is a dead heat all ballot papers would have to be checked!

If so then the spreadsheet at
http://home.earthlink.net/~adechert/LOTAcceptanceCalculator.xls is
doing roughly what you want. There appears to be a slight error in
there where it uses $C$4 instead of $D$4 in cells C15 to V42, although
it is irrelevant to you since the value in $C$4 is 0 anyway. If it
used =BINOMDIST() instead of =1-BINOMDIST() in cells C15 to V42 then
you would be able to see how small the probability of not seeing a
mismatch is (i.e. you don't really need more decimal places). What I'm
not sure of is whether "The Lot Fraction Defective" is analagous to
the predicted victory margin or to twice the predicted victory margin.
This would depend on whether mismatches were ignored or counted the
other way round.


The only real problem is that since the sample size could be large
relative to the population size, the binomial distribution may not
provide an adequate approximation to the Hypergeometric distribution
(I assume you will be using sampling without replacement - the
binomial is strictly only correct for sampling with replacement).


Using the Hypergeometric calculator, if we have 1 million ballot
papers and 100 mismatches could cause the result to be changed then
the probability of observing 0 mismatches will depend on the sample
size as follows

sample size probability of failing to detect a mismatch when it might
matter
100 0.99
1000 0.90
10000 0.37
1000000 2.65e-5
2000000 2.03e-10

If we have 1 million ballot papers and 10000 mismatches could cause
the result to be changed then the probability of observing 0
mismatches will depend on the sample size as follows

sample size probability of failing to detect a mismatch when it might
matter
100 0.37
1000 4.30e-5
10000 1.35e-44


While the calculators at
http://members.aol.com/iandjmsmith/EXAMPLES.HTM are useful in their
own right, they are really supposed to be a demonstration of how to
use the functions in myfunctions.js so people can then use the
functions in their own software! In this case what we really want is
to choose the sample size required given a total number of ballots,
the critical number of mismatches and the probability that we do not
detect a mismatch. The calculations are fairly simple and you can
build your own calculator for this quickly and easily (see
http://members.aol.com/iandjmsmith/reqss.htm).


With this calculator, we can see that if we have a million ballot
papers and the probability of failure to detect a mismatch is 1e-6,
then the minimum sample size is related to the critical number of
mismatches as follows:-

Critical
number of
mismatches Sample size required
1 999999
10 748808
100 129031
1000 13714
10000 1374
100000 132
200000 62
300000 39
500000 20
900000 6


Ian Smith
 
A

Andrew Dalke

At present there are 50 people who paid the non-refundable fee
for running for the California governship. The article I just read
said the standard ballot used could handle up to 300 entires.

That brings to mind one of the things I don't like about the
touchscreen solution -- I like to see my whole ballot at once.
In this case, could you even fit 50 names for this one category
on the screen at once and be readable enough?

Then again, I would also like a desk sized display with
200+ dpi. Just gotta wait 10 years, right?

Andrew
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top