chi squared (X2) in Python

ts8807385 · Feb 16, 2009

I was wondering if anyone has done this in Python. I wrote two
functions that do it (I think... see below), but I do not understand
how to interpret the results. I'm doing an experiment to implement ent
in Python. ent tests the randomness of files and chi squared is
probably the best test for this purposes when compared to other tests.
Many of the statistical tests are easy (like Arithmetic Mean, etc) and
I have no problems interpreting the results from those, but chi
squared has stumped me. Here are my two simple functions, run them if
you like to better understand the output:

import os
import os.path

def observed(f):

# argument f is a filepath/filename
#
# Return a list of observed characters in decimal ord(char).
# Decimal value of characters may be 0 through 255.
# [43, 54, 0, 255, 4, etc.]

chars = []
#print f

fd = open(f, 'rb')
bytes = fd.read(13312)
fd.close()

for byte in bytes:
chars.append(ord(byte))

#print chars

if len(chars) != 13312:
print "Wait... chars does not equal 13312 in observed!!!"
return None
else:
return chars

def chi(char_list):

# Expected frequency of characters. I arrived at this like so:
# expected = number of observations/number of possibilities
# 52 = 13312/256

expected = 52.0

print "observed\texpected\tx2"

# 0 - 255
for x in range(0,256):
observed = 0
for char in char_list:
if x == char:
observed +=1

# The three chi squared calculations
# one = observed - expected
# two = one squared
# x2 = two/expected

# x2 = (observed - expected) squared
# ----------------------------
# expected

one = observed - expected
two = one * one
x2 = two/expected

print observed, "\t", expected, "\t", x2

chi(observed("filepath"))

The output looks similar to this:

observed expected x2
62 52.0 1.92307692308
46 52.0 0.692307692308
60 52.0 1.23076923077
68 52.0 4.92307692308

I know this is a bit off-topic here, just hoping someone could help me
interpret the x2 variable. After that, I'll be OK. I need to sum up
things to get an overall x2 for the bytes I've read, but before doing
that, I wanted to post this note. Please feel free to comment on any
aspect of this. If I've got something entirely wrong, let me know.
BTW, I selected 13KB (13,312) as it seems to be efficient and a decent
size to test, the data could be any amount (up to and including the
whole file) above this.

Thanks,

Tiff

Minimising chi square to fit two parameters	1	Dec 11, 2022
chi-squared tests in python?	5	Jan 17, 2006
difference between random module in python 2.6 and 3.2?	12	Feb 6, 2012
Encodign issue in Python 3.3.1 (once again)	42	May 26, 2013
Partly erratic wrong behaviour, Python 3, lxml	5	Mar 4, 2010
Python point location of intersect between two lines	0	Feb 28, 2018
Hints for writing bit-twiddling code in Python	3	Dec 7, 2011
statistical analysis tools in python?	3	Jul 13, 2006

chi squared (X2) in Python

ts8807385

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads