Looking for crossfold validation code

M

Mark Livingstone

Hello,

I am doing research as part of a Uni research Scholarship into using
data compression for classification. What I am looking for is python
code to handle the crossfold validation side of things for me - that
will take my testing / training corpus and create the testing /
training files after asking me for number of folds and number of times
(or maybe allow me to enter a random seed or offset instead of times.)
I could then either hook my classifier into the program or use it in a
separate step.

Probably not very hard to write, but why reinvent the wheel ;-)

Thanks in advance,

MarkL
 
S

Sandy

Following is the code I use. I got it from web, but forgot the link.

def k_fold_cross_validation(X, K, randomise = False):
"""
Generates K (training, validation) pairs from the items in X.

Each pair is a partition of X, where validation is an iterable
of length len(X)/K. So each training iterable is of length
(K-1)*len(X)/K.

If randomise is true, a copy of X is shuffled before partitioning,
otherwise its order is preserved in training and validation.
"""
if randomise: from random import shuffle; X=list(X); shuffle(X)
for k in xrange(K):
training = [x for i, x in enumerate(X) if i % K != k]
validation = [x for i, x in enumerate(X) if i % K == k]
yield training, validation


Cheers,
dksr
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top