J
Jeroen Hegeman
Dear Pythoneers,
I'm moderately new to python and it got me completely lost already.
I've got a bunch of large (30MB) txt files containing one 'event' per
line. I open files after each other, read them line by line and from
each line build a 'data structure' of a main class (HugeClass)
containing some simple information as well as several instances of
some other classes.
No problem so far, but I noticed that the first file was always
faster than the others, whereas I would expect it to be slower, if
anything. Testing with two copies of the same file shows the same
behaviour.
Below is a (rather large, I'll explain) chunk of code. I ran this in
a directory with two test files called 'test_file0.txt' and
'test_file1.txt', each containing 10k lines of the same information
as the 'long_line' variable in the code. This shows the following
timing (consistently) for the little piece of code that reads all
lines from file:
....processing all 2 files found
--> 1/2: ./test_file0.txt
Now reading ...
DEBUG readLines A took 0.093 s
....took 8.85717201233 seconds
--> 2/2: ./test_file0.txt
Now reading ...
DEBUG readLines A took 3.917 s
....took 12.8725550175 seconds
So the first time around the file gets read in in ~0.1 seconds, the
second time around it needs almost four seconds! As far as I can see
this is related to 'something in memory being copied around' since if
I replace the 'alternative 1' by the 'alternative 2', basically
making sure that my classes are not used, reading time the second
time around drops back to normal (= roughly what it is the first pass).
I already want to apologise for the size of the code chunk below. I
know about 'minimal reproducible examples' and such but I found out
that if I commented out the filling (and thus binding) of some of the
member variables in the lower-level classes, the problem (sometimes)
also disappears. That also points to some magic happening in memory?
I probably mucked something up but I'm really lost as to where. Any
help would be appreciated.
The original problem showed up using Python 2.4.3 under linux (Fedora
Core 1).
Python 2.3.5 on OS X 10.4.10 (PPC) appears not to show this issue(?).
Thanks,
Jeroen
P.S. Any ideas on optimising the input to the classes would be
welcome too ;-)
Jeroen Hegeman
jeroen DOT hegeman AT gmail DOT com
===================Start of code chunk=========================
#!/usr/bin/env python
import time
import sys
import os
import gzip
import pdb
long_line =
"1,31905,0,174501,46152419,2117961,143,-1.0000,51,2,-19.9139,42,-19.9140
,
6.6002,0,0,0,46713.1484,2,0.0000,-1,1.4203220606,0.3876158297,147.121017
4561,147.1284120973,-2,0.0000,-1,1.5887237787,-2.4011900425,-319.7776794
434,319.7906836817,4,21,0.0000,-1,-0.5672637224,2.2052443027,-43.2842369
080,43.3440905719,21,0.0000,-1,-0.8540721536,0.0770076364,-22.7033920288
,
22.7195827425,21,0.0000,-1,0.1623233557,0.5845987201,-28.0794525146,28.0
860084170,21,0.0000,-1,0.1943928897,-0.2195242196,-22.0666370392,22.0685
899391,6,0.0000,-1,-40.1810989380,-127.0743789673,-104.9231948853,239.74
36794163,-6,0.0000,-1,43.2013626099,125.0640945435,-67.7339172363,227.17
53587387,24,0.0000,-1,-57.9123306274,-17.3483123779,-71.8334121704,123.4
397648033,-24,0.0000,-1,84.0985488892,54.4542312622,-62.4525032043,144.5
299239704,5,0.0000,-1,17.7312316895,-109.7260665894,-33.0897827148,116.3
039146130,-5,0.0000,-1,-40.8971862793,70.6098632812,-5.2814140320,82.645
4347683,4,0.0000,-1,-6.2859884724,-17.9586020410,-58.9464384913,69.40294
68585,-3,0.0000,-1,-51.6263811588,0.6104701459,-12.8869901896,54.0368221
571,3,0.0000,-1,16.4690684490,48.0271777511,-51.7867884636,74.5327484701
,-4,0.0000,-1,67.6295298338,6.4269350171,-10.6658525467,69.9971834876,7,
7,1.0345464706e+01,-7.0800781250e+01,-2.0385742187e+01,7.5256346272e
+01,1.3148,0.0072,0.0072,1.3148,0.0072,0.0072,1.0255,1.0413,0.0,0.0,0.0,
0.0,-1.0,-4.2383,49.5276,13,0.1537,0.5156,0,0.9982,0.0034,1.0000,7,1,0.9
566,0.0062,1,0,2,1.2736,1,7.8407,1,0,2,1.2736,1,7.8407,0,0,-1.0,-1.0,5,1
,-2.4047853470e+01,4.0832519531e+01,-3.8452150822e+00,4.7851562559e
+01,1.3383,0.0051,0.0051,1.3383,0.0051,0.0051,0.9340,0.9541,0.0,0.0,0.0,
0.0,-1.0,-2.4609,21.3916,7,0.1166,0.5977,0,0.9999,0.0052,1.0000,9,1,0.99
47,0.0063,1,0,2,0.7735,1,74.7937,1,0,2,0.7735,1,74.7937,0,0,-1.0,-1.0,5,
1,-4.4067382812e+01,2.5634796619e+00,-1.1138916016e+01,4.6203614579e
+01,1.3533,0.0054,0.0054,1.3533,0.0054,0.0054,1.0486,1.0903,0.0,0.0,0.0,
0.0,-1.0,-3.9648,31.3733,13,0.1767,0.5508,100,0.9977,0.0040,1.0000,9,1,0
..
0000,0.4349,0,0,0,0.0000,0,-1000.0000,0,0,0,0.0000,0,-1000.0000,0,0,-1.0
,-1.0,0,1,3.7200927734e+01,2.7465817928e+00,-5.5847163200e
+00,3.7994386563e
+01,1.3634,0.0062,0.0062,1.6488,0.0385,0.0385,0.7141,0.9013,5.3986899118
e+00,6.6766492833e-01,-2.3780213181e-01,5.4460399892e
+00,0.5504,-3.1445,0.7776,9,0.1169,0.7734,0,0.9977,0.0040,1.0000,7,1,0.0
000,0.1099,0,0,0,0.0000,0,-1000.0000,0,0,0,0.0000,0,-1000.0000,1,-1,5.38
93,0.5459,4,1,1.2969970703e+01,3.3203125000e+01,-3.7231445312e
+01,5.2001951876e
+01,1.4414,0.0129,0.0129,1.4414,0.0129,0.0129,0.9019,0.7331,0.0,0.0,0.0,
0.0,-1.0,-10.0195,12.2034,17,0.1922,0.3633,0,0.9774,0.0248,1.0000,6,1,0.
0000,0.3523,0,0,0,0.0000,0,-1000.0000,0,0,0,0.0000,0,-1000.0000,0,0,-1.0
,-1.0,0,1,-1.6174327135e+00,-7.1411132812e+00,-1.8798828125e
+01,2.0202637222e
+01,1.7886,0.0352,0.0352,1.7886,0.0352,0.0352,1.8257,1.2368,0.0,0.0,0.0,
0.0,-1.0,-17.3438,45.6714,10,0.1529,0.5625,0,0.9898,0.0094,1.0000,3,1,-1
..
0000,10000.0000,0,0,0,-1.0000,0,-1.0000,0,0,0,-1.0000,0,-1.0000,0,0,-1.0
,-1.0,-6,0,-5.9204106331e+00,-3.4484868050e+00,-6.5307617187e
+00,9.6740722971e
+00,1.6782,0.0326,0.0326,1.6782,0.0326,0.0326,1.0000,1.0000,0.0,0.0,0.0,
0.0,-1.0,-9.4727,37.3401,13,0.2711,0.2344,100,0.9861,0.0045,1.0000,3,1,-
1.0000,10000.0000,0,0,0,-1.0000,0,-1.0000,0,0,0,-1.0000,0,-1.0000,0,0,-1
..0,-1.0,-6,0"
########################################################################
###
class SmallClass:
def __init__(self):
return
def input(self, line, c):
self.item0 = int(line[c]); c += 1
self.item1 = float(line[c]); c += 1
self.item2 = int(line[c]); c += 1
self.item3 = float(line[c]); c += 1
self.item4 = float(line[c]); c += 1
self.item5 = float(line[c]); c += 1
self.item6 = float(line[c]); c += 1
return c
########################################################################
###
class ModerateClass:
def __init__(self):
return
def __del__(self):
pass
return
def input(self, line, c):
self.items = {}
self.item0 = float(line[c]);
c += 1
unit1 = SmallClass()
c = unit1.input(line, c)
self.items[len(self.items)] = unit1
unit2 = SmallClass()
c = unit2.input(line, c)
self.items[len(self.items)] = unit2
units_chunk = []
chunk_size = int(line[c])
c += 1
for i in xrange(chunk_size):
unit = SmallClass()
c = unit.input(line, c)
units_chunk.append(unit)
for i in xrange(10):
unit = SmallClass()
c = unit.input(line, c)
return c
########################################################################
###
class LongClass:
def __init__(self):
return
def clear(self):
return
def input(self, foo, c):
self.item0 = float(foo[c]); c += 1
self.item1 = float(foo[c]); c += 1
self.item2 = float(foo[c]); c += 1
self.item3 = float(foo[c]); c += 1
self.item4 = float(foo[c]); c+=1
self.item5 = float(foo[c]); c+=1
self.item6 = float(foo[c]); c+=1
self.item7 = float(foo[c]); c+=1
self.item8 = float(foo[c]); c+=1
self.item9 = float(foo[c]); c+=1
self.item10 = float(foo[c]); c+=1
self.item11 = float(foo[c]); c+=1
self.item12 = float(foo[c]); c += 1
self.item13 = float(foo[c]); c += 1
self.item14 = float(foo[c]); c += 1
self.item15 = float(foo[c]); c += 1
self.item16 = float(foo[c]); c+=1
self.item17 = float(foo[c]); c+=1
self.item18 = float(foo[c]); c+=1
self.item19 = int(foo[c]); c+=1
self.item20 = float(foo[c]); c+=1
self.item21 = float(foo[c]); c+=1
self.item22 = int(foo[c]); c+=1
self.item23 = float(foo[c]); c += 1
self.item24 = float(foo[c]); c += 1
self.item25 = float(foo[c]); c+=1
self.item26 = int(foo[c]); c+=1
self.item27 = bool(int(foo[c])); c+=1
self.item28 = float(foo[c]); c+=1
self.item29 = float(foo[c]); c+=1
self.item30 = (foo[c] == "1"); c += 1
self.item31 = (foo[c] == "1"); c += 1
self.item32 = float(foo[c]); c += 1
self.item33 = float(foo[c]); c += 1
self.item34 = int(foo[c]); c += 1
self.item35 = float(foo[c]); c += 1
self.item36 = (foo[c] == "1"); c+=1
self.item37 = (foo[c] == "1"); c+=1
self.item38 = float(foo[c]); c += 1
self.item39 = float(foo[c]); c += 1
self.item40 = int(foo[c]); c += 1
self.item41 = float(foo[c]); c += 1
self.item42 = (foo[c] == "1"); c+=1
self.item43 = float(foo[c]); c+=1
self.item44 = float(foo[c]); c+=1
self.item45 = float(foo[c]); c += 1
self.item46 = int(foo[c]); c+=1
self.item47 = bool(int(foo[c])); c+=1
return c
########################################################################
###
class HugeClass:
def __init__(self,line):
self.clear()
self.input(line)
return
def __del__(self):
del self.B4v
return
def clear(self):
self.long_classes = {}
self.B4v={}
return
def input(self, line):
try:
foo = line.strip().split(',')
c = 0
self.asciiVersion = float(foo[c])
c += 1
self.item0 = foo[c]; c += 1
self.item1 = (self.item0 != "0")
self.item2 = (foo[c] == "1"); c += 1
self.item3=int(foo[c]); c+=1
self.item4=int(foo[c]); c+=1
self.item5=int(foo[c]); c+=1
self.item6=int(foo[c]); c += 1
self.item7=float(foo[c]); c+=1
self.item8 = foo[c]; c += 1
bit_item = int(self.item8)
self.item9 = bool(bit_item & 2048)
self.item10 = bool(bit_item & 1024)
self.item11 = bool(bit_item & 512)
self.item12 = bool(bit_item & 256)
self.item13 = bool(bit_item & 128)
self.item14 = bool(bit_item & 64)
self.item15 = bool(bit_item & 32)
self.item16 = bool(bit_item & 16)
self.item17 = bool(bit_item & 8)
self.item18 = bool(bit_item & 4)
self.item19 = bool(bit_item & 2)
self.item20 = bool(bit_item & 1)
self.item21 = int(foo[c]); c+=1
self.item22 = float(foo[c]); c+=1
self.item23 = int(foo[c]); c+=1
self.item24 = float(foo[c]); c+=1
self.item25 = float(foo[c]); c+=1
self.item26 = foo[c]; c+=1
self.item27 = int(foo[c]); c+=1
self.item28 = int(foo[c]); c+=1
self.item29 = ModerateClass()
c = self.item29.input(foo, c)
self.item30 = int(foo[c]); c+=1
self.item31 = int(foo[c]); c+=1
for i in xrange(self.item31):
unit = LongClass()
c = unit.input(foo, c)
self.long_classes[len(self.long_classes)] = unit
assert(c == len(foo)), "ERROR We did not read the whole
line!!!"
except (ValueError,IndexError), msg:
print >> sys.stderr, \
"ERROR Trouble reading line: `%(msg)s'" % vars()
self.clear()
return
return
########################################################################
###
def readLines(f):
DATA = []
f.seek(0)
time_a = time.time()
for i in f:
DATA.append(i)
time_b = time.time()
time_spent_reading = time_b - time_a
print "DEBUG readLines took %.3f s" % time_spent_reading
return DATA
########################################################################
###
def ReadClasses(filename):
print 'Now reading ...'
built_classes = {}
# Read lines from file
in_file = open(filename, 'r')
LINES = readLines(in_file)
in_file.close()
# and interpret them.
for i in LINES:
## This is alternative 1.
built_classes[len(built_classes)] = HugeClass(long_line)
## The next line is alternative 2.
## built_classes[len(built_classes)] = long_line
del LINES
return
########################################################################
###
def ProcessList():
input_files = ["./test_file0.txt",
"./test_file0.txt"]
# Loop over all files that we found.
nfiles = len(input_files)
file_index = 0
for i in input_files:
print "--> %i/%i: %s" % (file_index+1, nfiles, i)
ReadClasses(i)
file_index += 1
return
########################################################################
###
if __name__ == "__main__":
ProcessList()
sys.exit(0)
########################################################################
###
I'm moderately new to python and it got me completely lost already.
I've got a bunch of large (30MB) txt files containing one 'event' per
line. I open files after each other, read them line by line and from
each line build a 'data structure' of a main class (HugeClass)
containing some simple information as well as several instances of
some other classes.
No problem so far, but I noticed that the first file was always
faster than the others, whereas I would expect it to be slower, if
anything. Testing with two copies of the same file shows the same
behaviour.
Below is a (rather large, I'll explain) chunk of code. I ran this in
a directory with two test files called 'test_file0.txt' and
'test_file1.txt', each containing 10k lines of the same information
as the 'long_line' variable in the code. This shows the following
timing (consistently) for the little piece of code that reads all
lines from file:
....processing all 2 files found
--> 1/2: ./test_file0.txt
Now reading ...
DEBUG readLines A took 0.093 s
....took 8.85717201233 seconds
--> 2/2: ./test_file0.txt
Now reading ...
DEBUG readLines A took 3.917 s
....took 12.8725550175 seconds
So the first time around the file gets read in in ~0.1 seconds, the
second time around it needs almost four seconds! As far as I can see
this is related to 'something in memory being copied around' since if
I replace the 'alternative 1' by the 'alternative 2', basically
making sure that my classes are not used, reading time the second
time around drops back to normal (= roughly what it is the first pass).
I already want to apologise for the size of the code chunk below. I
know about 'minimal reproducible examples' and such but I found out
that if I commented out the filling (and thus binding) of some of the
member variables in the lower-level classes, the problem (sometimes)
also disappears. That also points to some magic happening in memory?
I probably mucked something up but I'm really lost as to where. Any
help would be appreciated.
The original problem showed up using Python 2.4.3 under linux (Fedora
Core 1).
Python 2.3.5 on OS X 10.4.10 (PPC) appears not to show this issue(?).
Thanks,
Jeroen
P.S. Any ideas on optimising the input to the classes would be
welcome too ;-)
Jeroen Hegeman
jeroen DOT hegeman AT gmail DOT com
===================Start of code chunk=========================
#!/usr/bin/env python
import time
import sys
import os
import gzip
import pdb
long_line =
"1,31905,0,174501,46152419,2117961,143,-1.0000,51,2,-19.9139,42,-19.9140
,
6.6002,0,0,0,46713.1484,2,0.0000,-1,1.4203220606,0.3876158297,147.121017
4561,147.1284120973,-2,0.0000,-1,1.5887237787,-2.4011900425,-319.7776794
434,319.7906836817,4,21,0.0000,-1,-0.5672637224,2.2052443027,-43.2842369
080,43.3440905719,21,0.0000,-1,-0.8540721536,0.0770076364,-22.7033920288
,
22.7195827425,21,0.0000,-1,0.1623233557,0.5845987201,-28.0794525146,28.0
860084170,21,0.0000,-1,0.1943928897,-0.2195242196,-22.0666370392,22.0685
899391,6,0.0000,-1,-40.1810989380,-127.0743789673,-104.9231948853,239.74
36794163,-6,0.0000,-1,43.2013626099,125.0640945435,-67.7339172363,227.17
53587387,24,0.0000,-1,-57.9123306274,-17.3483123779,-71.8334121704,123.4
397648033,-24,0.0000,-1,84.0985488892,54.4542312622,-62.4525032043,144.5
299239704,5,0.0000,-1,17.7312316895,-109.7260665894,-33.0897827148,116.3
039146130,-5,0.0000,-1,-40.8971862793,70.6098632812,-5.2814140320,82.645
4347683,4,0.0000,-1,-6.2859884724,-17.9586020410,-58.9464384913,69.40294
68585,-3,0.0000,-1,-51.6263811588,0.6104701459,-12.8869901896,54.0368221
571,3,0.0000,-1,16.4690684490,48.0271777511,-51.7867884636,74.5327484701
,-4,0.0000,-1,67.6295298338,6.4269350171,-10.6658525467,69.9971834876,7,
7,1.0345464706e+01,-7.0800781250e+01,-2.0385742187e+01,7.5256346272e
+01,1.3148,0.0072,0.0072,1.3148,0.0072,0.0072,1.0255,1.0413,0.0,0.0,0.0,
0.0,-1.0,-4.2383,49.5276,13,0.1537,0.5156,0,0.9982,0.0034,1.0000,7,1,0.9
566,0.0062,1,0,2,1.2736,1,7.8407,1,0,2,1.2736,1,7.8407,0,0,-1.0,-1.0,5,1
,-2.4047853470e+01,4.0832519531e+01,-3.8452150822e+00,4.7851562559e
+01,1.3383,0.0051,0.0051,1.3383,0.0051,0.0051,0.9340,0.9541,0.0,0.0,0.0,
0.0,-1.0,-2.4609,21.3916,7,0.1166,0.5977,0,0.9999,0.0052,1.0000,9,1,0.99
47,0.0063,1,0,2,0.7735,1,74.7937,1,0,2,0.7735,1,74.7937,0,0,-1.0,-1.0,5,
1,-4.4067382812e+01,2.5634796619e+00,-1.1138916016e+01,4.6203614579e
+01,1.3533,0.0054,0.0054,1.3533,0.0054,0.0054,1.0486,1.0903,0.0,0.0,0.0,
0.0,-1.0,-3.9648,31.3733,13,0.1767,0.5508,100,0.9977,0.0040,1.0000,9,1,0
..
0000,0.4349,0,0,0,0.0000,0,-1000.0000,0,0,0,0.0000,0,-1000.0000,0,0,-1.0
,-1.0,0,1,3.7200927734e+01,2.7465817928e+00,-5.5847163200e
+00,3.7994386563e
+01,1.3634,0.0062,0.0062,1.6488,0.0385,0.0385,0.7141,0.9013,5.3986899118
e+00,6.6766492833e-01,-2.3780213181e-01,5.4460399892e
+00,0.5504,-3.1445,0.7776,9,0.1169,0.7734,0,0.9977,0.0040,1.0000,7,1,0.0
000,0.1099,0,0,0,0.0000,0,-1000.0000,0,0,0,0.0000,0,-1000.0000,1,-1,5.38
93,0.5459,4,1,1.2969970703e+01,3.3203125000e+01,-3.7231445312e
+01,5.2001951876e
+01,1.4414,0.0129,0.0129,1.4414,0.0129,0.0129,0.9019,0.7331,0.0,0.0,0.0,
0.0,-1.0,-10.0195,12.2034,17,0.1922,0.3633,0,0.9774,0.0248,1.0000,6,1,0.
0000,0.3523,0,0,0,0.0000,0,-1000.0000,0,0,0,0.0000,0,-1000.0000,0,0,-1.0
,-1.0,0,1,-1.6174327135e+00,-7.1411132812e+00,-1.8798828125e
+01,2.0202637222e
+01,1.7886,0.0352,0.0352,1.7886,0.0352,0.0352,1.8257,1.2368,0.0,0.0,0.0,
0.0,-1.0,-17.3438,45.6714,10,0.1529,0.5625,0,0.9898,0.0094,1.0000,3,1,-1
..
0000,10000.0000,0,0,0,-1.0000,0,-1.0000,0,0,0,-1.0000,0,-1.0000,0,0,-1.0
,-1.0,-6,0,-5.9204106331e+00,-3.4484868050e+00,-6.5307617187e
+00,9.6740722971e
+00,1.6782,0.0326,0.0326,1.6782,0.0326,0.0326,1.0000,1.0000,0.0,0.0,0.0,
0.0,-1.0,-9.4727,37.3401,13,0.2711,0.2344,100,0.9861,0.0045,1.0000,3,1,-
1.0000,10000.0000,0,0,0,-1.0000,0,-1.0000,0,0,0,-1.0000,0,-1.0000,0,0,-1
..0,-1.0,-6,0"
########################################################################
###
class SmallClass:
def __init__(self):
return
def input(self, line, c):
self.item0 = int(line[c]); c += 1
self.item1 = float(line[c]); c += 1
self.item2 = int(line[c]); c += 1
self.item3 = float(line[c]); c += 1
self.item4 = float(line[c]); c += 1
self.item5 = float(line[c]); c += 1
self.item6 = float(line[c]); c += 1
return c
########################################################################
###
class ModerateClass:
def __init__(self):
return
def __del__(self):
pass
return
def input(self, line, c):
self.items = {}
self.item0 = float(line[c]);
c += 1
unit1 = SmallClass()
c = unit1.input(line, c)
self.items[len(self.items)] = unit1
unit2 = SmallClass()
c = unit2.input(line, c)
self.items[len(self.items)] = unit2
units_chunk = []
chunk_size = int(line[c])
c += 1
for i in xrange(chunk_size):
unit = SmallClass()
c = unit.input(line, c)
units_chunk.append(unit)
for i in xrange(10):
unit = SmallClass()
c = unit.input(line, c)
return c
########################################################################
###
class LongClass:
def __init__(self):
return
def clear(self):
return
def input(self, foo, c):
self.item0 = float(foo[c]); c += 1
self.item1 = float(foo[c]); c += 1
self.item2 = float(foo[c]); c += 1
self.item3 = float(foo[c]); c += 1
self.item4 = float(foo[c]); c+=1
self.item5 = float(foo[c]); c+=1
self.item6 = float(foo[c]); c+=1
self.item7 = float(foo[c]); c+=1
self.item8 = float(foo[c]); c+=1
self.item9 = float(foo[c]); c+=1
self.item10 = float(foo[c]); c+=1
self.item11 = float(foo[c]); c+=1
self.item12 = float(foo[c]); c += 1
self.item13 = float(foo[c]); c += 1
self.item14 = float(foo[c]); c += 1
self.item15 = float(foo[c]); c += 1
self.item16 = float(foo[c]); c+=1
self.item17 = float(foo[c]); c+=1
self.item18 = float(foo[c]); c+=1
self.item19 = int(foo[c]); c+=1
self.item20 = float(foo[c]); c+=1
self.item21 = float(foo[c]); c+=1
self.item22 = int(foo[c]); c+=1
self.item23 = float(foo[c]); c += 1
self.item24 = float(foo[c]); c += 1
self.item25 = float(foo[c]); c+=1
self.item26 = int(foo[c]); c+=1
self.item27 = bool(int(foo[c])); c+=1
self.item28 = float(foo[c]); c+=1
self.item29 = float(foo[c]); c+=1
self.item30 = (foo[c] == "1"); c += 1
self.item31 = (foo[c] == "1"); c += 1
self.item32 = float(foo[c]); c += 1
self.item33 = float(foo[c]); c += 1
self.item34 = int(foo[c]); c += 1
self.item35 = float(foo[c]); c += 1
self.item36 = (foo[c] == "1"); c+=1
self.item37 = (foo[c] == "1"); c+=1
self.item38 = float(foo[c]); c += 1
self.item39 = float(foo[c]); c += 1
self.item40 = int(foo[c]); c += 1
self.item41 = float(foo[c]); c += 1
self.item42 = (foo[c] == "1"); c+=1
self.item43 = float(foo[c]); c+=1
self.item44 = float(foo[c]); c+=1
self.item45 = float(foo[c]); c += 1
self.item46 = int(foo[c]); c+=1
self.item47 = bool(int(foo[c])); c+=1
return c
########################################################################
###
class HugeClass:
def __init__(self,line):
self.clear()
self.input(line)
return
def __del__(self):
del self.B4v
return
def clear(self):
self.long_classes = {}
self.B4v={}
return
def input(self, line):
try:
foo = line.strip().split(',')
c = 0
self.asciiVersion = float(foo[c])
c += 1
self.item0 = foo[c]; c += 1
self.item1 = (self.item0 != "0")
self.item2 = (foo[c] == "1"); c += 1
self.item3=int(foo[c]); c+=1
self.item4=int(foo[c]); c+=1
self.item5=int(foo[c]); c+=1
self.item6=int(foo[c]); c += 1
self.item7=float(foo[c]); c+=1
self.item8 = foo[c]; c += 1
bit_item = int(self.item8)
self.item9 = bool(bit_item & 2048)
self.item10 = bool(bit_item & 1024)
self.item11 = bool(bit_item & 512)
self.item12 = bool(bit_item & 256)
self.item13 = bool(bit_item & 128)
self.item14 = bool(bit_item & 64)
self.item15 = bool(bit_item & 32)
self.item16 = bool(bit_item & 16)
self.item17 = bool(bit_item & 8)
self.item18 = bool(bit_item & 4)
self.item19 = bool(bit_item & 2)
self.item20 = bool(bit_item & 1)
self.item21 = int(foo[c]); c+=1
self.item22 = float(foo[c]); c+=1
self.item23 = int(foo[c]); c+=1
self.item24 = float(foo[c]); c+=1
self.item25 = float(foo[c]); c+=1
self.item26 = foo[c]; c+=1
self.item27 = int(foo[c]); c+=1
self.item28 = int(foo[c]); c+=1
self.item29 = ModerateClass()
c = self.item29.input(foo, c)
self.item30 = int(foo[c]); c+=1
self.item31 = int(foo[c]); c+=1
for i in xrange(self.item31):
unit = LongClass()
c = unit.input(foo, c)
self.long_classes[len(self.long_classes)] = unit
assert(c == len(foo)), "ERROR We did not read the whole
line!!!"
except (ValueError,IndexError), msg:
print >> sys.stderr, \
"ERROR Trouble reading line: `%(msg)s'" % vars()
self.clear()
return
return
########################################################################
###
def readLines(f):
DATA = []
f.seek(0)
time_a = time.time()
for i in f:
DATA.append(i)
time_b = time.time()
time_spent_reading = time_b - time_a
print "DEBUG readLines took %.3f s" % time_spent_reading
return DATA
########################################################################
###
def ReadClasses(filename):
print 'Now reading ...'
built_classes = {}
# Read lines from file
in_file = open(filename, 'r')
LINES = readLines(in_file)
in_file.close()
# and interpret them.
for i in LINES:
## This is alternative 1.
built_classes[len(built_classes)] = HugeClass(long_line)
## The next line is alternative 2.
## built_classes[len(built_classes)] = long_line
del LINES
return
########################################################################
###
def ProcessList():
input_files = ["./test_file0.txt",
"./test_file0.txt"]
# Loop over all files that we found.
nfiles = len(input_files)
file_index = 0
for i in input_files:
print "--> %i/%i: %s" % (file_index+1, nfiles, i)
ReadClasses(i)
file_index += 1
return
########################################################################
###
if __name__ == "__main__":
ProcessList()
sys.exit(0)
########################################################################
###