D
Dantium
I have a small script that reads several CSV files in a directory and
puts the data in a DB using Django.
There are about 1.7 million records in 120 CSV files, I am running the
script on a VPS with about 512mb of memory python 2.6.5 on ubuntu
10.04.
The script gets slow and seems to lock after about 870000 records.
running top show that the memory is all being used up y the python
process, is there someway I can improve on this script?
class Command(BaseCommand):
def handle(self, *args, **options):
count = 0
d = urllib2.urlopen(postcode_dir).read()
postcodefiles = re.findall('<a href="(.*?\.csv)">', d)
nprog = 0
for n in range(nprog, len(postcodefiles)):
fl = postcodefiles[n]
print 'Processing %d %s ...' % (n, fl)
s = urllib2.urlopen(postcode_dir + fl)
c = csv.reader(s.readlines())
for row in c:
postcode = row[0]
location = Point(map(float, row[10:12]))
Postcode.objects.create(code=postcode,
location=location)
count += 1
if count % 10000 == 0:
print "Imported %d" % count
s.close()
nprog = n+1
Thanks
-Dan
puts the data in a DB using Django.
There are about 1.7 million records in 120 CSV files, I am running the
script on a VPS with about 512mb of memory python 2.6.5 on ubuntu
10.04.
The script gets slow and seems to lock after about 870000 records.
running top show that the memory is all being used up y the python
process, is there someway I can improve on this script?
class Command(BaseCommand):
def handle(self, *args, **options):
count = 0
d = urllib2.urlopen(postcode_dir).read()
postcodefiles = re.findall('<a href="(.*?\.csv)">', d)
nprog = 0
for n in range(nprog, len(postcodefiles)):
fl = postcodefiles[n]
print 'Processing %d %s ...' % (n, fl)
s = urllib2.urlopen(postcode_dir + fl)
c = csv.reader(s.readlines())
for row in c:
postcode = row[0]
location = Point(map(float, row[10:12]))
Postcode.objects.create(code=postcode,
location=location)
count += 1
if count % 10000 == 0:
print "Imported %d" % count
s.close()
nprog = n+1
Thanks
-Dan