urllib2 script slowing and stopping

Dantium · Oct 11, 2010

I have a small script that reads several CSV files in a directory and
puts the data in a DB using Django.

There are about 1.7 million records in 120 CSV files, I am running the
script on a VPS with about 512mb of memory python 2.6.5 on ubuntu
10.04.

The script gets slow and seems to lock after about 870000 records.
running top show that the memory is all being used up y the python
process, is there someway I can improve on this script?

class Command(BaseCommand):

def handle(self, *args, **options):
count = 0
d = urllib2.urlopen(postcode_dir).read()
postcodefiles = re.findall('<a href="(.*?\.csv)">', d)
nprog = 0

for n in range(nprog, len(postcodefiles)):
fl = postcodefiles[n]
print 'Processing %d %s ...' % (n, fl)
s = urllib2.urlopen(postcode_dir + fl)
c = csv.reader(s.readlines())
for row in c:
postcode = row[0]
location = Point(map(float, row[10:12]))
Postcode.objects.create(code=postcode,
location=location)
count += 1
if count % 10000 == 0:
print "Imported %d" % count
s.close()
nprog = n+1

Thanks

-Dan

Ian · Oct 11, 2010

I have a small script that reads several CSV files in a directory and
puts the data in a DB using Django.

There are about 1.7 million records in 120 CSV files, I am running the
script on a VPS with about 512mb of memory python 2.6.5 on ubuntu
10.04.

The script gets slow and seems to lock after about 870000 records.
running top show that the memory is all being used up y the python
process, is there someway I can improve on this script?

Probably you have "DEBUG = True" in your Django settings.py file. In
debug mode, Django records every query that is executed in
django.db.connection.queries. To fix it, either disable debug mode or
just have your script go in and clear out that list from time to time.

HTH,
Ian

Dantium · Oct 11, 2010

Probably you have "DEBUG = True" in your Django settings.py file. In
debug mode, Django records every query that is executed in
django.db.connection.queries. To fix it, either disable debug mode or
just have your script go in and clear out that list from time to time.

HTH,
Ian

Yeah thanks that helped!

It was still running really low on memory by the end though but they
all got added.

Rapidshare to Megaupload script	4	Feb 14, 2009
python/xpath issue..	0	Aug 25, 2008
my first screen scraper	0	Dec 2, 2007
CSV dB script help	9	Jun 2, 2004
script hangs when run from command line and redirecting stdout and stderr to file	2	Jan 5, 2006
ques and and level order traversal	6	Dec 2, 2006
server-side JavaScript: Prototypes of built-in classes, objects and functins	0	Jun 28, 2008
fastcgi performance problems and ruby	14	Nov 2, 2004

urllib2 script slowing and stopping

Dantium

Ian

Dantium

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads