S
Sean Davis
I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).
I have a class like so:
class GeneInfo():
def __init__(self):
#urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
self.fh = gzip.open("/tmp/gene_info.gz")
self.fh.readline() #deal with header line
def _read(self,n=1):
for line in self.fh:
if line=='':
break
line=line.strip()
line=re.sub("\t-","\t",line)
rowvals = line.split("\t")
yield "\t".join([rowvals for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"
def readline(self,n=1):
return self._read().next()
def read(self,n=1):
return self._read().next()
def close(self):
self.fh.close()
and I use it like so:
a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()
It works well except that the end of file is not caught by copy_from.
I get errors like:
psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: COPY gene_info, line 1000: ""
for a 1000 line test file. Any ideas what is going on?
Thanks,
Sean
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).
I have a class like so:
class GeneInfo():
def __init__(self):
#urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
self.fh = gzip.open("/tmp/gene_info.gz")
self.fh.readline() #deal with header line
def _read(self,n=1):
for line in self.fh:
if line=='':
break
line=line.strip()
line=re.sub("\t-","\t",line)
rowvals = line.split("\t")
yield "\t".join([rowvals for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"
def readline(self,n=1):
return self._read().next()
def read(self,n=1):
return self._read().next()
def close(self):
self.fh.close()
and I use it like so:
a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()
It works well except that the end of file is not caught by copy_from.
I get errors like:
psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: COPY gene_info, line 1000: ""
for a 1000 line test file. Any ideas what is going on?
Thanks,
Sean