remove header line when reading/writing files

R

RyanL

I'm a newbie with a large number of data files in multiple
directories. I want to uncompress, read, and copy the contents of
each file into one master data file. The code below seems to be doing
this perfectly. The problem is each of the data files has a header
row in the first line, which I do not want in the master file. How
can I skip that first line when writing to the master file? Any help
is much appreciated. Thank you.

import os
import sys
import glob
import gzip
zipdir = "G:/Research/Data/"
outfilename = "G:/Research/Data/master_data.txt"
outfile = open(outfilename,'w')
os.chdir(zipdir)
dirlist = os.listdir(os.curdir)
for item in dirlist:
if os.path.isdir(item):
os.chdir(item)
filelist = glob.glob("*.gz")
for zipfile in filelist:
filein = gzip.GzipFile(zipfile,'r')
filecontent = filein.read()
filein.close()
outfile.write(filecontent)
os.chdir(os.pardir)
outfile.close()
 
T

Tim Chase

each file into one master data file. The code below seems to be doing
this perfectly. The problem is each of the data files has a header
row in the first line, which I do not want in the master file. How
can I skip that first line when writing to the master file? Any help
is much appreciated. Thank you. [snip]
for zipfile in filelist:
filein = gzip.GzipFile(zipfile,'r')
filecontent = filein.read()
filein.close()
outfile.write(filecontent)

for zipfile in filelist:
for i, line in gzip.Gzipfile(zipfile,'r'):
if i: outfile.write(line)

should do the trick for you.

If you like a little more readable code, you can change that line to

if i <> 0: outfile.write(line)

or

if i == 0: continue
outfile.write(line)

whichever you like.

-tkc
 
T

Tim Chase

Forgot the enumerate call of all things
for zipfile in filelist:
for i, line in enumerate(gzip.Gzipfile(zipfile,'r')):
if i: outfile.write(line)


Some days, I'm braindead.

-tkc
 
T

timaranz

Forgot the enumerate call of all things


Some days, I'm braindead.

-tkc

I would move the 'if' test outside the loop :

for zipfile in filelist:
zfiter = iter(gzip.Gzipfile(zipfile,'r'))
zfiter.next() # ignore header line
for i, line in enumerate(fziter):
outfile.write(line)

I'm not sure if the iter(...) is required. This will raise a
StopIteration exception if zipfile is empty.

Cheers
Tim
 
S

Scott David Daniels

...
for zipfile in filelist:
zfiter = iter(gzip.Gzipfile(zipfile,'r'))
zfiter.next() # ignore header line
for i, line in enumerate(fziter):
outfile.write(line)
Or even:
writes = outfile.write
for zipfile in filelist:
zfiter = iter(gzip.Gzipfile(zipfile,'r'))
zfiter.next() # ignore header line
for line in zfiter:
writes(line)
 
M

Marc 'BlackJack' Rintsch

I'm a newbie with a large number of data files in multiple
directories. I want to uncompress, read, and copy the contents of
each file into one master data file. The code below seems to be doing
this perfectly. The problem is each of the data files has a header
row in the first line, which I do not want in the master file. How
can I skip that first line when writing to the master file? Any help
is much appreciated. Thank you.

Untested version with `itertools.islice()`:

import glob
import gzip
import os
from itertools import islice


def main():
zipdir = 'G:/Research/Data/'
outfilename = 'G:/Research/Data/master_data.txt'
out_file = open(outfilename, 'w')
for name in os.listdir(os.curdir):
if os.path.isdir(name):
os.chdir(name)
for zip_name in glob.glob('*.gz'):
in_file = gzip.GzipFile(zip_name, 'r')
out_file.writelines(islice(in_file, 1, None))
in_file.close()
os.chdir(os.pardir)
out_file.close()

Ciao,
Marc 'BlackJack' Rintsch
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top