S
Scott Bass
Hi,
I'm not looking for a full blown solution, just architectural advice
for the following design criteria...
Input File(s): (tilde delimited)
Line 1:
Header Record:
SourceSystem~EffectiveDate~ExtractDateAndTime~NumberRecords~FileFormatVersion
RemainingRecords:
72 columns of delimited data
Ouput File:
Concatenate the input files into a single output file. A subset of
the header fields are prepended to the data lines as follows:
SourceSystem~EffectiveDate~ExtractDateAndTime~72 columns of delimited
data
Design Criteria:
1) If number of records in the file does not match the number of
records reported in the header (incomplete FTP), abort the entire
file, print an error message, but continue processing the remaining
files.
(I'll use split and join to process the header and prepend to the
remainder).
2) Specify the list of input files on the command line. Specify the
output file on the command line. For example:
concat.pl -in foo.dat bar.dat blah.dat -out concat.dat
or possibly:
concat.pl -in src_*.dat -out concat.dat
(I'll use GetOptions to process the command line)
My thoughts:
1) Slurp the file into an array (minus first record). Count the
elements in the array. Abort if not equal to the number in the
header, else concat to the output file.
2) Process the file, reading records. At EOF, get record number from
$. . If correct, rewind to beginning of file handle and concat to
output file. (Not sure how to do the rewind bit).
3) Process the file, writing to a temp file. At EOF, get record
number from $. . If correct, concat the temp file to the output file.
Questions:
A) If I've globbed the files on the command line and am processing
the file handle <>, how do I know when the file name has changed?
B) When that happens, how do I reset $. to 1?
C) Of the three approaches above, which is the "best"? Performance
is important but not critical. I lean toward #3, since I need to
cater for files too large for #1. Or if you have a better idea please
let me know.
I hope this wasn't too cryptic...I was trying to keep it short.
Thanks,
Scott
I'm not looking for a full blown solution, just architectural advice
for the following design criteria...
Input File(s): (tilde delimited)
Line 1:
Header Record:
SourceSystem~EffectiveDate~ExtractDateAndTime~NumberRecords~FileFormatVersion
RemainingRecords:
72 columns of delimited data
Ouput File:
Concatenate the input files into a single output file. A subset of
the header fields are prepended to the data lines as follows:
SourceSystem~EffectiveDate~ExtractDateAndTime~72 columns of delimited
data
Design Criteria:
1) If number of records in the file does not match the number of
records reported in the header (incomplete FTP), abort the entire
file, print an error message, but continue processing the remaining
files.
(I'll use split and join to process the header and prepend to the
remainder).
2) Specify the list of input files on the command line. Specify the
output file on the command line. For example:
concat.pl -in foo.dat bar.dat blah.dat -out concat.dat
or possibly:
concat.pl -in src_*.dat -out concat.dat
(I'll use GetOptions to process the command line)
My thoughts:
1) Slurp the file into an array (minus first record). Count the
elements in the array. Abort if not equal to the number in the
header, else concat to the output file.
2) Process the file, reading records. At EOF, get record number from
$. . If correct, rewind to beginning of file handle and concat to
output file. (Not sure how to do the rewind bit).
3) Process the file, writing to a temp file. At EOF, get record
number from $. . If correct, concat the temp file to the output file.
Questions:
A) If I've globbed the files on the command line and am processing
the file handle <>, how do I know when the file name has changed?
B) When that happens, how do I reset $. to 1?
C) Of the three approaches above, which is the "best"? Performance
is important but not critical. I lean toward #3, since I need to
cater for files too large for #1. Or if you have a better idea please
let me know.
I hope this wasn't too cryptic...I was trying to keep it short.
Thanks,
Scott