tab delimited file processing problem

D

Domenico Discepola

Hi all. I have constructed a script that uses Win32::OLE to save an Excel
workbook as a tab-delimited text file (TSV file). This works fine. My next
step is to perform formatting on each field per line in the TSV file while
retaining the # of fields. The problem lies with "empty" cells in the 1st
column of the Excel file.

Example Excel file row:
col A's value=<empty>
col B's value = "1"
col C's value = "2"
<end of row>

When you use Win32::OLE to "tell" Excel to save this as a TSV file (using
the SaveAs method), a hex-dump of the resultant TSV file reveals row1 as:
/^\t12$/ (using regex notation). In other words, I lose the existence of
col A (which I need).

I was thinking of the following solution:
s/^\t/\s\t/, $my_line;
but could there be a 'better' way to handle it?

Any suggestions on how to best solve this problem would be appreciated.

Thanks in advance.
 
G

Greg Bacon

: [...]
: Example Excel file row:
: col A's value=<empty>
: col B's value = "1"
: col C's value = "2"
: <end of row>
:
: When you use Win32::OLE to "tell" Excel to save this as a TSV file (using
: the SaveAs method), a hex-dump of the resultant TSV file reveals row1 as:
: /^\t12$/ (using regex notation). In other words, I lose the existence of
: col A (which I need).

How have you lost column A? Consider the example below:

C:\Temp>type try
#! perl

use warnings;
use strict;

use Data::Dumper;

my $data = "\t12";
my @fields = split /\t/, $data;

print Dumper \@fields;

C:\Temp>perl try
$VAR1 = [
'',
'12'
];

Are you sure there wasn't a TAB between the 1 and the 2? Even so,
you're still happy; note that the first element of @fields is empty:

#! perl

use warnings;
use strict;

use Data::Dumper;

my $data = "\t1\t2";
my @fields = split /\t/, $data;

print Dumper \@fields;

C:\Temp>perl try
$VAR1 = [
'',
'1',
'2'
];

How were you trying to extract the fields in your TSV file?

Technical side note: what you're calling tab-delimited is really
tab-separated. Using [TAB] to make things stand out, a tab-delimited
record would look like

[TAB]field_1[TAB]field_2[TAB]...[TAB]field_n[TAB]

Tab-*separated*, however, would look like

field_1[TAB]field_2[TAB]...[TAB]field_n

Hope this helps,
Greg
 
D

Domenico Discepola

Greg Bacon said:
: [...]
: Example Excel file row:
: col A's value=<empty>
: col B's value = "1"
: col C's value = "2"
: <end of row>
:
: When you use Win32::OLE to "tell" Excel to save this as a TSV file (using
: the SaveAs method), a hex-dump of the resultant TSV file reveals row1 as:
: /^\t12$/ (using regex notation). In other words, I lose the existence of
: col A (which I need).

How have you lost column A? Consider the example below:

C:\Temp>type try
#! perl

use warnings;
use strict;

use Data::Dumper;

my $data = "\t12";
my @fields = split /\t/, $data;

print Dumper \@fields;

C:\Temp>perl try
$VAR1 = [
'',
'12'
];

Are you sure there wasn't a TAB between the 1 and the 2? Even so,
you're still happy; note that the first element of @fields is empty:

#! perl

use warnings;
use strict;

use Data::Dumper;

my $data = "\t1\t2";
my @fields = split /\t/, $data;

print Dumper \@fields;

C:\Temp>perl try
$VAR1 = [
'',
'1',
'2'
];

How were you trying to extract the fields in your TSV file?

Technical side note: what you're calling tab-delimited is really
tab-separated. Using [TAB] to make things stand out, a tab-delimited
record would look like

[TAB]field_1[TAB]field_2[TAB]...[TAB]field_n[TAB]

Tab-*separated*, however, would look like

field_1[TAB]field_2[TAB]...[TAB]field_n

Hope this helps,
Greg
--
It remains true today as it did in fascist Italy, socialist Germany, New
Deal America, and socialist Russia: freedom has no greater opponents than
those who despise and demonize commercial society.
-- Lew Rockwell

1st off, you're right - my regex describing my output from Excel was
incorrect - it should have read (as you suggested) /^\t1\t2$/
2nd, I did mean tab separated (instead of tab delimited). 3rd, thanks for
mentioning the Data::Dumper module, I hadn't heard of that - I'll check it
out. Right off the bat it seems to help diagnosing my problem.
 
B

Bart Lateur

Domenico said:
Example Excel file row:
col A's value=<empty>
col B's value = "1"
col C's value = "2"
<end of row>

When you use Win32::OLE to "tell" Excel to save this as a TSV file (using
the SaveAs method), a hex-dump of the resultant TSV file reveals row1 as:
/^\t12$/ (using regex notation). In other words, I lose the existence of
col A (which I need).

No you don't. The first column comes before the first tab.
 
B

Bill Smith

Domenico Discepola said:
Hi all. I have constructed a script that uses Win32::OLE to save an Excel
workbook as a tab-delimited text file (TSV file). This works fine. My next
step is to perform formatting on each field per line in the TSV file while
retaining the # of fields. The problem lies with "empty" cells in the 1st
column of the Excel file.

I remember having similar problems in the distant past. I used native
EXCEL commands to create the TSV file. Details of the file format
depended on the version of EXCEL. This is probably not a problem to
you, but beware!

Bill
 
D

Domenico Discepola

Bill Smith said:
I remember having similar problems in the distant past. I used native
EXCEL commands to create the TSV file. Details of the file format
depended on the version of EXCEL. This is probably not a problem to
you, but beware!

Bill

As an added note, if one particular row in Excel has fewer columns than
columns in surrounding rows, Excel will not pad the remaining columns with
tabs... In other words, you can end up with a TSV file having a different
number of fields:

Example Excel file:
row1: x, y, z
row2: a, b, c, d

resultant TSV file (in regex format):
row1: /^x\ty\tz$/
row2: /^a\tb\tc\td$/

So, as you see, if you are trying to manipulate a TSV file and expect the
same number of fields per line, you must account for it programmatically as
Excel does not do it for you.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top