regexp problem in program

G

Gary Schenk

I'm a very new user of Perl, trying to convert the output
of an old DOS program to a CSV format to import to
Microstation. This idea should work, but it won't and
I'm stumped. The substitution pattern to remove the
leading whitespaces comes from the llama book, yet
I seem to be implementing it wrong. It is probably
so obvious that I can't see, so I am hoping another
set of eyes can help. Here it is:

!/usr/bin/perl -w

##############################
# Convert a gcogo EXPORT #
# COORDINATES output file #
# to a Microstaion PLTPNT #
# MDL friendly format #
##############################
print "Convert gcogo EXPORT COORDINATES output\n";
print "to a comma delimited file, for export to\n";
print "PLTPNT in Microstation.\n";
print "\n\n";
print "Enter the name of a file to convert: ";
chomp( $output_file = <STDIN> );

open( FILE, "$output_file") or die( "Can't open that file: $!");
open( CONVERT, "+>converted.txt") or die( "Can't open a conversion file: $!");
# opening a filehandle for update. This is OK, right?

foreach (<FILE>) {
if ( /^\s+\d+/ ) {
print ( CONVERT );
}
}
# This works just fine, apparently. It produces a file with the
# letters stripped out.

close( FILE ) or die( "Can't close that file: $!");

foreach (<CONVERT>) {
s/^\s+//; # From the llama book. It should strip the
# the leading whitespaces, leaving number fields
# separated by whitespaces. However the file produced
# is empty
}


close( CONVERT ) or die( "Can't close converted file: $!");
# There is more work to be done, but I'm stuck here. The plan
# is to use another foreach loop to substitute a comma for white-
# spaces.


Here is an example of the file to be converted:

Wed, Jul 14, 2004, 2:59 PM 54


EXPORT COORDINATES
2-205
3 678772.5350000 1839644.3330000 .0000000
4 678777.8420000 1838819.8640000 .0000000
100 678920.7180000 1838881.9720000 .0000000
101 678891.1930000 1838881.7560000 .0000000
102 678777.4864933 1838880.9241421
103 678777.7542000 1838839.6498000 .0000000
104 678786.8979393 1838839.7196468
105 678786.6895588 1838871.8472255
106 678891.2598942 1838872.6122447
107 678790.6481667 1838871.8761860
108 678790.6771513 1838867.9142920
109 678786.7152346 1838867.8885949
110 678786.7996421 1838854.8748686
111 678792.9937615 1838839.7662113
112 678792.8955139 1838854.9144066
113 678810.9942363 1838839.9037124
114 678831.8116290 1838840.0627314
115 678810.7636285 1838872.0233474
116 678831.5810714 1838872.1756443
117 678789.9458443 1838839.7429290
118 678789.9499268 1838838.8229380
119 678549.9651623 1837934.3833071
120 678928.4725482 1838790.1495672
200 678794.9150526 1838871.9074018
201 678791.0665523 1838854.9025439
202 678790.9821449 1838867.9162702
203 678791.1456251 1838842.7108004
204 678792.3847792 1838839.7615594
205 678793.0051731 1838838.0067278
END OF JOB
Table 39:Unprotected Figure Area 68 Coordinate Area 3619
Job 0 0 Completed 0 Errors

Thanks in advanced for taking a look.
 
M

Mladen Gogala

I'm a very new user of Perl, trying to convert the output
of an old DOS program to a CSV format to import to
Microstation. This idea should work, but it won't and
I'm stumped. The substitution pattern to remove the
leading whitespaces comes from the llama book, yet
I seem to be implementing it wrong. It is probably
so obvious that I can't see, so I am hoping another
set of eyes can help. Here it is:


This works:

#!/usr/bin/perl
print "Convert gcogo EXPORT COORDINATES output\n";
print "to a comma delimited file, for export to\n";
print "PLTPNT in Microstation.\n";
print "\n\n";
print "Enter the name of a file to convert: ";
chomp( $output_file = <STDIN> );
my $BUFF;
open( FILE, "$output_file") or die( "Can't open that file: $!");
open( INTERM, ">", \$BUFF) or die( "Memory problem:$!\n");

while (<FILE>) {
print INTERM if /^\s+\d+/;
}
# This works just fine, apparently. It produces a file with the
# letters stripped out.

close( FILE ) or die( "Can't close that file: $!");
close (INTERM);
open( INTERM , "<",\$BUFF);
open( CONVERT, ">","converted.txt") or die( "Can't open a conversion file: $!");
while (<INTERM>) {
s/^\s+//; # From the llama book. It should strip the
# the leading whitespaces, leaving number fields
# separated by whitespaces. However the file produced
# is empty
print CONVERT;
}


close( CONVERT ) or die( "Can't close converted file: $!");
# There is more work to be done, but I'm stuck here. The plan
# is to use another foreach loop to substitute a comma for white-
# spaces.
 
M

Mladen Gogala

This works:

The program that I wrote is just a fixed version of the original.
Files are NOT exactly like arrays and cannot be treated as such.
I followed your logic, which is flawed. The intermediate file is
completely unnecessary, but it is 1am, I am tired, and I don't feel
like fixing it any further. I got the program working, which is
good enough for this time of day. After all, perl is the duct
tape of the cumputer engineering, so let's treat it as such.
 
A

Anno Siegel

Gary Schenk said:
I'm a very new user of Perl, trying to convert the output
of an old DOS program to a CSV format to import to
Microstation.

So what format is that? This is a Perl newsgroup, "Microstation" is
not universally known around here.
This idea should work, but it won't and
I'm stumped.

So what did you expect, and what does your code do instead?
The substitution pattern to remove the
leading whitespaces comes from the llama book, yet
I seem to be implementing it wrong. It is probably
so obvious that I can't see, so I am hoping another
set of eyes can help. Here it is:

!/usr/bin/perl -w

##############################
# Convert a gcogo EXPORT #
# COORDINATES output file #
# to a Microstaion PLTPNT #
# MDL friendly format #
##############################
print "Convert gcogo EXPORT COORDINATES output\n";
print "to a comma delimited file, for export to\n";
print "PLTPNT in Microstation.\n";
print "\n\n";
print "Enter the name of a file to convert: ";
chomp( $output_file = <STDIN> );

open( FILE, "$output_file") or die( "Can't open that file: $!");
open( CONVERT, "+>converted.txt") or die( "Can't open a conversion file: $!");

You should mention the file name in the die message.
# opening a filehandle for update. This is OK, right?

Yes and no. You're opening the file for read/write, but since you are
opening it primarily for writing, all previous content of the file will
be lost. That is not what is generally understood by updating, but
I don't think updating is really what you want here.
foreach (<FILE>) {
if ( /^\s+\d+/ ) {
print ( CONVERT );
}
}
# This works just fine, apparently. It produces a file with the
# letters stripped out.

Fine. So why don't you go all the way and strip leading whitespace
too at this step? You won't need the second pass over the data then.
The clearest sequence seems to be to remove white space first, then
see if the line starts with a digit:

foreach ( <FILE> ) {
s/^\s+//;
print CONVERT if /^\d/;
}

You can even combine the steps into a single substitution, though that's
less readable:

s/^\s*(\d)/$1/ and print CONVERT while said:
close( FILE ) or die( "Can't close that file: $!");

You're done here, the second pass isn't needed any more.
foreach (<CONVERT>) {
s/^\s+//; # From the llama book. It should strip the
# the leading whitespaces, leaving number fields
# separated by whitespaces. However the file produced
# is empty
}

The way you have tried it has two or three serious problems. For one,
after writing (even in update mode) the file pointer is at the end of
the file. Trying to read from it will produce nothing. See "perldoc
-f seek" for how to remedy that.

Further, in the loop, you never write the changed line back to a file,
so the changes get lost. You'd need a third file to write out to,
since you're changing the line length of the records. But you don't
need that step anyway.

[snip]


Anno
 
M

Mladen Gogala

My brain works better in the morning. Here it is:

###############################################################
#!/usr/bin/perl -w
use strict;
print "Convert gcogo EXPORT COORDINATES output\n";
print "to a comma delimited file, for export to\n";
print "PLTPNT in Microstation.\n";
print "\n\n";
open( CONVERT, ">","converted.txt") or die( "Can't open a conversion file: $!");

while (<>) {
if (/^\s+\d+/) {
s/^\s+//;
print CONVERT;
}
}
close(CONVERT);
###############################################################

The program is invoked like: prog <file_to_convert>
 
D

David K. Wall

!/usr/bin/perl -w

Enabling warnings is good. Even better would be to enable strictures
as well:

use strict;

This disables certain unsafe practices. See the docs for details, but
the most obvious thing you'll notice is that it forces you to declare
your variables before using them.
##############################
# Convert a gcogo EXPORT #
# COORDINATES output file #
# to a Microstaion PLTPNT #
# MDL friendly format #
##############################
print "Convert gcogo EXPORT COORDINATES output\n";
print "to a comma delimited file, for export to\n";
print "PLTPNT in Microstation.\n";
print "\n\n";
print "Enter the name of a file to convert: ";
chomp( $output_file = <STDIN> );

open( FILE, "$output_file") or die( "Can't open that file: $!");

Unnecessary quotes around $output_file.

open( CONVERT, "+>converted.txt") or die( "Can't open a conversion
file: $!"); # opening a filehandle for update. This is OK, right?

foreach (<FILE>) {

'while' is normally used for this purpose instead of 'foreach'.
Someone correct me if I'm mistaken, but it's my understanding that
foreach constructs the entire list before iterating over it; this
could use a lot of memory if the file is large.
if ( /^\s+\d+/ ) {
print ( CONVERT );
}
}
# This works just fine, apparently. It produces a file with the
# letters stripped out.

close( FILE ) or die( "Can't close that file: $!");

foreach (<CONVERT>) {

You're at the end of the CONVERT file, so you'd need to seek() to the
beginning before looping over all the lines. But as Anno pointed out,
it would be better to clean the data before writing it to CONVERT and
then this loop would not be needed.
s/^\s+//; # From the llama book. It should strip the
# the leading whitespaces, leaving number fields
# separated by whitespaces. However the file produced
# is empty

Yeah, I suppose it is. :)
}


close( CONVERT ) or die( "Can't close converted file: $!");
# There is more work to be done, but I'm stuck here. The plan
# is to use another foreach loop to substitute a comma for white-
# spaces.

There's no need for me to repeat others' comments, so I'll shut up
now.
 
G

Gary Schenk

Andrew Palmer said:
Just a question: Why is your input file variable called $output_file?

Yeah, that's wacky, but it is the output of the Gcogo program that is
being processed. But for the purposes of this program it should be
$input_file. As for Gcogo, it is a coordinate geometry program that
goes back before the beginning of time: 1992!
Up to here, everything works like you think it does. but...


You're at the end of the file, so this does nothing. To get to the beginning
you would have to seek(CONVERT,0,0)

AHA! I did not realize that. I thought foreach would start at the
beginning of the file.
Say you're at the beginning of the file. You read every line into $_ one at
a time, then remove leading whitespace and do nothing with $_. It is just
overwritten on the next pass.

Oops.
close( CONVERT ) or die( "Can't close converted file: $!");

You can accomplish what you're trying to do with only one pass. You don't
have to keep bouncing between files like mixing a record with a 4-track.

Consider the following:

foreach (<FILE>)
{
next unless (/^\s+\d+/ ); # skip if doesn't match your pattern
s/^\s+//; # strip leading whitespace
s/[ \t]+/,/g; # do comma substitution for clusters of spaces and tabs
print CONVERT; # print it somewhere!
}


does this look right?

That looks perfect. And a very nice solution. Thanks very much for the
critique.
I'm on my own here, attempting to use Perl to solve some problems.
We're land surveyors, and not really computer savvy. We get these
coordinate files, and then spend half an hour deleting whitespaces and
inserting commas manually on an editor.
3,678772.5350000,1839644.3330000,.0000000
4,678777.8420000,1838819.8640000,.0000000
100,678920.7180000,1838881.9720000,.0000000
101,678891.1930000,1838881.7560000,.0000000
102,678777.4864933,1838880.9241421
103,678777.7542000,1838839.6498000,.0000000
etc.

Gary Schenk
 
G

Gary Schenk

So what format is that? This is a Perl newsgroup, "Microstation" is
not universally known around here.

It is a CADD program. The main idea is to get fields sparated by commas.
So what did you expect, and what does your code do instead?

I was hoping to keep the input file intact, and generate a new file in CSV format.

You should mention the file name in the die message.

Good point.
Yes and no. You're opening the file for read/write, but since you are
opening it primarily for writing, all previous content of the file will
be lost. That is not what is generally understood by updating, but
I don't think updating is really what you want here.
Fine. So why don't you go all the way and strip leading whitespace
too at this step? You won't need the second pass over the data then.
The clearest sequence seems to be to remove white space first, then
see if the line starts with a digit:

foreach ( <FILE> ) {
s/^\s+//;
print CONVERT if /^\d/;
}

You can even combine the steps into a single substitution, though that's
less readable:

s/^\s*(\d)/$1/ and print CONVERT while said:
close( FILE ) or die( "Can't close that file: $!");

You're done here, the second pass isn't needed any more.
foreach (<CONVERT>) {
s/^\s+//;
}

The way you have tried it has two or three serious problems. For one,
after writing (even in update mode) the file pointer is at the end of
the file. Trying to read from it will produce nothing. See "perldoc
-f seek" for how to remedy that.

Further, in the loop, you never write the changed line back to a file,
so the changes get lost. You'd need a third file to write out to,
since you're changing the line length of the records. But you don't
need that step anyway.

[snip]


Anno

Thank you very much for taking the time to critique this program.

Gary Schenk
 
G

Greg Bacon

: I'm a very new user of Perl, trying to convert the output
: of an old DOS program to a CSV format to import to
: Microstation. This idea should work, but it won't and
: I'm stumped. The substitution pattern to remove the
: leading whitespaces comes from the llama book, yet
: I seem to be implementing it wrong. It is probably
: so obvious that I can't see, so I am hoping another
: set of eyes can help. Here it is:
: [...]

I haven't seen a followup that uses Perl's natural solution to
this problem, i.e., a combination of split and join:

#! /usr/local/bin/perl

use warnings;

##############################
# Convert a gcogo EXPORT #
# COORDINATES output file #
# to a Microstation PLTPNT #
# MDL friendly format #
##############################
print "Convert gcogo EXPORT COORDINATES output\n",
"to a comma delimited file, for export to\n",
"PLTPNT in Microstation.\n",
"\n",
"\n",
"Enter the name of a file to convert: ";
chomp( $input = <STDIN> );

open INPUT, $input or die "$0: open $input: $!";

$conv = "converted.txt";
open CONVERT, ">", $conv or die "$0: open $conv: $!";

while (<INPUT>) {
next unless /^\s+\d/;

@fields = split;

# skip this line if any field has other than digits or dot
next if grep /[^.\d]/, @fields;

print CONVERT join("," => @fields), "\n";
}

close CONVERT or warn "$0: close $conv: $!";

You can cut out unnecessary code by writing it as a filter:

#! /usr/local/bin/perl

use warnings;

##############################
# Convert a gcogo EXPORT #
# COORDINATES output file #
# to a Microstation PLTPNT #
# MDL friendly format #
##############################

while (<>) {
next unless /^\s+\d/;
@fields = split;

# skip this line if any field has other than digits or dot
next if grep /[^.\d]/, @fields;

print join("," => @fields), "\n";
}

The filter version is equivalent to the earlier code when called as

./prog output.txt >converted.txt

where prog is the name of your conversion program.

Hope this helps,
Greg
 
M

Mladen Gogala

@fields = split;

# skip this line if any field has other than digits or dot next
if grep /[^.\d]/, @fields;

print CONVERT join("," => @fields), "\n";

Why would you need "split" and "join"? He's writing down all lines which
begin with digits preceded by blanks. He also cleans the starting
whitespace characters. Nothing is done with fields, there is no need to
split them and then re-join them, pure waste of CPU. Also, entering the
file name is completely unnecessary. That's what <> is for.
 
G

Greg Bacon

: On Tue, 20 Jul 2004 17:28:49 +0000, Greg Bacon wrote:
:
: > @fields = split;
: >
: > # skip this line if any field has other than digits or dot next
: > if grep /[^.\d]/, @fields;
: >
: > print CONVERT join("," => @fields), "\n";
:
: Why would you need "split" and "join"?

Did you read the root of this thread?
I'm a very new user of Perl, trying to convert the output
of an old DOS program to a CSV format [...]
^^^^^^^^^^^^^^^

: He's writing down all lines
: which begin with digits preceded by blanks.

Andrew Palmer's post had the following candidate output, and the OP
described it as "perfect."

3,678772.5350000,1839644.3330000,.0000000
4,678777.8420000,1838819.8640000,.0000000
100,678920.7180000,1838881.9720000,.0000000
101,678891.1930000,1838881.7560000,.0000000
102,678777.4864933,1838880.9241421
103,678777.7542000,1838839.6498000,.0000000
etc.

However, Andrew's code leaves the "2-205" line.

: He also cleans the
: starting whitespace characters. Nothing is done with fields, there
: is no need to split them and then re-join them, pure waste of CPU.

Unlike s/\s+/,/g, emulating awk's default behavior with split correctly
handles lines with trailing whitespace.

I doubt that the difference is even perceptible to the user. If you're
counting cycles only for the sake of the tally, why are you using Perl?

: Also, entering the file name is completely unnecessary. That's what
: <> is for.

Um, that's why I suggested writing the program as a filter.

Greg
 
G

Gary Schenk

Mladen Gogala said:
###############################################################
#!/usr/bin/perl -w
use strict;
print "Convert gcogo EXPORT COORDINATES output\n";
print "to a comma delimited file, for export to\n";
print "PLTPNT in Microstation.\n";
print "\n\n";
open( CONVERT, ">","converted.txt") or die( "Can't open a conversion file: $!");

while (<>) {
if (/^\s+\d+/) {
s/^\s+//;
print CONVERT;
}
}
close(CONVERT);
###############################################################

The program is invoked like: prog <file_to_convert>

Thanks for your time and effort critiquing this program. I did not
realize that a command line argument could be used so easily here.

Gary
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,415
Latest member
PeggyCramp

Latest Threads

Top