extracting text data in the presence of a "look-up" file: Is it possible?

V

Vumani Dlamini

This problem follows up on a couple of problems I sent to the list 2
months back. The data is structured as follows;

##### data #########
Area=3706
Company=101
PROPdes=1 # description/type of property
PROPpri=2 # public/private
PROPemp=54 # number of employees
PROPdes=6
PROPpri=2
PROPemp=23
Company=106
PROPdes=4
PROPpri=2
PROPemp=56
Area=3709
Company=116
PROPdes=9
PROPpri=1
PROPemp=200
###################

And the data set created is;
3706|101|1|2|054
3706|101|6|2|023
3706|106|4|2|056
3709|116|9|1|200

using the following Perl script;
##### Perl script ######
use strict;
use warnings;
open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";
my ($Area , $Comp, $Pdes, $Ppri, $Pemp);
open PRIVATE, ">c:/.../private.txt";
while (<DATA>){
if (/Area=(\d+)/) {
$Area = $1;
}
elsif (/Company=(\d+)/) {
$Comp = $1;
}
elsif (/PROPdes=(\d+)/) {
$Pdes = $1;
}
elsif (/PROPpri=(\d+)/) {
$Ppri = $1;
}
elsif (/PROPemp=(\d+)/) {
print PRIVATE "$Area$Comp$Pdes$Ppri$1\n";
}
}
}
##### Perl script ######

I now have a "area text file" with specific companies that have to be
extracted, with each row in the "area text file" having a code for an
area. I would like to extract companies only in areas listed in the
"area text file".

If within the areas in the "area text file" I am only interested in
areas with more than 10 companies, is it possible to write a script
which utilizes all this information?

Thanks al lot, again.

Vumani Dlamini

PS: My previous posts related to this problem can be found here:
http://groups.google.nl/[email protected]
http://groups.google.nl/[email protected]
http://groups.google.nl/[email protected]













http://groups.google.nl/[email protected]
http://groups.google.nl/[email protected]
http://groups.google.nl/[email protected]
 
T

Tore Aursand

[...]
And the data set created is;
3706|101|1|2|054
3706|101|6|2|023
3706|106|4|2|056
3709|116|9|1|200

[...]

I now have a "area text file" with specific companies that have to be
extracted, with each row in the "area text file" having a code for an
area. I would like to extract companies only in areas listed in the
"area text file".

If within the areas in the "area text file" I am only interested in
areas with more than 10 companies, is it possible to write a script
which utilizes all this information?

If I understand your problem correctly, you could use a hash to do that;

my %areas = ();
while ( <DATA> ) {
chomp;
my ($area, $company, @tmp) = split( /\Q|\E/ );
push( @{$areas{$area}}, $company );
}

foreach ( keys %areas ) {
if ( @{$areas{$_}} > 10 ) {
print "Area $_ has more than 10 companies\n";
}
}


--
Tore Aursand <[email protected]>
"Writing is a lot like sex. At first you do it because you like it.
Then you find yourself doing it for a few close friends and people you
like. But if you're any good at all, you end up doing it for money."
-- Unknown
 
T

Tad McClellan

Vumani Dlamini said:
This problem follows up on a couple of problems I sent to the list 2
^^^^^^^^
^^^^^^^^

This is not a mailing list.

This is a Usenet newsgroup.

using the following Perl script;


I kinda doubt that.

The following is not a Perl script at all! It has a syntax error.

Please be careful to post your _real_ code.

##### Perl script ######
use strict;
use warnings;
open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";
^^^
^^^
I think you meant "$!\n" instead of "$\n" there.

(if so, then why are you putting the newline there?)

open PRIVATE, ">c:/.../private.txt";


You should always, yes *always*, check the return value from open():

open PRIVATE, '>c:/.../private.txt' or
die "could not open '>c:/.../private.txt' $!";

You did it earlier, why did you stop?

while (<DATA>){


DATA is a special filehandle, you should choose some other name.

if (/Area=(\d+)/) {
$Area = $1;
}
elsif (/Company=(\d+)/) {
$Comp = $1;
}
elsif (/PROPdes=(\d+)/) {
$Pdes = $1;
}
elsif (/PROPpri=(\d+)/) {
$Ppri = $1;
}
elsif (/PROPemp=(\d+)/) {
print PRIVATE "$Area$Comp$Pdes$Ppri$1\n";
}
}
}
^
^
^
What does that curly match up with?

I now have a "area text file"


Maybe you do and maybe you don't.

If the open() failed, then there _is no_ file...

If within the areas in the "area text file" I am only interested in
areas with more than 10 companies, is it possible to write a script
which utilizes all this information?


Yes.
 
J

Jay Tilton

: my ($area, $company, @tmp) = split( /\Q|\E/ );
^^^^^
An unusual style choice. Am I overlooking an advantage that has over
saying split( /\|/ ) or split( /[|]/ ) ?
 
T

Tore Aursand

my ($area, $company, @tmp) = split( /\Q|\E/ );
^^^^^
An unusual style choice. Am I overlooking an advantage that has over
saying split( /\|/ ) or split( /[|]/ ) ?

No advantages that I know of. I've made my editor (FTE) highlight \Q\E in
a special way, so...
 
M

Michele Dondi

This problem follows up on a couple of problems I sent to the list 2
months back. The data is structured as follows; [snip]
And the data set created is; [snip]
using the following Perl script;
##### Perl script ######
use strict;
use warnings;
open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";

Probably not a very good idea calling it "DATA": no harm done, but you
may end up needing Perl's own DATA fh first or later...
open PRIVATE, ">c:/.../private.txt";

aren't we checking here, eh?!?
;-)

[snip]
elsif (/PROPemp=(\d+)/) {
print PRIVATE "$Area$Comp$Pdes$Ppri$1\n";

This doesn't seem consistent with the "data set created" cut away from
the above paragraph.

Here's how I'd do it anyway:

#!/usr/bin/perl -l

use strict;
use warnings;

die "Usage: $0 <infile> <outfile>\n" unless @ARGV == 2;

my ($data,$priv);
open $data, '<', $_ or die "Unable to open `$_': $!\n" for shift;
open $priv, '>', $_ or die "Unable to open `$_': $!\n" for shift;
select $priv;

my %props;
while (<$data>) {
chomp;
warn("Input data mismatch"), next unless /^(\w+)=(\d+)\s*$/;
$props{$1}=$2;
if ($1 eq 'PROPemp') {
no warnings 'uninitialized';
local $,='|';
print map $props{$_},
qw/Area Company PROPdes PROPpri PROPemp/;
}
}

__END__

This is basically just as your own script. Only, IMHO, slightly more
perlish and more maintainable.
I now have a "area text file" with specific companies that have to be
extracted, with each row in the "area text file" having a code for an
area. I would like to extract companies only in areas listed in the
"area text file".

Oh, but then just add as the first statement of the 'if' block the
following line:

next unless in $props{'Area'}, @Areas;

Of course it is up to you to write a suitable 'in' sub (see a recent
thread on the subject too!) or substitute suitable code, and populate
@Areas. But that shouldn't be a problem...


Michele
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,738
Latest member
JinaMacvit

Latest Threads

Top