extracting text data in the presence of a "look-up" file: Is it possible?

Vumani Dlamini · Jan 7, 2004

This problem follows up on a couple of problems I sent to the list 2
months back. The data is structured as follows;

##### data #########
Area=3706
Company=101
PROPdes=1 # description/type of property
PROPpri=2 # public/private
PROPemp=54 # number of employees
PROPdes=6
PROPpri=2
PROPemp=23
Company=106
PROPdes=4
PROPpri=2
PROPemp=56
Area=3709
Company=116
PROPdes=9
PROPpri=1
PROPemp=200
###################

And the data set created is;
3706|101|1|2|054
3706|101|6|2|023
3706|106|4|2|056
3709|116|9|1|200

using the following Perl script;
##### Perl script ######
use strict;
use warnings;
open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";
my ($Area , $Comp, $Pdes, $Ppri, $Pemp);
open PRIVATE, ">c:/.../private.txt";
while (<DATA>){
if (/Area=(\d+)/) {
$Area = $1;
}
elsif (/Company=(\d+)/) {
$Comp = $1;
}
elsif (/PROPdes=(\d+)/) {
$Pdes = $1;
}
elsif (/PROPpri=(\d+)/) {
$Ppri = $1;
}
elsif (/PROPemp=(\d+)/) {
print PRIVATE "$Area$Comp$Pdes$Ppri$1\n";
}
}
}
##### Perl script ######

I now have a "area text file" with specific companies that have to be
extracted, with each row in the "area text file" having a code for an
area. I would like to extract companies only in areas listed in the
"area text file".

If within the areas in the "area text file" I am only interested in
areas with more than 10 companies, is it possible to write a script
which utilizes all this information?

Thanks al lot, again.

Vumani Dlamini

PS: My previous posts related to this problem can be found here:
http://groups.google.nl/[email protected]
http://groups.google.nl/[email protected]
http://groups.google.nl/[email protected]

http://groups.google.nl/[email protected]
http://groups.google.nl/[email protected]
http://groups.google.nl/[email protected]

Tore Aursand · Jan 7, 2004

[...]
And the data set created is;
3706|101|1|2|054
3706|101|6|2|023
3706|106|4|2|056
3709|116|9|1|200

[...]

I now have a "area text file" with specific companies that have to be
extracted, with each row in the "area text file" having a code for an
area. I would like to extract companies only in areas listed in the
"area text file".

If within the areas in the "area text file" I am only interested in
areas with more than 10 companies, is it possible to write a script
which utilizes all this information?

If I understand your problem correctly, you could use a hash to do that;

my %areas = ();
while ( <DATA> ) {
chomp;
my ($area, $company, @tmp) = split( /\Q|\E/ );
push( @{$areas{$area}}, $company );
}

foreach ( keys %areas ) {
if ( @{$areas{$_}} > 10 ) {
print "Area $_ has more than 10 companies\n";
}
}

--
Tore Aursand <[email protected]>
"Writing is a lot like sex. At first you do it because you like it.
Then you find yourself doing it for a few close friends and people you
like. But if you're any good at all, you end up doing it for money."
-- Unknown

Tad McClellan · Jan 7, 2004

Vumani Dlamini said:
This problem follows up on a couple of problems I sent to the list 2

^^^^^^^^
^^^^^^^^

This is not a mailing list.

This is a Usenet newsgroup.

using the following Perl script;

I kinda doubt that.

The following is not a Perl script at all! It has a syntax error.

Please be careful to post your _real_ code.

##### Perl script ######
use strict;
use warnings;
open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";

^^^
^^^
I think you meant "$!\n" instead of "$\n" there.

(if so, then why are you putting the newline there?)

open PRIVATE, ">c:/.../private.txt";

You should always, yes *always*, check the return value from open():

open PRIVATE, '>c:/.../private.txt' or
die "could not open '>c:/.../private.txt' $!";

You did it earlier, why did you stop?

while (<DATA>){

DATA is a special filehandle, you should choose some other name.

if (/Area=(\d+)/) {
$Area = $1;
}
elsif (/Company=(\d+)/) {
$Comp = $1;
}
elsif (/PROPdes=(\d+)/) {
$Pdes = $1;
}
elsif (/PROPpri=(\d+)/) {
$Ppri = $1;
}
elsif (/PROPemp=(\d+)/) {
print PRIVATE "$Area$Comp$Pdes$Ppri$1\n";
}
}
}

^
^
^
What does that curly match up with?

I now have a "area text file"

Maybe you do and maybe you don't.

If the open() failed, then there _is no_ file...

If within the areas in the "area text file" I am only interested in
areas with more than 10 companies, is it possible to write a script
which utilizes all this information?

Yes.

Jay Tilton · Jan 8, 2004

: my ($area, $company, @tmp) = split( /\Q|\E/ );
^^^^^
An unusual style choice. Am I overlooking an advantage that has over
saying split( /\|/ ) or split( /[|]/ ) ?

Tore Aursand · Jan 8, 2004

my ($area, $company, @tmp) = split( /\Q|\E/ );

Click to expand...

^^^^^
An unusual style choice. Am I overlooking an advantage that has over
saying split( /\|/ ) or split( /[|]/ ) ?

No advantages that I know of. I've made my editor (FTE) highlight \Q\E in
a special way, so...

Michele Dondi · Jan 9, 2004

This problem follows up on a couple of problems I sent to the list 2
months back. The data is structured as follows; [snip]
And the data set created is; [snip]
using the following Perl script;
##### Perl script ######
use strict;
use warnings;
open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";

Probably not a very good idea calling it "DATA": no harm done, but you
may end up needing Perl's own DATA fh first or later...

open PRIVATE, ">c:/.../private.txt";

aren't we checking here, eh?!?
;-)

[snip]

elsif (/PROPemp=(\d+)/) {
print PRIVATE "$Area$Comp$Pdes$Ppri$1\n";

This doesn't seem consistent with the "data set created" cut away from
the above paragraph.

Here's how I'd do it anyway:

#!/usr/bin/perl -l

use strict;
use warnings;

die "Usage: $0 <infile> <outfile>\n" unless @ARGV == 2;

my ($data,$priv);
open $data, '<', $_ or die "Unable to open `$_': $!\n" for shift;
open $priv, '>', $_ or die "Unable to open `$_': $!\n" for shift;
select $priv;

my %props;
while (<$data>) {
chomp;
warn("Input data mismatch"), next unless /^(\w+)=(\d+)\s*$/;
$props{$1}=$2;
if ($1 eq 'PROPemp') {
no warnings 'uninitialized';
local $,='|';
print map $props{$_},
qw/Area Company PROPdes PROPpri PROPemp/;
}
}

__END__

This is basically just as your own script. Only, IMHO, slightly more
perlish and more maintainable.

I now have a "area text file" with specific companies that have to be
extracted, with each row in the "area text file" having a code for an
area. I would like to extract companies only in areas listed in the
"area text file".

Oh, but then just add as the first statement of the 'if' block the
following line:

next unless in $props{'Area'}, @Areas;

Of course it is up to you to write a suitable 'in' sub (see a recent
thread on the subject too!) or substitute suitable code, and populate
@Areas. But that shouldn't be a problem...

Michele

extracting properties of companies with a tag for company number	6	Nov 1, 2003
Create MD5 of files in directories and subdirectories	10	Apr 16, 2004
Google "programming language" and look at the #1 rank ;)	16	Aug 18, 2004
HasRows property	2	Jan 23, 2004
datagrid paging problem in dynamic load user control	0	Oct 15, 2003
using rdoc on all of 1.8 at once	1	Jan 16, 2004
Decorators	12	Aug 7, 2004
python scripting game The Temple Of Elemental Evil update	1	Jul 7, 2003

extracting text data in the presence of a "look-up" file: Is it possible?

Vumani Dlamini

Tore Aursand

Tad McClellan

Jay Tilton

Tore Aursand

Michele Dondi

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads