Sorting out sort

P

Peter Stokes

I'm trying to extract a column from a flatfile database and print it
alphabetically. I can get the data out, but I can't get it to sort.

The database is pipe-separated in a plain text file and looks like
this:

Ref_no|Title|County|Another_field|And_another|And_another
1234|A Name|Devon|Data here|More data here|And more data
1234|A Name|Somerset|Data here|More data here|And more data
1234|A Name|Nottinghamshire|Data here|More data here|And more data
1234|A Name|Essex|Data here|More data here|And more data
.... and so on

My routine is as follows:

#open the file
open (INFILE, 'database.txt') or die "could not open listing file -
$!";
while (<INFILE>) {
#split the array at the separator
@out = split /\|/;
#select the field with the county names
@out = $out[2];
#clean out duplicates
@out = grep ( (($out{$_}++ < 1) || 0), @out);
#delete blank lines left by duplicates
if (@out == 1) {
print sort @out, "\n";
}
else {
print "";
}
}
#close the file to tidy up
close INFILE;
#eof

When I print out the results, I get, typically:
Devon
Somerset
Nottinghamshire
Essex

.... which is the order they appear in the database, and they haven't
sorted alphabetically. I've tried it from every angle I can think, but
I must be missing one because I can't make this result alphabetical. I
get the same result wherever I put the 'sort' command.

Thanks in anticipation
 
T

Tad McClellan

Peter Stokes said:
I can get the data out, but I can't get it to sort.
^^^^^^^^^^^^^^^^^^^^^^

Not really. Your problem is with getting the data, not with
sorting the data...

#select the field with the county names
@out = $out[2];


Stomps over all of the elements in @out, and puts a single element
in their place.

#clean out duplicates
@out = grep ( (($out{$_}++ < 1) || 0), @out);


@out can never have more than a single element in it.

A single element cannot contain duplicates.

This code serves no purpose.

if (@out == 1) {
print sort @out, "\n";


sort()ing a 1-element list is a rather trivial thing to do, so why do it?

I
get the same result wherever I put the 'sort' command.


sort() operates on a list of values.

Your code does not construct a list of values.

You do not have a problem sorting, you have a probleming building up
a list that you can then sort.


# untested
my @out;
while ( <FILE> ) {
push @out, (split /\|/)[2]; # a "list slice"
}
print sort @out;


or to eliminate duplicates:

# untested
my %out;
while ( <FILE> ) {
$out{ (split /\|/)[2] } = 1; # a "list slice" used as a hash key
}
print sort keys @out;
 
J

John Strauss

On 29 Jul 2003 07:55:58 -0700
I'm trying to extract a column from a flatfile database and print it
alphabetically. I can get the data out, but I can't get it to sort.

The database is pipe-separated in a plain text file and looks like
this:

Ref_no|Title|County|Another_field|And_another|And_another
1234|A Name|Devon|Data here|More data here|And more data
1234|A Name|Somerset|Data here|More data here|And more data
1234|A Name|Nottinghamshire|Data here|More data here|And more data
1234|A Name|Essex|Data here|More data here|And more data
... and so on

My routine is as follows:

#open the file
open (INFILE, 'database.txt') or die "could not open listing file -
$!";
while (<INFILE>) {
#split the array at the separator
@out = split /\|/;
#select the field with the county names
@out = $out[2];
#clean out duplicates
@out = grep ( (($out{$_}++ < 1) || 0), @out);
#delete blank lines left by duplicates
if (@out == 1) {
print sort @out, "\n";
}
else {
print "";
}
}
#close the file to tidy up
close INFILE;
#eof

When I print out the results, I get, typically:
Devon
Somerset
Nottinghamshire
Essex

... which is the order they appear in the database, and they haven't
sorted alphabetically. I've tried it from every angle I can think, but
I must be missing one because I can't make this result alphabetical. I
get the same result wherever I put the 'sort' command.

Thanks in anticipation

working in england, i've had a few beers at lunch,
so take this with a grain of salt...
unless you sort in your open()'d command as in:
open(INFILE, "sort -k2 database.txt |") or blah; #very unix-y

then you must first read the database.txt file in,
and sort it based on the County name:
my @db=();
open(INFILE, "<database.txt") or die "cannot open file, '$!'";
while (<INFILE>) {
chomp;
my @out = split(/\|/,$_);
push @db, \@out;
}
# $e->[2] is the county name
foreach my $e (sort {$a->[2] cmp $b->[2]} @db) {
print "$e->[2]: @$e\n";
}
use a hash instead of @db if you want to drop duplicate
counties.

regarding your example, i can see where @out is set,
but i don't see how %out (from your grep) is populated.
you've either not shown enough code, or you are not
using the "use warnings" and "use strict" directives.

for the definitive perldoc answer to your question run:
perldoc -q "How do I sort a"



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drop the .thetenant to get me via mail
 
J

JR

I'm trying to extract a column from a flatfile database and print it
alphabetically. I can get the data out, but I can't get it to sort.

The database is pipe-separated in a plain text file and looks like
this:

Ref_no|Title|County|Another_field|And_another|And_another
1234|A Name|Devon|Data here|More data here|And more data
1234|A Name|Somerset|Data here|More data here|And more data
1234|A Name|Nottinghamshire|Data here|More data here|And more data
1234|A Name|Essex|Data here|More data here|And more data
... and so on

My routine is as follows:

#open the file
open (INFILE, 'database.txt') or die "could not open listing file -
$!";
while (<INFILE>) {
#split the array at the separator
@out = split /\|/;
#select the field with the county names
@out = $out[2];
#clean out duplicates
@out = grep ( (($out{$_}++ < 1) || 0), @out);
#delete blank lines left by duplicates
if (@out == 1) {
print sort @out, "\n";
}
else {
print "";
}
}
#close the file to tidy up
close INFILE;
#eof

When I print out the results, I get, typically:
Devon
Somerset
Nottinghamshire
Essex

... which is the order they appear in the database, and they haven't
sorted alphabetically. I've tried it from every angle I can think, but
I must be missing one because I can't make this result alphabetical. I
get the same result wherever I put the 'sort' command.

Thanks in anticipation

### You're sorting as you get the data, rather than getting all
### the data and then sorting. The below changes should help.

### [tested once]

#!/perl
use strict;
use warnings;
use diagnostics;

my (@out, %out);

## Get data
while (<DATA>) {
my ($refNo, $title, $county, $otherFields) = split /\|/, $_, 4;
push @out, $county;
}

$out{$_} = 1 for @out; # Remove duplicates
print "$_\n" for sort keys %out; # Sort and print counties

__DATA__
Ref_no|Title|County|Another_field|And_another|And_another
1234|A Name|Devon|Data here|More data here|And more data
1234|A Name|Somerset|Data here|More data here|And more data
1234|A Name|Devon|Data here|More data here|And more data
1234|A Name|Nottinghamshire|Data here|More data here|And more data
1234|A Name|Essex|Data here|More data here|And more data
 
J

James E Keenan

I'm trying to extract a column from a flatfile database and print it
alphabetically. I can get the data out, but I can't get it to sort.

The database is pipe-separated in a plain text file and looks like
this:

Ref_no|Title|County|Another_field|And_another|And_another
1234|A Name|Devon|Data here|More data here|And more data
1234|A Name|Somerset|Data here|More data here|And more data
1234|A Name|Nottinghamshire|Data here|More data here|And more data
1234|A Name|Essex|Data here|More data here|And more data
... and so on

My routine is as follows:
[snip]
while (<INFILE>) {
#split the array at the separator
@out = split /\|/;
#select the field with the county names
@out = $out[2];

Here's your problem (or, at least, "a" problem). You're using @out
for 2 contradictory purposes. OT1H, you're using it to temporarily
hold the fields for each record in turn. OTOH, you're using it to
collect the data you *really* want: the counties.

Suggestion: before entering the 'while' loop, define a hash which can
function as a "seen-hash":

my %counties;

Then, for each line/data record:

my @temp = split /\|/;
$counties{$temp[2]}++ unless exists $counties{$temp[2]};

Then, when you've worked thru every line/record, you can get a sorted
list:

my @out = sort keys %counties;

and you can print it out as you wish. HTH.

Jim Keenan
 
P

Peter Stokes

Many thanks for all your help - Tad, the explanation makes complete
sense, now I see where I was taking the wrong approach. JR, your
solution worked a treat, straight out of the box.

regards and best wishes

peter stokes
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,121
Messages
2,570,712
Members
47,283
Latest member
hopkins1988

Latest Threads

Top