sort problem

S

sstark

This is probably a dumb question, but how do I sort so that symbols
always appear before numbers?

foreach my $a (sort keys %HASH){
print $a, "\n";
}

My %HASH keys have a lot of symbols and numbers as their first
characters.

I'm getting a sort order where some symbols (!, %, ", -, etc.) print
first, then any digits print, then another set of symbols :), =, @, [,
\, ^) prints after the digits. I understand this has something to do
with the byte order of the character set.

How can I get all of the symbols to print first, and then the digits,
and then the alpha characters?

thanks
Scott
 
I

it_says_BALLS_on_your forehead

sstark said:
This is probably a dumb question, but how do I sort so that symbols
always appear before numbers?

foreach my $a (sort keys %HASH){
print $a, "\n";
}

My %HASH keys have a lot of symbols and numbers as their first
characters.

I'm getting a sort order where some symbols (!, %, ", -, etc.) print
first, then any digits print, then another set of symbols :), =, @, [,
\, ^) prints after the digits. I understand this has something to do
with the byte order of the character set.

How can I get all of the symbols to print first, and then the digits,
and then the alpha characters?

instead of a hash, you could use an array, and make sure you push all
of the symbols in before the numbers and letters. or, you could split
up the %HASH into %with_symbols and %no_symbols.
 
B

Big and Blue

You write a custom subroutine to pass to sort() that enforces whatever
collating sequence you wish. For your requirements this will be
highly non-trivial.

IIRC its just a few lines. Somewhere around (at work) I have such code
(to cater for hostnames that start with characters and end with a numeric tag).
 
A

Anno Siegel

You write a custom subroutine to pass to sort() that enforces whatever
collating sequence you wish. For your requirements this will be
highly non-trivial.

Not so hard. The requirements are, punctuation first, then digits,
then alpha, then (I'm amending) everything else. This can be taken
almost literally to produce a hash table that has, for each character,
a sort value in the required sequence:

my %order;
# Set $order{ $char} to increasing values in the sequence we want
# to order them
my @ascii = map chr, 0 .. 255;
my $i = 0;
$order{ $_} = $i ++ for grep /[[:punct:]]/, @ascii;
$order{ $_} = $i ++ for grep /[[:digit:]]/, @ascii;
$order{ $_} = $i ++ for grep /[[:alpha:]]/, @ascii;
# everything else sorts higher
$order{ $_} = $i ++ for grep ! exists $order{ $_}, @ascii;

# characters are easier to use later
$_ = chr for values %order;

We can use this table in a regex substitution s/(.)/$order{ $1}/gs
to extract a sort key from every string that will compare in the
required order. We don't care what it looks (prints) like.

Given a hash with an assortment of keys

my %hash;
$hash{ $_} = 'low punct' for qw( ! % " -);
$hash{ $_} = 'high punct' for qw( : = @ [ \ ^);
$hash{ $_} = 'number' for qw( 1 2 123 99999999);
$hash{ $_} = 'alpha' for qw( abc def ghi);
$hash{ $_} = 'mixed' for qw( @mpersand bang!bang! 12GH45);

we can sort the keys in the required order. Because key extraction is
heavy duty, we use a Schwartz transform:

print "$_ => $hash{ $_}\n" for
map $_->[ 1],
sort { $a->[ 0] cmp $b->[ 0] }
map { ( my $key = $_) =~ s/(.)/$order{ $1}/gs; [ $key, $_] }
keys %hash;

Since the s/// operation is on single characters, it is tempting to
use tr/// to speed up key extraction. That can be done, but tr/// doesn't
do run-time interpolation. Some trickery involving string-eval is
required to compile the tr/// operation from the run-time calculated
table.

Anno
 
D

Dr.Ruud

Anno Siegel schreef:
Since the s/// operation is on single characters, it is tempting to
use tr/// to speed up key extraction. That can be done, but tr///
doesn't do run-time interpolation. Some trickery involving
string-eval is required to compile the tr/// operation from the
run-time calculated table.

I tried, but the backslash poses problems:

#!/usr/bin/perl
use strict;
use warnings;

my $fr = chr( 0)x256;

for (0..255) { substr($fr, $_, 1) = chr($_) };

my $to = chr(255)x256;
my $i = 0;
for (0..255) { substr($to, $_, 1) = chr($i++)
if chr =~ /^[[:punct:]]$/ };
for (0..255) { substr($to, $_, 1) = chr($i++)
if chr =~ /^[[:digit:]]$/ };
for (0..255) { substr($to, $_, 1) = chr($i++)
if chr =~ /^[[:alpha:]]$/ };
for (0..255) { substr($to, $_, 1) = chr($i++)
if chr(255) eq substr($to, $_, 1) };
my %hash;
$hash{$_} = [ 'low punct' ] for qw( ! % " - );
$hash{$_} = [ 'high punct' ] for qw( : = @ [ \ ^ );
$hash{$_} = [ 'number' ] for qw( 1 2 123 9999 );
$hash{$_} = [ 'alpha' ] for qw( abc def ghi );
$hash{$_} = [ 'mixed' ] for qw( @mp bang! 12GH45 );

for my $key ( keys %hash ) {
$_ = $key;
eval "tr{$fr}{$to}"; die $@ if $@;
$hash{ $key }[1] = $_;
}

{ local ($,, $\) = ("\t\t", "\n");
print($_, $hash{$_}[0], "\Q$hash{$_}[1]")
for (sort {$hash{$a}[1] cmp $hash{$b}[1]} keys %hash);
}
 
U

Uri Guttman

AS> # Set $order{ $char} to increasing values in the sequence we want
AS> # to order them
AS> my @ascii = map chr, 0 .. 255;
AS> my $i = 0;
AS> $order{ $_} = $i ++ for grep /[[:punct:]]/, @ascii;
AS> $order{ $_} = $i ++ for grep /[[:digit:]]/, @ascii;
AS> $order{ $_} = $i ++ for grep /[[:alpha:]]/, @ascii;

one line could do all that:

$order{$_} = chr( $i++ ) for
grep( /[[:punct:]]/, @ascii ),
grep( /[[:digit:]]/, @ascii ),
grep( /[[:alpha:]]/, @ascii ) ;

AS> # everything else sorts higher
AS> $order{ $_} = $i ++ for grep ! exists $order{ $_}, @ascii;

AS> # characters are easier to use later

you should say single bytes are easier to use that integers.

AS> $_ = chr for values %order;

you could just put chr before $i++ in the above grep lines and remove
that conversion loop. my version has the chr in it.

AS> we can sort the keys in the required order. Because key extraction is
AS> heavy duty, we use a Schwartz transform:

AS> print "$_ => $hash{ $_}\n" for
AS> map $_->[ 1],
AS> sort { $a->[ 0] cmp $b->[ 0] }
AS> map { ( my $key = $_) =~ s/(.)/$order{ $1}/gs; [ $key, $_] }
AS> keys %hash;

or you could use sort::maker to generate the ST or GRT for
you. (untested but the module is well tested :)

use Sort::Maker ;
my $sorter = make_sorter(
GRT => 1,
string => '(my $key = $_) =~ s/(.)/$order{ $1}/gs; $key',
) ;


print map "$_ => $hash{ $_}\n", $sorter->( keys %hash ) ;

# that key extraction line can also be written:

string => 'join "", map $order{$_}, split //',

AS> Since the s/// operation is on single characters, it is tempting to
AS> use tr/// to speed up key extraction. That can be done, but tr/// doesn't
AS> do run-time interpolation. Some trickery involving string-eval is
AS> required to compile the tr/// operation from the run-time calculated
AS> table.

i think there could be a better way but i don't have the spare neurons
to attach to this problem. the speedup of tr may be lost by the need to
build up the tr code and eval it. this will depend on the size of the
strings and the number of hash keys. and as dr. ruud has shown,
generating syntax clean code with funny chars can be tricky. i had fun
doing it with sort::maker.

uri
 
A

Anno Siegel

Dr.Ruud said:
Anno Siegel schreef:


I tried, but the backslash poses problems:

That can be fixed by quotemeta (applied as \Q...\E pairs below).
#!/usr/bin/perl
use strict;
use warnings;

my $fr = chr( 0)x256;

for (0..255) { substr($fr, $_, 1) = chr($_) };

my $to = chr(255)x256;
my $i = 0;
for (0..255) { substr($to, $_, 1) = chr($i++)
if chr =~ /^[[:punct:]]$/ };
for (0..255) { substr($to, $_, 1) = chr($i++)
if chr =~ /^[[:digit:]]$/ };
for (0..255) { substr($to, $_, 1) = chr($i++)
if chr =~ /^[[:alpha:]]$/ };
for (0..255) { substr($to, $_, 1) = chr($i++)
if chr(255) eq substr($to, $_, 1) };
my %hash;
$hash{$_} = [ 'low punct' ] for qw( ! % " - );
$hash{$_} = [ 'high punct' ] for qw( : = @ [ \ ^ );
$hash{$_} = [ 'number' ] for qw( 1 2 123 9999 );
$hash{$_} = [ 'alpha' ] for qw( abc def ghi );
$hash{$_} = [ 'mixed' ] for qw( @mp bang! 12GH45 );

for my $key ( keys %hash ) {
$_ = $key;
eval "tr{$fr}{$to}"; die $@ if $@;

eval "tr{\Q$fr\E}{\Q$to\E}"; die $@ if $@;
$hash{ $key }[1] = $_;
}

{ local ($,, $\) = ("\t\t", "\n");
print($_, $hash{$_}[0], "\Q$hash{$_}[1]")
for (sort {$hash{$a}[1] cmp $hash{$b}[1]} keys %hash);
}

I'd factor "eval" out of the loop:

my $trans = eval "sub { tr{\Q$fr\E}{\Q$to\E} }" or die $@;

for my $key ( keys %hash ) {
$_ = $key;
$trans->();
$hash{ $key }[1] = $_;
}

Anno
 
A

Anno Siegel

Uri Guttman said:
AS> # Set $order{ $char} to increasing values in the sequence we want
AS> # to order them
AS> my @ascii = map chr, 0 .. 255;
AS> my $i = 0;
AS> $order{ $_} = $i ++ for grep /[[:punct:]]/, @ascii;
AS> $order{ $_} = $i ++ for grep /[[:digit:]]/, @ascii;
AS> $order{ $_} = $i ++ for grep /[[:alpha:]]/, @ascii;

one line could do all that: ^^^^
expression


$order{$_} = chr( $i++ ) for
grep( /[[:punct:]]/, @ascii ),
grep( /[[:digit:]]/, @ascii ),
grep( /[[:alpha:]]/, @ascii ) ;

Right. I thought it would lose readability, but it's okay.
AS> # everything else sorts higher
AS> $order{ $_} = $i ++ for grep ! exists $order{ $_}, @ascii;

AS> # characters are easier to use later

you should say single bytes are easier to use that integers.

No, i mean "characters". The fact that they all happen to be
single-byte characters is irrelevant. The salient point is that
they are strings, not integers.

[snip]
AS> Since the s/// operation is on single characters, it is tempting to
AS> use tr/// to speed up key extraction. That can be done, but tr/// doesn't
AS> do run-time interpolation. Some trickery involving string-eval is
AS> required to compile the tr/// operation from the run-time calculated
AS> table.

i think there could be a better way but i don't have the spare neurons
to attach to this problem. the speedup of tr may be lost by the need to
build up the tr code and eval it. this will depend on the size of the
strings and the number of hash keys.

String eval needs only to be used once per program run. If tr/// is
faster than s/// at all, for sufficiently large/many hashes to sort,
the tr/// solution will win out.
and as dr. ruud has shown,
generating syntax clean code with funny chars can be tricky.

Nothing quotemeta can't fix.

Anno
 
U

Uri Guttman

AS> # Set $order{ $char} to increasing values in the sequence we want
AS> # to order them
AS> my @ascii = map chr, 0 .. 255;
AS> my $i = 0;
AS> $order{ $_} = $i ++ for grep /[[:punct:]]/, @ascii;
AS> $order{ $_} = $i ++ for grep /[[:digit:]]/, @ascii;
AS> $order{ $_} = $i ++ for grep /[[:alpha:]]/, @ascii;AS> ^^^^
AS> expression

statement :)

and here is another fun way to do that:

my $i = 0 ;
foreach my $re ( qr/[[:punct:]]/, qr/[[:digit:]]/, /[[:alpha:]]/ ) {

my @chars = grep /$re/, @ascii ;
@order{@chars} = ($i) x @chars ;
$i++ ;
}

note that all letters in a group get the same ordering. this assumes
that you will also sort on the actual letter after you sort on the type
of letter.

AS> No, i mean "characters". The fact that they all happen to be
AS> single-byte characters is irrelevant. The salient point is that
AS> they are strings, not integers.

i agree they aren't integers. bytes also covers that. i consider perl
strings as strings of bytes and only as characters when you are
interpreting them as text. this is because you can have binary data in
perl strings. it is a minor difference in how i think about them.

AS> Nothing quotemeta can't fix.

it won't even need that. i think he just needed to use the right delimiters
and not the default /. something like this (untested):

my $replace = join '', map $hash{$_}, @ascii ;
my tr_sub = eval "sub tr{\\x00-\\xff}{$replace}" ;

since in your design the sort order values that are {} will also be
paired and in the same order, you can have them cleanly nested in the
tr's {}. if you left / as the delim you would have to escape it.

note that the tr code has to be a sub if you wanted to do this with the
regular ST (unless you generated the code for the entire ST). in
sort::maker you could actually use the tr/// code text inside a key
extraction string since it will be evaled later on.

uri
 
B

Big and Blue

Big said:
IIRC its just a few lines. Somewhere around (at work) I have such
code (to cater for hostnames that start with characters and end with a
numeric tag).

OK - here it is. It was written to handler hostnames which, in general
look something like sys001, but contains the odd sys002a, sysoo2b as well.

sub by_myname {
my ($astem, $anum, $atag) = ($a =~ /^(\D*)(\d*)(.*)$/);
my ($bstem, $bnum, $btag) = ($b =~ /^(\D*)(\d*)(.*)$/);
return (($astem cmp $bstem) or ($anum <=> $bnum) or ($atag cmp > $btag));
}
my @hosts = sort by_myname @hlist;

If you don't have any trailing bits after your numbers remove the
$atag, $btag bits.
 
D

Dr.Ruud

Anno Siegel schreef:
I'd factor "eval" out of the loop:

my $trans = eval "sub { tr{\Q$fr\E}{\Q$to\E} }" or die $@;

for my $key ( keys %hash ) {
$_ = $key;
$trans->();
$hash{ $key }[1] = $_;
}

Revision:

#!/usr/bin/perl
use strict; use warnings;

my $to = "\xFF" x 0x100;

my $i = 0;
for my $re ( qr/[[:punct:]]/
, qr/[[:digit:]]/
, qr/[[:alpha:]]/
, qr/[^[:punct:][:digit:][:alpha:]]/ ) {
for (0x00..0xFF) { substr($to, $_, 1) = chr($i++) if chr =~ /$re/ };
}
my $trans = eval "sub { tr{\\x00-\\xFF}{$to} }" or die $@;

my %hash;
$hash{$_} = ['lo punct', undef] for qw(! % " -);
$hash{$_} = ['hi punct', undef] for qw:) = @ [ \ ^);
$hash{$_} = ['number' , undef] for qw(1 2 123 9999);
$hash{$_} = ['alpha' , undef] for qw(ab CD ef GH ij KL mn);
$hash{$_} = ['mixed' , undef] for qw(@mp bang! 12gh45);

for my $key (keys %hash) {
$_ = $key;
$trans->();
$hash{$key}[1] = $_;
}

{ local ($,, $\) = ("\t\t", "\n");

print( $_, $hash{$_}[0] )
for (sort { $hash{$a}[1] cmp $hash{$b}[1] } keys %hash);
}
 
R

robic0

Anno Siegel schreef:
I'd factor "eval" out of the loop:

my $trans = eval "sub { tr{\Q$fr\E}{\Q$to\E} }" or die $@;

for my $key ( keys %hash ) {
$_ = $key;
$trans->();
$hash{ $key }[1] = $_;
}

Revision:

#!/usr/bin/perl
use strict; use warnings;

my $to = "\xFF" x 0x100;

my $i = 0;
for my $re ( qr/[[:punct:]]/
, qr/[[:digit:]]/
, qr/[[:alpha:]]/
, qr/[^[:punct:][:digit:][:alpha:]]/ ) {
for (0x00..0xFF) { substr($to, $_, 1) = chr($i++) if chr =~ /$re/ };
}
my $trans = eval "sub { tr{\\x00-\\xFF}{$to} }" or die $@;

my %hash;
$hash{$_} = ['lo punct', undef] for qw(! % " -);
$hash{$_} = ['hi punct', undef] for qw:) = @ [ \ ^);
$hash{$_} = ['number' , undef] for qw(1 2 123 9999);
$hash{$_} = ['alpha' , undef] for qw(ab CD ef GH ij KL mn);
$hash{$_} = ['mixed' , undef] for qw(@mp bang! 12gh45);

for my $key (keys %hash) {
$_ = $key;
$trans->();
$hash{$key}[1] = $_;
}

{ local ($,, $\) = ("\t\t", "\n");

print( $_, $hash{$_}[0] )
for (sort { $hash{$a}[1] cmp $hash{$b}[1] } keys %hash);
}

Holy crap Anno, thats something you don't see very often.
Playing with the default variable again?

$hash{$_} =
$hash{$_} =
$hash{$_} =
$hash{$_} =
$hash{$_} =
 
S

sstark

Wow, thanks for all the useful replies, looks like it wasn't such a
dumb question after all.

best,
Scott
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,183
Messages
2,570,968
Members
47,524
Latest member
ecomwebdesign

Latest Threads

Top