Sort Part of a string

G

Gunnar Hjalmarsson

Anno said:
I told you why it matters. People who see your code will not
believe you are a competent Perl coder.

Or, worse, less experienced people may believe he is, while adopting a
similar misconception with respect to the /o operator.
 
A

Anno Siegel

Eugene Mikheyev said:
Let's play another game from here :)

my $re = qr{(\d+)\.log$};
...
sort {
my($a_part) = $a =~ $re;
my($b_part) = $b =~ $re;
$b_part <=> $a_part
}

:)

Okay, fine. Here's a possible improvement.

When the sort key must be extracted from each list element, that is
generally an invitation to use a Schwartz Transform. The Schwartz
Transform saves time by extracting the sort key only once from each
element, instead of doing it twice each time two elements must be
compared.

map $_->[ 0],
sort { $a->[ 1] <=> $b->[ 1] }
map [ $_, /(\d+)/ ],
@list;

This transformation only really pays when the lists to be sorted are
long (exactly how long depends on the time needed to extract the sort
key). In this case, the transformed code isn't longer than the
original, so one could decide to use it anyway.

Anno
 
E

Eugene Mikheyev

Frankly I don't care, as you are killfiled as of this moment.
Yes, please do, so I can never see your comments on my posts. Do that for
me, please.
 
N

norfernuman

I'm the perfect target audience some are talking about here.

I lurk here to learn about Perl. The programmers here have 'use strict' on.
That's about it. All errors are sent to the news reader. If not, dummies like me
might think that bad code is good code. This kind of strictness is actually
very real world, were this a work environment. A good manager would look point out all errors
in your code, not just the one you might ask him/her about.
Then they (most likely without much tact) ask you why you brought them such stupid code
and tell you to fix that junk before you show it to me again.
Doing so would be good for you, but I'll let you figure out why.

Apologize,.. I don't know why they would care.
Just get with the program.

Now I'll go back to "as the PG turns" ;-)

- NN
 
U

Uri Guttman

AS> Okay, fine. Here's a possible improvement.

AS> When the sort key must be extracted from each list element, that is
AS> generally an invitation to use a Schwartz Transform. The Schwartz
AS> Transform saves time by extracting the sort key only once from each
AS> element, instead of doing it twice each time two elements must be
AS> compared.

AS> map $_->[ 0],
AS> sort { $a->[ 1] <=> $b->[ 1] }
AS> map [ $_, /(\d+)/ ],
AS> @list;


and you can use Sort::Maker (fresh on CPAN) to generate that for you
from a clean specification of the sort.

uri
 
J

Joe Smith

Eugene said:
I need it in every piece of code that contains a constant regex compiled
multiple times.

No, you do not. Perl pre-compiles *all* regexes that have constant strings.
The only time /o is needed is if the regexp has a variable and you are
sure that the value of the variable does not change each time through
the loop.

Since /(\d+)\.log/ does not have any variables, the /o modifier does
absolutely nothing.
-Joe
 
D

David H. Adler

When the sort key must be extracted from each list element, that is
generally an invitation to use a Schwartz Transform.

More commonly called the "Schwartzian Transform".

Just wanted to keep up the level of pedantry. :)

dha
 
R

Randal L. Schwartz

Anno> When the sort key must be extracted from each list element, that is
Anno> generally an invitation to use a Schwartz Transform. The Schwartz
Anno> Transform saves time by extracting the sort key only once from each
Anno> element, instead of doing it twice each time two elements must be
Anno> compared.

A "Schwartz Transform" sounds like something from Spaceballs.
I think you mean "Schwartzian Transform".

And no, it wasn't named *by* me, but I'm the Schwartz in the
Schwartzian Transform.

print "Just another Perl hacker,"; # the original
 
A

Anno Siegel

Randal L. Schwartz said:
Anno> When the sort key must be extracted from each list element, that is
Anno> generally an invitation to use a Schwartz Transform. The Schwartz
Anno> Transform saves time by extracting the sort key only once from each
Anno> element, instead of doing it twice each time two elements must be
Anno> compared.

A "Schwartz Transform" sounds like something from Spaceballs.

What's spaceballs?

"Schwartz transform" reminds me of "Lapace transform". The analogy is
not in what it is, but how it is used: Transform a hard-to-solve
differential equation, solve it, and transform it back to a solution
of the original problem. Similarly you transform a hard-to-sort list
into an easy to-sort one, sort it and transform back.

The Laplace transform is not called "Laplacian transform". "Laplacian"
by itself exists, but it is short for "Laplacian operator", which is
a differential operator that has no similarity with the sorting operation.
I think you mean "Schwartzian Transform".

If that is what you prefer, that's what it is, never mind analogies.

Anno
 
D

David Combs

....

and you can use Sort::Maker (fresh on CPAN) to generate that for you
from a clean specification of the sort.

Uri -- did you ever get a look at the vast-speedup
to quicksort, discovered by Sedgewick, that I saw at
a talk and later mentioned to you?

Anyway, I happened to just run across it again,
with implementation, makefile, etc, and thought
you might be interested (given your interest in sorting).

Perhaps if you think worth while, you could forward
this addr, etc, to Randal and Damian and to the
guys working on P6.

Here's the lynx-output for the home-page of the sort-place:





A library of internal sorting routines
Ariel Faigon
________________________________________________________________

Keywords:
insertion sort, shellsort, radixsort, heapsort, quicksort,
quickersort, sedgesort (Robert Sedgewick quicksort
optimization)

This is a collection of sorting algorithms, in C. All the examples
are thoroughly tested using random input generation and
assertions, there are no known bugs. I've been using these,
especially the fastest ``sedgesort'', in production code.

The source of these algorithms are classic algorithm and C texts
which I most highly recommended, with minor modifications I made
over time:
* [1]Robert Sedgewick: Algorithms in C:
[2]Fundamentals, Data Structures, Sorting, Searching
* [3]Jon Louis Bentley: Programming Pearls
* [4]Kernighan & Ritchie: The C Programming Language

All the sorting routines are internal sorts, they assume your data
is in memory. When sorting big arrays that do not fit in memory on
virtual memory systems, performance may suffer significantly.

Certain current implementations of sorting in C++ standard
libraries like STL (the HP/SGI Standard Template Library which is
now a part of the GNU free software collection) are comparable in
speed to ``sedgesort''. Note however that the general C/Unix
``qsort'' routine is much slower due to its generality (e.g.
element compare is done via an indirect function call passed as a
function pointer.)

You may use this collection to trade generality for performance
and for instructional purposes.
_____________________________________________________________

You may [5]download the whole sort library or browse the
individual files below.

Here are some brief descriptions of the sort library source files:

[6]Makefile
Simple makefile to build the library sort.a and the test
program sorttest.

[7]sort.h
Header file. defines the key_t type, the type of the items we
are sorting, macros for key-comparisons: GT (greater than), LT
(less than), GE (greater or equal), LE (less or equal). SWAP
macro to swap elements in the array. Macros are used instead
of a compare function for speed. You may define all the above
to your liking and recompile the library.

[8]insort.c
Insertion sort. Most efficient for small arrays (about 15
elements or less). For greater arrays, use one of the
O(n*log(n)) methods. If you have small arrays, use this. Other
methods are an overkill, and are not faster.

[9]shellsort.c
Shell Sort. More efficient than insertion sort for arrays
larger than about 15 elements, but less efficient than the
O(n*log(n)) methods. Average time is K*O(n**1.2)

[10]gamasort.c
A radix sort based on bits in the binary representation of the
sorted numbers. Contributed by José Gama. Gamasort will work
only on unsigned integers in 2-complement representation. Thus
it is less general than the other methods, it also has a
relatively high complexity factor. On the positive side this
is an O(n*log(n)) algorithm in which the 'n' is proportional
to the largest number in the sorted array so it scales well to
the cases where the array is very big but the numbers in it
are relatively small like a million element array containing
unsigned bytes.

[11]quicksort.c
quicksort. Invented by C.A.R. Hoare. Very good average time
K*n*log(n), where the constant K is very small. but worst case
is bad: O(n*n). See, sedgesort for an impressive improvement.
^^^^^^^^^^

[12]quickersort.c
An optimization of quicksort. Use a sentinel containing
MAXINT, to make the constant K even smaller. This is similar
to the qsort libc function, but without the (*compar)() call
penalty.

[13]sedgesort.c
Robert Sedgewick optimization. Use partial quickersort with a
sentinel to reach a partial order, leave the unsorted segments
of length == CUTOFF unsorted. Then apply a simpler one-pass
low-overhead insertion sort on the whole array to reach
complete order. This method was found empirically to be one of
the fastest sorting methods in practice. If you don't believe
it, try to beat it. Note, in big arrays, worst cases are rare
in big arrays, so to keep the overhead to minimum, this sort
does not try to avoid worst cases (e.g. using median of three
to select a pivot).

[14]heapsort.c
Heap sort. This is on the average, slower than quicksort. But
is has the advantage that it worst case is still O(n*log(n))
Like quicksort, and other methods that pick far elements at
random to exchange it is not stable: elements of the same
magnitude, might change relative order. This is a relatively
optimized version of heapsort which minimizes neighboring
elements exchanges in the innermost loop, using an insertion
strategy (first find final location, only then exchange).
Lastly, if you want only the top N elements out of M elements,
(kinda priority queue) heap sort is the way to go; just create
a heap, then sift N times.

Auxilliary test routines:

+ [15]sorttest.c: Program containing main() to test the sort
library sort.a.
+ [16]bstest.c: Program containing main() to test the binary
search routine that is included here as well.
+ [17]sorted.c: Test routine to verify that an array is
sorted.
_____________________________________________________________

Feedback to:
Antispam Coalition
________________________________________________________________

If you used this, and found it useful, please let me know.
I'd be more inclined to enrich this collection if I know people find
it useful.

References

5. http://www.yendor.com/programming/sort/sort-all.tgz
6. http://www.yendor.com/programming/sort/Makefile
7. http://www.yendor.com/programming/sort/sort.h
8. http://www.yendor.com/programming/sort/insort.c
9. http://www.yendor.com/programming/sort/shellsort.c
10. http://www.yendor.com/programming/sort/gamasort.c
11. http://www.yendor.com/programming/sort/quicksort.c
12. http://www.yendor.com/programming/sort/quickersort.c
13. http://www.yendor.com/programming/sort/sedgesort.c
14. http://www.yendor.com/programming/sort/heapsort.c
15. http://www.yendor.com/programming/sort/sorttest.c
16. http://www.yendor.com/programming/sort/bstest.c
17. http://www.yendor.com/programming/sort/sorted.c


Hope this is useful to *someone*!

David
 
D

David Combs

Anno> When the sort key must be extracted from each list element, that is
Anno> generally an invitation to use a Schwartz Transform. The Schwartz
Anno> Transform saves time by extracting the sort key only once from each
Anno> element, instead of doing it twice each time two elements must be
Anno> compared.

A "Schwartz Transform" sounds like something from Spaceballs.
I think you mean "Schwartzian Transform".

And no, it wasn't named *by* me, but I'm the Schwartz in the
Schwartzian Transform.

Aren't you actually the *child* of Schwartz, and thus sadly
*limited* by that fact, such that while out walking on
especially dark nights, you come perilously close to
falling into deep holes?

David
 
T

Tassilo v. Parseval

Also sprach David Combs:
Uri -- did you ever get a look at the vast-speedup
to quicksort, discovered by Sedgewick, that I saw at
a talk and later mentioned to you?

Anyway, I happened to just run across it again,
with implementation, makefile, etc, and thought
you might be interested (given your interest in sorting).

Perhaps if you think worth while, you could forward
this addr, etc, to Randal and Damian and to the
guys working on P6.

Actually, you don't need to do that since they are most likely aware of
those concepts.
[13]sedgesort.c
Robert Sedgewick optimization. Use partial quickersort with a
sentinel to reach a partial order, leave the unsorted segments
of length == CUTOFF unsorted. Then apply a simpler one-pass
low-overhead insertion sort on the whole array to reach
complete order. This method was found empirically to be one of
the fastest sorting methods in practice. If you don't believe
it, try to beat it. Note, in big arrays, worst cases are rare
in big arrays, so to keep the overhead to minimum, this sort
does not try to avoid worst cases (e.g. using median of three
to select a pivot).

This is what Perl's implementation of quicksort is already doing. AFAIR,
perl uses a CUTOFF of 6 at which point it skips to insertion sort.
Furthermore, it does try to avoid worst case by using the median method
as mentioned above (also with three elements).

It features some other goodies which presumably makes it even faster
than the above implementation. For example, it never swaps elements that
need not be swapped and goes through quite some hoops to avoid it.

It's quite enlighting to have a look at pp_sort.c in the Perl source
distribution. It must be one of the most extensively commented files in
the source tree and it explains the algorithms in great detail.

Tassilo
 
R

Randal L. Schwartz

David> A library of internal sorting routines
David> Ariel Faigon

If this is not as loosely licensed as the Artistic License (which is
even broader than the BSD license, and FAR broader than the GPL), it
will not be useful, except to know that something is out there.

That's the cool thing/bad thing about Perl's licensing. It's
not compatible with the GPL.

print "Just another Perl hacker,"; # the original
 
R

Randal L. Schwartz

David> Aren't you actually the *child* of Schwartz, and thus sadly
David> *limited* by that fact, such that while out walking on
David> especially dark nights, you come perilously close to
David> falling into deep holes?

This makes absolutely no sense to me. If it's a reference
to a story somewhere, would you mind giving me a pointer?

Jokes only work when the audience understands the basis of the joke. :)

print "Just another Perl hacker,"; # the original
 
A

Anno Siegel

Randal L. Schwartz said:
David> Aren't you actually the *child* of Schwartz, and thus sadly
David> *limited* by that fact, such that while out walking on
David> especially dark nights, you come perilously close to
David> falling into deep holes?

This makes absolutely no sense to me. If it's a reference
to a story somewhere, would you mind giving me a pointer?

Jokes only work when the audience understands the basis of the joke. :)

Took me a while too.

Karl Schwarzschild, 1873-1916, Astrophysicist, namesake of the
Schwarzschild limit, which describes the conditions under which
black holes are formed. Now s/zs/tz/ and you're there.

Anno
 
D

David Combs

Took me a while too.

Karl Schwarzschild, 1873-1916, Astrophysicist, namesake of the
Schwarzschild limit, which describes the conditions under which
black holes are formed. Now s/zs/tz/ and you're there.

Anno

Some (many!) years back, Scientific American, for a while,
had it seems like a whole series of articles on the
I guess new idea of black holes, and his name was
all over the place. Like a couple of decades ago (or more?).

I suppose that today it's strings, branes, all that stuff --
not that I understand anything about it.

From watching cspan's cspan-2's 48-hours each weekend "book-tv",
which has had Brian Green on twice, for his two books

[] *** Brian Green, "the fabric of the cosmos"


[] ***** "The Elegant Universe", by Brian Greene (not! layman's expl).

(Winner of the 2000 Aventis Prize for the best scientific book of
the year, and a Pulitzer finalist.)

This guy (green) is some super-duper physicist (max planck lab?)
who works on string theory, and seems to have written two
really good books on that and other maybe-related subjects.

The "reviews" at Amazon are worth looking at.

David
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,416
Latest member
LionelQ387

Latest Threads

Top