gamo said:
....and it solves definitively the problem, but I wonder if it
could be optimized even more. I.e. given a large list, could be
predicted when the candidates would be of a pattern of success?
Yes, but that is not going to get you orders of magnitude improvement.
The first thing I did was translate the algorithm already described
(including the part about fixing one of the "a"s to be the first slot,
which means less permutations plus less rotations (you only have to find
the other "a" and rotate it, rather than doing all 20 rotations) into
C, because that will get you orders of magnitude improvement and my gut
tells me that nothing left that you can do in Perl will do so (other
than large scale parallelization, assuming you have a 100 CPU cluster
laying around). Also, because at this point it was pretty
straightforward to translate to C, but after more optimization done in
Perl the translation to C would become exponentially harder.
The next optimization is to hack the dpermute code itself (as opposed to
the callback made from it) in the manner you describe, so that as soon
as the second 'a' is placed into the variable 'prefix', I can start
testing to see if the "rotated" string is, to our knowledge so far,
going to be less than, greater than, or undecidable compared to the
original string. If the rotated will be less than the original, then we
will fail the self-canonicalization test and I can abort now before
filling in any more letters. If the rotated will be greater than the
original, then I know that any string I make by filling in more letters
will pass the canonicalization test, so I can set a flag so that from
here (recursively) on, no more lexical order testing is needed, it
passes by induction. If it is undecidable, then the I just have to do
the test-skip-flag thing when the next letter is added. (And of course
I only need to test the added letter to its mate, I don't have to test
all preceding letters as I know that they are equal because otherwise I
wouldn't have reached this point without the flag being set)
After all that, I got C code that could generate all 4,888,643,760
answers for the size 2-3-4-5-6 problem in about one hour, on a 900MHz
processor, using negligible memory.
Of course, if there were not exactly two instance of the rarest letter,
then this optimization would not work. A general-case optimization
should be possible, but much more difficult. (and if the rarest letter
were not also the alphabetically smallest, then rather than changing my
code to work under that scenario, I would just remap the alphabet to
make it be the case that the rarest was the smallest, and then use
either Perl's or linux's "tr" to map the answers back to the original
alphabet.)
But this all because I think it's fun. If I were doing this for real,
I'd probably be asking, "What the heck am I going to do with over 4
billion 20-letter strings, and maybe I should focus my optimization on
the use of them rather than the generation of them"
Xho