Velocity Reviews > Perl > Circular lists

# Circular lists

Xho Jingleheimerschmidt
Guest
Posts: n/a

 01-18-2009
gamo wrote:
> On Thu, 15 Jan 2009, Xho Jingleheimerschmidt wrote:
>
>> gamo wrote:
>>
>>> The thing could change radically if there is a method to canonicalize all
>>> the rotations of a list in a compact string. Did you say that is possible?

>> Of course. From a previous post:
>>
>> my \$s = join '',@set;
>> my \$two = \$s . \$s;
>> next if (\$two =~ /gg/);
>> my (\$canon) = sort map {substr \$two,\$_,length \$s} 0..length(\$s) -1;
>>

>
> That's fine, but don't works for handling the circular permutations.

Of course it does. That is the only thing it works for. That is the
whole point of the code. When I originally posted the code, it was
tested and verified to work, converging on the same answer as the other
implementation did.

> It detects distintinct permutations

Distinct permutations need no canonicalization. They do not have
multiple different representations of the same underlying conceptual
thing, so canonicalization is neither necessary nor possible. (Unless
you actually did append a digit to each letter to indicate the original
order, as someone proposed as a pedagogical exercise. Then stripping
that digit off would be a form of canonicalization.)

> but not the circular ones, as far as
> I tested. Probably the problem comes for taking the first element of the
> sorted list (only).

Taking just the first element of the sorted list (of strings, which are
themselves implicit lists of letters) is the whole point. If, for some
reason, you later need all the rotations, not just the canonical one,
you can recompute then on the fly given the canonical one. I can't
comment on what your tests were showing you, unless you
post the test code.

....

> In the case of 20 letters it has 20*20 rotations per item.

There only 20 rotations, not 20*20. And you only save one of them, if that.

> About testing candidates on the fly I still don't imagine how could be
> done without using hashes of the previous ones.

Rather than using memory and hashes to explicitly compare to the past,
you use logic and rules to implicitly compare to the past and future.
To pull it off you, need to combine both the distinct permutation
generator and the circular canonicalization together, otherwise it won't
work.

1) Has this linear permutation been generated in past, or will it be
generated again in the future (of this particular execution of the
code)? No!

1a) How do you know that? Because I successfully implemented a distinct
permutation generator designed explicitly to have that behavior.

2) Will the underlying circular concept represented by this linear
permutation be represented by some other linear permutation, past or
future? Yes!

2a) So how can I efficiently communicate with those past and future
representations so that one and only one of us knows whether we are the
"it" one or not? Do it by rule, not by explicit communication. Each of
us tests whether our representation canonicalizes to itself. If it
does, I know I am "it". If it doesn't, I know that one of the other
representations is (or will be) "it", so I bow out gracefully.

Cheers,

Xho

gamo
Guest
Posts: n/a

 01-18-2009
On Sat, 17 Jan 2009, Xho Jingleheimerschmidt wrote:

> > That's fine, but don't works for handling the circular permutations.

>
> Of course it does. That is the only thing it works for. That is the whole
> point of the code. When I originally posted the code, it was tested and
> verified to work, converging on the same answer as the other implementation
> did.
>
>
> > It detects distintinct permutations

>
> Distinct permutations need no canonicalization. They do not have multiple
> different representations of the same underlying conceptual thing, so
> canonicalization is neither necessary nor possible. (Unless you actuallydid
> append a digit to each letter to indicate the original order, as someone
> proposed as a pedagogical exercise. Then stripping that digit off would be a
> form of canonicalization.)
>
>
> > but not the circular ones, as far as
> > I tested. Probably the problem comes for taking the first element of the
> > sorted list (only).

>
> Taking just the first element of the sorted list (of strings, which are
> themselves implicit lists of letters) is the whole point. If, for some
> reason, you later need all the rotations, not just the canonical one, youcan
> recompute then on the fly given the canonical one. I can't comment on what
> your tests were showing you, unless you
> post the test code.
>
> ...

You are right, again.
#!/usr/local/bin/perl -w
use List::Util qw(shuffle);
@a = qw(a a a a a r r g g n);
# @a = qw(a a b b b c c c c d d d d d e e e e e e);
\$n = @a;
for (1..10_000_000){
@set = shuffle(@a);
\$s = join '',@set;
\$two = \$s . \$s;
my (\$canon) = sort map {substr \$two,\$_,\$n} 0..\$n-1;
if (!defined \$clist{\$s} && !defined \$hash{\$canon}){
\$clist{\$s}++;
# print "\$s -> length canon \$canon ", length \$canon, "\n";
\$hash{\$canon}++;
\$exito++;
print "\$0 \$exito\n";
\$kount=0;
}else{
\$kount++;
last if \$kount>=100_000;
}
}
print "El número de permutaciones circulares es \$exito\n";

__END__
756

>
> > In the case of 20 letters it has 20*20 rotations per item.

>
> There only 20 rotations, not 20*20. And you only save one of them, if that.
>
> > About testing candidates on the fly I still don't imagine how could be done
> > without using hashes of the previous ones.

>
> Rather than using memory and hashes to explicitly compare to the past, you use
> logic and rules to implicitly compare to the past and future.
> To pull it off you, need to combine both the distinct permutation generator
> and the circular canonicalization together, otherwise it won't work.
>
> 1) Has this linear permutation been generated in past, or will it be generated
> again in the future (of this particular execution of the code)? No!
>
> 1a) How do you know that? Because I successfully implemented a distinct
> permutation generator designed explicitly to have that behavior.
>
> 2) Will the underlying circular concept represented by this linear permutation
> be represented by some other linear permutation, past or future? Yes!
>
> 2a) So how can I efficiently communicate with those past and future
> representations so that one and only one of us knows whether we are the "it"
> one or not? Do it by rule, not by explicit communication. Each of us tests
> whether our representation canonicalizes to itself. If it does, I know Iam
> "it". If it doesn't, I know that one of the other representations is (or
> will be) "it", so I bow out gracefully.
>

Good idea.

> Cheers,
>
> Xho
>

Best regards,

--
http://www.telecable.es/personales/gamo/
"Was it a car or a cat I saw?"
perl -E 'say 111_111_111**2;'

gamo
Guest
Posts: n/a

 01-18-2009
On Sun, 18 Jan 2009, gamo wrote:

> >
> > 2a) So how can I efficiently communicate with those past and future
> > representations so that one and only one of us knows whether we are the "it"
> > one or not? Do it by rule, not by explicit communication. Each of us tests
> > whether our representation canonicalizes to itself. If it does, I know I am
> > "it". If it doesn't, I know that one of the other representations is (or
> > will be) "it", so I bow out gracefully.
> >

> Good idea.

....and it solves definitively the problem, but I wonder if it
could be optimized even more. I.e. given a large list, could be
predicted when the candidates would be of a pattern of success?

Thanks!

--
http://www.telecable.es/personales/gamo/
"Was it a car or a cat I saw?"
perl -E 'say 111_111_111**2;'

Xho Jingleheimerschmidt
Guest
Posts: n/a

 01-18-2009
gamo wrote:
> On Sun, 18 Jan 2009, gamo wrote:
>
>>> 2a) So how can I efficiently communicate with those past and future
>>> representations so that one and only one of us knows whether we are the "it"
>>> one or not? Do it by rule, not by explicit communication. Each of us tests
>>> whether our representation canonicalizes to itself. If it does, I know I am
>>> "it". If it doesn't, I know that one of the other representations is (or
>>> will be) "it", so I bow out gracefully.
>>>

>> Good idea.

>
> ....and it solves definitively the problem, but I wonder if it
> could be optimized even more. I.e. given a large list, could be
> predicted when the candidates would be of a pattern of success?

Yes, but that is not going to get you orders of magnitude improvement.

The first thing I did was translate the algorithm already described
(including the part about fixing one of the "a"s to be the first slot,
which means less permutations plus less rotations (you only have to find
the other "a" and rotate it, rather than doing all 20 rotations) into
C, because that will get you orders of magnitude improvement and my gut
tells me that nothing left that you can do in Perl will do so (other
than large scale parallelization, assuming you have a 100 CPU cluster
laying around). Also, because at this point it was pretty
straightforward to translate to C, but after more optimization done in
Perl the translation to C would become exponentially harder.

The next optimization is to hack the dpermute code itself (as opposed to
the callback made from it) in the manner you describe, so that as soon
as the second 'a' is placed into the variable 'prefix', I can start
testing to see if the "rotated" string is, to our knowledge so far,
going to be less than, greater than, or undecidable compared to the
original string. If the rotated will be less than the original, then we
will fail the self-canonicalization test and I can abort now before
filling in any more letters. If the rotated will be greater than the
original, then I know that any string I make by filling in more letters
will pass the canonicalization test, so I can set a flag so that from
here (recursively) on, no more lexical order testing is needed, it
passes by induction. If it is undecidable, then the I just have to do
the test-skip-flag thing when the next letter is added. (And of course
I only need to test the added letter to its mate, I don't have to test
all preceding letters as I know that they are equal because otherwise I
wouldn't have reached this point without the flag being set)

After all that, I got C code that could generate all 4,888,643,760
answers for the size 2-3-4-5-6 problem in about one hour, on a 900MHz
processor, using negligible memory.

Of course, if there were not exactly two instance of the rarest letter,
then this optimization would not work. A general-case optimization
should be possible, but much more difficult. (and if the rarest letter
were not also the alphabetically smallest, then rather than changing my
code to work under that scenario, I would just remap the alphabet to
make it be the case that the rarest was the smallest, and then use
either Perl's or linux's "tr" to map the answers back to the original
alphabet.)

But this all because I think it's fun. If I were doing this for real,
I'd probably be asking, "What the heck am I going to do with over 4
billion 20-letter strings, and maybe I should focus my optimization on
the use of them rather than the generation of them"

Xho