The T-Files


Tue, 28 Sep 2010

Shuffle Kerfuffle

One of the most important lessons to learn from cryptography courses is that nothing is as simple as it looks (especially if it does not look so simple), and if you use ad-hoc methods instead of publicized and proven algorithms you will not get it right. And I rightfully got slammed a little when somehow asked how to shuffle an array and I suggested the procedure I came up with when I was twelve and have been using ever since:

Loop a couple of times (length of array should do), and in every iteration, get two random array indexes and swap the two elements there.

Of course, using something established such as the Knuth-Fisher-Yates Shuffle would be better, and I can see that my algorithm is flawed in that it has a (marginal) bias for keeping the array elements in their original positions. However, it'd like to be able to understand why, and be able to quantify the bias.

My current proof is based on two observations:

  1. An element that has not been touched by a shuffle will remain in its original position
  2. An element that has been touched by at least one shuffle will end up in a completely random position (even if that position is the original).
With these two observations in place, I think I can deduce that the probability that the shuffle is not ideal is the same as the probability that not all cards have been touched, which is easy to calculate and declines as the number of iterations increases. The first observation is obvious, the second one needs a proof (that includes the fact that the random position is also independent of other elements' positions, see the rotation paradox).

The proof still eludes this poor math-school dropout (who did do very well on Elementary Statistics in his day, though, and has studied Pseudorandom Number Generation for an unbelievable four semesters). I have brought it up with the experts.