Skip to content

Conversation

@d8uv
Copy link
Contributor

@d8uv d8uv commented Feb 28, 2013

The current "Shuffling Array Elements" page recommends a naive algorithm, which produces a slow and biased shuffle. Included is a highly idiomatic refactorization of the Fisher-Yates shuffle (aka the Knuth Shuffle), a less-idiomatic-but-better refactorization, and a version of the algorithm that adds to Array.prototype, for those that program in that style.

The current "Shuffling Array Elements" page recommends a naive algorithm, which produces 
a slow and biased shuffle. Included is a highly idiomatic refactorization of the Fisher-
Yates shuffle (aka the Knuth Shuffle), a less-idiomatic-but-better refactorization, and a
version of the algorithm that adds to Array.prototype, for those that program in that 
style.
@dbrady
Copy link
Member

dbrady commented Feb 28, 2013

THIS. This is what a fantastic recipe entry looks like. I saw this email and thought "oh no, they've deleted the simple shuffle and replaced it with a needlessly complex one". I came in here expecting to write a message asking you to preserve the old shuffle, but then explain when and why it might be unsuitable, and propose the Fisher-Yeates shuffle as a replacement. When I read your diff I wanted to cry with joy. d8uv, this is awesome. THANK YOU.

This is the happiest "I hereby give you commit rights" message I've written to date. :-) Welcome!

dbrady added a commit that referenced this pull request Feb 28, 2013
Change the shuffle to the Fisher-Yates shuffle
@dbrady dbrady merged commit d0ee5cb into coffeescript-cookbook:master Feb 28, 2013
@dbrady
Copy link
Member

dbrady commented Feb 28, 2013

Actually, rereading it, might be better if it started with array.shuffle(), presented for fairly simple cases of randomness. Is it slightly biased, or is it obviously biased? I haven't run the Chi-squareq analysis on it to see how broken it is. I wonder if we could give some examples for when it works as a naive example, along with a specific use case that shows just how bad it can be--sort of like how you should never use modulus on Linear Congruential generators because rand() % 2 in the C stdlib will return 0,1,0,1,0,1,... etc.

@d8uv d8uv deleted the patch-1 branch February 28, 2013 21:13
@d8uv
Copy link
Contributor Author

d8uv commented Feb 28, 2013

The chi-square analysis for the random.sort vs. fisher-yates has been done by Rob Weir. I haven't done a proper formal analysis of my refactorization, so if you're gonna drag out R you might as well check to see if my math-fu is strong, but my informal tests seem to pan out wonderfully.

The reason I didn't write it in a "this is what everyone uses, these are the problems, and this is the solution" way is... I find that style to be a little complex when you're trying to learn something specific. People are here to learn how best to shuffle in Coffeescript, and Knuth Shuffle is the best way. People aren't here to learn about a bunch of different methods and their pitfalls, they just want the goods, and to get out. Adding that supplementary stuff at the end is great, and I'd be really curious as to what that'd look like, but the solution should stay at the top

And as to if the naive solution is good enough... I'd say probably never. Doing it properly only takes up 5 lines of source, is a lot faster, and is free from bias. When you want something random, you really want it to be random, not random-ish.

@dbrady
Copy link
Member

dbrady commented Mar 1, 2013

Heh, I misread the naive solution, sorry--I thought there was a
library-supplied Array.shuffle() method. Yes, sorting by randomly
swapping adjacent pairs is a horrible idea.

Were there an array.shuffle() method, even if it was pretty bad (but not
horrible) I would argue that it should go first, on account of it being
the best solution for trivial cases. A solution that takes up 1 line of
code is 5x better than a solution that takes 5, so unless it's literally
5x worse in some other dimension (like randomness) it's still better.

But yes. Doing a swapsort by random() is truly horrific--moving an
element away from its home position requires 2x as much entropy for each
step it moves, and that's assuming it's moving forward, in the direction
the shuffler is iterating. Even worse, an element can only move
backwards 1 step for each PASS of the shuffler. Look at that link you
sent me: In some of them there's a huge lump at the second-to-last
position, in others the lump is at the second position. That tells you
in which direction their swapper is biased...

Sorting by random DOES work with a schwartzian transform (assign each
element a random number, then sort the array by that number) but this a)
requires n floats of memory and b) I can't write it cleanly in much less
than 5 lines of code. :-)

Thanks again!

David

On 02/28/2013 02:32 PM, d8uv wrote:

The chi-square analysis for the random.sort vs. fisher-yates has been
done by Rob Weir
http://www.robweir.com/blog/2010/02/microsoft-random-browser-ballot.html.
I haven't done a proper formal analysis of my refactorization, so if
you're gonna drag out R you might as well check to see if my math-fu
is strong, but my informal tests seem to pan out wonderfully.

The reason I didn't write it in a "this is what everyone uses, these
are the problems, and this is the solution" way is... I find that
style to be a little complex when you're trying to learn something
specific. People are here to learn how best to shuffle in
Coffeescript, and Knuth Shuffle is the best way. People aren't here to
learn about a bunch of different methods and their pitfalls, they just
want the goods, and to get out. Adding that supplementary stuff at the
end is great, and I'd be really curious as to what that'd look like,
but the solution should stay at the top

And as to if the naive solution is good enough... I'd say probably
never. Doing it properly only takes up 5 lines of source, is a lot
faster, and is free from bias. When you want something random, you
/really/ want it to be random, not random-ish.


Reply to this email directly or view it on GitHub
#72 (comment).

David Brady
dbrady@shinybit.com
We enter this world naked, bloody, and screaming. That sort of thing doesn't have to end there if you know how to live right. -- Dana Gould

d8uv added a commit that referenced this pull request Mar 1, 2013
As mentioned by @dbrady in #72 we shouldn't overwrite a native `Array::shuffle`
@d8uv
Copy link
Contributor Author

d8uv commented Mar 1, 2013

I... actually disagree with that. If Array::shuffle gets into native javascript, then code I wrote in 0648bcf automatically becomes a polyfill. It'll use the native method if it's available, and will use our method if it isn't. And regardless, most of the time, we should be using a utility library like Lo-dash or Underscore, because they smooth out inconsistencies when the native methods differ from platform to platform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants