Change the shuffle to the Fisher-Yates shuffle #72

d8uv · 2013-02-28T20:04:04Z

The current "Shuffling Array Elements" page recommends a naive algorithm, which produces a slow and biased shuffle. Included is a highly idiomatic refactorization of the Fisher-Yates shuffle (aka the Knuth Shuffle), a less-idiomatic-but-better refactorization, and a version of the algorithm that adds to Array.prototype, for those that program in that style.

The current "Shuffling Array Elements" page recommends a naive algorithm, which produces a slow and biased shuffle. Included is a highly idiomatic refactorization of the Fisher- Yates shuffle (aka the Knuth Shuffle), a less-idiomatic-but-better refactorization, and a version of the algorithm that adds to Array.prototype, for those that program in that style.

dbrady · 2013-02-28T20:50:22Z

THIS. This is what a fantastic recipe entry looks like. I saw this email and thought "oh no, they've deleted the simple shuffle and replaced it with a needlessly complex one". I came in here expecting to write a message asking you to preserve the old shuffle, but then explain when and why it might be unsuitable, and propose the Fisher-Yeates shuffle as a replacement. When I read your diff I wanted to cry with joy. d8uv, this is awesome. THANK YOU.

This is the happiest "I hereby give you commit rights" message I've written to date. :-) Welcome!

Change the shuffle to the Fisher-Yates shuffle

dbrady · 2013-02-28T20:56:01Z

Actually, rereading it, might be better if it started with array.shuffle(), presented for fairly simple cases of randomness. Is it slightly biased, or is it obviously biased? I haven't run the Chi-squareq analysis on it to see how broken it is. I wonder if we could give some examples for when it works as a naive example, along with a specific use case that shows just how bad it can be--sort of like how you should never use modulus on Linear Congruential generators because rand() % 2 in the C stdlib will return 0,1,0,1,0,1,... etc.

d8uv · 2013-02-28T21:32:49Z

The chi-square analysis for the random.sort vs. fisher-yates has been done by Rob Weir. I haven't done a proper formal analysis of my refactorization, so if you're gonna drag out R you might as well check to see if my math-fu is strong, but my informal tests seem to pan out wonderfully.

The reason I didn't write it in a "this is what everyone uses, these are the problems, and this is the solution" way is... I find that style to be a little complex when you're trying to learn something specific. People are here to learn how best to shuffle in Coffeescript, and Knuth Shuffle is the best way. People aren't here to learn about a bunch of different methods and their pitfalls, they just want the goods, and to get out. Adding that supplementary stuff at the end is great, and I'd be really curious as to what that'd look like, but the solution should stay at the top

And as to if the naive solution is good enough... I'd say probably never. Doing it properly only takes up 5 lines of source, is a lot faster, and is free from bias. When you want something random, you really want it to be random, not random-ish.

dbrady · 2013-03-01T17:28:53Z

Heh, I misread the naive solution, sorry--I thought there was a
library-supplied Array.shuffle() method. Yes, sorting by randomly
swapping adjacent pairs is a horrible idea.

Were there an array.shuffle() method, even if it was pretty bad (but not
horrible) I would argue that it should go first, on account of it being
the best solution for trivial cases. A solution that takes up 1 line of
code is 5x better than a solution that takes 5, so unless it's literally
5x worse in some other dimension (like randomness) it's still better.

But yes. Doing a swapsort by random() is truly horrific--moving an
element away from its home position requires 2x as much entropy for each
step it moves, and that's assuming it's moving forward, in the direction
the shuffler is iterating. Even worse, an element can only move
backwards 1 step for each PASS of the shuffler. Look at that link you
sent me: In some of them there's a huge lump at the second-to-last
position, in others the lump is at the second position. That tells you
in which direction their swapper is biased...

Sorting by random DOES work with a schwartzian transform (assign each
element a random number, then sort the array by that number) but this a)
requires n floats of memory and b) I can't write it cleanly in much less
than 5 lines of code. :-)

Thanks again!

David

On 02/28/2013 02:32 PM, d8uv wrote:

The chi-square analysis for the random.sort vs. fisher-yates has been
done by Rob Weir
http://www.robweir.com/blog/2010/02/microsoft-random-browser-ballot.html.
I haven't done a proper formal analysis of my refactorization, so if
you're gonna drag out R you might as well check to see if my math-fu
is strong, but my informal tests seem to pan out wonderfully.

The reason I didn't write it in a "this is what everyone uses, these
are the problems, and this is the solution" way is... I find that
style to be a little complex when you're trying to learn something
specific. People are here to learn how best to shuffle in
Coffeescript, and Knuth Shuffle is the best way. People aren't here to
learn about a bunch of different methods and their pitfalls, they just
want the goods, and to get out. Adding that supplementary stuff at the
end is great, and I'd be really curious as to what that'd look like,
but the solution should stay at the top

And as to if the naive solution is good enough... I'd say probably
never. Doing it properly only takes up 5 lines of source, is a lot
faster, and is free from bias. When you want something random, you
/really/ want it to be random, not random-ish.

—
Reply to this email directly or view it on GitHub
#72 (comment).

David Brady
dbrady@shinybit.com
We enter this world naked, bloody, and screaming. That sort of thing doesn't have to end there if you know how to live right. -- Dana Gould

@dbrady

As mentioned by @dbrady in #72 we shouldn't overwrite a native `Array::shuffle`

d8uv · 2013-03-01T21:55:10Z

I... actually disagree with that. If Array::shuffle gets into native javascript, then code I wrote in 0648bcf automatically becomes a polyfill. It'll use the native method if it's available, and will use our method if it isn't. And regardless, most of the time, we should be using a utility library like Lo-dash or Underscore, because they smooth out inconsistencies when the native methods differ from platform to platform.

dbrady added a commit that referenced this pull request Feb 28, 2013

Merge pull request #72 from d8uv/patch-1

d0ee5cb

Change the shuffle to the Fisher-Yates shuffle

dbrady merged commit d0ee5cb into coffeescript-cookbook:master Feb 28, 2013

d8uv deleted the patch-1 branch February 28, 2013 21:13

d8uv added a commit that referenced this pull request Mar 1, 2013

Make Array::shuffle safer

0648bcf

As mentioned by @dbrady in #72 we shouldn't overwrite a native `Array::shuffle`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change the shuffle to the Fisher-Yates shuffle #72

Change the shuffle to the Fisher-Yates shuffle #72

Uh oh!

d8uv commented Feb 28, 2013

Uh oh!

dbrady commented Feb 28, 2013

Uh oh!

dbrady commented Feb 28, 2013

Uh oh!

d8uv commented Feb 28, 2013

Uh oh!

dbrady commented Mar 1, 2013

Uh oh!

d8uv commented Mar 1, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Change the shuffle to the Fisher-Yates shuffle #72

Change the shuffle to the Fisher-Yates shuffle #72

Uh oh!

Conversation

d8uv commented Feb 28, 2013

Uh oh!

dbrady commented Feb 28, 2013

Uh oh!

dbrady commented Feb 28, 2013

Uh oh!

d8uv commented Feb 28, 2013

Uh oh!

dbrady commented Mar 1, 2013

Uh oh!

d8uv commented Mar 1, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants