0

Let's say I have 5 marbles, 3 are red and 2 are blue. The marbles of the same color are numbered from 1 to n, where n is the number of marbles of that color.

What I need is way to shuffle the list ['red', 'red', 'red','blue', 'blue'] , where the permutations of the individual marbles does not matter.

E.g ['red1', 'red2', 'red3','blue', 'blue'] and ['red2', 'red1', 'red3','blue', 'blue'] are only one solution.

Because my original problem has a lot more marbles of one color, compared to the others, the normal shuffle of the list results in a high bias for certain orders.

One idea I had would be:

  1. Start with a new empty list,
  2. Generate 2 random integers (the first integer to choose a color, the second integer to determine how many marbles to pick of that color)
  3. Append the picked marbles to the new list
  4. Repeat until all marbles have been used up

I'm wondering if there are some better ways to solve this, maybe with a more guaranteed unbiased result.

Edit: A small scaled scenario would be 4 different colors with 2 to 5 marbles per color. A realistic scenario would be 20 diff. colors, with between 5 and 50 marbles per color.

5
  • Because this is a performance question, how many possible values are there, and how big is the list? Commented Feb 11, 2020 at 15:20
  • Are the colors supposed to not change position? You only want to shuffle by same colors? Commented Feb 11, 2020 at 15:34
  • 1
    "the normal shuffle of the list results in a high bias for certain ordes." What do you mean by this? A standard shuffling algorithm like Fisher-Yates (which random.shuffle implements) will give unbiased shuffles regardless of how many duplicates there are. Commented Feb 11, 2020 at 15:38
  • @kaya3 : If i want to shuffle ['red', 'red', 'blue] Fisher-Yates will handle it like ['red1','red2','blue'], if I understand the algorithm correctly. Which adds solutions that look exactly the same, but the distribution will change towards those results Commented Feb 11, 2020 at 15:50
  • 1
    It adds solutions which "look the same" in equal proportion to all solutions. For your example, Fisher-Yates will generate (red,red,blue), (red,blue,red) and (blue,red,red) with equal probability. How could it be any less biased? Commented Feb 11, 2020 at 16:30

2 Answers 2

1

If you want an unbiased shuffle, then you are comparing your algorithm to Knuth's (slightly improved version of Fisher-Yate's) shuffle

If you are going to optimise using a two stage selection and want it to remain fair, you need to calculate the probabilities of getting runs of one, two, three of the same colour, and implement this calculation more efficiently than the corresponding number of swaps, and then do that number of copies from your sources into the target list.

I wouldn't bother for performance sake, as it is unlikely with 1000 marbles that you will improve the performance by doing the maths over the swap - you're going to have to do another list iteration on each stage to calculate the probabilities and then extra work which is going to make it O(N²) rather than O(N).

Sign up to request clarification or add additional context in comments.

Comments

0

This will give a uniform distribution of the uniques combinations:

from random import randrange
from collections import Counter
def marblePick(marbles):
    remaining = list(Counter(marbles).elements())
    return [remaining.pop(randrange(r)) for r in range(len(remaining),0,-1)]

The performance seems to be linear and I tested the distribution by sampling as follows:

marbles = {"R":3,"B":2}

N = 1000000
dist = Counter()
for _ in range(N):
    combo = marblePick(marbles)
    dist[tuple(combo)]+=1

for s,c in dist.items():
    print(s,c/N)

import statistics as stats
prop = [c/N for c in dist.values()]
mean = stats.mean(prop)
stdev = stats.stdev(prop)
print("mean:",mean,"stdDev:",f"{100*stdev/mean:.1f}%",N,"samples",len(dist),"combos")

the results are conclusive for this sample (and a few others I tried):

('B', 'R', 'B', 'R', 'R') 0.100245
('R', 'R', 'R', 'B', 'B') 0.099789
('R', 'R', 'B', 'B', 'R') 0.099952
('B', 'R', 'R', 'B', 'R') 0.100579
('R', 'R', 'B', 'R', 'B') 0.099732
('R', 'B', 'R', 'R', 'B') 0.100033
('B', 'B', 'R', 'R', 'R') 0.099861
('B', 'R', 'R', 'R', 'B') 0.099941
('R', 'B', 'R', 'B', 'R') 0.100103
('R', 'B', 'B', 'R', 'R') 0.099765
mean: 0.1 stdDev: 0.3% 1000000 samples 10 combos

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.