2

I am creating lists like this:

In:

x = [[[random.randint(1,1000), random.randint(0,2)] for i,j in zip(range(1), range(1))][0] for i in range(5)]

Out:

[[76, 0], [128, 2], [194, 2], [851, 2], [123, 1]]

However, what is the most efficient way of making the first element of the sublist unique? It seems that randint doesnt have this option. How can I force 76, 128, 194, 851, 123 to be unique?

2
  • 1
    Are you interested in a numpy solution? The question is tagged as such, but you also seem to deal with lists and the random module. Commented May 26, 2021 at 14:59
  • Yes @Reti43 a numpy solution also works Commented May 26, 2021 at 15:00

3 Answers 3

4

You can use np.random.choice np.random.default_rng().choice with replace=False to ensure uniqueness.

import numpy as np

first = np.random.choice(np.arange(1000, dtype=int), 5, replace=False)  # replace=False ensures uniqueness
# first = array([645, 543, 233,  93, 420])

second = np.random.choice([1, 2], 5)
# second = array([1, 1, 2, 1, 2])

Using the np.array to combine the two and taking the trasnpose:

np.array((first, second)).T.tolist()
# [[645, 1], [543, 1], [233, 2], [93, 1], [420, 2]]

Update:

Based on the comment by @Sam Mason and according to this thread, seems like the preferred way since numpy 1.17 is to use rng = np.random.default_rng().

So, the variable first shall be changed to:

rng = np.random.default_rng()
first = rng.choice(np.arange(1000, dtype=int), 5, replace=False)

Timing Comparison

This is a rough timing comparison for two large values. If a proper comparison is needed, you need to run this for many combinations of the array length and the range to pick from. Feel free to edit this as the new answers appear.

length, max_val = 100000, 10000000


%timeit op(length, max_val)
%timeit akilat90(length, max_val)
%timeit Reti43_np(length, max_val)
%timeit Reti43_p(length, max_val)
%timeit Shivam_Roy(length, max_val)

# 1 loop, best of 5: 392 ms per loop
# 10 loops, best of 5: 45.4 ms per loop
# 1 loop, best of 5: 13.8 s per loop
# 1 loop, best of 5: 261 ms per loop
# 1 loop, best of 5: 364 ms per loop

Code to reproduce:

def op(length, max_val):
    """
    [0, max_val) range is considered to get the first values
    """
    if max_val < length:
        raise ValueError("Can't ensure uniqueness")
    return [[[random.randint(1,max_val), random.randint(0,2)] for i,j in zip(range(1), range(1))][0] for i in range(length)]

def akilat90(length, max_val):
    if max_val < length:
        raise ValueError("Can't ensure uniqueness")
    value_range = np.arange(max_val)
    rng = np.random.default_rng()

    first = rng.choice(value_range, length, replace=False)
    second = rng.choice([1, 2], length)
    return np.array((first, second)).T.tolist()

def Reti43_np(length, max_val):
    if max_val < length:
        raise ValueError("Can't ensure uniqueness")    
    a = np.arange(max_val)[:,None]
    np.random.shuffle(a)
    a = a[:length]
    b = np.random.randint(0, 3, (length, 1))
    out = np.hstack([a, b])
    return out

def Reti43_p(length, max_val):
    if max_val < length:
        raise ValueError("Can't ensure uniqueness")
    a = random.sample(range(1, max_val + 1), length)
    b = [random.randint(0, 2) for _ in range(length)]
    # If you want a list of lists instead `[[first, second] for first, second in zip(a, b)]`
    return list(zip(a, b))    

def Shivam_Roy(length, max_val):
    if max_val < length:
        raise ValueError("Can't ensure uniqueness")
    rand_list = random.sample(range(0, max_val), length)
    return [[[rand_list[x], random.randint(0,2)] for i,j in zip(range(1), range(1))][0] for x in range(length)]
Sign up to request clarification or add additional context in comments.

7 Comments

I keep forgetting that numpy.random.choice supports sampling with no replacement because I tend to think in parallels of random.choice and random.sample.
@Reti43 it's quite useful! Also, please check the timing code - I might have made a mistake there. Feel free to edit if that's the case.
@akilat90 your version is so fast because it's ignoring the parameters! also note that using choice from the non-legacy RNG has much better performance that the interface you're using
@akilat90 yup, that looks right. you can also just pass max_val directly to it, which makes things another 10 times faster for me
just realised that I was testing without tolist which now takes the vast majority of the runtime, so you'll only see things improving by 2x
|
1

In order to get random, but unique elements, shuffle your list and take the first N elements.

import numpy as np

rows = 5
a = np.arange(1, 1001)[:,None]
np.random.shuffle(a)
a = a[:rows]
b = np.random.randint(0, 3, (rows, 1))
out = np.hstack([a, b])

Result

array([[  3,   1],
       [291,   1],
       [159,   1],
       [814,   0],
       [989,   2]])

For a pure python solution you can use random.sample to generate unique elements from a collection.

import random

a = random.sample(range(1, 1001), rows)
b = [random.randint(0, 2) for _ in range(rows)]
# If you want a list of lists instead `[[first, second] for first, second in zip(a, b)]`
out = list(zip(a, b))

1 Comment

@J Do If you don't have a lot of items, you shouldn't be concerned about speed. If that is a requirement you should put it in the question.
1

You can use random.sample to get unique values from a range, likewise:

rand_list = random.sample(range(1, 10000), 5)

x = [[[rand_list[x], random.randint(0,2)] for i,j in zip(range(1), range(1))][0] for x in range(5)]

5 Comments

Thanks! However, I got TypeError: Population must be a sequence or set. For dicts, use list(d).
I'm sorry, I just made an edit, the input parameters have a specific format. Please refer the documentation as well, for more information.
Apologies for the trouble, I also realised that random.sample returns a list. Please refer to the edit.
This is a bad application of random.sample(). Yes, one call gets k unique elements, but if you call it N times, you don't guarantee there won't be duplicates within calls. If you want to use that function, you need to generate all k elements upfront and them zip them with the 0-2 values for the second columns.
@JDo Hi, as Reti43 pointed out, my code was not guaranteeing random values. I have edited the code to make sure that you always get random values. I really hope this helps, and apologise for so many edits. But this one would work perfectly. Also, please note that I have used a variable x instead of using i twice.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.