2

I have the following test DateFrame:

| tag      | list                                                | Count |
| -------- | ----------------------------------------------------|-------|
| icecream | [['A',0.9],['B',0.6],['C',0.5],['D',0.3],['E',0.1]] |  5    |
| potato   | [['U',0.8],['V',0.7],['W',0.4],['X',0.3]]           |  4    |
| cheese   | [['I',0.2],['J',0.4]]                               |  2    |

I want to randomly sample the list column to pick any 3 from the first 4 lists of lists. (Like ['E',0.1] is not even considered for tag = icecream).

The rule should be able to pick 3 list randomly from the list of lists. If there is less than 3 then pick whatever is there and randomize it.

The result should be random every time so need to seed it for the same output:

| tag      | list                           | 
| -------- | -------------------------------|
| icecream | [['B',0.6],['C',0.5],['A',0.9]]|
| potato   | [['W',0.4],['X',0.3],['U',0.8]]|
| cheese   | [['J',0.4],['I',0.2]]          | 

This is what I tried:

data = [['icecream', [['A', 0.9],['B', 0.6],['C',0.5],['D',0.3],['E',0.1]]], 
        ['potato', [['U', 0.8],['V', 0.7],['W',0.4],['X',0.3]]],
        ['cheese',[['I',0.2],['J',0.4]]]]

df = pd.DataFrame(data, columns=['tag', 'list'])
df['Count'] = df['list'].str.len().sort_values( ascending=[False])
df
--

import random
item_top_3 =  []
find = 4
num = 3
for i in range(df.shape[0]):
    item_id = df["tag"].iloc[i]
    whole_list = df["list"].iloc[i]
    item_top_3.append([item_id, random.sample(whole_list[0:find], num)])

--
I get this error:
ValueError: Sample larger than population or is negative.

Can anyone help randomizing it. The original DataFrame has over 50,000 rows and I want to randomize for any rule like tomorrow someone may want to pick 5 random items from first 20 elements in the list of lists, but it should still work.

2
  • Can you provide a DataFrame constructor of the input? Commented Aug 3, 2022 at 5:56
  • @mozway - updated it in the question. Can you check? Commented Aug 3, 2022 at 6:02

2 Answers 2

1

Use a list comprehension combined with random.sample:

import random

find = 4
num = 3
df['list'] = [random.sample(l[:find], k=min(num, len(l))) for l in df['list']]

output:

        tag                            list  Count
0  icecream  [[C, 0.5], [B, 0.6], [D, 0.3]]      5
1    potato  [[V, 0.7], [U, 0.8], [X, 0.3]]      4
2    cheese            [[J, 0.4], [I, 0.2]]      2
Sign up to request clarification or add additional context in comments.

4 Comments

but E shouldnt be coming in results, correct? It is 5th item in the list for icecream
I think we have to do find = 3 but this works. Thanks again, Mozway. You are awsesome!
I pasted the wrong output. find should be 4 ;)
yeah, it is fine
0

Alternatively, you can combine np.random.choice with apply after creating a temporary list column that only contains the first n elements of your orginal list column.

Code:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "tag": ["icecream", "potato", "cheese"],
    "list": [[['A',0.9],['B',0.6],['C',0.5],['D',0.3],['E',0.1]], [['U',0.8],['V',0.7],['W',0.4],['X',0.3]], [['I',0.2],['J',0.4]]],
    "count": [5, 4, 2]
})

first_n = 4
size = 3
df["ls_tmp"] = df["list"].str[:first_n].apply(np.array)
df["list"] = df["ls_tmp"].apply(lambda x: list(x[np.random.choice(len(x), size=size)]))

You can also write a helper function and use map instead of apply, which should be faster and more effective:

def randomize(x, size=3):
    return list(x[np.random.choice(len(x), size=size)])

df["list"] = df["ls_tmp"].map(randomize)

Output:

    tag       list                              count   ls_tmp
0   icecream  [[A, 0.9], [A, 0.9], [C, 0.5]]    5       [[A, 0.9], [B, 0.6], [C, 0.5], [D, 0.3]]
1   potato    [[W, 0.4], [V, 0.7], [V, 0.7]]    4       [[U, 0.8], [V, 0.7], [W, 0.4], [X, 0.3]]
2   cheese    [[J, 0.4], [J, 0.4]]              2       [[I, 0.2], [J, 0.4]]

where the column ls_tmp contains the original first nvalues.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.