how to get unique random values from a list within a for loop?

Question

I made a script that combines data from 2 different csv files and generates a txt file with different lines (prompt). What I want to do is to avoid a repetition of the same "fintag" variable in a way that all the prompts would be different.

This script does exactly what I need, but it obviously repeats some of the values because ran is a random number.

I can't avoid repetitions of the same random number, because the random number is used in multiple column. Creating a different variable for each column would solve it, but the columns number is high, and it might even change overtime.

The alternative is to remove the elements from the "asstag" lists once they've been used, but the list is generated within a for loop and I have no idea how to remove elements from a list while a for loop is iterating on it.

Input:

people = {'Name' : ['mark', 'bill', 'tim', 'frank'],
        'Tag' : [color, animal, clothes, animal]}
dic = {'color' : ['blu', 'green', 'red', 'yellow'],
        'animal' : [dog, cat, horse, shark],
        'clothes' : [gloves, shoes, shirt, socks]}

Expected Output:

mark blu (or green, or red, or yellow)
bill horse (or dog, or cat, or shark)
tim socks (or gloves, or shoes, or shirt)
frank dog (or cat, or shark, but not horse if horse is already assigned to bill)

Code:

people = pd.read_csv("people.csv")
dic = pd.read_csv("dic.csv")

nam = list(people.loc[:,"Name"])    
tag = list(people.loc[:,"Tag"])

with open("test.txt", "w+") as file:  
    for n, t in zip (nam, tag):
        asstag = list(dic.loc[:, t])
        ran = random.randint(0, len(dic.loc[:, tag]) - 1)
        fintag = asstag[ran]
        prompt = (str(nam) + " " + str(fintag))
        print(prompt)
        file.write(prompt)

Please add input and expected output. What is othervariable? — Dani Mesejo
– Dani Mesejo, Commented Jul 19, 2022 at 7:34
What happens if there are more names than possible unique tags — Dani Mesejo
– Dani Mesejo, Commented Jul 19, 2022 at 8:03

Dani Mesejo · Accepted Answer · 2022-07-20 08:17:24Z

1

One approach to select by tag unique elements, using random.sample:

import pandas as pd
import random
from collections import Counter

random.seed(42)

people = pd.DataFrame({'Name': ['mark', 'bill', 'tim', 'frank'],
                       'Tag': ['color', 'animal', 'clothes', 'animal']})
dic = pd.DataFrame({'color': ['blu', 'green', 'red', 'yellow'],
                    'animal': ['dog', 'cat', 'horse', 'shark'],
                    'clothes': ['gloves', 'shoes', 'shirt', 'socks']})

names = list(people.loc[:, "Name"])
tags = list(people.loc[:, "Tag"])

samples_by_tag = {tag: random.sample(dic.loc[:, tag].unique().tolist(), count) for tag, count in Counter(tags).items()}

for name, tag in zip(names, tags):
    print(name, samples_by_tag[tag].pop())

Output

mark blu
bill horse
tim shirt
frank dog

The idea is to sample n_i unique elements by each tag using random.sample, where n_i is the number each tag appears in tags, this is done in the line:

samples_by_tag = {tag: random.sample(dic.loc[:, tag].unique().tolist(), count) for tag, count in Counter(tags).items()}

for a given run it can take the following value:

{'color': ['blu'], 'animal': ['dog', 'horse'], 'clothes': ['shirt']}
 # samples_by_tag

Note that you need to remove:

random.seed(42)

to make the script give random results every time. See the documentation on random.seed and the notes on reproducibility.

UPDATE

If one tag has fewer values than need, and you have a list to replace them, do the following:

other_colors = ['black', 'violet', 'green', 'brown']
populations = { tag : dic.loc[:, tag].unique().tolist() for tag in set(tags) }
populations["color"] = list(set(other_colors))

samples_by_tag = {tag: random.sample(populations[tag], count) for tag, count in Counter(tags).items()}

for name, tag in zip(names, tags):
    print(name, samples_by_tag[tag].pop())

edited Jul 20, 2022 at 8:17

answered Jul 19, 2022 at 8:08

Dani Mesejo

62.2k6 gold badges56 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Francesco Calderone Over a year ago

this is not random, it gives the exact same result every time it's ran.

Dani Mesejo Over a year ago

@FrancescoCalderone Because I set the seed for reproducibility, just remove the random.seed(42) line

Francesco Calderone Over a year ago

What if there are actually more names than unique tags as you suggested before? Let's say that a specific tag called "colors" only has 2 entries. What I want to do is to disregard those 2 entries entirely, and get the values for an entirely different list (just for that tag). The list is stored in a variable named colors_list. How would I do that?

Dani Mesejo Over a year ago

Just update the dictionary samples_by_tag replace the values that has the "colors" key with the new ones

Francesco Calderone Over a year ago

Not sure what that means. samples_by_tag dictionary doesn't get generated because "ValueError: Sample larger than population or is negative" so I can't change the values afterwards. And I'm not sure how to make that exception before that samples_by_tag line is ran.

|

Collectives™ on Stack Overflow

how to get unique random values from a list within a for loop?

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related