0

I have the following DataFrame containing various information about a certain product. Input3 is a list of sentences created as shown below:

sentence_list = (['Køb online her','Sammenlign priser her','Tjek priser fra 4 butikker','Se produkter fra 4 butikker', 'Stort udvalg fra 4 butikker','Sammenlign og køb'])
df["Input3"] = np.random.choice(sentence_list, size=len(df))

Full_Input is a string created by joining various columns, its content being something like: "ProductName from Brand - Buy online here - Sitename". It is created like this:

df["Full_Input"] = df['TitleTag'].astype(str) +  " " + df['Input2'].astype(str) + " " + df['Input3'].astype(str) + " " +  df['Input4'].astype(str) + " " +  df['Input5'].astype(str) 

enter image description here

The problem here is that Full_Input_Length should be under 55. Therefore I am trying to figure out how to put a condition while randomly generating Input3 so when it adds up with the other columns' strings, the full input length does not go over 55.

This is what I tried:

for col in range(len(df)):
    condlist = [df["Full_Input"].apply(len) < 55]
    choicelist = [sentence_list]
    df['Input3_OK'][col] = np.random.choice.select(condlist, choicelist)

As expected, it doesn't work like that. np.random.choice.select is not a thing and I am getting an AttributeError.

How can I do that instead?

1 Answer 1

1

If you are guaranteed to have at least one item in Input3 that will satisfy this condition, you may want to try something like conditioning your random selection ONLY on values in your sentence_list that would be of an acceptable length:

# convert to series to enable use of pandas filtering mechanism:
my_sentences = [s for s in sentence_list if len(s) < MAX_LENGTH]

# randomly select from this filtered list:
np.random.choice(my_sentences)

In other words, perform the filter on each list of strings BEFORE you call random.choice.

You can run this for each row in a dataframe like so:

def choose_string(full_input):
    return np.random.choice([
        s 
        for s in sentence_list 
        if len(s) + len(full_input) < 55
    ])

df["Input3_OK"] = df.Full_Input.map(choose_string)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.