0

Hi I want to create a dataframe that stores a unique variable and its average in every column. Currently I have a dataframe that has 2 columns. One has a list of names while the other has a single value. I want to associate that value with all of the names in the list and eventually find the average value for all the names This is the data I have:

Df1:
names_col                    cost_col
[milk, eggs, cookies]          3
[water, milk, yogurt]          5 
[cookies, diaper, yogurt]      7

This is what I want:

Df2:
names_col             avg_cost_col
milk                       4
eggs                       3
cookies                    5
water                      5
yogurt                     6
diaper                     7

I thought about doing an apply over all the rows somehow or using set() to remove duplicates out of every list but I am not sure. Any help would be appreicated

3 Answers 3

1

Following @MaxU's suggestion of vectorized solution in this answer

ndf = pd.DataFrame({'a':np.repeat(df.cost_col.values, df.names_col.str.len()),
                    'b':np.concatenate(df.names_col.values)})

yields

    a   b
0   3   milk
1   3   eggs
2   3   cookies
3   5   water
4   5   milk
5   5   yogurt
6   7   cookies
7   7   diaper
8   7   yogurt

Then just plain groupy+mean

ndf.groupby('b').mean()

        a
b   
cookies 5
diaper  7
eggs    3
milk    4
water   5
yogurt  6
Sign up to request clarification or add additional context in comments.

Comments

1

IIUC flatten your list (unnest)

pd.DataFrame(data=df.cost_col.repeat(df.names_col.str.len()).values,index=np.concatenate(df.names_col.values)).mean(level=0)
Out[221]: 
         0
milk     4
eggs     3
cookies  5
water    5
yogurt   6
diaper   7

Comments

1

Set() would be the easiest answer to remove duplicates.
The notation *list_name extends/unpack your list.

l = [["milk", "eggs", "cookies"], 
    ["water", "milk", "yogurt"], 
    ["cookies", "diaper", "yogurt"]]
c = set([*l[0], *l[1], *l[2]])
{'cookies', 'eggs', 'diaper', 'yogurt', 'water', 'milk'}

And use it for your pandas columns names. You can convert it back to a list if needed with list()

1 Comment

This is not an answer to the question the OP want's to use pandas and this doesn't answer the whole question

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.