2

I have a DataFrame grouped_reps that contains names and a certain array of numbers associated with those names.

The dataframe is basically like:

grouped_reps = pd.DataFrame({
    'A': ['John', 'Mary', 'Tom'],
    'util_rate': [[1.0, 0.75, 0.90], [1.0, 0.80, 0.87],
                  [0.74, 0.34, 0.90, 0.45]]
})

Both the columns are currently object data types.

I'm trying to take the mean of each array associated with a name and store it in a new column in the dataframe, but to do this I have to convert the array to an float array first. I'm trying to do this by:

grouped_reps["util_rate"] = grouped_reps["util_rate"].astype(str).astype(float)

But I get this Error:

ValueError: could not convert string to float: '[1.0, 0.75, 0.9]'
3
  • @HenryEcker I don't know how to make the problem reproducible as this problem is unique to me b/c of having an imported dataset. Commented Jun 7, 2021 at 15:27
  • 1
    @garchukins Your edits moved in the right direction! Take a look at the edit's I've made for some ideas. Generally you want to make sure that your attempt works on the sample frame as well, for example if the dataframe is grouped_reps don't call it df, if the column is 'util_rate' don't call it 'B'. Additionally, report the error message for the sample data so it is reproducible. Commented Jun 7, 2021 at 15:33
  • 1
    @HenryEcker Thank you very much for your help! You helped me understand what I did wrong. Commented Jun 7, 2021 at 16:00

1 Answer 1

2

To get the mean of each list, explode the list into multiple rows, convert to float via astype then calculate the mean on level=0:

grouped_reps['mean'] = (
    grouped_reps['util_rate'].explode().astype(float).mean(level=0)
)

grouped_reps:

      A                util_rate      mean
0  John         [1.0, 0.75, 0.9]  0.883333
1  Mary         [1.0, 0.8, 0.87]  0.890000
2   Tom  [0.74, 0.34, 0.9, 0.45]  0.607500

Explanation:

Explode produces a series where each element is in its own row:

grouped_reps['util_rate'].explode()
0     1.0
0    0.75
0     0.9
1     1.0
1     0.8
1    0.87
2    0.74
2    0.34
2     0.9
2    0.45
Name: util_rate, dtype: object

Convert to float:

grouped_reps['util_rate'].explode().astype(float)
0    1.00
0    0.75
0    0.90
1    1.00
1    0.80
1    0.87
2    0.74
2    0.34
2    0.90
2    0.45
Name: util_rate, dtype: float64

Since the index aligns with the index from each row, we can take the mean relative to level=0:

grouped_reps['util_rate'].explode().astype(float).mean(level=0)
0    0.883333
1    0.890000
2    0.607500
Name: util_rate, dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.