How to convert array column to int array in Pandas?

Question

I have a DataFrame grouped_reps that contains names and a certain array of numbers associated with those names.

The dataframe is basically like:

grouped_reps = pd.DataFrame({
    'A': ['John', 'Mary', 'Tom'],
    'util_rate': [[1.0, 0.75, 0.90], [1.0, 0.80, 0.87],
                  [0.74, 0.34, 0.90, 0.45]]
})

Both the columns are currently object data types.

I'm trying to take the mean of each array associated with a name and store it in a new column in the dataframe, but to do this I have to convert the array to an float array first. I'm trying to do this by:

grouped_reps["util_rate"] = grouped_reps["util_rate"].astype(str).astype(float)

But I get this Error:

ValueError: could not convert string to float: '[1.0, 0.75, 0.9]'

@HenryEcker I don't know how to make the problem reproducible as this problem is unique to me b/c of having an imported dataset. — garchukins
– garchukins, Commented Jun 7, 2021 at 15:27
@garchukins Your edits moved in the right direction! Take a look at the edit's I've made for some ideas. Generally you want to make sure that your attempt works on the sample frame as well, for example if the dataframe is grouped_reps don't call it df, if the column is 'util_rate' don't call it 'B'. Additionally, report the error message for the sample data so it is reproducible. — Henry Ecker
– Henry Ecker ♦, Commented Jun 7, 2021 at 15:33
@HenryEcker Thank you very much for your help! You helped me understand what I did wrong. — garchukins
– garchukins, Commented Jun 7, 2021 at 16:00

Henry Ecker · Accepted Answer · 2021-06-07 16:12:12Z

To get the mean of each list, explode the list into multiple rows, convert to float via astype then calculate the mean on level=0:

grouped_reps['mean'] = (
    grouped_reps['util_rate'].explode().astype(float).mean(level=0)
)

grouped_reps:

      A                util_rate      mean
0  John         [1.0, 0.75, 0.9]  0.883333
1  Mary         [1.0, 0.8, 0.87]  0.890000
2   Tom  [0.74, 0.34, 0.9, 0.45]  0.607500

Explanation:

Explode produces a series where each element is in its own row:

grouped_reps['util_rate'].explode()

0     1.0
0    0.75
0     0.9
1     1.0
1     0.8
1    0.87
2    0.74
2    0.34
2     0.9
2    0.45
Name: util_rate, dtype: object

Convert to float:

grouped_reps['util_rate'].explode().astype(float)

0    1.00
0    0.75
0    0.90
1    1.00
1    0.80
1    0.87
2    0.74
2    0.34
2    0.90
2    0.45
Name: util_rate, dtype: float64

Since the index aligns with the index from each row, we can take the mean relative to level=0:

grouped_reps['util_rate'].explode().astype(float).mean(level=0)

0    0.883333
1    0.890000
2    0.607500
Name: util_rate, dtype: float64

Collectives™ on Stack Overflow

How to convert array column to int array in Pandas?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related