0

I was trying to find the maximum value of a column in a dataframe that contains numpy arrays.

df = pd.DataFrame({'id': [1, 2, 33, 4],
                   'a': [1, 22, 23, 44],
                   'b': [1, 42, 23, 42]})
df['new'] = df.apply(lambda r: tuple(r), axis=1).apply(np.array)

This how the dataframe can look like:

    id  a   b   new
0   1   1   1   [1, 1, 1]
1   2   22  42  [2, 22, 42]
2   33  23  23  [33, 23, 23]
3   4   44  42  [4, 44, 42]

Now I want to find the maximum (single) value of column new. In this case it is 44. What about a quick and easy way?

3
  • df["new"].apply(max).max() ? Commented Feb 3, 2023 at 22:18
  • Do the arrays in new always have the same dimension? Commented Feb 3, 2023 at 22:21
  • Yes, alway the same dimension! In the real world it has up to 8000 entries. Commented Feb 3, 2023 at 22:26

6 Answers 6

1

Because your new column is actually constructed from the columns id, a, b. Before you create the new column you can do:

single_max = np.max(df.values)

OR if you insist on your dataframe to contain the new column and then get max you can do:

single_max = np.max(df.drop('new',axis=1).values)
Sign up to request clarification or add additional context in comments.

Comments

0

You can apply a lambda to the values that calls the array's max method. This would result in a Series that also has a max method.

df['new'].apply(lambda arr: arr.max()).max()

Just guessing, but this should be faster than .apply(max) because you use the optimized array method instead of converting the numpy ints to python ints one by one.

Comments

0

A possible solution:

df.new.explode().max()

Or a faster alternative:

np.max(np.vstack(df.new.values))

Returns 44.

Comments

0

Assuming you only want to consider the columns "new":

import numpy as np
out = np.max(tuple(df['new'])) # or np.max(df['new'].tolist())

Output: 44

Comments

0
df1.new.map(pd.eval).explode().max()

Output: 44

Comments

0

1- Combination of max and explode()

df['new'].explode().max()
# 44

2- List comprehension

max([max(e) for e in df['new']])
# 44

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.