1

At the most basic I have the following dataframe:

a = {'possibility' : np.array([1,2,3])}
b = {'possibility' : np.array([4,5,6])}

df = pd.DataFrame([a,b])

This gives me a dataframe of size 2x1: like so:

row 1:  np.array([1,2,3])
row 2:  np.array([4,5,6])

I have another vector of length 2. Like so:

[1,2]

These represent the index I want from each row.
So if I have [1,2] I want: from row 1: 2, and from row 2: 6. Ideally, my output is [2,6] in a vector form, of length 2.

Is this possible? I can easily run through a for loop, but am looking for FAST approaches, ideally vectors approaches since it is already in pandas/numpy.

For actual use case approximations, I am looking to make this work in the 300k-400k row ranges. And need to run it in optimization problems (hence the fast part)

1
  • There's a separate array in each cell. It isn't one multidimensional array. Commented Feb 17, 2022 at 9:24

1 Answer 1

1

You could transform to a multi-dimensional numpy array and take_along_axis:

v = np.array([1,2])
a = np.vstack(df['possibility'])
np.take_along_axis(a.T, v[None], axis=0)[0]

output: array([2, 6])

Sign up to request clarification or add additional context in comments.

2 Comments

works! even when lengths are in the 3 million. nice and sweet. (though it does add 3 seconds to the calc... are there simple optimizations from here?)
@user1639926 good question, I imagine most of the computing is spent stacking the arrays so I doubt you can greatly improve it. Might be worth asking a new numpy question though ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.