10

I have a Pandas DataFrame which has a list of integers inside one of the columns. I'd like to access the individual elements within this list. I've found a way to do it by using tolist() and turning it back into a DataFrame, but I am wondering if there is a simpler/better way. In this example, I add Column A to the middle element of the list in Column B.

import pandas as pd
df = pd.DataFrame({'A' : (1,2,3), 'B': ([0,1,2],[3,4,5,],[6,7,8])})
df['C'] = df['A'] + pd.DataFrame(df['B'].tolist())[1]
df

Is there a better way to do this?

3 Answers 3

16

A bit more straightforward is:

df['C'] = df['A'] + df['B'].apply(lambda x:x[1])
Sign up to request clarification or add additional context in comments.

Comments

7

One option is to use the apply, which should be faster than creating a data frame out of it:

df['C'] = df['A'] + df.apply(lambda row: row['B'][1], axis = 1) 

Some speed test:

%timeit df['C'] = df['A'] + pd.DataFrame(df['B'].tolist())[1]
# 1000 loops, best of 3: 567 µs per loop
%timeit df['C'] = df['A'] + df.apply(lambda row: row['B'][1], axis = 1) 
# 1000 loops, best of 3: 406 µs per loop
%timeit df['C'] = df['A'] + df['B'].apply(lambda x:x[1])
# 1000 loops, best of 3: 250 µs per loop

OK. Slightly better. @breucopter's answer is the fastest.

Comments

5

You can also simply try the following:

df['C'] = df['A'] + df['B'].str[1]

Performance of this method:

%timeit df['C'] = df['A'] + df['B'].str[1]
#1000 loops, best of 3: 445 µs per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.