8

I need to initialize the cells in a column of a DataFrame to lists.

df['some_col'] = [[] for _ in no_of_rows]

I am wondering is there a better way to do that in terms of time efficiency?

3
  • You have accepted an answer that offers a solution 3x slower than your starting point. Commented May 24, 2016 at 14:08
  • @Stefan it seems that you are correct, as apply(list) is indeed slightly slower than my old code. Commented May 24, 2016 at 14:15
  • So as you can see below you can get a tiny bit faster using itertools, but I think you're actually quite good already because I don't see a faster way to add the column than the standard method, but perhaps someone comes up with some magic.. Commented May 24, 2016 at 14:18

2 Answers 2

6

Since you are looking for time efficiency, below some benchmarks. I think list comprehension is already quite fast to create the empty list of list objects, but you can squeeze out a marginal improvement using itertools.repeat. On the insert piece, apply is 3x slower because it loops:

import pandas as pd
from itertools import repeat
df = pd.DataFrame({"A":np.arange(100000)})

%timeit df['some_col'] = [[] for _ in range(len(df))]
100 loops, best of 3: 8.75 ms per loop

%timeit df['some_col'] = [[] for i in repeat(None, len(df))]
100 loops, best of 3: 8.02 ms per loop

%%timeit 
df['some_col'] = ''
df['some_col'] = df['some_col'].apply(list)
10 loops, best of 3: 25 ms per loop
Sign up to request clarification or add additional context in comments.

Comments

5

Try apply:

df1['some_col'] = ''
df1['some_col'] = df1['some_col'].apply(list)

Sample:

df1 = pd.DataFrame({'a': pd.Series([1,2])})
print (df1)
   a
0  1
1  2

df1['some_col'] = ''
df1['some_col'] = df1['some_col'].apply(list)
print (df1)
   a some_col
0  1       []
1  2       []

7 Comments

How is this better in terms of time efficiency?
Hmmm, I think it is not better in terms of time efficiency. But it is up to OP which answer sign as accepted. Maybe you prefer me, because I was first, maybe because like it. But maybe in few seconds change his opinion. I dont know.
Also note that lambda: [] will be faster than list.
Just asking because the question was about time efficiency, so it's a good thing if the answer tries to do so as well.
@Stefan And now maybe your solution will be accepted.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.