How to populate a dataframe list with missing values

Question

I have a dataframe with the following:

colA  colB
ABC   0.12
GHI   0.01

And a unique list for which I want to create a dataframe with:

ABC
DEF
GHI

The dataframe I need to create would have:

colA   colB
ABC    0.12
DEF    0.00
GHI    0.01

What would be the fastest way to populate my new dataframe (i.e. my intution would be to loop).

when asking question on StackOverflow kindly remember that if solved, you should accept the best answer by clicking the checkmark next to the solution. Thanks! — David Erickson
– David Erickson, Commented Oct 25, 2020 at 22:19

sammywemmy · Accepted Answer · 2020-10-20 23:22:03Z

1

Try this:

df.set_index("colA").reindex(["ABC", "DEF", "GHI"], fill_value=0).reset_index()



   colA colB
0   ABC 0.12
1   DEF 0.00
2   GHI 0.01

answered Oct 20, 2020 at 23:22

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

David Erickson · Accepted Answer · 2020-10-20 23:39:46Z

You could use .combine_first if you create a second dataframe from the list and use .set_index('colA') for both dataframes:

df1 = pd.DataFrame({'colA': {0: 'ABC', 1: 'GHI'}, 'colB': {0: 0.12, 1: 0.01}})
lst = ['ABC','DEF','GHI']
df2 = pd.DataFrame({'colA' : lst})
df3 = df1.set_index('colA').combine_first(df2.set_index('colA')).reset_index().fillna(0)
df3
Out[1]: 
  colA  colB
0  ABC  0.12
1  DEF  0.00
2  GHI  0.01

You could use .combine_first if you create a second dataframe from the list and use .set_index('colA') for both dataframes:

df1 = pd.DataFrame({'colA': {0: 'ABC', 1: 'GHI'}, 'colB': {0: 0.12, 1: 0.01}})
lst = ['ABC','DEF','GHI']
df2 = pd.DataFrame({'colA' : lst})
df3 = df1.set_index('colA').combine_first(df2.set_index('colA')).reset_index().fillna(0)
df3
Out[1]: 
  colA  colB
0  ABC  0.12
1  DEF  0.00
2  GHI  0.01

I was curious to see which method was faster between combine_first and reindex. Sammy's approach was faster at least for this dataframe.

df1 = pd.DataFrame({'colA': {0: 'ABC', 1: 'GHI'}, 'colB': {0: 0.12, 1: 0.01}}).set_index('colA')
lst = ['ABC','DEF','GHI']
df2 = pd.DataFrame({'colA' : lst}).set_index('colA')

def f1(): 
    return df1.combine_first(df2).reset_index().fillna(0)


def f2(): 
    return df1.reindex(lst, fill_value=0).reset_index()

%timeit f1()
%timeit f2()

2.35 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
784 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

wwnde · Accepted Answer · 2020-10-20 23:55:34Z

0

Another way is to pd.Series the list, append to existing dataframe, and drop duplicates;

df.append(pd.DataFrame(l,columns=['colA'])).drop_duplicates(subset=['colA'], keep='first').fillna(0)

 colA  colB
0  ABC  0.12
1  DEF  0.01
2  GHI  0.00

answered Oct 20, 2020 at 23:55

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

Collectives™ on Stack Overflow

How to populate a dataframe list with missing values

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related