You could use .combine_first if you create a second dataframe from the list and use .set_index('colA') for both dataframes:
df1 = pd.DataFrame({'colA': {0: 'ABC', 1: 'GHI'}, 'colB': {0: 0.12, 1: 0.01}})
lst = ['ABC','DEF','GHI']
df2 = pd.DataFrame({'colA' : lst})
df3 = df1.set_index('colA').combine_first(df2.set_index('colA')).reset_index().fillna(0)
df3
Out[1]:
colA colB
0 ABC 0.12
1 DEF 0.00
2 GHI 0.01
You could use .combine_first if you create a second dataframe from the list and use .set_index('colA') for both dataframes:
df1 = pd.DataFrame({'colA': {0: 'ABC', 1: 'GHI'}, 'colB': {0: 0.12, 1: 0.01}})
lst = ['ABC','DEF','GHI']
df2 = pd.DataFrame({'colA' : lst})
df3 = df1.set_index('colA').combine_first(df2.set_index('colA')).reset_index().fillna(0)
df3
Out[1]:
colA colB
0 ABC 0.12
1 DEF 0.00
2 GHI 0.01
I was curious to see which method was faster between combine_first and reindex. Sammy's approach was faster at least for this dataframe.
df1 = pd.DataFrame({'colA': {0: 'ABC', 1: 'GHI'}, 'colB': {0: 0.12, 1: 0.01}}).set_index('colA')
lst = ['ABC','DEF','GHI']
df2 = pd.DataFrame({'colA' : lst}).set_index('colA')
def f1():
return df1.combine_first(df2).reset_index().fillna(0)
def f2():
return df1.reindex(lst, fill_value=0).reset_index()
%timeit f1()
%timeit f2()
2.35 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
784 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)