0

I have a dataframe with the following:

colA  colB
ABC   0.12
GHI   0.01

And a unique list for which I want to create a dataframe with:

ABC
DEF
GHI

The dataframe I need to create would have:

colA   colB
ABC    0.12
DEF    0.00
GHI    0.01

What would be the fastest way to populate my new dataframe (i.e. my intution would be to loop).

1
  • when asking question on StackOverflow kindly remember that if solved, you should accept the best answer by clicking the checkmark next to the solution. Thanks! Commented Oct 25, 2020 at 22:19

3 Answers 3

1

Try this:

df.set_index("colA").reindex(["ABC", "DEF", "GHI"], fill_value=0).reset_index()



   colA colB
0   ABC 0.12
1   DEF 0.00
2   GHI 0.01
Sign up to request clarification or add additional context in comments.

Comments

1

You could use .combine_first if you create a second dataframe from the list and use .set_index('colA') for both dataframes:

df1 = pd.DataFrame({'colA': {0: 'ABC', 1: 'GHI'}, 'colB': {0: 0.12, 1: 0.01}})
lst = ['ABC','DEF','GHI']
df2 = pd.DataFrame({'colA' : lst})
df3 = df1.set_index('colA').combine_first(df2.set_index('colA')).reset_index().fillna(0)
df3
Out[1]: 
  colA  colB
0  ABC  0.12
1  DEF  0.00
2  GHI  0.01

You could use .combine_first if you create a second dataframe from the list and use .set_index('colA') for both dataframes:

df1 = pd.DataFrame({'colA': {0: 'ABC', 1: 'GHI'}, 'colB': {0: 0.12, 1: 0.01}})
lst = ['ABC','DEF','GHI']
df2 = pd.DataFrame({'colA' : lst})
df3 = df1.set_index('colA').combine_first(df2.set_index('colA')).reset_index().fillna(0)
df3
Out[1]: 
  colA  colB
0  ABC  0.12
1  DEF  0.00
2  GHI  0.01

I was curious to see which method was faster between combine_first and reindex. Sammy's approach was faster at least for this dataframe.

df1 = pd.DataFrame({'colA': {0: 'ABC', 1: 'GHI'}, 'colB': {0: 0.12, 1: 0.01}}).set_index('colA')
lst = ['ABC','DEF','GHI']
df2 = pd.DataFrame({'colA' : lst}).set_index('colA')

def f1(): 
    return df1.combine_first(df2).reset_index().fillna(0)


def f2(): 
    return df1.reindex(lst, fill_value=0).reset_index()

%timeit f1()
%timeit f2()

2.35 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
784 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

1 Comment

Thank you for your advice regarding the timings.
0

Another way is to pd.Series the list, append to existing dataframe, and drop duplicates;

df.append(pd.DataFrame(l,columns=['colA'])).drop_duplicates(subset=['colA'], keep='first').fillna(0)

 colA  colB
0  ABC  0.12
1  DEF  0.01
2  GHI  0.00

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.