0

I have two dataframes. "df" is my original dataframe with 100000+ values and "df_result" is another that contains only certain columns with certain indexes of df. I have changed the values in "df_result" columns and want to apply back to my original dataframe "df". I have mapped the column names and index of "df_index" to match the right index of "df" but it does not contain every index of "df". (ex, df.index() output is [0,1,2,.....,92808,92809] and df_result.index() output is [23429,23430,32349,42099,45232,.....,91324,91423]). Is there efficient way to put every value in "df_result" to the original "df" which is corespond to same index and columns?. Thank you!

1
  • Can you add data samples and expected output? Commented Dec 20, 2017 at 8:46

2 Answers 2

1

You can use combine_first:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df)
   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  c  4  9  5  6  a
3  d  5  4  7  9  b
4  e  5  2  1  2  b
5  f  4  3  0  4  b

df_result = pd.DataFrame({'A':list('abc'),
                   'B':[4,5,4],
                   'C':[7,9,3],
                   'D':[5,7,1],
                   'E':[5,3,6],
                   'F':list('klo')}, index=[2,4,5])

print (df_result)
   A  B  C  D  E  F
2  a  4  7  5  5  k
4  b  5  9  7  3  l
5  c  4  3  1  6  o

df = df_result.combine_first(df)
print (df)
   A    B    C    D    E  F
0  a  4.0  7.0  1.0  5.0  a
1  b  5.0  8.0  3.0  3.0  a
2  a  4.0  7.0  5.0  5.0  k
3  d  5.0  4.0  7.0  9.0  b
4  b  5.0  9.0  7.0  3.0  l
5  c  4.0  3.0  1.0  6.0  o

Another solution wotking with NaNs too is join DataFrames and remove duplicates rows by indices:

df = df_result.append(df)
df = df[~df.index.duplicated()].sort_index()
print (df)

   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  a  4  7  5  5  k
3  d  5  4  7  9  b
4  b  5  9  7  3  l
5  c  4  3  1  6  o

EDIT:

does this work with np.nan values also? and if df have more columns other then df_result?

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[np.nan,4,8,9,4,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df)
   A  B    C  D  E  F
0  a  4  NaN  1  5  a
1  b  5  4.0  3  3  a
2  c  4  8.0  5  6  a
3  d  5  9.0  7  9  b
4  e  5  4.0  1  2  b
5  f  4  3.0  0  4  b

df_result = pd.DataFrame({'A':list('abc'),
                   'B':[np.nan,50,40],
                   'E':[50,30,60],
                   'F':list('klo')}, index=[2,4,5])

print (df_result)
   A     B   E  F
2  a   NaN  50  k
4  b  50.0  30  l
5  c  40.0  60  o

You can set df by indices and columns names with loc:

df.loc[df_result.index, df_result.columns] = df_result
print (df)
   A     B    C  D   E  F
0  a   4.0  NaN  1   5  a
1  b   5.0  4.0  3   3  a
2  a   NaN  8.0  5  50  k
3  d   5.0  9.0  7   9  b
4  b  50.0  4.0  1  30  l
5  c  40.0  3.0  0  60  o
Sign up to request clarification or add additional context in comments.

1 Comment

does this work with np.nan values also? and if df have more columns other then df_result? Thanks for the answer
0

This function should work if you don't have any NA:

df = df.update(df_result)

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.