0

I need to replace values in one dataframe by the mutual date index of another dataframe. Here there are specific dates (the 5th through the 10th), where column B's values need to be substituted with those in dataframe2. I've looked at merges / joins / replace / etc, but cannot find out how to do this.

import pandas as pd
import numpy as np

list1 = [10,80,6,38,41,54,12,280,46,21,46,22]
list2 = [4,3,22,6,'NA','NA','NA','NA','NA','NA',452,13]
list3 = ['2016-01-01', '2016-01-02','2016-01-03','2016-01-04','2016-01-05','2016-01-06',
         '2016-01-07','2016-01-08','2016-01-09','2016-01-10','2016-01-11','2016-01-12',]

dat = pd.DataFrame({'A' : list1, 'B' : list2, 'Date' : list3}, columns = ['A', 'B', 'Date'])
dat['Date'] = pd.to_datetime(dat['Date'], format = '%Y-%m-%d')
dat = dat.set_index('Date')
print(dat)

Values 2016-01-05 to 2016-01-10 need to be replaced with values in the second dataframe:

              A    B
Date                
2016-01-01   10    4
2016-01-02   80    3
2016-01-03    6   22
2016-01-04   38    6
2016-01-05   41   NA
2016-01-06   54   NA
2016-01-07   12   NA
2016-01-08  280   NA
2016-01-09   46   NA
2016-01-10   21   NA
2016-01-11   46  452
2016-01-12   22   13

Here is the second dataframe, where these values need to be "mapped" into the first dataframe:

list4 = [78,15,16,79,71,90]
list5 = ['2016-01-05','2016-01-06','2016-01-07','2016-01-08','2016-01-09','2016-01-10']
dat2 = pd.DataFrame({'B' : list4, 'Date' : list5}, columns = ['B', 'Date'])
dat2['Date'] = pd.to_datetime(dat2['Date'], format = '%Y-%m-%d')
dat2 = dat2.set_index('Date')
print(dat2)

             B
Date          
2016-01-05  78
2016-01-06  15
2016-01-07  16
2016-01-08  79
2016-01-09  71
2016-01-10  90

The final output should look like:

              A    B
Date                
2016-01-01   10    4
2016-01-02   80    3
2016-01-03    6   22
2016-01-04   38    6
2016-01-05   41   78
2016-01-06   54   15
2016-01-07   12   16
2016-01-08  280   79
2016-01-09   46   71
2016-01-10   21   90
2016-01-11   46  452
2016-01-12   22   13

Any help would be greatly appreciated! Thank you.

0

2 Answers 2

1

You can update cells by their location (index and column) to precisely target what you update:

replace = [pd.to_datetime(d) for d in ['2016-01-05', '2016-01-10']
dat.loc[replace, 'B'] = dat2.loc[replace, 'B']

This ensures that you only touch the indices you expect, and only touch the columns you expect.

EDIT: Here is the documentation for that .loc method. I'd give it a look, it's a very versatile tool.

EDIT2: I saw you're actually replacing a slice of time, not just those two values locations. This can also be achieved with .loc:

start, end = pd.to_datetime('2016-01-05'), pd.to_datetime('2016-01-10')
dat.loc[start:end, 'B'] = dat2.loc[start:end, 'B']
Sign up to request clarification or add additional context in comments.

Comments

1

One way using combine_first

df1 = dat2.combine_first(dat)

print (df1)

            A    B
Date        
2016-01-01  10  4.0
2016-01-02  80  3.0
2016-01-03  6   22.0
2016-01-04  38  6.0
2016-01-05  41  78.0
2016-01-06  54  15.0
2016-01-07  12  16.0
2016-01-08  280 79.0
2016-01-09  46  71.0
2016-01-10  21  90.0
2016-01-11  46  452.0
2016-01-12  22  13.0

Or using DataFrame.update

dat.update(dat2)

Or you could also use .loc

dat.loc[dat2.index, 'B'] = dat2.loc[:, 'B'] 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.