0

I am trying to replace some missing and incorrect values in my master dataset by filling it in with correct values from two different datasets.

I created a miniature version of the full dataset like so (note the real dataset is several thousand rows long):

import pandas as pd

data = {'From':['GA0251','GA5201','GA5551','GA510A','GA5171','GA5151'],
        'To':['GA0201_T','GA5151_T','GA5151_R','GA5151_V','GA5151_P','GA5171_B'],
        'From_Latitude':[55.86630869,0,55.85508787,55.85594626,55.85692217,55.85669934],
        'From_Longitude':[-4.27138731,0,-4.24126866,-4.24446585,-4.24516129,-4.24358251,],
        'To_Latitude':[55.86614756,0,55.85522197,55.85593762,55.85693878,0],
        'To_Longitude':[-4.271040979,0,-4.241466534,-4.244607602,-4.244905037,0]}
 
dataset_to_correct = pd.DataFrame(data)

However, some values in the From lat/long and the To lat/long are incorrect. I have two tables like the one below for each of From and To, which I would like to substitute into the table in place of the two values for that row.

Table of Corrected From lat/long:

data = {'Site':['GA5151_T','GA5171_B'],
        'Correct_Latitude':[55.85952791,55.87044558],
        'Correct_Longitude':[55.85661767,-4.24358251,]}
        
correct_to_coords = pd.DataFrame(data)

I would like to match this table to the From column and then replace the From_Latitude and From_Longitude with the correct values.

Table of Corrected To lat/long:

data = {'Site':['GA5201','GA0251'],
        'Correct_Latitude':[55.857577,55.86616756],
        'Correct_Longitude':[-4.242770,-4.272140979]}

correct_from_coords = pd.DataFrame(data)

I would like to match this table to the To column and then replace the To_Latitude and To_Longitude with the correct values.

Is there a way to match the site in each table to the corresponding From or To column and then replace only the values in the respective columns?

I have tried using code from this answer (Elegant way to replace values in pandas.DataFrame from another DataFrame) but it seems to have no effect on the database.

(correct_to_coords.set_index('Site').rename(columns = {'Correct_Latitude':'To_Latitude'})                        .combine_first(dataset_to_correct.set_index('To')))
1
  • hi! Is any one of the answers below working? If so & if you wish, you might consider accepting one of them to signal others that the issue is resolved. If not, you can provide feedback so they can be improved (or removed altogether) Commented Aug 14, 2021 at 6:06

3 Answers 3

1

@zswqa 's answer produces right result, @Anurag Dabas 's doesn't.

Another possible solution, It is a bit faster than merge method suggested above, although both are correct.

dataset_to_correct.set_index("To",inplace=True)
correct_to_coords.set_index("Site",inplace=True)
dataset_to_correct.loc[correct_to_coords.index, "To_Latitude"] = correct_to_coords["Correct_Latitude"]
dataset_to_correct.loc[correct_to_coords.index, "To_Longitude"] = correct_to_coords["Correct_Longitude"]
dataset_to_correct.reset_index(inplace=True)

dataset_to_correct.set_index("From",inplace=True)
correct_from_coords.set_index("Site",inplace=True)
dataset_to_correct.loc[correct_from_coords.index, "From_Latitude"] = correct_from_coords["Correct_Latitude"]
dataset_to_correct.loc[correct_from_coords.index, "From_Longitude"] = correct_from_coords["Correct_Longitude"]
dataset_to_correct.reset_index(inplace=True)

Sign up to request clarification or add additional context in comments.

Comments

1
merge = dataset_to_correct.merge(correct_to_coords, left_on='To', right_on='Site', how='left')

merge.loc[(merge.To == merge.Site), 'To_Latitude'] = merge.Correct_Latitude
merge.loc[(merge.To == merge.Site), 'To_Longitude'] = merge.Correct_Longitude

# del merge['Site']
# del merge['Correct_Latitude']
# del merge['Correct_Longitude']
merge = merge.drop(columns = ['Site','Correct_Latitude','Correct_Longitude'])

merge = merge.merge(correct_from_coords, left_on='From', right_on='Site', how='left')

merge.loc[(merge.From == merge.Site), 'From_Latitude'] = merge.Correct_Latitude
merge.loc[(merge.From == merge.Site), 'From_Longitude'] = merge.Correct_Longitude

# del merge['Site']
# del merge['Correct_Latitude']
# del merge['Correct_Longitude']
merge = merge.drop(columns = ['Site','Correct_Latitude','Correct_Longitude'])

merge

3 Comments

Before added answer checked in REPL - worked.
If I am not wrong then you are updating the opposite columns
@AnuragDabas Try and check in REPL.
0

lets try dual merge by merge()+pop()+fillna()+drop():

dataset_to_correct=dataset_to_correct.merge(correct_to_coords,left_on='To',right_on='Site',how='left').drop('Site',1)
dataset_to_correct['From_Latitude']=dataset_to_correct.pop('Correct_Latitude').fillna(dataset_to_correct['From_Latitude'])
dataset_to_correct['From_Longitude']=dataset_to_correct.pop('Correct_Longitude').fillna(dataset_to_correct['From_Longitude'])
dataset_to_correct=dataset_to_correct.merge(correct_from_coords,left_on='From',right_on='Site',how='left').drop('Site',1)
dataset_to_correct['To_Latitude']=dataset_to_correct.pop('Correct_Latitude').fillna(dataset_to_correct['To_Latitude'])
dataset_to_correct['To_Longitude']=dataset_to_correct.pop('Correct_Longitude').fillna(dataset_to_correct['To_Longitude'])

3 Comments

Zero in columns To_Latitude and To_Longitude because uses From_X and nowhere To_X
@zswqa thanks that was a typo corrected it thanks :)
@MySlav bruh ¯\_(ツ)_/¯

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.