pandas two dataframes multiple column values comparison

Question

I have a few very large datasets with x, y and z values. These datasets represent interpolated height measurements in time. The first dataset (the original) contains the data for the entire area. In time parts of the area have been measured again. I want to overwrite the original dataset in the location where x and y are equal but z is different (the height has changed at location(x,y)).

So my dataframes look something like this

Original:

x    y    z
1    1    0.5
1    2    0.5
1    3    0.5
2    1    0.5
2    2    0.5
2    3    0.5
3    1    0.5
3    2    0.5
3    3    0.5

New measurement:

x    y    z
0    1    0.5
0    2    0.5
1    1    1.5
1    2    0.5
2    1    0.5
2    2    1.0

The final dataframe should look like this:

x    y    z
1    1    1.5
1    2    0.5
1    3    0.5
2    1    0.5
2    2    1.0
2    3    0.5
3    1    0.5
3    2    0.5
3    3    0.5

I can loop through all the measurements and see of the x and y occur in the original and if the z value is different (if so, replace it) but this takes forever and I can imagine that there must be a better way using pandas. How would I do this in a fast and efficient way?

Correct, just changed it.

Yorian
– Yorian

2017-09-09 07:54:20 +00:00
Commented Sep 9, 2017 at 7:54 — Yorian
– Yorian, Commented Sep 9, 2017 at 7:54

Alexander · Accepted Answer · 2017-09-09 08:09:08Z

3

Given that 'Original' is df1 and 'New Measurement' is df2:

df3 = df1.set_index(['x', 'y'])
df3.update(df2.set_index(['x', 'y']))  # Inplace modificatioin on df3.
>>> df3.reset_index()
   x  y    z
0  1  1  1.5
1  1  2  0.5
2  1  3  0.5
3  2  1  0.5
4  2  2  1.0
5  2  3  0.5
6  3  1  0.5
7  3  2  0.5
8  3  3  0.5

answered Sep 9, 2017 at 8:09

Alexander

111k32 gold badges212 silver badges208 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Yorian Over a year ago

Very nice solution!

Zero · Accepted Answer · 2017-09-09 07:55:15Z

1

You can use

merge on df1 and df2 with x, y keys
assign new column z with fillna using z_x, z_y
drop these unwanted columns

In [716]: (df1.merge(df2, on=['x', 'y'], how='left')
              .assign(z=lambda x: x.z_y.fillna(x.z_x))
              .drop(['z_x', 'z_y'], 1))
Out[716]:
   x  y    z
0  1  1  1.5
1  1  2  0.5
2  1  3  0.5
3  2  1  0.5
4  2  2  1.0
5  2  3  0.5
6  3  1  0.5
7  3  2  0.5
8  3  3  0.5

Details

In [717]: df1.merge(df2, on=['x', 'y'], how='left')
Out[717]:
   x  y  z_x  z_y
0  1  1  0.5  1.5
1  1  2  0.5  0.5
2  1  3  0.5  NaN
3  2  1  0.5  0.5
4  2  2  0.5  1.0
5  2  3  0.5  NaN
6  3  1  0.5  NaN
7  3  2  0.5  NaN
8  3  3  0.5  NaN

answered Sep 9, 2017 at 7:55

Zero

77.4k22 gold badges153 silver badges153 bronze badges

1 Comment

piRSquared Over a year ago

Could I get you to vote your conscience on my post? Thanks stackoverflow.com/a/46192213/2336654

chrisckwong821 · Accepted Answer · 2017-09-09 08:09:35Z

-1

original[(original.x == new.x) | (original.y == new.y)].z = new.z

answered Sep 9, 2017 at 8:09

chrisckwong821

1,17312 silver badges25 bronze badges

1 Comment

Yorian Over a year ago

This doesn't work, it throws an error (besides the | needing to be an &)

Collectives™ on Stack Overflow

pandas two dataframes multiple column values comparison

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related