2

I have a few very large datasets with x, y and z values. These datasets represent interpolated height measurements in time. The first dataset (the original) contains the data for the entire area. In time parts of the area have been measured again. I want to overwrite the original dataset in the location where x and y are equal but z is different (the height has changed at location(x,y)).

So my dataframes look something like this

Original:

x    y    z
1    1    0.5
1    2    0.5
1    3    0.5
2    1    0.5
2    2    0.5
2    3    0.5
3    1    0.5
3    2    0.5
3    3    0.5

New measurement:

x    y    z
0    1    0.5
0    2    0.5
1    1    1.5
1    2    0.5
2    1    0.5
2    2    1.0

The final dataframe should look like this:

x    y    z
1    1    1.5
1    2    0.5
1    3    0.5
2    1    0.5
2    2    1.0
2    3    0.5
3    1    0.5
3    2    0.5
3    3    0.5

I can loop through all the measurements and see of the x and y occur in the original and if the z value is different (if so, replace it) but this takes forever and I can imagine that there must be a better way using pandas. How would I do this in a fast and efficient way?

1
  • Correct, just changed it. Commented Sep 9, 2017 at 7:54

3 Answers 3

3

Given that 'Original' is df1 and 'New Measurement' is df2:

df3 = df1.set_index(['x', 'y'])
df3.update(df2.set_index(['x', 'y']))  # Inplace modificatioin on df3.
>>> df3.reset_index()
   x  y    z
0  1  1  1.5
1  1  2  0.5
2  1  3  0.5
3  2  1  0.5
4  2  2  1.0
5  2  3  0.5
6  3  1  0.5
7  3  2  0.5
8  3  3  0.5
Sign up to request clarification or add additional context in comments.

1 Comment

Very nice solution!
1

You can use

merge on df1 and df2 with x, y keys
assign new column z with fillna using z_x, z_y
drop these unwanted columns

In [716]: (df1.merge(df2, on=['x', 'y'], how='left')
              .assign(z=lambda x: x.z_y.fillna(x.z_x))
              .drop(['z_x', 'z_y'], 1))
Out[716]:
   x  y    z
0  1  1  1.5
1  1  2  0.5
2  1  3  0.5
3  2  1  0.5
4  2  2  1.0
5  2  3  0.5
6  3  1  0.5
7  3  2  0.5
8  3  3  0.5

Details

In [717]: df1.merge(df2, on=['x', 'y'], how='left')
Out[717]:
   x  y  z_x  z_y
0  1  1  0.5  1.5
1  1  2  0.5  0.5
2  1  3  0.5  NaN
3  2  1  0.5  0.5
4  2  2  0.5  1.0
5  2  3  0.5  NaN
6  3  1  0.5  NaN
7  3  2  0.5  NaN
8  3  3  0.5  NaN

1 Comment

Could I get you to vote your conscience on my post? Thanks stackoverflow.com/a/46192213/2336654
-1
original[(original.x == new.x) | (original.y == new.y)].z = new.z

1 Comment

This doesn't work, it throws an error (besides the | needing to be an &)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.