0

I want to update the values in a GeoPanda dataframe from another GeoPanda dataframe for select columns. Both of them will have a common key called 'geometry.'

For example

df1 = pd.DataFrame([["X",1,1,0],
              ["Y",0,1,0],
              ["Z",0,0,0],
              ["Y",0,0,0]],columns=["geometry","Nonprofit","Business", "Education"])    

df2 = pd.DataFrame([["Y",1,1],
              ["Z",1,1]],columns=["geometry","Non", "Edu"])  

enter image description here

Following this answer I did the following steps:

df1 = df1.set_index('geometry')
df2 = df2.set_index('geometry')

list_1 = ['Nonprofit', 'Education']
list_2 = ['Non', 'Edu']

df1[list_1].update(df2[list_2])

This results in the wrong results without any warning. How can I fix this?

enter image description here

Notes:

Updating one column at a time (df1['Nonprofit'].update(df2['Non'])) will produce the correct result.

geometry Linestring from GeoPandas replaced by a character for simplicity.

4
  • Check your pandas version Commented Apr 22, 2020 at 22:08
  • I'm using pandas version '1.0.3'. Thanks Commented Apr 22, 2020 at 22:11
  • Then it is because you does not use current version of pandas. The answer in the link says warning occurs when using current version of pandas. Commented Apr 22, 2020 at 22:31
  • I think the issue is related to multiple columns labels passed in a list. When I used df1['Nonprofit'].update(df2['Non']) I got the correct answer. I am have issues when I pass the list for column names in df1[list_1].update(df2[list_2]). Thanks Commented Apr 22, 2020 at 23:02

1 Answer 1

3

DataFrame.update only updates columns with the same name!

Accordingly, one solution would be to first rename the columns in df2 to match those in df1.

Note that when calling update(), there is no need to specify the target columns in df1: all common columns will be updated. If required, you can specify which columns you want from df2 by using column indexing.

df2 = df2.rename(columns={'Non': 'Nonprofit', 'Edu': 'Education'})
df1.update(df2)  

# optionally restrict columns:
# df1.update(df2['Nonprofit'])  

# alternative short version, leaving df2 untouched
df1.update(df2.rename(columns={'Non': 'Nonprofit', 'Edu': 'Education'})) 

gives

          Nonprofit  Business  Education
geometry                                
X               1.0         1        0.0
Y               1.0         1        1.0
Z               1.0         0        1.0
Y               1.0         0        1.0

The reason your "single column" approach works is that there you're implicitly using Series.update, where there is no such concept as common columns.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.