0

I have the some data that looks like this:

City         1994 1995  1996  1997                           
Rome            X  NaN   NaN   NaN
New York      NaN  NaN   NaN   NaN
Paris         NaN    X   NaN   NaN
Sidney        NaN  NaN   NaN   NaN

The crosses represent that an event happened in that city in that specific year. I want to add new years to my data frame, where the data looks like this:

City         1998                           
Rome            X
Sidney          X
London          X
Madrid          X

Basically the new data indicates where events happened in a new different year (so it'll always have 'X'). The result I'm looking for should look like this:

City         1994 1995  1996  1997   1998                          
Rome            X  NaN   NaN   NaN      X
New York      NaN  NaN   NaN   NaN    NaN
Paris         NaN    X   NaN   NaN    NaN
Sidney        NaN  NaN   NaN   NaN      X
London        NaN  NaN   NaN   NaN      X
Madrid        NaN  NaN   NaN   NaN      X

So it should add the new non existing cities to the city list and put NaN for the already present years for that city and also put an X in the newly added year for the cities where the event occurred.

I tried using df.append() but I get repeaded values

City         1994 1995  1996  1997   1998                           
Rome            X  NaN   NaN   NaN    NaN
New York      NaN  NaN   NaN   NaN    NaN
Paris         NaN    X   NaN   NaN    NaN
Sidney        NaN  NaN   NaN   NaN    NaN
Rome          NaN  NaN   NaN   NaN      X
Sidney        NaN  NaN   NaN   NaN      X
London        NaN  NaN   NaN   NaN      X
Madrid        NaN  NaN   NaN   NaN      X

I also tried using df.merge() but I didn't get any better results. Any suggestions on how to do it using pandas? I can write some code to manually does it but it defeats the purpose of using such a powerful tool.

1 Answer 1

1

You can use df.merge() to achieve this.

import pandas as pd

data = [['Rome','X',  None,   None,   None],['New York' ,None,  None,   None,   None], ['Paris',None,  'X',   None,   None]]
df = pd.DataFrame(data, columns = ['City','1994', '1995',  '1996' , '1997' ])

df.head()

       City  1994  1995  1996  1997
0      Rome     X  None  None  None
1  New York  None  None  None  None
2     Paris  None     X  None  None

df2 = pd.DataFrame([['Rome', 'X', ],['Sydney','X'], ['London','X']], columns = ['City',         '1998'])

     City 1998
0    Rome    X
1  Sydney    X
2  London    X

Then you can merge the two DataFrame

df.merge(df2, how='outer')

       City  1994  1995  1996  1997 1998
0      Rome     X  None  None  None    X
1  New York  None  None  None  None  NaN
2     Paris  None     X  None  None  NaN
3    Sydney   NaN   NaN   NaN   NaN    X
4    London   NaN   NaN   NaN   NaN    X

You need to specify outer as type of merge to be performed.

outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.

Sign up to request clarification or add additional context in comments.

3 Comments

@MR I have updated the answer, can you check? If that is giving you desired output, you can mark is as accepted answer.
That does the trick, thank you very much!
This is a good answer. Just to add a little bit more context to it. merge is a versatile option to work with pandas, so if both dataframes have the same index/structure you could merge both dataframes based on the same reference index/column (like a left join). This is also an important remark. (external reference ) @lrfanuddin

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.