0

I have a concatenated pandas dataframe from 4 dataframes like this:

In [121]: all
Out[121]:
       E  H  N  S
   102P    Y  NaN  NaN  NaN
   103R    Y  NaN  NaN  NaN
   102P  NaN  NaN    Y  NaN
   103R  NaN  NaN    Y  NaN
   109F  NaN  NaN    Y  NaN
   103R  NaN    Y  NaN  NaN
   109F  NaN    Y  NaN  NaN
   102P  NaN  NaN  NaN    Y
   103R  NaN  NaN  NaN    Y
   109F  NaN  NaN  NaN    Y

I want to consolidate this into a dataframe like this:

        E   H   N   S
  102P  Y  NAN  Y   Y
  103R  Y   Y   Y   Y
  109F NAN  Y   Y   Y

How can I merge them based on all.index?

1 Answer 1

1

Do a groupby on the index (I presume from the data you posted that the values 102P... are in the index). And count the values. That will return a DataFrame with zeros and ones. Just replace them with appropriate values.

>>> ndf = df.groupby(level=0).count()
>>> ndf[ndf == 1] = 'Y'
>>> ndf[ndf == 0] = np.nan
>>> ndf
         E    H  N  S
label                
102P     Y  NaN  Y  Y
103R     Y    Y  Y  Y
109F   NaN    Y  Y  Y

If you have repetitions then just change the condition from ndf[ndf == 1] to ndf[ndf > 0].

But why are you concatenating the data frames instead of combining them? Example:

>>> df1
      E   H   N   S
0                  
102P  Y NaN NaN NaN
103R  Y NaN NaN NaN
>>> df2
       E   H  N   S
0                  
102P NaN NaN  Y NaN
103R NaN NaN  Y NaN
109F NaN NaN  Y NaN

...

>>> reduce(lambda first, second: first.combine_first(second),
           [df1, df2, df3, df4], pd.DataFrame())
        E    H  N  S
0                   
102P    Y  NaN  Y  Y
103R    Y    Y  Y  Y
109F  NaN    Y  Y  Y
Sign up to request clarification or add additional context in comments.

3 Comments

pd.merge takes only two dataframes at a time, I ended up with two sets of merged df's which I merged to get the final df. Is this the right way or am I missing something?
@UNagaswamy Updated the answer, added example. I'm sorry for misleading you, I was not clear enough when I said merge :( I was not thinking about the merge function but a merge as a general term...
You could use nested merge.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.