6

I have two dataframes, one with daily info starting in 1990 and one with daily info starting in 2000. Both dataframes contain information ending in 2016.

df1:

   Date       A     B     C 
1990-01-01   3.0  40.0  70.0  
1990-01-02  20.0  50.0  80.0  
1990-01-03  30.0  60.0  90.0  
1990-01-04   2.0   1.0   1.0 
1990-01-05   1.0   8.0   3.0  

df2:

   Date       A     B     C 
2000-01-01   NaN   NaN   NaN  
2000-01-02   5.0   NaN   NaN  
2000-01-03   1.0   NaN   5.0  
2000-01-04   2.0   4.0   8.0 
2000-01-05   1.0   3.0   4.0 

I need to compare columns in df1 and df2 which have the same name, which wouldn't usually be too complicated, but I need to compare them from the point at which there is data available in both dataframes for a given column (e.g from df2, 2000-01-02 in column 'A', 2000-01-04 in 'B'). I need to return True if they are the same from that point on and False if they are different. I have started by merging, which gives me:

df2.merge(df1, how = 'left', on = 'Date')


   Date      A.x   B.x   C.x   A.y   B.y   C.y   
2000-01-01   NaN   NaN   NaN   3.0   4.0   5.0
2000-01-02   5.0   NaN   NaN   5.0   9.0   2.0
2000-01-03   1.0   NaN   5.0   1.0   6.0   5.0
2000-01-04   2.0   4.0   8.0   2.0   4.0   1.0
2000-01-05   1.0   3.0   4.0   1.0   3.0   3.0

I have figured out how to find the common date, but am stuck as to how to do the same/different comparison. Can anyone help me compare the columns from the point at which there is a common value? A dictionary comes to mind as a useful output format, but wouldn't be essential:

comparison_dict = {
    "A" : True,
    "B" : True,
    "C" : False
}

Many thanks.

2
  • 1
    I don't get that output at all when I merge on Date. Also, it seems that none of the column values in df1 and df2 are equal anywhere. It looks like you copied the wrong df1 Commented Aug 31, 2018 at 16:35
  • apologies, yes, wrong data was copied Commented Sep 4, 2018 at 8:30

3 Answers 3

7

Assuming the Date column is the index.

  1. Stacking will drop nan by default
  2. Align with 'inner' logic
  3. Check equality
  4. Group and check all True

pd.Series.eq(*df1.stack().align(df2.stack(), 'inner')).groupby(level=1).all()

If Date is not the index

pd.Series.eq(
    *df1.set_index('Date').stack().align(
        df2.set_index('Date').stack(), 'inner'
    )
).groupby(level=1).all()
Sign up to request clarification or add additional context in comments.

Comments

6

Check with eq and isnull Data from user3483203

((df1.eq(df2))|df2.isnull()|df1.isnull()).all(0)
Out[22]: 
A     True
B     True
C    False
dtype: bool

2 Comments

Very nice, it looks like the check for isnull is faster than using fillna
what if df1 has the nulls? You'd want ((df1.eq(df2) | df2.isna() | df1.isna()).all(0)
4

Using fillna with eq

df2.fillna(df1).eq(df1).all(0)

A        True
B        True
C       False
dtype: bool

This works by filling in NaN values with valid values from df1, so they will always be equal where df2 is null (essentially the same as ignoring them). Next, we create a boolean mask comparing the two arrays:

df2.fillna(df1).eq(df1)

               A     B      C
2000-01-01  True  True   True
2000-01-02  True  True   True
2000-01-03  True  True   True
2000-01-04  True  True  False
2000-01-05  True  True  False

Finally, we assert that all the values for each column are True, in order for the columns to be considered equal.


Setup

It looks like you copied the wrong DataFrame for df1 based on your desired output and merge, so I derived it from your merge:

df1 = pd.DataFrame({'A': {'2000-01-01': 3.0, '2000-01-02': 5.0, '2000-01-03': 1.0, '2000-01-04': 2.0, '2000-01-05': 1.0}, 'B': {'2000-01-01': 4.0, '2000-01-02': 9.0, '2000-01-03': 6.0, '2000-01-04': 4.0, '2000-01-05': 3.0}, 'C': {'2000-01-01': 5.0, '2000-01-02': 2.0, '2000-01-03': 5.0, '2000-01-04': 1.0, '2000-01-05': 3.0}})

df2 = pd.DataFrame({'A': {'2000-01-01': np.nan, '2000-01-02': 5.0, '2000-01-03': 1.0, '2000-01-04': 2.0, '2000-01-05': 1.0}, 'B': {'2000-01-01': np.nan, '2000-01-02': np.nan, '2000-01-03': np.nan, '2000-01-04': 4.0, '2000-01-05': 3.0}, 'C': {'2000-01-01': np.nan, '2000-01-02': np.nan, '2000-01-03': 5.0, '2000-01-04': 8.0, '2000-01-05': 4.0}})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.