1

I have two data frames that are both multi-indexed on 'Date' and 'name', and want to do a SQL style JOIN to combine them. I've tried

pd.merge(df1.reset_index(), df2.reset_index(), on=['name', 'Date'], how='inner')

which then results in an empty DataFrame.

If I inspect the data frames I can see that the index of one is represented as '2015-01-01' and the other is represented as '2015-01-01 00:00:00' which explains my issues with joining.

Is there a way to 'recast' the index to a specific format within pandas?

I've included the tables to see what data I'm working with.

df1=
+-------------+------+------+------+
|    Date     | name | col1 | col2 |
+-------------+------+------+------+
| 2015-01-01  | mary |   12 |  123 |
| 2015-01-02  | mary |   23 |   33 |
| 2015-01-03  | mary |   34 |   45 |
| 2015-01-01  | john |   65 |   76 |
| 2015-01-02  | john |   67 |   78 |
| 2015-01-03  | john |   25 |   86 |
+-------------+------+------+------+

df2= 
+------------+------+-------+-------+
|    Date    | name | col3  | col4  |
+------------+------+-------+-------+
| 2015-01-01 | mary | 80809 | 09885 |
| 2015-01-02 | mary | 53879 | 58972 |
| 2015-01-03 | mary | 23887 |  3908 |
| 2015-01-01 | john |  9238 |  2348 |
| 2015-01-02 | john |   234 |   234 |
| 2015-01-03 | john |  5325 |  6436 |
+------------+------+-------+-------+

DESIRED Result:

+-------------+------+------+-------+-------+-------+
|    Date     | name | col1 | col2  | col3  | col4  |
+-------------+------+------+-------+-------+-------+
| 2015-01-01  | mary |   12 |   123 | 80809 | 09885 |
| 2015-01-02  | mary |   23 |    33 | 53879 | 58972 |
| 2015-01-03  | mary |   34 |    45 | 23887 |  3908 |
| 2015-01-01  | john |   65 |    76 |  9238 |  2348 |
| 2015-01-02  | john |   67 |    78 |   234 |   234 |
| 2015-01-03  | john |   25 |    86 |  5325 |  6436 |
+-------------+------+------+-------+-------+-------+
3
  • Sorry are your 'Date' dtypes datetimes? Commented Jun 25, 2015 at 13:29
  • It's multi - indexed, but when I print a concatenated table it shows some dates as '2015-06-09 00:00:00' and u'2013-04-17' so I think some are strings and others are datetimes. Can I convert them? I tried df.index = df.index.date but I'm having difficulty access the single component of the multi index. Commented Jun 25, 2015 at 13:47
  • After resetting the index, convert the dtype using df['date'] = pd.to_datetime(df['date']) you should then be able to merge Commented Jun 25, 2015 at 13:49

1 Answer 1

2

The reason you cannot join is because you have different dtypes on the indicies. Pandas silently fails if the indicies have different dtypes.

You can easily change your indicies from string representations of time to proper pandas datetimes like this:

df = pd.DataFrame({"data":range(1,30)}, index=['2015-04-{}'.format(d) for d in range(1,30)])

df.index.dtype
dtype('O')

df.index = df.index.to_series().apply(pd.to_datetime)

df.index.dtype
dtype('<M8[ns]')

Now you can merge the dataframes on their index:

pd.merge(left=df, left_index=True,
         right=df2, right_index=True)

Assuming you have a df2, which my example is omitting...

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.