0

I'm trying to create a new pandas.DataFrame from another pandas.DataFrame based on a unique multiple column index. I'm able to create a pandas.core.index.MultiIndex using df.index.drop_duplicates() with the correct results, but I can't figure out how to convert it to a pandas.DataFrame.

The following script creates the original DataFrame using a SQL Query.

import sqlite3 as db
import pandas as pd

conn = db.connect('C:/data.db')
query = """SELECT TimeStamp, UnderlyingSymbol, Expiry, Strike, CP, BisectIV, OTMperc FROM ActiveOptions
           WHERE TimeStamp = '2015-11-09 16:00:00' AND UnderlyingSymbol = 'INTC' AND
           Expiry < '2015-11-27 16:00:00' AND OTMperc < .02  AND OTMperc > -.02
           ORDER BY UnderlyingSymbol, Expiry, ABS(OTMperc)"""

df = pd.read_sql_query(sql=query, con=conn,index_col=['TimeStamp', 'UnderlyingSymbol', 'Expiry'],
                       parse_dates=['TimeStamp', 'Expiry'])

The script creates the following DataFrame:

In[6]: df
Out[6]: 
                                                          Strike  CP  BisectIV  OTMperc
TimeStamp           UnderlyingSymbol Expiry                                            
2015-11-09 16:00:00 INTC             2015-11-13 16:00:00    33.5  -1    0.2302  -0.0045
                                     2015-11-13 16:00:00    33.5   1    0.2257   0.0045
                                     2015-11-13 16:00:00    33.0  -1    0.2442   0.0105
                                     2015-11-13 16:00:00    33.0   1    0.2426  -0.0106
                                     2015-11-13 16:00:00    34.0   1    0.2240   0.0191
                                     2015-11-13 16:00:00    34.0  -1    0.2295  -0.0195

                                     2015-11-20 16:00:00    33.5   1    0.2817   0.0045
                                     2015-11-20 16:00:00    33.5  -1    0.2840  -0.0045
                                     2015-11-20 16:00:00    33.0  -1    0.2935   0.0105
                                     2015-11-20 16:00:00    33.0   1    0.2914  -0.0106
                                     2015-11-20 16:00:00    34.0   1    0.2718   0.0191
                                     2015-11-20 16:00:00    34.0  -1    0.2784  -0.0195

Creating a new DataFrame with a unique multiple column index generates the following output:

In[10]: new_df = df.index.drop_duplicates()
In[11]: new_df
Out[11]: 
MultiIndex(levels=[[2015-11-09 16:00:00], [u'INTC'], [2015-11-13 16:00:00, 2015-11-20 16:00:00]],
           labels=[[0, 0], [0, 0], [0, 1]],
           names=[u'TimeStamp', u'UnderlyingSymbol', u'Expiry'])

In[12]: type(new_df)
Out[12]: pandas.core.index.MultiIndex

Any ideas?

1 Answer 1

1

The problem is that you set new_df to the index list with the duplicates removed:

new_df = df.index.drop_duplicates()

What you want is to select only the rows which do not have duplicate indices. You can use the duplicated function to filter your old data frame:

new_df = df[~df.index.duplicated()]

A small example, based on this:

#create data sample with multi index
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'one', 'one', 'two', 'one', 'two', 'one', 'one']]
#(the first and last are duplicates)
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(8), index=index)

The original data:

>>> s
first  second
bar    one      -0.932521
       one       1.969771
baz    one       1.574908
       two       0.125159
foo    one      -0.075174
       two       0.777039
qux    one      -0.992862
       one      -1.099260
dtype: float64

And filtered for duplicates:

>>> s[~s.index.duplicated()]
first  second
bar    one      -0.932521
baz    one       1.574908
       two       0.125159
foo    one      -0.075174
       two       0.777039
qux    one      -0.992862
dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.