pandas dataframe indexing filtering

Question

I have two dataframes in the same time resolution. From the first dataframe (in my case: df_data1) I only want to have all values ['A'] where ['B'] is < 90. And now I'd like filter my secound dataframe so that I have only the values with the same timestamp(timeindex) from my first dataframe

df_data1 = pd.io.parsers.read_csv(station_path, skiprows=0, index_col=0, na_values=[-999], names= names_header , sep=';', header=None , squeeze=True)

date     A  B
16.08.2013 03:00     -1  97
16.08.2013 03:15     -1  95
16.08.2013 03:30     0   92
16.08.2013 03:45     4  90
16.08.2013 04:00     18 88
16.08.2013 04:15     42 86
16.08.2013 04:30 73 83
16.08.2013 04:45     110    81
16.08.2013 05:00    151 78

Now I'd like to have all df_data['A'] where df_data['B'] is <90. So I do:

df_data = df_data[(df_data['B']  < 90)]

the second dataframe looks like:

df_data2 = pd.io.parsers.read_csv(station_path, skiprows=1, sep=";",  index_col=False, header=None)

date    w   x   y   z
16.08.2013 03:00    0   0   0   0
16.08.2013 03:15    0   0   0   0
16.08.2013 03:30    0   0   0   0
16.08.2013 03:45    0   0   0   0
16.08.2013 04:00    0   0   0   0
16.08.2013 04:15    0   0   0   0
16.08.2013 04:30    47  47  48  0
16.08.2013 04:45    77  78  79  88
16.08.2013 05:00    111 112 113 125

Have anyone an idea to solve this? I need the dataframes in the same shape cause furthermore I'd like to calculate the np.corrcoef and so on.

EdChum · Accepted Answer · 2015-03-25 11:06:57Z

2

Well your first part is pretty much done:

df_data = df_data[(df_data['B']  < 90)]

you can then access column A using df_data['A']

if your index values are the same in both df then this should work:

In [40]:

df1.loc[df_data.index]
Out[40]:
                       w    x    y   z
date                                  
2013-08-16 04:00:00    0    0    0   0
2013-08-16 04:15:00    0    0    0   0
2013-08-16 04:30:00   47   47   48   0
2013-08-16 04:45:00   77   78   79  88
2013-08-16 05:00:00  111  112  125 NaN

EDIT

Unclear why you get a KeyError but you can use the following also:

df_data2[df_data2.index.isin(df_data1.index)]

This will handle any index values that are not present in your second df.

edited Mar 25, 2015 at 11:06

answered Mar 25, 2015 at 8:58

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

steff Over a year ago

Thx for answering! I got this error: File "C:\WinPython-64bit-2.7.9.3\python-2.7.9.amd64\lib\site-packages\pandas\core\indexing.py", line 1283, in _has_valid_type (key, self.obj._get_axis_name(axis)))

EdChum Over a year ago

Are you using my answer verbatim or doing this: df_data2.loc[df_data.index]?

steff Over a year ago

df_data1 is from Type TimeSeries and df_data2 is from Type DataFrame is this a problem?

EdChum Over a year ago

It should still work, can you post raw input data and code to reproduce your problem, an alternative is df_data2[df_data2.index.isin(df_data1.index)]

steff · Accepted Answer · 2015-03-25 11:06:18Z

1

to complete this:

with the first approach I got an error

but with the following expression it works well:

df_data2[df_data2.index.isin(df_data1.index)]

answered Mar 25, 2015 at 11:06

steff

577 bronze badges

Collectives™ on Stack Overflow

pandas dataframe indexing filtering

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related