1

I have two dataframes in the same time resolution. From the first dataframe (in my case: df_data1) I only want to have all values ['A'] where ['B'] is < 90. And now I'd like filter my secound dataframe so that I have only the values with the same timestamp(timeindex) from my first dataframe

df_data1 = pd.io.parsers.read_csv(station_path, skiprows=0, index_col=0, na_values=[-999], names= names_header , sep=';', header=None , squeeze=True)

date     A  B
16.08.2013 03:00     -1  97
16.08.2013 03:15     -1  95
16.08.2013 03:30     0   92
16.08.2013 03:45     4  90
16.08.2013 04:00     18 88
16.08.2013 04:15     42 86
16.08.2013 04:30 73 83
16.08.2013 04:45     110    81
16.08.2013 05:00    151 78

Now I'd like to have all df_data['A'] where df_data['B'] is <90. So I do:

df_data = df_data[(df_data['B']  < 90)]

the second dataframe looks like:

df_data2 = pd.io.parsers.read_csv(station_path, skiprows=1, sep=";",  index_col=False, header=None)

date    w   x   y   z
16.08.2013 03:00    0   0   0   0
16.08.2013 03:15    0   0   0   0
16.08.2013 03:30    0   0   0   0
16.08.2013 03:45    0   0   0   0
16.08.2013 04:00    0   0   0   0
16.08.2013 04:15    0   0   0   0
16.08.2013 04:30    47  47  48  0
16.08.2013 04:45    77  78  79  88
16.08.2013 05:00    111 112 113 125

Have anyone an idea to solve this? I need the dataframes in the same shape cause furthermore I'd like to calculate the np.corrcoef and so on.

2 Answers 2

2

Well your first part is pretty much done:

df_data = df_data[(df_data['B']  < 90)]

you can then access column A using df_data['A']

if your index values are the same in both df then this should work:

In [40]:

df1.loc[df_data.index]
Out[40]:
                       w    x    y   z
date                                  
2013-08-16 04:00:00    0    0    0   0
2013-08-16 04:15:00    0    0    0   0
2013-08-16 04:30:00   47   47   48   0
2013-08-16 04:45:00   77   78   79  88
2013-08-16 05:00:00  111  112  125 NaN

EDIT

Unclear why you get a KeyError but you can use the following also:

df_data2[df_data2.index.isin(df_data1.index)]

This will handle any index values that are not present in your second df.

Sign up to request clarification or add additional context in comments.

4 Comments

Thx for answering! I got this error: File "C:\WinPython-64bit-2.7.9.3\python-2.7.9.amd64\lib\site-packages\pandas\core\indexing.py", line 1283, in _has_valid_type (key, self.obj._get_axis_name(axis)))
Are you using my answer verbatim or doing this: df_data2.loc[df_data.index]?
df_data1 is from Type TimeSeries and df_data2 is from Type DataFrame is this a problem?
It should still work, can you post raw input data and code to reproduce your problem, an alternative is df_data2[df_data2.index.isin(df_data1.index)]
1

to complete this:

  • with the first approach I got an error

but with the following expression it works well:

df_data2[df_data2.index.isin(df_data1.index)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.