0

I have a dataframe which includes some "invalid" rows, which I would like to remove. I have a second dataframe which contains these invalid rows.

the invalid rows are =

DatetimeIndex(['2019-11-11', '2019-12-06', '2019-12-13', '2019-12-15',
           '2019-12-17', '2019-12-18', '2019-12-19', '2019-12-31',
           '2020-01-01', '2020-01-02', '2020-01-03', '2020-01-10',
           '2020-01-15', '2020-01-17', '2020-01-22', '2020-02-05',
           '2020-02-07', '2020-02-09', '2020-02-10', '2020-02-12',
           '2020-02-14', '2020-02-19', '2020-02-20', '2020-02-21',
           '2020-02-25', '2020-02-26', '2020-02-28', '2020-03-02',
           '2020-03-04', '2020-03-06', '2020-03-11', '2020-03-12',
           '2020-03-15', '2020-03-22', '2020-03-29', '2020-04-04',
           '2020-04-11', '2020-04-13', '2020-05-13', '2020-05-23',
           '2020-05-29', '2020-05-30', '2020-06-12', '2020-06-15',
           '2020-06-19', '2020-06-24', '2020-06-26', '2020-07-09',
           '2020-07-10', '2020-07-11', '2020-07-12', '2020-07-16',
           '2020-07-17', '2020-07-18', '2020-07-20', '2020-07-23',
           '2020-07-24', '2020-07-26'],
          dtype='datetime64[ns]', name='dateTime', freq=None)

I want to removes these rows (dates) from:

DatetimeIndex(['2019-11-11 11:00:00', '2019-11-11 12:00:00',
           '2019-11-11 13:00:00', '2019-11-11 14:00:00',
           '2019-11-11 15:00:00', '2019-11-11 16:00:00',
           '2019-11-11 17:00:00', '2019-11-11 18:00:00',
           '2019-11-11 19:00:00', '2019-11-11 20:00:00',
           ...
           '2020-07-26 05:00:00', '2020-07-26 06:00:00',
           '2020-07-26 07:00:00', '2020-07-26 08:00:00',
           '2020-07-26 09:00:00', '2020-07-26 10:00:00',
           '2020-07-26 11:00:00', '2020-07-26 12:00:00',
           '2020-07-26 13:00:00', '2020-07-26 14:00:00'],
          dtype='datetime64[ns]', name='dateTime', length=6196, freq='H')

I tried :

df_steps1h.loc[df_steps1h.index.difference(df_valid.index), ]

and

df_steps1h[~df_steps1h.index.isin(df_valid.index)].dropna()

The DataFrames are different, so I dont want to use concat or merge. but it doesn't remove anything. Any ideas as to why ? Thanks!

1 Answer 1

1

Considering df as the invalid rows DataFrame and df_valid as the original DataFrame from which you want to remove.

df_valid.loc[:,"actual_index"]=df_valid.index
df_valid.loc[:,"actual_index"]=df_valid.loc[:,"actual_index"].apply(lambda x: datetime.strftime(x,'%Y-%m-%d'))
df_valid.loc[:,"actual_index"]=pd.to_datetime(df_valid.loc[:,"actual_index"])
df_valid=df_valid[~df_valid.actual_index.isin(df.index)]
df_valid.drop('actual_index', inplace=True, axis=1)

In the mentioned query, though the index of the DataFrame is of type DatetimeIndex but the values are significantly different from the other DataFrame based on Frequency.

The solution aims at converting it to a similar frequency and hence perform operation.

Sign up to request clarification or add additional context in comments.

6 Comments

Hello! Thanks for answering, I tried running your code, but it doesn't remove any rows :/
Please try after the edit. I've changed this line df_valid=df_valid[~df_valid.actual_index.isin(df.index)]
now I get :A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/user_guide/… errors=errors,
But I do also still need the dataframe to contain info on every hour
Hey! I changed df_valid=df_valid[~df_valid.index.isin(df.index)] and it works
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.