2

I want my python script to delete a row in a DataFrame, if the term at the current index is a substring of the following term. And also, if the following term is a substring of the term at the current index.

In the following example only the last data set with the terms 'A 600 Strom' should be left aswell as 'Silent'.

    term            timestamp
83  A 6             2018-09-27 18:26:46
85  A 60            2018-09-27 18:26:46
86  A 600           2018-09-27 18:26:46
89  A 600           2018-09-27 18:26:47
91  A 600 S         2018-09-27 18:26:47
93  A 600 Str       2018-09-27 18:26:48
95  A 600 Stro      2018-09-27 18:26:49
97  A 600 Str       2018-09-27 18:26:53
98  A 600 Strom     2018-09-27 18:26:5
99  S               2018-09-27 18:26:48
100 Sil             2018-09-27 18:26:49
101 Silen           2018-09-27 18:26:53
102 Silent          2018-09-27 18:26:5

Is there an elegant and efficient solution or do I have to process a series of if-statements in a loop?

4
  • is the term always in the same format A 600 Storm i.e B 250 Rain and B 2 would be a subset of it Commented Jun 22, 2020 at 15:57
  • it is not. It could also be something like "weather" and "weat" would be a subset. For a better understanding: The data comes from an application that gathers all search queries from the users, so the term could be in any format Commented Jun 22, 2020 at 16:07
  • is there a user key in the table? Commented Jun 22, 2020 at 16:20
  • Yes, but unfortunately it is inconsistent and therefore not really usable Commented Jun 22, 2020 at 16:23

1 Answer 1

2

Use, Series.shift to shift the term column and assign it to the new_column s_1 then use DataFrame.agg along axis=1 to create a boolean mask by comparing a previous term to next term(s_1) and also compare the next term(s_1) to its previous term. Finally use this mask to filter the dataframe:

mask = (
    df.assign(s_1=df['term'].shift(-1).astype(str))
    .agg(lambda s: s['term'] in s['s_1'] or s['s_1'] in s['term'], axis=1)
)

df1 = df[~mask]

Result:

# print(df1)
           term            timestamp
98  A 600 Strom  2018-09-27 18:26:53
Sign up to request clarification or add additional context in comments.

1 Comment

Exactly what I needed. Thank you very much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.