1

I have a pandas.DataFrame:

    index    question_id    tag
    0        1858           [pset3, game-of-fifteen]
    1        2409           [pset4]
    2        4346           [pset6, cs50submit]
    3        9139           [pset8, pset5, gradebook]
    4        9631           [pset4, recover]

I need to remove every string from list of strings in tag column except pset* strings.

So I need to end with something like this:

    index    question_id    tag
    0        1858           [pset3]
    1        2409           [pset4]
    2        4346           [pset6]
    3        9139           [pset8, pset5]
    4        9631           [pset4]

How can I do that please?

3 Answers 3

2

One option: Use apply method to loop through the items in the tag column; for each item, use a list comprehension to filter strings based on the prefix using startswith method:

df['tag'] = df.tag.apply(lambda lst: [x for x in lst if x.startswith("pset")])
df

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

2

You can apply a function to the tag series that constructs a list using only the elements that start with 'pset'

df.tag.apply(lambda x: [xx for xx in x if xx.startswith('pset')])

# returns:
0           [pset3]
1           [pset4]
2           [pset6]
3    [pset8, pset5]
4           [pset4]

Comments

2

You can even use python in operator

df.tag = df.tag.apply(lambda x: [elem for elem in x if 'pset' in elem])

0           [pset3]
1           [pset4]
2           [pset6]
3    [pset8, pset5]
4           [pset4]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.