1

I'm trying to create a subset of a pandas dataframe, based on values in a list. However, I need to include string indexing. I'll demonstrate with an example:

Here is my dataframe:

df = pd.DataFrame({'A' : ['1-2', '2', '3', '3-8', '4']})

Here is what it looks like:

A
0    1-2
1      2
2      3
3    3-8
4      4

I have a list of values I want to use to select rows from my dataframe.

list1 = ['2', '3']

I can use the .isin() function to select rows from my dataframe using my list items.

subset = df[df['A'].isin(list1)]
print(subset)

   A
1  2
2  3

However, I want any value that includes '2' or '3'. This is my desired output:

   A
1  1-2
2  2
3  3
4  3-8

Can I use string indexing in my .isin() function? I am struggling to come up with another workaround.

2 Answers 2

3

Check str.split with isin and any

Newdf=df[df.A.str.split('-',expand=True).isin(['2','3']).any(1)].copy()
Out[189]: 
     A
0  1-2
1    2
2    3
3  3-8
Sign up to request clarification or add additional context in comments.

2 Comments

what does .any() do? More specifically, the argument (1) in .any(1).
any True per row @ErichPurpur
1

You can try with regular expression:

import re

pattern=re.compile(".*(("+(")|(").join(list1)+"))")

print(df.loc[df['A'].apply(lambda x: True if pattern.match(x) else False)])

Output:

A
0  1-2
1    2
2    3
3  3-8

[Program finished]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.