3

I have a dataframe like this:

 A     B        D
 d     151     M
 d     152     a
 d     153     M
 d     161     z
 d     164     M
 d     169     g
 d     178     g
 d     173     e
 d     176     j
 d     605     j
 d     603     d
 d     607     e
 d     905     j
 d     903     d
 d     907     e

I want to filter it based on the values of column B. Specifically i want the values of column B that begins with 15, 16 and 60. My dataframe would be reduced to:

 A     B        D
 d     151     M
 d     152     a
 d     153     M
 d     161     z
 d     164     M
 d     169     g
 d     605     j
 d     603     d
 d     607     e

So far i've been unsuccessful to achieve this result. Which would be the best way to achieve this?

0

2 Answers 2

2

You need convert to str first and select first 2 values which are compared with isin and filter with boolean indexing:

print (df.B.astype(str).str[:2])
0     15
1     15
2     15
3     16
4     16
5     16
6     17
7     17
8     17
9     60
10    60
11    60
12    90
13    90
14    90
Name: B, dtype: object

print (df[df.B.astype(str).str[:2].isin(['15','16','60'])])
    A    B  D
0   d  151  M
1   d  152  a
2   d  153  M
3   d  161  z
4   d  164  M
5   d  169  g
9   d  605  j
10  d  603  d
11  d  607  e

Another solution is use contains with custom regex - ^ means start of string and | or:

L = [15,16,90]
val = '|'.join(['^' + str(x) for x in L])
print (val)
^15|^16|^90
print (df[df.B.astype(str).str.contains(val)])
    A    B  D
0   d  151  M
1   d  152  a
2   d  153  M
3   d  161  z
4   d  164  M
5   d  169  g
12  d  905  j
13  d  903  d
14  d  907  e
Sign up to request clarification or add additional context in comments.

Comments

0
>>> df[df['B'].map(lambda x: str(x[:2]) in ['15', '16', '60'])]
    A    B  D
0   d  151  M
1   d  152  a
2   d  153  M
3   d  161  z
4   d  164  M
5   d  169  g
9   d  605  j
10  d  603  d
11  d  607  e

7 Comments

A bit danger, if in column B is None or NaN - get error.
@jezrael Updated to address None or NaN
only I think if in column B is None or NaN get error - so solution works only if is 100% sure never get it. So better is use str.contains or isin which nice handling it.
@jezrael yea right, can you explain me df.B.astype(str).str[:2] i got it till df.B.astype(str) it typecasting it to str. but str[:2] i don't get it. as str is built in python.
It is called indexing with str
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.