7

i have a data-frame like below

 name         genre
 satya      |ACTION|DRAMA|IC|
 satya      |COMEDY|BIOPIC|SOCIAL|
 abc        |CLASSICAL|
 xyz        |ROMANCE|ACTION|DARMA|
 def        |DISCOVERY|SPORT|COMEDY|IC|
 ghj        |IC|

Now I want to query the dataframe so that i can get row 1,5 and 6.i:e i want to find |IC| with alone or with any combination of other genres.

Upto now i am able to do either a exact search using

df[df['genre'] == '|ACTION|DRAMA|IC|']  ######exact value yields row 1

or a string contains search by

 df[df['genre'].str.contains('IC')]  ####yields row 1,2,3,5,6
 # as BIOPIC has IC in that same for CLASSICAL also

But i don't want these two.

#df[df['genre'].str.contains('|IC|')]  #### row 6
# This also not satisfying my need as i am missing rows 1 and 5

So my requirement is to find genres having |IC| in them.(My string search fails because python treats '|' as or operator)

Somebody suggest some reg or any method to do that.Thanks in ADv.

2 Answers 2

8

I think you can add \ to regex for escaping , because | without \ is interpreted as OR:

'|'

A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. To match a literal '|', use \|, or enclose it inside a character class, as in [|].

print df['genre'].str.contains(u'\|IC\|')
0     True
1    False
2    False
3    False
4     True
5     True
Name: genre, dtype: bool

print df[df['genre'].str.contains(u'\|IC\|')]
    name                        genre
0  satya            |ACTION|DRAMA|IC|
4    def  |DISCOVERY|SPORT|COMEDY|IC|
5    ghj                         |IC|
Sign up to request clarification or add additional context in comments.

Comments

1

may be this construction:

    pd.DataFrame[DataFrame['columnName'].str.contains(re.compile('regex_pattern'))]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.