1

Continuing my last question in stack searching matching string pattern from dataframe column in python pandas

Suppose i have a dataframe

 name         genre
 satya      |ACTION|DRAMA|IC|
 satya      |COMEDY|DRAMA|SOCIAL|MUSIC|
 abc        |DRAMA|ACTION|BIOPIC|
 xyz        |ACTION||ROMANCE|DARMA|
 def        |ACTION|SPORT|COMEDY|IC|
 ghj        |IC|ACTIONDRAMA|NOACTION|

From the answer of my last question , i am able to search any one genre (ex IC) if independently exist in genre column and not as a part of any other genre string value (MUSIC or BIOPIC).

Now i want to find if ACTION And DRAMA both present in a genre column but not necessarily in particular order and as not part of string but individually.

So i need rows in output row[1,3,4]

 name         genre
 satya      |ACTION|DRAMA|IC|   # both adjacently present
 #row 2 will not come           # as only DRAMA present not ACTION
 abc        |DRAMA|ACTION|BIOPIC|   ### both adjacently present in diff. order
 xyz        |ACTION||ROMANCE|DARMA|   ### both present not adjacent
 ##row  5 should not present as DRAMA is not here
 ## row 6 should not come as both are not present individually(but present as one string part)

I tried something like

 x = df[df['gen'].str.contains('\|ACTION\|DRAMA\|')]
 ### got only Row  1 (ACTION and DRAMA in adjacent and in order ACTION->DRAMA)

Please somebody suggest what can be followed/added here so that i can get what i need here.

6
  • x = df[df['gen'].str.contains(r'(?s)^(?=.*\bACTION\b)(?=.*\bDRAMA\b)')] Commented Apr 25, 2016 at 6:39
  • you want to return the whole line if action and drama is present? Commented Apr 25, 2016 at 6:40
  • or just check if they are just present in the line? Commented Apr 25, 2016 at 6:41
  • @JanLeeYu- want to return the rows to another data-frame. Commented Apr 25, 2016 at 6:43
  • what abount the ghi? is it really ACTIONDRAMA or ACTION|DRAMA? Commented Apr 25, 2016 at 7:06

2 Answers 2

2

I think you can use str.contains with two conditions with AND - &:

print df
    name                        genre
0  satya            |ACTION|DRAMA|IC|
1  satya  |COMEDY|DRAMA|SOCIAL|MUSIC|
2    abc        |DRAMA|ACTION|BIOPIC|
3    xyz      |ACTION||ROMANCE|DRAMA|
4    def     |ACTION|SPORT|COMEDY|IC|
5    ghj    |IC|ACTIONDRAMA|NOACTION|

print df['genre'].str.contains('\|ACTION\|') & df['genre'].str.contains('\|DRAMA\|') 
0     True
1    False
2     True
3     True
4    False
5    False
Name: genre, dtype: bool

print df[ df['genre'].str.contains('\|ACTION\|') & df['genre'].str.contains('\|DRAMA\|') ]
    name                    genre
0  satya        |ACTION|DRAMA|IC|
2    abc    |DRAMA|ACTION|BIOPIC|
3    xyz  |ACTION||ROMANCE|DRAMA|
Sign up to request clarification or add additional context in comments.

3 Comments

@jezrael-Works..may be a offbit question i have ... is there a way that i can make this kind of thing dynamic(using for loop or some list comprehension)..Like i want to pass [x,y,z] then for x y z the result should come applying all 3 on a base dataframe(like ACTION AND DRAMA applied in Question). My content of list should be variable length.
I think this is answer for your comment question - np.logical_and.reduce([X,Y,Z]).
@jezrael- can you plz help me in this stackoverflow.com/questions/37578530/…
0

I'm not really sure about this answer because I don't have a compiler here but try using this one.

(\|ACTION|\|DRAMA).*?(\|ACTION|\|DRAMA)

Hope it helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.