searching a string pattern from a Data-frame column in pandas

Question

Continuing my last question in stack searching matching string pattern from dataframe column in python pandas

Suppose i have a dataframe

 name         genre
 satya      |ACTION|DRAMA|IC|
 satya      |COMEDY|DRAMA|SOCIAL|MUSIC|
 abc        |DRAMA|ACTION|BIOPIC|
 xyz        |ACTION||ROMANCE|DARMA|
 def        |ACTION|SPORT|COMEDY|IC|
 ghj        |IC|ACTIONDRAMA|NOACTION|

From the answer of my last question , i am able to search any one genre (ex IC) if independently exist in genre column and not as a part of any other genre string value (MUSIC or BIOPIC).

Now i want to find if ACTION And DRAMA both present in a genre column but not necessarily in particular order and as not part of string but individually.

So i need rows in output row[1,3,4]

 name         genre
 satya      |ACTION|DRAMA|IC|   # both adjacently present
 #row 2 will not come           # as only DRAMA present not ACTION
 abc        |DRAMA|ACTION|BIOPIC|   ### both adjacently present in diff. order
 xyz        |ACTION||ROMANCE|DARMA|   ### both present not adjacent
 ##row  5 should not present as DRAMA is not here
 ## row 6 should not come as both are not present individually(but present as one string part)

I tried something like

 x = df[df['gen'].str.contains('\|ACTION\|DRAMA\|')]
 ### got only Row  1 (ACTION and DRAMA in adjacent and in order ACTION->DRAMA)

Please somebody suggest what can be followed/added here so that i can get what i need here.

x = df[df['gen'].str.contains(r'(?s)^(?=.*\bACTION\b)(?=.*\bDRAMA\b)')] — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 25, 2016 at 6:39
you want to return the whole line if action and drama is present? — JanLeeYu
– JanLeeYu, Commented Apr 25, 2016 at 6:40
what abount the ghi? is it really ACTIONDRAMA or ACTION|DRAMA? — JanLeeYu
– JanLeeYu, Commented Apr 25, 2016 at 7:06

jezrael · Accepted Answer · 2016-04-25 06:38:41Z

2

I think you can use str.contains with two conditions with AND - &:

print df
    name                        genre
0  satya            |ACTION|DRAMA|IC|
1  satya  |COMEDY|DRAMA|SOCIAL|MUSIC|
2    abc        |DRAMA|ACTION|BIOPIC|
3    xyz      |ACTION||ROMANCE|DRAMA|
4    def     |ACTION|SPORT|COMEDY|IC|
5    ghj    |IC|ACTIONDRAMA|NOACTION|

print df['genre'].str.contains('\|ACTION\|') & df['genre'].str.contains('\|DRAMA\|') 
0     True
1    False
2     True
3     True
4    False
5    False
Name: genre, dtype: bool

print df[ df['genre'].str.contains('\|ACTION\|') & df['genre'].str.contains('\|DRAMA\|') ]
    name                    genre
0  satya        |ACTION|DRAMA|IC|
2    abc    |DRAMA|ACTION|BIOPIC|
3    xyz  |ACTION||ROMANCE|DRAMA|

answered Apr 25, 2016 at 6:38

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Satya Over a year ago

@jezrael-Works..may be a offbit question i have ... is there a way that i can make this kind of thing dynamic(using for loop or some list comprehension)..Like i want to pass [x,y,z] then for x y z the result should come applying all 3 on a base dataframe(like ACTION AND DRAMA applied in Question). My content of list should be variable length.

jezrael Over a year ago

I think this is answer for your comment question - np.logical_and.reduce([X,Y,Z]).

Satya Over a year ago

@jezrael- can you plz help me in this stackoverflow.com/questions/37578530/…

JanLeeYu · Accepted Answer · 2016-04-25 07:09:59Z

0

I'm not really sure about this answer because I don't have a compiler here but try using this one.

(\|ACTION|\|DRAMA).*?(\|ACTION|\|DRAMA)

Hope it helps.

answered Apr 25, 2016 at 7:09

JanLeeYu

1,0012 gold badges10 silver badges24 bronze badges

Collectives™ on Stack Overflow

searching a string pattern from a Data-frame column in pandas

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related