13

I'm wondering if there's a more general way to do the below? I'm wondering if there's a way to create the st function so that I can search a non-predefined number of strings?

So for instance, being able to create a generalized st function, and then type st('Governor', 'Virginia', 'Google)

here's my current function, but it predefines two words you can use. (df is a pandas DataFrame)

def search(word1, word2, word3 df):
    """
    allows you to search an intersection of three terms
    """
    return df[df.Name.str.contains(word1) & df.Name.str.contains(word2) & df.Name.str.contains(word3)]

st('Governor', 'Virginia', newauthdf)
0

2 Answers 2

16

You could use np.logical_and.reduce:

import pandas as pd
import numpy as np
def search(df, *words):  #1
    """
    Return a sub-DataFrame of those rows whose Name column match all the words.
    """
    return df[np.logical_and.reduce([df['Name'].str.contains(word) for word in words])]   # 2


df = pd.DataFrame({'Name':['Virginia Google Governor',
                           'Governor Virginia',
                           'Governor Virginia Google']})
print(search(df, 'Governor', 'Virginia', 'Google'))

prints

                       Name
0  Virginia Google Governor
2  Governor Virginia Google

  1. The * in def search(df, *words) allows search to accept an unlimited number of positional arguments. It will collect all the arguments (after the first) and place them in a list called words.
  2. np.logical_and.reduce([X,Y,Z]) is equivalent to X & Y & Z. It allows you to handle an arbitrarily long list, however.
Sign up to request clarification or add additional context in comments.

2 Comments

sorry is there an equivalent for 'OR'? if I also wanted to mix in or and and searches, how would i do that?
There are two ways to handle OR. You could combine the the regex patterns with |, as behzad.nouri shows, or you could use np.logical_or.reduce. It might be easiest, however, to allow the user to enter regex (which might contain |), and just use search to combine the regex with np.logical_and.reduce.
15

str.contains can take regex. so you can use '|'.join(words) as the pattern; to be safe map to re.escape as well:

>>> df
                 Name
0                Test
1            Virginia
2              Google
3  Google in Virginia
4               Apple

[5 rows x 1 columns]
>>> words = ['Governor', 'Virginia', 'Google']

'|'.join(map(re.escape, words)) would be the search pattern:

>>> import re
>>> pat = '|'.join(map(re.escape, words))
>>> df.Name.str.contains(pat)
0    False
1     True
2     True
3     True
4    False
Name: Name, dtype: bool

2 Comments

This is helpful! I like both answers, but i chose the one below because it allows you to input an arbitrarily long list of answers with *words, which i didn't know about. I also didn't know regex worked in str.contains, so that's very useful.
Is it possible to run contains on multiple fields without using the and operator? pseudo:'df['Name', 'AnotherField'].str.contains(pattern)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.