Search for Multiple String Values of Entire Row of Dataframe in python pandas

Question

In a pandas dataframe, I want to search row by row for multiple string values. If the row contains a string value then the function will add/print for that row, into an empty column at the end of the df 1 or 0 based upon
There have been multiple tutorials on how to select rows of a Pandas DataFrame that match a (partial) string.

For Example:

import pandas as pd

#create sample data
data = {'model': ['Lisa', 'Lisa 2', 'Macintosh 128K', 'Macintosh 512K'],
        'launched': [1983,1984,1984,1984],
        'discontinued': [1986, 1985, 1984, 1986]}

df = pd.DataFrame(data, columns = ['model', 'launched', 'discontinued'])
df

I'm pulling the above example from this website: https://davidhamann.de/2017/06/26/pandas-select-elements-by-string/

How would I do a multi-value search of the entire row for: 'int', 'tos', '198'?

Then print into a column next discontinued, a column int that would have 1 or 0 based upon whether the row contained that keyword.

rafaelc · Accepted Answer · 2018-06-14 02:02:40Z

6

If you have

l=['int', 'tos', '198']

Then you use str.contains by joining with '|' to get every model that contains any of these words

df.model.str.contains('|'.join(l))

0    False
1    False
2     True
3     True

Edit

If the intention is to check all columns as @jpp interpreted, I'd suggest:

from functools import reduce
res = reduce(lambda a,b: a | b, [df[col].astype(str).str.contains(m) for col in df.columns])

0    False
1     True
2     True
3     True

If you want it as a column with integer values, just do

df['new_col'] = res.astype(int)

     new_col
0    0
1    1
2    1
3    1

edited Jun 14, 2018 at 2:02

answered Jun 13, 2018 at 20:53

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jpp · Accepted Answer · 2018-06-14 23:35:00Z

If I understand correctly, you wish to check the existence of strings across all columns in each row. This is not straightforward given you have mixed types (integers, strings). One way is to use pd.DataFrame.apply with a custom function.

The main point we need to remember is to convert your entire dataframe to type str, since you cannot test the existence of substrings within an integer.

match = ['int', 'tos', '1985']

def string_finder(row, words):
    if any(word in field for field in row for word in words):
        return True
    return False

df['isContained'] = df.astype(str).apply(string_finder, words=match, axis=1)

print(df)

            model  launched  discontinued  isContained
0            Lisa      1983          1986        False
1          Lisa 2      1984          1985         True
2  Macintosh 128K      1984          1984         True
3  Macintosh 512K      1984          1986         True

mrGreenBrown · Accepted Answer · 2018-06-26 21:14:48Z

0

So the simplest method without using fancy pandas staff would be to use two for loops. I would like if someone could give a better solution, but my approach would be this:

def check_all_for(column_name, search_terms):
    df[column_name] = ''
    for row in df.iterrows():
        flag = 0
        for element in row:
            for search_term in search_terms:
                if search_term in (str(element)).lower():
                    flag = 1
        row[column_name] = flag

Assumption is that you have dataframe defined as df and you want to flag the new column with 1 and 0

answered Jun 26, 2018 at 21:14

mrGreenBrown

6041 gold badge8 silver badges24 bronze badges

Comments

harpan · Accepted Answer · 2018-06-13 21:11:11Z

0

You need to check if model is a substring of match or not.

match = [ 'int', 'tos', '198']
df['isContained'] = df['model'].apply(lambda x: 1 if any(s in x for s in match) else 0)

Output:

            model  launched  discontinued  isContained
0            Lisa      1983          1986            0
1          Lisa 2      1984          1985            0
2  Macintosh 128K      1984          1984            1
3  Macintosh 512K      1984          1986            1

edited Jun 13, 2018 at 21:11

answered Jun 13, 2018 at 21:01

harpan

8,6412 gold badges22 silver badges40 bronze badges

Comments

0xFK · Accepted Answer · 2019-03-31 16:41:50Z

@Guy_Fuqua, my understanding that you want to assure that all words included in one row, am I right?

if so, then a little modification for jpp answer shall help you to achieve this,kindly note the AssessAllString function here

match = ['int', 'tos', '1984']

def string_finder(row, words):
    if any(word in field for field in row for word in words):
        return True
    return False

def AssessAllString (row,words):
    b=True
    for x in words:
      b = b&string_finder(row,[x])
    return b

df['isContained'] = df.astype(str).apply(AssessAllString, words=match, axis=1)

print(df)

            model  launched  discontinued  isContained
0  Lisa            1983      1986          False      
1  Lisa 2          1984      1985          False      
2  Macintosh 128K  1984      1984          True       
3  Macintosh 512K  1984      1986          True

Another Example for :

match = ['isa','1984']
df['isContained'] = df.astype(str).apply(AssessAllString, words=match, axis=1)

            model  launched  discontinued  isContained
0  Lisa            1983      1986          False      
1  Lisa 2          1984      1985          True       
2  Macintosh 128K  1984      1984          False      
3  Macintosh 512K  1984      1986          False

I believe code still need optimization, but so far shall fit the purpose

Collectives™ on Stack Overflow

Search for Multiple String Values of Entire Row of Dataframe in python pandas

5 Answers 5

Edit

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Edit

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related