3

In a pandas dataframe, I want to search row by row for multiple string values. If the row contains a string value then the function will add/print for that row, into an empty column at the end of the df 1 or 0 based upon
There have been multiple tutorials on how to select rows of a Pandas DataFrame that match a (partial) string.

For Example:

import pandas as pd

#create sample data
data = {'model': ['Lisa', 'Lisa 2', 'Macintosh 128K', 'Macintosh 512K'],
        'launched': [1983,1984,1984,1984],
        'discontinued': [1986, 1985, 1984, 1986]}

df = pd.DataFrame(data, columns = ['model', 'launched', 'discontinued'])
df

I'm pulling the above example from this website: https://davidhamann.de/2017/06/26/pandas-select-elements-by-string/

How would I do a multi-value search of the entire row for: 'int', 'tos', '198'?

Then print into a column next discontinued, a column int that would have 1 or 0 based upon whether the row contained that keyword.

0

5 Answers 5

6

If you have

l=['int', 'tos', '198']

Then you use str.contains by joining with '|' to get every model that contains any of these words

df.model.str.contains('|'.join(l))

0    False
1    False
2     True
3     True

Edit

If the intention is to check all columns as @jpp interpreted, I'd suggest:

from functools import reduce
res = reduce(lambda a,b: a | b, [df[col].astype(str).str.contains(m) for col in df.columns])

0    False
1     True
2     True
3     True

If you want it as a column with integer values, just do

df['new_col'] = res.astype(int)

     new_col
0    0
1    1
2    1
3    1
Sign up to request clarification or add additional context in comments.

Comments

3

If I understand correctly, you wish to check the existence of strings across all columns in each row. This is not straightforward given you have mixed types (integers, strings). One way is to use pd.DataFrame.apply with a custom function.

The main point we need to remember is to convert your entire dataframe to type str, since you cannot test the existence of substrings within an integer.

match = ['int', 'tos', '1985']

def string_finder(row, words):
    if any(word in field for field in row for word in words):
        return True
    return False

df['isContained'] = df.astype(str).apply(string_finder, words=match, axis=1)

print(df)

            model  launched  discontinued  isContained
0            Lisa      1983          1986        False
1          Lisa 2      1984          1985         True
2  Macintosh 128K      1984          1984         True
3  Macintosh 512K      1984          1986         True

Comments

0

So the simplest method without using fancy pandas staff would be to use two for loops. I would like if someone could give a better solution, but my approach would be this:

def check_all_for(column_name, search_terms):
    df[column_name] = ''
    for row in df.iterrows():
        flag = 0
        for element in row:
            for search_term in search_terms:
                if search_term in (str(element)).lower():
                    flag = 1
        row[column_name] = flag

Assumption is that you have dataframe defined as df and you want to flag the new column with 1 and 0

Comments

0

You need to check if model is a substring of match or not.

match = [ 'int', 'tos', '198']
df['isContained'] = df['model'].apply(lambda x: 1 if any(s in x for s in match) else 0)

Output:

            model  launched  discontinued  isContained
0            Lisa      1983          1986            0
1          Lisa 2      1984          1985            0
2  Macintosh 128K      1984          1984            1
3  Macintosh 512K      1984          1986            1

Comments

0

@Guy_Fuqua, my understanding that you want to assure that all words included in one row, am I right?

if so, then a little modification for jpp answer shall help you to achieve this,kindly note the AssessAllString function here

match = ['int', 'tos', '1984']

def string_finder(row, words):
    if any(word in field for field in row for word in words):
        return True
    return False

def AssessAllString (row,words):
    b=True
    for x in words:
      b = b&string_finder(row,[x])
    return b

df['isContained'] = df.astype(str).apply(AssessAllString, words=match, axis=1)

print(df)

            model  launched  discontinued  isContained
0  Lisa            1983      1986          False      
1  Lisa 2          1984      1985          False      
2  Macintosh 128K  1984      1984          True       
3  Macintosh 512K  1984      1986          True 

Another Example for :

match = ['isa','1984']
df['isContained'] = df.astype(str).apply(AssessAllString, words=match, axis=1)

            model  launched  discontinued  isContained
0  Lisa            1983      1986          False      
1  Lisa 2          1984      1985          True       
2  Macintosh 128K  1984      1984          False      
3  Macintosh 512K  1984      1986          False 

I believe code still need optimization, but so far shall fit the purpose

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.