Check if string is in another column pandas

Question

Below is my DF

df= pd.DataFrame({'col1': ['[7]', '[30]', '[0]', '[7]'], 'col2': ['[0%, 7%]', '[30%]', '[30%, 7%]', '[7%]']})

col1    col2    
[7]     [0%, 7%]
[30]    [30%]
[0]     [30%, 7%]
[7]     [7%]

The aim is to check if col1 value is contained in col2 below is what I've tried

df['test'] = df.apply(lambda x: str(x.col1) in str(x.col2), axis=1)

Below is the expected output

col1    col2       col3
[7]     [0%, 7%]   True
[30]    [30%]      True
[0]     [30%, 7%]  False
[7]     [7%]       True

Wiktor Stribiżew · Accepted Answer · 2021-11-08 11:13:44Z

2

You can also replace the square brackets with word boundaries \b and use re.search like in

import re
#...
df.apply(lambda x: bool(re.search(x['col1'].replace("[",r"\b").replace("]",r"\b"), x['col2'])), axis=1)
# => 0     True
#    1     True
#    2    False
#    3     True
#    dtype: bool

This will work because \b7\b will find a match in [0%, 7%] as 7 is neither preceded nor followed with letters, digits or underscores. There won't be any match found in [30%, 7%] as \b0\b does not match a zero after a digit (here, 3).

answered Nov 8, 2021 at 11:13

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mozway · Accepted Answer · 2021-11-08 11:06:08Z

2

You can extract the numbers on both columns and join, then check if there is at least one match per id using eval+groupby+any:

(df['col2'].str.extractall('(?P<col2>\d+)').droplevel(1)
   .join(df['col1'].str[1:-1])
   .eval('col2 == col1')
   .groupby(level=0).any()
)

output:

0     True
1     True
2    False
3     True

answered Nov 8, 2021 at 11:06

mozway

267k13 gold badges55 silver badges106 bronze badges

Comments

Dani Mesejo · Accepted Answer · 2021-11-08 11:07:23Z

2

One approach:

import ast

# convert to integer list
col2_lst = df["col2"].str.replace("%", "").apply(ast.literal_eval)

# check list containment
df["col3"] = [all(bi in a for bi in b)  for a, b in zip(col2_lst, df["col1"].apply( ast.literal_eval)) ]

print(df)

Output

   col1       col2   col3
0   [7]   [0%, 7%]   True
1  [30]      [30%]   True
2   [0]  [30%, 7%]  False
3   [7]       [7%]   True

answered Nov 8, 2021 at 11:07

Dani Mesejo

62.1k6 gold badges56 silver badges86 bronze badges

Comments

jezrael · Accepted Answer · 2021-11-08 11:10:11Z

2

Use Series.str.extractall for get numbers, reshape by Series.unstack, so possible compare by DataFrame.isin with DataFrame.any:

df['test'] = (df['col2'].str.extractall('(\d+)')[0].unstack()
                        .isin(df['col1'].str.strip('[]'))
                        .any(axis=1))
print (df)
   col1       col2   test
0   [7]   [0%, 7%]   True
1  [30]      [30%]   True
2   [0]  [30%, 7%]  False
3   [7]       [7%]   True

edited Nov 8, 2021 at 11:10

answered Nov 8, 2021 at 11:03

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Check if string is in another column pandas

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related