6

Below is my DF

df= pd.DataFrame({'col1': ['[7]', '[30]', '[0]', '[7]'], 'col2': ['[0%, 7%]', '[30%]', '[30%, 7%]', '[7%]']})

col1    col2    
[7]     [0%, 7%]
[30]    [30%]
[0]     [30%, 7%]
[7]     [7%]

The aim is to check if col1 value is contained in col2 below is what I've tried

df['test'] = df.apply(lambda x: str(x.col1) in str(x.col2), axis=1)

Below is the expected output

col1    col2       col3
[7]     [0%, 7%]   True
[30]    [30%]      True
[0]     [30%, 7%]  False
[7]     [7%]       True

4 Answers 4

2

You can also replace the square brackets with word boundaries \b and use re.search like in

import re
#...
df.apply(lambda x: bool(re.search(x['col1'].replace("[",r"\b").replace("]",r"\b"), x['col2'])), axis=1)
# => 0     True
#    1     True
#    2    False
#    3     True
#    dtype: bool

This will work because \b7\b will find a match in [0%, 7%] as 7 is neither preceded nor followed with letters, digits or underscores. There won't be any match found in [30%, 7%] as \b0\b does not match a zero after a digit (here, 3).

Sign up to request clarification or add additional context in comments.

Comments

2

You can extract the numbers on both columns and join, then check if there is at least one match per id using eval+groupby+any:

(df['col2'].str.extractall('(?P<col2>\d+)').droplevel(1)
   .join(df['col1'].str[1:-1])
   .eval('col2 == col1')
   .groupby(level=0).any()
)

output:

0     True
1     True
2    False
3     True

Comments

2

One approach:

import ast

# convert to integer list
col2_lst = df["col2"].str.replace("%", "").apply(ast.literal_eval)

# check list containment
df["col3"] = [all(bi in a for bi in b)  for a, b in zip(col2_lst, df["col1"].apply( ast.literal_eval)) ]

print(df)

Output

   col1       col2   col3
0   [7]   [0%, 7%]   True
1  [30]      [30%]   True
2   [0]  [30%, 7%]  False
3   [7]       [7%]   True

Comments

2

Use Series.str.extractall for get numbers, reshape by Series.unstack, so possible compare by DataFrame.isin with DataFrame.any:

df['test'] = (df['col2'].str.extractall('(\d+)')[0].unstack()
                        .isin(df['col1'].str.strip('[]'))
                        .any(axis=1))
print (df)
   col1       col2   test
0   [7]   [0%, 7%]   True
1  [30]      [30%]   True
2   [0]  [30%, 7%]  False
3   [7]       [7%]   True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.