1

I have datafarme as below.

ID   COUNTRY   GENDER    AGE  V1   V2   V3   V4   V5
1    1    1    53   APPLE     apple     bosck     APPLE123  xApple111t
2    2    2    51   BEKO beko SIMSUNG   SamsungO123    ttBeko111t
3    3    1    24   SAMSUNG   bosch     SEMSUNG   BOSC1123  uuSAMSUNG111t

I want to replace to np.nan if there are same value in list or contain specific value. I tried below but occurred error.

remove_list = ['APPLE', 'BEKO']

remove_contain_list = ['SUNG', 'bosc']

df.iloc[:,4:].str.replace(remove_list, np.nan, case=False) # exact match & case sensitive
df.iloc[:,4:].str.contains(remove_contain_list, np.nan, case=False) # contain & case sensitive

How can I solve these problems?

1
  • could u format ur code properly? data should be indented with at least four spaces Commented Apr 3, 2020 at 7:01

1 Answer 1

1

You can create MultiIndex Series by DataFrame.stack, get masks for exact and partial matches by Series.isin with lowercase values and Series.str.contains, replace by Series.mask (default value for replace is NaN, so no necessary specify) and last Series.unstack and assign back:

remove_list = ['APPLE', 'BEKO']
remove_contain_list = ['SUNG', 'bosc']

s = df.iloc[:,4:].stack(dropna=False)
m1 = s.str.lower().isin([x.lower() for x in remove_list])
m2 = s.str.contains('|'.join(remove_contain_list), case=False)
s = s.mask(m1 | m2)

df.iloc[:,4:] = s.unstack()
print (df)
   ID  COUNTRY  GENDER  AGE   V1   V2   V3        V4          V5
0   1        1       1   53  NaN  NaN  NaN  APPLE123  xApple111t
1   2        2       2   51  NaN  NaN  NaN       NaN  ttBeko111t
2   3        3       1   24  NaN  NaN  NaN       NaN         NaN

EDIT: You can replace mask to background color if match in Styler.apply:

def color(x): 
    c1 = 'background-color: yellow'
    c = ''

    remove_list = ['APPLE', 'BEKO']
    remove_contain_list = ['SUNG', 'bosc']

    s = x.iloc[:,4:].stack(dropna=False)
    m1 = s.str.lower().isin([i.lower() for i in remove_list])
    m2 = s.str.contains('|'.join(remove_contain_list), case=False)
    m = m1| m2

    df1 = pd.DataFrame(c, index=x.index, columns=x.columns)
    mask = m.unstack(fill_value=False).reindex(x.columns, fill_value=False, axis=1)   
    df1 = df1.mask(mask, c1)
    return df1

df.style.apply(color,axis=None)
Sign up to request clarification or add additional context in comments.

11 Comments

Thank you and Can I know the means of '|'.join ?
@purplecollar - It means regex OR - first value of list OR second value of list
Same condition as above, how can I change the backgroud color to yellow instead of replace?
@purplecollar - so for remove_list matching is yellow, for remove_contain_list is red? How looks expected output?
If the column value corresponds to remove_list or contain remove_contain_list, I want to change the background color of the column to yellow.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.