Checking multiple columns for condition and creating new column

Question

I have the following dataframe

   text         ext_nms ext_dt  ext_cd  ext_addr
Stonetrust      CRR     xxx     xxx     xxx
Wilkes          CRR     xxx     xxx     xxx
wcv 0086967 05  xxx     xxx     COD     xxx
SILVER  INC     xxx     xxx     xxx     ADD
PO BOX 988      xxx     xxx     xxx     ADD
LA 70520        xxx     xxx     xxx     ADD
02/12/2019      xxx     DAT     xxx     ADD

My expected output:

text            ext_nms ext_dt  ext_cd  ext_addr    out
Stonetrust      CRR     xxx     xxx     xxx         CRR
Wilkes          CRR     xxx     xxx     xxx         CRR
wcv 0086967 05  xxx     xxx     COD     xxx         COD
SILVER  INC     xxx     xxx     xxx     ADD         ADD
PO BOX 988      xxx     xxx     xxx     ADD         ADD
LA 70520        xxx     xxx     xxx     ADD         ADD
02/12/2019      xxx     DAT     xxx     xxx         DAT

Essentially, I want to create a new column that has a non-xxx string and that value needs to be moved to that new column.

What I tried:

df['qs_key'] = df.filter(like='^ext').replace('^xxx', '').apply(lambda x: ''.join(x), axis=1)

I am not getting the output that I desire.

Any help will be appreciated. thanks

Henry Yik · Accepted Answer · 2020-08-21 12:15:17Z

1

IIUC use df.ne("xxx"), stack and map:

   df["out"] = df["text"].map(df[df.ne("xxx")].set_index("text").stack().droplevel(1))
   
   print (df)
   
                text ext_nms ext_dt ext_cd ext_addr  out
   0      Stonetrust     CRR    xxx    xxx      xxx  CRR
   1          Wilkes     CRR    xxx    xxx      xxx  CRR
   2  wcv 0086967 05     xxx    xxx    COD      xxx  COD
   3      SILVER INC     xxx    xxx    xxx      ADD  ADD
   4      PO BOX 988     xxx    xxx    xxx      ADD  ADD
   5        LA 70520     xxx    xxx    xxx      ADD  ADD
   6      02/12/2019     xxx    DAT    xxx      xxx  DAT

answered Aug 21, 2020 at 12:15

Henry Yik

22.6k5 gold badges21 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2020-08-21 12:22:59Z

In your code use regex for filter by regex code, then add regex=True for substring replacement and last lambda should be omitted:

df['qs_key'] = df.filter(regex='^ext').replace('^xxx', '', regex=True).apply(''.join, axis=1)
print (df)
             text ext_nms ext_dt ext_cd ext_addr qs_key
0      Stonetrust     CRR    xxx    xxx      xxx    CRR
1          Wilkes     CRR    xxx    xxx      xxx    CRR
2  wcv 0086967 05     xxx    xxx    COD      xxx    COD
3     SILVER  INC     xxx    xxx    xxx      ADD    ADD
4      PO BOX 988     xxx    xxx    xxx      ADD    ADD
5        LA 70520     xxx    xxx    xxx      ADD    ADD
6      02/12/2019     xxx    DAT    xxx      xxx    DAT

If possible multiple values and need first replace ^xxx to missing values, then back filling missing values and last seelct first column by position:

df['qs_key'] = df.filter(regex='^ext').replace('^xxx', np.nan, regex=True).bfill(axis=1).iloc[:, 0]

print (df)
             text ext_nms ext_dt ext_cd ext_addr qs_key
0      Stonetrust     CRR    xxx    xxx      xxx    CRR
1          Wilkes     CRR    xxx    xxx      xxx    CRR
2  wcv 0086967 05     xxx    xxx    COD      xxx    COD
3     SILVER  INC     xxx    xxx    xxx      ADD    ADD
4      PO BOX 988     xxx    xxx    xxx      ADD    ADD
5        LA 70520     xxx    xxx    xxx      ADD    ADD
6      02/12/2019     xxx    DAT    xxx      xxx    DAT

Collectives™ on Stack Overflow

Checking multiple columns for condition and creating new column

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related