Create new columns in pandas data frame based on existing column

Question

I have a Python dictionary as follows:

ref_dict = {
"Company1" :["C1_Dev1","C1_Dev2","C1_Dev3","C1_Dev4","C1_Dev5",],
"Company2" :["C2_Dev1","C2_Dev2","C2_Dev3","C2_Dev4","C2_Dev5",],
"Company3" :["C3_Dev1","C3_Dev2","C3_Dev3","C3_Dev4","C3_Dev5",],
 }

I have a Pandas data frame called df whose one of the columns looks like this:

    DESC_DETAIL
0   Probably task Company2 C2_Dev5
1   File system C3_Dev1
2   Weather subcutaneous Company2
3   Company1 Travesty C1_Dev3
4   Does not match anything 
...........

My goal is to add two extra columns to this data frame and name the columns, COMPANY and DEVICE. The value in each row of the COMPANY column will be either be the company key in the dictionary if it exists in the DESC_DETAIL column or if the corresponding device exists in the DESC_DETAIL column. The value in the DEVICE column will simply be the device string in the DESC_DETAIL column. If no match is found, the corresponding row is empty. Hence the final output will look like this:

     DESC_DETAIL                        COMPANY         DEVICE
 0   Probably task Company2 C2_Dev5     Company2        C2_Dev5
 1   File system C3_Dev1                Company3        C3_Dev1
 2   Weather subcutaneous Company2      Company2        NaN
 3   Company1 Travesty C1_Dev3          Company1        C1_Dev3
 4   Does not match anything            NaN             NaN

My attempt:

for key, value in ref_dict.items():
    df['COMPANY'] = df.apply(lambda row: key if row['DESC_DETAIL'].isin(key) else Nan, axis=1)

This is obviously just wrong and does not work. How do I make it work ?

perl · Accepted Answer · 2021-04-21 22:41:57Z

1

You can extract values with str.extract using a regex pattern:

import re

s = pd.Series(ref_dict).explode()

# extract company
df['COMPANY'] = df['DESC_DETAIL'].str.extract(
    f"({'|'.join(s.index.unique())})", flags=re.IGNORECASE)

# extract device
df['DEVICE'] = df['DESC_DETAIL'].str.extract(
    f"({'|'.join(s)})", flags=re.IGNORECASE)

# fill missing company values based on device
df['COMPANY'] = df['COMPANY'].fillna(
    df['DEVICE'].str.lower().map(dict(zip(s.str.lower(), s.index))))

df

Output:

                      DESC_DETAIL   COMPANY   DEVICE
0  Probably task Company2 C2_Dev5  Company2  C2_Dev5
1             File system C3_Dev1  Company3  C3_Dev1
2   Weather subcutaneous Company2  Company2      NaN
3       Company1 Travesty C1_Dev3  Company1  C1_Dev3
4         Does not match anything       NaN      NaN

edited Apr 21, 2021 at 22:41

answered Apr 21, 2021 at 21:22

perl

9,9811 gold badge14 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Amistad Over a year ago

@perl..the extract device line gives me an error saying..ValueError: Wrong number of items passed 3, placement implies 1

perl Over a year ago

@Amistad Can you share your data? Running without any problem on your sample

Amistad Over a year ago

@perl..i figured the issue but now this is not working since the string is upper and lower at various places..How do I make the search case insensitive

perl Over a year ago

@Amistad Just pass flags=re.IGNORECASE to str.extract

Amistad Over a year ago

Thank you so much...you have been awesome..everything works now !!!

|

SomeDude · Accepted Answer · 2021-04-21 21:43:07Z

0

You need a device to company dictionary as well and you can build it from the ref_dict easily as below:

dev_to_company_dict = {v:l[0] for l in zip(ref_dict.keys(), ref_dict.values()) for v in l[1]}

Then it is easy to do this:

df['COMPANY'] = df['DESC_DETAIL'].apply(lambda det : ''.join(set(re.split("\\s+", det)).intersection(ref_dict.keys())))
df['COMPANY'].replace('', np.nan, inplace=True)
df['DEVICE'] = df['DESC_DETAIL'].apply(lambda det : ''.join(set(re.split("\\s+", det)).intersection(dev_to_company_dict.keys())))
df['DEVICE'].replace('', np.nan, inplace=True)
df['COMPANY'] = df['COMPANY'].fillna(df['DEVICE'].map(dev_to_company_dict))

Output:

                       DESC_DETAIL   COMPANY     DEVICE
0   Probably task Company2 C2_Dev5  Company2    C2_Dev5
1   File system C3_Dev1             Company3    C3_Dev1
2   Weather subcutaneous Company2   Company2        NaN
3   Company1 Travesty C1_Dev3       Company1    C1_Dev3
4   Does not match anything              NaN        NaN

edited Apr 21, 2021 at 21:43

answered Apr 21, 2021 at 21:38

SomeDude

14.3k5 gold badges26 silver badges49 bronze badges

2 Comments

Amistad Over a year ago

@SomeDude..how do i make the search case insensitive

SomeDude Over a year ago

Can be done, but going to be ugly. @perl's is better.

Collectives™ on Stack Overflow

Create new columns in pandas data frame based on existing column

2 Answers 2

10 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related