1

I wanted to create a new column categorizing records according to a substring in a tracking code. For example, it tracking code contains 'KNC-' the new column Channel should be 'Paid Search'

From this post Pandas: Check if a substring exists in another column then create a new column with a specific value I was able to find a solution.

desc = {"KNC-":"Paid Search","SL-": "Display",'SNP-':'Social','EMC-':'Email'}
def check_desc(x):
    for key in desc:
        if key.lower() in x.lower():
            return desc[key]
    return ''
df['Marketing Channel'] = df["Tracking Code"].map(lambda x: check_desc(x))

However, the first thing I tried was using numpy select:

conditions = [
    ('KNC-' in df['Tracking Code']),
    ('SL-' in df['Tracking Code']),
    ('SNP-' in df['Tracking Code']),
    ('EMC-' in df['Tracking Code'])
    ]
values = ['Paid Search', 'Display', 'Social', 'Email']
df['Marketing Channel'] = np.select(conditions, values)

This latter code created the column but all values were zero. Why?

1 Answer 1

1

'KNC-' in df['Tracking Code'] checks if the value KNC- exists in the column. It doesn't check each value for the substring KNC-.

Change your conditions to use str.contains:

conditions = [
    df['Tracking Code'].str.contains('KNC-'),
    df['Tracking Code'].str.contains('SL-'),
    df['Tracking Code'].str.contains('SNP-'),
    df['Tracking Code'].str.contains('EMC-'),
]
values = ['Paid Search', 'Display', 'Social', 'Email']
df['Marketing Channel'] = np.select(conditions, values)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.