pandas-python dataframe update a column

Question

Say I have a list BRANDS that contains brand names:

BRANDS = ['Samsung', 'Apple', 'Nike', .....]

Dataframe A has following structure

row     item_title      brand_name

1    |  Apple 6S      |  Apple
2    |  Nike BB Shoes |  na  <-- need to fill with Nike
3    |  Samsung TV    |  na  <--need fill with Samsung
4    | Used bike      |  na  <--No need to do anything because there is no brand_name in the title 
    ....

I want to fill the column brand_name of Row 2 with Nike, Row 3 with Samsung, because they null and the item_title contains keywords that can be found in list BRANDS. How can I do it?

MaxU - stand with Ukraine · Accepted Answer · 2018-01-29 22:23:46Z

3

Vectorized solution:

In [168]: x = df.item_title.str.split(expand=True)

In [169]: df['brand_name'] = \
              df['brand_name'].fillna(x[x.isin(BRANDS)]
                                         .ffill(axis=1)
                                         .bfill(axis=1)
                                         .iloc[:, 0])

In [170]: df
Out[170]:
   row     item_title brand_name
0    1       Apple 6S      Apple
1    2  Nike BB Shoes       Nike
2    3     Samsung TV    Samsung
3    4      Used bike        NaN

answered Jan 29, 2018 at 22:23

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

pault · Accepted Answer · 2018-01-29 22:01:40Z

One approach is to use apply():

import pandas as pd
BRANDS = ['Samsung', 'Apple', 'Nike']

def get_brand_name(row):
    if ~pd.isnull(row['brand_name']):
        # don't do anything if brand_name is not null
        return row['brand_name']

    item_title = row['item_title']
    title_words = map(str.title, item_title.split())
    for tw in title_words:
        if tw in BRANDS:
            # return first 'match'
            return tw
    # default return None
    return None

df['brand_name'] = df.apply(lambda x: get_brand_name(x), axis=1)
print(df)
#   row     item_title brand_name
#0    1       Apple 6S      Apple
#1    2  Nike BB Shoes       Nike
#2    3     Samsung TV    Samsung
#3    4      Used bike       None

Notes

I converted the tokenized title to title-case using str.title() because that's how you defined BRANDS.
If you have a lot of brands, it's recommended to use a set instead of a list because lookups will be faster. However, this won't work if you care about order.

fpersyn · Accepted Answer · 2019-11-26 11:31:22Z

0

You can achieve the result you are after by writing a simple function. You can then use .apply() with a lambda function to generate your desired column.

def contains_any(s, arr):
    for item in arr:
        if item in s: return item
    return np.nan
df['brand_name'] = df['product'].apply(lambda x: match_substring(x, product_map))

edited Nov 26, 2019 at 11:31

answered Nov 26, 2019 at 10:38

fpersyn

1,1162 gold badges12 silver badges20 bronze badges

Collectives™ on Stack Overflow

pandas-python dataframe update a column

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related