Python: Creating New Column Based on Conditional Function of Another Column and Row

Question

I am trying to produce some new columns based on data from different columns and rows. For instance, take the below series:

df = pd.Series(['Fruit[edit]','Apple','Orange','Banana','Vegetable[edit]','Celery','Beans','Kale'])

0        Fruit[edit]
1              Apple
2             Orange
3             Banana
4    Vegetable[edit]
5             Celery
6              Beans
7               Kale

I'm starting off with a series where the elements with "[edit]" represent the categories, and the rest are the names of the items that belong in that category. I would like to create two new columns, one showing the "Category" (i.e. fruit or vegetable) and another with the column title "Name" showing the items belonging to that category.

The end result should look something like this:

Desired Result

    Category    Name
0   Fruit       Apple
1   Fruit       Orange
2   Fruit       Banana
3   Vegetable   Celery
4   Vegetable   Beans
5   Vegetable   Kale

As we go down the series, I would like the code to recognize a new category (i.e. elements that endwith '[edit]' and store that as the updated category for the items until a newer category is reached.

jezrael · Accepted Answer · 2019-02-24 06:25:42Z

3

Use:

#if necessary convert Series to DataFrame 
df = df.to_frame('Name')
#get rows with edit
mask = df['Name'].str.endswith('[edit]')
#remove edit
df.loc[mask, 'Name'] = df['Name'].str[:-6]
#create Category column
df.insert(0, 'Category', df['Name'].where(mask).ffill())
#remove rows with same values in columns
df = df[~mask].copy()
print (df)
    Category    Name
1      Fruit   Apple
2      Fruit  Orange
3      Fruit  Banana
5  Vegetable  Celery
6  Vegetable   Beans
7  Vegetable    Kale

answered Feb 24, 2019 at 6:25

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

anky · Accepted Answer · 2019-02-24 05:56:38Z

2

This may be ugly, but does the job:

df = pd.DataFrame(df) #since df is a series
df['Name']=df.groupby(df[0].str.contains('edit').cumsum())[0].apply(lambda x: x.shift(-1))
df=df.dropna().rename(columns={0:'Category'})
df.loc[~df.Category.str.contains('edit'),'Category']=np.nan
df.Category=df.Category.ffill()
df.Category=df.Category.str.split("[").str[0]
print(df)

    Category    Name
0      Fruit   Apple
1      Fruit  Orange
2      Fruit  Banana
4  Vegetable  Celery
5  Vegetable   Beans
6  Vegetable    Kale

answered Feb 24, 2019 at 5:56

anky

75.3k11 gold badges46 silver badges76 bronze badges

Comments

Vaishali · Accepted Answer · 2019-02-24 06:34:37Z

2

You can use str.extract to extract groups based on presence of the keyword,

new_df = df.str.extract('(?P<Category>.*\[edit\])?(?P<Name>.*)')\
.replace('\[edit\]', '', regex = True).ffill()\
.replace('', np.nan).dropna()

    Category    Name
1   Fruit   Apple
2   Fruit   Orange
3   Fruit   Banana
5   Vegetable   Celery
6   Vegetable   Beans
7   Vegetable   Kale

answered Feb 24, 2019 at 6:34

Vaishali

38.5k5 gold badges62 silver badges88 bronze badges

Collectives™ on Stack Overflow

Python: Creating New Column Based on Conditional Function of Another Column and Row

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related