2

I am trying to produce some new columns based on data from different columns and rows. For instance, take the below series:

df = pd.Series(['Fruit[edit]','Apple','Orange','Banana','Vegetable[edit]','Celery','Beans','Kale'])

0        Fruit[edit]
1              Apple
2             Orange
3             Banana
4    Vegetable[edit]
5             Celery
6              Beans
7               Kale

I'm starting off with a series where the elements with "[edit]" represent the categories, and the rest are the names of the items that belong in that category. I would like to create two new columns, one showing the "Category" (i.e. fruit or vegetable) and another with the column title "Name" showing the items belonging to that category.

The end result should look something like this:

Desired Result

    Category    Name
0   Fruit       Apple
1   Fruit       Orange
2   Fruit       Banana
3   Vegetable   Celery
4   Vegetable   Beans
5   Vegetable   Kale

As we go down the series, I would like the code to recognize a new category (i.e. elements that endwith '[edit]' and store that as the updated category for the items until a newer category is reached.

3 Answers 3

3

Use:

#if necessary convert Series to DataFrame 
df = df.to_frame('Name')
#get rows with edit
mask = df['Name'].str.endswith('[edit]')
#remove edit
df.loc[mask, 'Name'] = df['Name'].str[:-6]
#create Category column
df.insert(0, 'Category', df['Name'].where(mask).ffill())
#remove rows with same values in columns
df = df[~mask].copy()
print (df)
    Category    Name
1      Fruit   Apple
2      Fruit  Orange
3      Fruit  Banana
5  Vegetable  Celery
6  Vegetable   Beans
7  Vegetable    Kale
Sign up to request clarification or add additional context in comments.

Comments

2

This may be ugly, but does the job:

df = pd.DataFrame(df) #since df is a series
df['Name']=df.groupby(df[0].str.contains('edit').cumsum())[0].apply(lambda x: x.shift(-1))
df=df.dropna().rename(columns={0:'Category'})
df.loc[~df.Category.str.contains('edit'),'Category']=np.nan
df.Category=df.Category.ffill()
df.Category=df.Category.str.split("[").str[0]
print(df)

    Category    Name
0      Fruit   Apple
1      Fruit  Orange
2      Fruit  Banana
4  Vegetable  Celery
5  Vegetable   Beans
6  Vegetable    Kale

Comments

2

You can use str.extract to extract groups based on presence of the keyword,

new_df = df.str.extract('(?P<Category>.*\[edit\])?(?P<Name>.*)')\
.replace('\[edit\]', '', regex = True).ffill()\
.replace('', np.nan).dropna()

    Category    Name
1   Fruit   Apple
2   Fruit   Orange
3   Fruit   Banana
5   Vegetable   Celery
6   Vegetable   Beans
7   Vegetable   Kale

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.