Slice pandas dataframe column into multiple columns using substring

Question

dataframe 'df' has the following data -

Column A	Column B
Item_ID1	Information - information for item that has ID as 1\nPrice - $7.99\nPlace - Albany, NY
Item_ID2	Information - item's information with ID as 2\nPrice - $5.99\nPlace - Ottawa, ON

How to segregate the values from column B using 'Information', 'Price', and 'Place' into different columns like -

Column A	Information	Price	Place
Item_ID1	information for item that has ID as 1	$7.99	Albany, NY
Item_ID2	item's information with ID as 2	$5.99	Ottawa, ON

I tried splitting the column B based on string values like 'Information - ', 'Price - ', 'Place - ' but that is becoming more complicated and the very first slice has information on Price and Place which is not required in it.

Timeless · Accepted Answer · 2022-11-22 08:38:35Z

2

You can approach this by using pandas.Series.split :

df[["Information", "Price", "Place"]]= df.pop("Column B").str.split(r"\\n", expand=True)

df= df.astype(str).apply(lambda x: x.replace(x.name, "", regex=True).str.strip(" - "))

# Output :

print(df.to_string())

   Column A                            Information  Price       Place
0  Item_ID1  information for item that has ID as 1  $7.99  Albany, NY
1  Item_ID2        item's information with ID as 2  $5.99  Ottawa, ON

answered Nov 22, 2022 at 8:38

Timeless

38.3k6 gold badges33 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mozway · Accepted Answer · 2022-11-22 08:43:03Z

2

For a generic method in which you do not need to know the future columns in advance, you can use str.extractall and a pivot:

out = df.drop(columns='Column B').join(
 df['Column B']
 .str.extractall(r'([^-]+) - ([^\n]+)\n?')
 .droplevel('match')
 .pivot(columns=0, values=1)
)

NB. I am assuming that you have real newlines, if you have instead the two characters \ and n, you can convert with df['Column B'] = df['Column B'].str.replace(r'\\n', '\n')

Output:

   Column A                            Information       Place  Price
0  Item_ID1  information for item that has ID as 1  Albany, NY  $7.99
1  Item_ID2        item's information with ID as 2  Ottawa, ON  $5.99

answered Nov 22, 2022 at 8:43

mozway

267k13 gold badges56 silver badges106 bronze badges

Comments

PaulS · Accepted Answer · 2022-11-22 10:58:55Z

1

Another possible solution, based on the following ideas:

Split Column B by \s-\s|\\n, using pandas.Series.str.split.
Reshape the result, using numpy.reshape.
Apply pandas.pivot_table.

(pd.concat([df['Column A'], pd.DataFrame(
    df['Column B'].str.split(r'\s-\s|\\n', expand=True, regex=True).values
    .reshape((-1,2)))
 .pivot_table(columns=0, values=1, aggfunc=list)
 .pipe(lambda d: d.explode(d.columns.tolist(), ignore_index=True))], axis=1))

Output:

   Column A                            Information       Place  Price
0  Item_ID1  information for item that has ID as 1  Albany, NY  $7.99
1  Item_ID2        item's information with ID as 2  Ottawa, ON  $5.99

edited Nov 22, 2022 at 10:58

answered Nov 22, 2022 at 10:39

PaulS

27.1k3 gold badges18 silver badges40 bronze badges

Collectives™ on Stack Overflow

Slice pandas dataframe column into multiple columns using substring

3 Answers 3

# Output :

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

# Output :

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related