1

dataframe 'df' has the following data -

Column A Column B
Item_ID1 Information - information for item that has ID as 1\nPrice - $7.99\nPlace - Albany, NY
Item_ID2 Information - item's information with ID as 2\nPrice - $5.99\nPlace - Ottawa, ON

How to segregate the values from column B using 'Information', 'Price', and 'Place' into different columns like -

Column A Information Price Place
Item_ID1 information for item that has ID as 1 $7.99 Albany, NY
Item_ID2 item's information with ID as 2 $5.99 Ottawa, ON

I tried splitting the column B based on string values like 'Information - ', 'Price - ', 'Place - ' but that is becoming more complicated and the very first slice has information on Price and Place which is not required in it.

3 Answers 3

2

You can approach this by using pandas.Series.split :

df[["Information", "Price", "Place"]]= df.pop("Column B").str.split(r"\\n", expand=True)

df= df.astype(str).apply(lambda x: x.replace(x.name, "", regex=True).str.strip(" - "))

# Output :

print(df.to_string())

   Column A                            Information  Price       Place
0  Item_ID1  information for item that has ID as 1  $7.99  Albany, NY
1  Item_ID2        item's information with ID as 2  $5.99  Ottawa, ON
Sign up to request clarification or add additional context in comments.

Comments

2

For a generic method in which you do not need to know the future columns in advance, you can use str.extractall and a pivot:

out = df.drop(columns='Column B').join(
 df['Column B']
 .str.extractall(r'([^-]+) - ([^\n]+)\n?')
 .droplevel('match')
 .pivot(columns=0, values=1)
)

NB. I am assuming that you have real newlines, if you have instead the two characters \ and n, you can convert with df['Column B'] = df['Column B'].str.replace(r'\\n', '\n')

Output:

   Column A                            Information       Place  Price
0  Item_ID1  information for item that has ID as 1  Albany, NY  $7.99
1  Item_ID2        item's information with ID as 2  Ottawa, ON  $5.99

Comments

1

Another possible solution, based on the following ideas:

  1. Split Column B by \s-\s|\\n, using pandas.Series.str.split.

  2. Reshape the result, using numpy.reshape.

  3. Apply pandas.pivot_table.

(pd.concat([df['Column A'], pd.DataFrame(
    df['Column B'].str.split(r'\s-\s|\\n', expand=True, regex=True).values
    .reshape((-1,2)))
 .pivot_table(columns=0, values=1, aggfunc=list)
 .pipe(lambda d: d.explode(d.columns.tolist(), ignore_index=True))], axis=1))

Output:

   Column A                            Information       Place  Price
0  Item_ID1  information for item that has ID as 1  Albany, NY  $7.99
1  Item_ID2        item's information with ID as 2  Ottawa, ON  $5.99

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.