2

I have two columns, phones and emails, that need to be exploded into rows. I have figured out how to do this to either one, but not both simultaneously. The biggest problem is that I may have 0 to many phones and 0 to many emails. So, if a customer has three emails and no phones, then I need 3 rows. If they have four phones and three emails, then I need 4 rows. One for each phone, and the three emails in those four rows. Example data:

| many columns | phones | emails |
|:-------------|:------:|:-------|
| row 1        | A,B,C  | A,B    |
| row 2        |        | D,E,F  |

Example Results:

| many columns | phones | emails |
|:-------------|:------:|:-------|
| row 1        | A      | A      |
| row 1        | B      | B      |
| row 1        | C      |        |
| row 2        |        | D      |
| row 2        |        | E      |
| row 2        |        | F      |
# Convert cell contents into lists rather than strings
df0['phones'] = df0['phones'].str.split(";", expand=False)
df0['emails'] = df0['emails'].str.split(",", expand=False)
df0 = df0.apply(pd.Series.explode) # DOES NOT WORK

When I try the above code, I get the error: ValueError: cannot reindex on an axis with duplicate labels

1

2 Answers 2

1

I assume the index on your original dataframe is unique. If not, run df = df.reset_index() before the following snippet:

columns = ["phones", "emails"]

# Explode each column individually, but instead of using `explode`, we will
# use`stack` to give us a second index level
exploded = [
    df[col].str.split(",", expand=True).stack().rename(col)
    for col in columns
]

# Align the exploded columns
exploded = pd.concat(exploded, axis=1).droplevel(-1)

# Merge it with the original data frame
result = pd.concat([df.drop(columns=columns), exploded], axis=1)
Sign up to request clarification or add additional context in comments.

1 Comment

Worked amazing except for the last line. I had to change it to df0 = df0.drop(columns=columns) result = df0.join(exploded) to get it to join on index. Thank you soooo much!
0
import itertools
import pandas as pd
import numpy as np
from pandas import DataFrame as df



df = pd.DataFrame({"x":[1,3,7],"y":["A","B","C"], 
                   "z":["p1,p2,p3","p4","p5,p6"],"package_code":["111,222,333","444","555,666"]})


print(df)
"""
   x  y         z package_code
0  1  A  p1,p2,p3  111,222,333
1  3  B        p4          444
2  7  C     p5,p6      555,666

"""
aa = (
    df.set_index(['x','y'])
    .apply(lambda col : pd.Series(col).str.split(','))
    .explode(['z','package_code'])
    .reset_index()
    .reindex(df.columns,axis=1)
    )
print(aa)

"""
   x  y   z package_code
0  1  A  p1          111
1  1  A  p2          222
2  1  A  p3          333
3  3  B  p4          444
4  7  C  p5          555
5  7  C  p6          666
"""

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.