0

df=spark.sql("select key, name, subjects from table")

df in from above select statement :

key name    subjects
12  x,y,z   1,2,3
20  a,b     8,7

df out :

12  x 1
12  y 2
12  z 3
20  a 8
20  b 7

tried converting to list , explode. Still throwing error. pls help the efficient way to achieve this ?

1

2 Answers 2

2

One way using pandas.DataFrame.apply:

# df["name"] = df["name"].str.split(",")
# df["subjects"] = df["subjects"].str.split(",")
# If not already split

new_df = df.apply(pd.Series.explode)
print(new_df)

Output:

   key name subjects
0   12    x        1
0   12    y        2
0   12    z        3
1   20    a        8
1   20    b        7
Sign up to request clarification or add additional context in comments.

Comments

0

Thanks chris. It is getting exploded. Still facing the error - Cannot reindex from a duplicate axis. Concat with ignore index is not working .Is it possible to generate temp unique indexes as key is duplicated during explode. pandasversion -1.0.5

df["name"] = df["name"].str.split(",") 
df["subjects"] = df["subjects"].str.split(",") 
new_df= df.apply(pd.Series.explode).reindex() 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.