9

Using Pandas 0.25.3, trying to explode a couple of columns.

Data looks like:

d1 = {'user':['user1','user2','user3','user4'],
      'paid':['Y','Y','N','N']
      'last_active':['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018'],
      'col4':'data'}

I sent this to a dataframe df=pd.DataFrame([d1],columns=d1.keys()) that looks like this:

user                              paid              last_active                                                col4               
['user1','user2','user3','user4'] ['Y','Y','N','N'] ['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018']  'data'

there are other columns as well with one value per, {'A':'B'} type stuff, but I'm not worried about those.

when I do df.explode('user') it works for that one, and same for the other columns, but when I try to do df.explode(column=('user','paid','last_active') it gives me the following error:

KeyError: ('user','paid','last_active')

So what I want to know, is how can I explode it with the explode function on multiple columns to get the following df:

user     paid  last_active    col4
'user1'  'Y'   '11 Jul 2019'  'data'
'user2'  'Y'   '23 Sep 2018'  NaN
'user3'  'N'   '08 Dec 2019'  NaN
'user4'  'N'   '03 Mar 2018'  NaN
3
  • 2
    just do df=pd.DataFrame(d1)., without [] Commented Dec 17, 2019 at 15:55
  • it gives me an error on account of the arrays are not the same length (col4 has 1 element in it, the others have multiple) Commented Dec 17, 2019 at 16:03
  • @QuangHoang that would give you a row having data for each row (not only the first row) Commented Dec 17, 2019 at 16:10

2 Answers 2

6

Pandas does not have a multi-column explode. There are workarounds. One such simple way could be:

df = pd.DataFrame(
    {
        'A': [1, 2],
        'B': [['a','b'], ['c','d']],
        'C': [['z','y'], ['x','w']]
    }
)
print(df)

--------------
A    B     C
--------------
1 [a, b] [z, y]
2 [c, d] [x, w]

##Let us say list_cols are the columns to be exploded
list_cols = {'B','C'}

other_cols = list(set(df.columns) - set(list_cols))
##other_cols now contains all the remaining column names in the df
##we temporarily convert to set() to easily get the differences in 2 lists

##now explode the list_cols using a loop
exploded = [df[col].explode() for col in list_cols]
##now we have long list of exploded values. Print to see the format

##This statement creates pairs of the exploded cols
##zip command is used to create the pairs
##dict puts it in an appropriate format from which a dataframe can be created
##Please print the individual outputs of each command to understand the flow
df2 = pd.DataFrame(dict(zip(list_cols, exploded)))

##Now merge back the other_cols as well
df2 = df[other_cols].merge(df2, how="right", left_index=True, right_index=True)

##lastly, re-create the original column order
df2 = df2.loc[:, df.columns]

print(df2)

------
A B C
------
1 a z
1 b y
2 c x
2 d w
Sign up to request clarification or add additional context in comments.

4 Comments

I applied this logic but every time the B and C columns are getting interchanged.
the above code should work. Please share your code to check what is going wrong
Please could you add some more step-by-step description to your code. It is hard do understand what happens here.
I have added a few comments inline
4

I guess you need (note the difference in data for col4 which has None as OP mentioned):

pd.DataFrame([[i] if not isinstance(i,list) else i 
             for i in d1.values()],index=d1.keys()).T

    user paid  last_active  col4
0  user1    Y  11 Jul 2019  data
1  user2    Y  23 Sep 2018  None
2  user3    N  08 Dec 2019  None
3  user4    N  03 Mar 2018  None

2 Comments

@anky_91 nice one! +1
@anky what about if I have a dataframe, no dictionary, how could I modify your above code to get the same result directly from exploding a dataframe or applying the above code to my dataframe? This works great for my test dictionary issue, but my data is in a df and even changing it to_dict() causes it to not be in the correct format to apply your above code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.