Pandas convert string into list to use explode on

Question

I'm working with a dataframe that contains multiple columns, my goal is to create one extra column which contains a list of the values from the columns and then, explode the dataframe on that new column.

This is the original dataset:

         id  day_a1  day_a2  ...   day_a6
13804  002n    25.0    25.0  ...     25.0
30842  002c    30.0    30.0  ...     30.0
1624   002k    25.0     NaN  ...     25.0
8959   002j    25.0    25.0  ...     25.0
21216  003t    25.0    25.0  ...     25.0

I use df['vector'] = df[['day_a1,'day_a2','day_a3','day_a4','day_a5','day_a6']].astype(str).apply(lambda x: ','.join(axis=1) to create this extra column that should be a list of all the dates for the day columns from 1 to 6.

print(df['vector']) returns the following output:

13804    25.0,25.0,24.0,25.0,25.0,25.0
30842    30.0,30.0,31.0,28.0,31.0,30.0
1624         25.0,nan,nan,nan,nan,25.0
8959     25.0,25.0,25.0,25.0,25.0,25.0

This is not being interpreted as a list, so if try new_df = df.explode('vector') nothing happens.

But also, I've tried using the following to convert the column vector into a list:

def listing(row):
    val = list(row['vector'])
    return val
df['vector_b'] = df.apply(listing,axis=1)

But it also doesn't work, because each row is interpreted as string, hence the list is being created as:

13804    [2, 5, ., 0, ,, 2, 5, ., 0, ,, 2, 4, ., 0, ,, ...
30842    [3, 0, ., 0, ,, 3, 0, ., 0, ,, 3, 1, ., 0, ,, ...
1624     [2, 5, ., 0, ,, n, a, n, ,, n, a, n, ,, n, a, ...

How can I create an extra column with the values of the columns day_a1,day_a2, to day_a6 that will be interpreted as a list to later use explode on?

I've tried also using ast.literal_eval() in a custom function and it didn't work because it returned error.
I need to use .astype(str) before applying the lambda otherwise I get an error saying string was expected but recieved float.

Thanks.

The expected output would be this:

         id  vector  
13804  002n    25.0 
13804  002n    25.0
       ....    ....
13804  002n    25.0
30842  002c    30.0
30842  002c    30.0
  ...   ...     ...
30842  002c    30.0
1624   002k    25.0
1624   002k     NaN
 ...    ...     ...
1624   002k    25.0

Instead of astype(str).apply(','.join, axis=1) do astype(str).apply(list, axis=1)? — Quang Hoang
– Quang Hoang, Commented Nov 15, 2019 at 20:09

Quang Hoang · Accepted Answer · 2019-11-15 20:11:31Z

2

On the second thought, this might work better for you:

df.set_index('id', append=True).stack()

Output:

       id          
13804  002n  day_a1    25.0
             day_a2    25.0
             day_a6    25.0
30842  002c  day_a1    30.0
             day_a2    30.0
             day_a6    30.0
1624   002k  day_a1    25.0
             day_a6    25.0
8959   002j  day_a1    25.0
             day_a2    25.0
             day_a6    25.0
21216  003t  day_a1    25.0
             day_a2    25.0
             day_a6    25.0
dtype: float64

answered Nov 15, 2019 at 20:11

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Celius Stingher Over a year ago

Yes both apply(list,axis=1) and this answer provide an expected output, thanks! I'll mark it as answer as soon as I can.

Quang Hoang Over a year ago

@IvanLibedinsky this is recommended as it is vectorized. It would perform much faster than apply then explode.

fsl · Accepted Answer · 2019-11-15 20:14:57Z

1

You could also do:

df[['day_a1','day_a2','day_a3','day_a4','day_a5','day_a6']].apply(lambda x: x.tolist(), axis=1)

answered Nov 15, 2019 at 20:14

fsl

3,2801 gold badge12 silver badges21 bronze badges

Collectives™ on Stack Overflow

Pandas convert string into list to use explode on

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related