Python Split a value of arrays into different columns

Question

Thanks for the help in advance. I have a pandas data frame that looks like this:

     index   source    timestamp    value
      1        car       1         ['98']
      2        bike      2         ['98', 100']
      3        car       3         ['65']
      4        bike      4         ['100', '120']
      5        plane     5         ['20' , '12', '30']

What I need, is to convert each value inside the 'value' Panda series..to a new column. So the output would be like this:

      index   source    timestamp   car  bike1  bike2  plane1  plane2  plane3
        1      car          1       98    Na     Na     Na       Na     Na
        2      bike         2       Na    98     100    Na       Na     Na
        3      car          3       65    Na     Na     Na       Na     Na
        4      bike         4       Na    100    120    Na       Na     Na
        5      plane        5       Na    Na     Na     20       12     30

For car, the size of the array will always be one, for bike 2 and for plane 3. And that translates to the number of new columns that I need in the new data frame. What is the best way to achieve this?

type(df['value']) returns <class 'pandas.core.series.Series'> — Mauricio Rodriguez
– Mauricio Rodriguez, Commented Oct 11, 2018 at 12:00
yes, but I ask about one value, not column. what is print (type(df.loc[1, 'value'])) ? — jezrael
– jezrael, Commented Oct 11, 2018 at 12:00
could you please post the code that created the dataframe, will save time, thanks :) — quest
– quest, Commented Oct 11, 2018 at 12:06

jezrael · Accepted Answer · 2018-10-11 12:09:01Z

1

First convert values to lists:

import ast
df['value'] = df['value'].apply(ast.literal_eval)

Then create dictionaries for each row:

L = [{f'{i}{x+1}':y for x, y in enumerate(j)} for i, j in zip(df['source'], df['value'])]
print (L)
[{'car1': '98'}, 
 {'bike1': '98', 'bike2': '100'}, 
 {'car1': '65'}, 
 {'bike1': '100', 'bike2': '120'}, 
 {'plane1': '20', 'plane2': '12', 'plane3': '30'}]

Create DataFrame and join to original df:

df = df.join(pd.DataFrame(L, index=df.index))
print (df)
   index source  timestamp         value bike1 bike2 car1 plane1 plane2 plane3
0      1    car          1          [98]   NaN   NaN   98    NaN    NaN    NaN
1      2   bike          2     [98, 100]    98   100  NaN    NaN    NaN    NaN
2      3    car          3          [65]   NaN   NaN   65    NaN    NaN    NaN
3      4   bike          4    [100, 120]   100   120  NaN    NaN    NaN    NaN
4      5  plane          5  [20, 12, 30]   NaN   NaN  NaN     20     12     30

answered Oct 11, 2018 at 12:09

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jezrael Over a year ago

@MauricioRodriguez - You are welcome! And thank you for comment ;)

Collectives™ on Stack Overflow

Python Split a value of arrays into different columns

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related