2

Thanks for the help in advance. I have a pandas data frame that looks like this:

     index   source    timestamp    value
      1        car       1         ['98']
      2        bike      2         ['98', 100']
      3        car       3         ['65']
      4        bike      4         ['100', '120']
      5        plane     5         ['20' , '12', '30']

What I need, is to convert each value inside the 'value' Panda series..to a new column. So the output would be like this:

      index   source    timestamp   car  bike1  bike2  plane1  plane2  plane3
        1      car          1       98    Na     Na     Na       Na     Na
        2      bike         2       Na    98     100    Na       Na     Na
        3      car          3       65    Na     Na     Na       Na     Na
        4      bike         4       Na    100    120    Na       Na     Na
        5      plane        5       Na    Na     Na     20       12     30

For car, the size of the array will always be one, for bike 2 and for plane 3. And that translates to the number of new columns that I need in the new data frame. What is the best way to achieve this?

5
  • What is print (type(df.loc[1, 'value'])) ? Commented Oct 11, 2018 at 11:57
  • type(df['value']) returns <class 'pandas.core.series.Series'> Commented Oct 11, 2018 at 12:00
  • yes, but I ask about one value, not column. what is print (type(df.loc[1, 'value'])) ? Commented Oct 11, 2018 at 12:00
  • 1
    My bad. That returns 'str' Commented Oct 11, 2018 at 12:04
  • could you please post the code that created the dataframe, will save time, thanks :) Commented Oct 11, 2018 at 12:06

1 Answer 1

1

First convert values to lists:

import ast
df['value'] = df['value'].apply(ast.literal_eval)

Then create dictionaries for each row:

L = [{f'{i}{x+1}':y for x, y in enumerate(j)} for i, j in zip(df['source'], df['value'])]
print (L)
[{'car1': '98'}, 
 {'bike1': '98', 'bike2': '100'}, 
 {'car1': '65'}, 
 {'bike1': '100', 'bike2': '120'}, 
 {'plane1': '20', 'plane2': '12', 'plane3': '30'}]

Create DataFrame and join to original df:

df = df.join(pd.DataFrame(L, index=df.index))
print (df)
   index source  timestamp         value bike1 bike2 car1 plane1 plane2 plane3
0      1    car          1          [98]   NaN   NaN   98    NaN    NaN    NaN
1      2   bike          2     [98, 100]    98   100  NaN    NaN    NaN    NaN
2      3    car          3          [65]   NaN   NaN   65    NaN    NaN    NaN
3      4   bike          4    [100, 120]   100   120  NaN    NaN    NaN    NaN
4      5  plane          5  [20, 12, 30]   NaN   NaN  NaN     20     12     30
Sign up to request clarification or add additional context in comments.

1 Comment

@MauricioRodriguez - You are welcome! And thank you for comment ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.