Adding columns dynamically to a pandas dataframe, from a list contained in the dataframe

Question

I have a dataframe in which the first column contains a list of random size, from 0 to around 10 items in each list. This dataframe also contains several other columns of data.

I would like to insert as many columns as the length of the longest list, and then populate the values across sequentially such that each column has one item from the list in column one.

I was unsure of a good way to go about this.

sample = [[[0,2,3,7,8,9],2,3,4,5],[[1,2],2,3,4,5],[[1,3,4,5,6,7,8,9,0],2,3,4,5]]
headers = ["col1","col2","col3","col4","col5"]
df = pd.DataFrame(sample, columns = headers)

In this example I would like to add 9 columns after column 1, as this is the maxiumum length of the list in the third row of the dataframe. These columns would be populated with:

 0 2  3    7    8     9  NULL NULL NULL in the first row,
 1 2 NULL NULL NULL NULL NULL NULL NULL in the second, etc...

Please build some code with a copyable sample dataframe, and showing what you would like to get, and what is your current attemp. It will greatly help others to give relevant answers. BTW as a new user, you should read How to Ask to know how to present question on this site... — Serge Ballesta
– Serge Ballesta, Commented Feb 12, 2020 at 14:38
Did you try my answer? The solution applies in the same way with your example set — Celius Stingher
– Celius Stingher, Commented Feb 12, 2020 at 15:08

Celius Stingher · Accepted Answer · 2020-02-12 15:08:14Z

1

Edit to fit OPs edit

This is how I would do it. First I would pad the lists of the original column so that they're all the same length and it's easier to work with them. Afterwards it's a matter of creating the columns and filling it with the value corresponding to the position in the list. Let's say our lists are of size up to 4 for an easier example:

df = pd.DataFrame(sample, columns = headers)
df = df.rename(columns={'col1':'col_of_lists'})
max_length = max(df['col_of_lists'].apply(lambda x:len(x)))
df['col_of_lists'] = df['col_of_lists'].apply(lambda x:x + ([np.nan] * (max_length - len(x))))
for i in range(max_length):
    df['col_'+str(i)] = df['col_of_lists'].apply(lambda x: x[i])

edited Feb 12, 2020 at 15:08

answered Feb 12, 2020 at 14:55

Celius Stingher

18.4k6 gold badges26 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

filbranden · Accepted Answer · 2020-02-12 15:55:20Z

The easiest way to turn a series of lists into separate columns is to use apply to convert them into a Series, which triggers the 'expand' result type:

result = df['col1'].apply(pd.Series)

At this point, we can adjust the columns from the automatically numbered to include the name of the original 'col1', for example:

result.columns = [
    'col1_{}'.format(i + 1)
    for i in result.columns]

Finally, we can join it back to the original DataFrame. Using the fact that this was the first column makes it easy, just joining it to the left of the original frame, dropping the original 'col1' in the process:

result = result.join(df.drop('col1', axis=1))

You can even do it all as a one-liner, by using the rename() method to change column names:

df['col1'].apply(pd.Series).rename(
    lambda i: 'col1_{}'.format(i + 1),
    axis=1,
).join(df.drop('col1', axis=1))

Collectives™ on Stack Overflow

Adding columns dynamically to a pandas dataframe, from a list contained in the dataframe

2 Answers 2

Edit to fit OPs edit

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Edit to fit OPs edit

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related