1

I have a dataframe in which the first column contains a list of random size, from 0 to around 10 items in each list. This dataframe also contains several other columns of data.

I would like to insert as many columns as the length of the longest list, and then populate the values across sequentially such that each column has one item from the list in column one.

I was unsure of a good way to go about this.

sample = [[[0,2,3,7,8,9],2,3,4,5],[[1,2],2,3,4,5],[[1,3,4,5,6,7,8,9,0],2,3,4,5]]
headers = ["col1","col2","col3","col4","col5"]
df = pd.DataFrame(sample, columns = headers)

In this example I would like to add 9 columns after column 1, as this is the maxiumum length of the list in the third row of the dataframe. These columns would be populated with:

 0 2  3    7    8     9  NULL NULL NULL in the first row,
 1 2 NULL NULL NULL NULL NULL NULL NULL in the second, etc... 
3
  • could you show an example? Commented Feb 12, 2020 at 14:38
  • 1
    Please build some code with a copyable sample dataframe, and showing what you would like to get, and what is your current attemp. It will greatly help others to give relevant answers. BTW as a new user, you should read How to Ask to know how to present question on this site... Commented Feb 12, 2020 at 14:38
  • Did you try my answer? The solution applies in the same way with your example set Commented Feb 12, 2020 at 15:08

2 Answers 2

1

Edit to fit OPs edit

This is how I would do it. First I would pad the lists of the original column so that they're all the same length and it's easier to work with them. Afterwards it's a matter of creating the columns and filling it with the value corresponding to the position in the list. Let's say our lists are of size up to 4 for an easier example:

df = pd.DataFrame(sample, columns = headers)
df = df.rename(columns={'col1':'col_of_lists'})
max_length = max(df['col_of_lists'].apply(lambda x:len(x)))
df['col_of_lists'] = df['col_of_lists'].apply(lambda x:x + ([np.nan] * (max_length - len(x))))
for i in range(max_length):
    df['col_'+str(i)] = df['col_of_lists'].apply(lambda x: x[i])
Sign up to request clarification or add additional context in comments.

Comments

0

The easiest way to turn a series of lists into separate columns is to use apply to convert them into a Series, which triggers the 'expand' result type:

result = df['col1'].apply(pd.Series)

At this point, we can adjust the columns from the automatically numbered to include the name of the original 'col1', for example:

result.columns = [
    'col1_{}'.format(i + 1)
    for i in result.columns]

Finally, we can join it back to the original DataFrame. Using the fact that this was the first column makes it easy, just joining it to the left of the original frame, dropping the original 'col1' in the process:

result = result.join(df.drop('col1', axis=1))

You can even do it all as a one-liner, by using the rename() method to change column names:

df['col1'].apply(pd.Series).rename(
    lambda i: 'col1_{}'.format(i + 1),
    axis=1,
).join(df.drop('col1', axis=1))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.