Pandas Dataframe - Values are lists

Question

I have a Pandas Dataframe with one single column, but the values for each row are lists of five elements, something like that:

	Column
timestamp
06:54:00	[1, 2, 3, 4, 5 ]
06:55:00	[0.5, 2.3, 4.5, 1, 3 ]

I would like to separate the data so I get another five columns, each of them containing one of the values of the list per row. Like this (I put only the two first ones to save space):

	Column	Column 1	Column 2
timestamp
06:54:00	[1, 2, 3, 4, 5 ]	1	2
06:55:00	[0.5, 2.3, 4.5, 1, 3 ]	0.5	2.3

I tried with:

        L = [pd.DataFrame(data[col].values.tolist()) for col in data]
        print(L)
        df_new = pd.concat(L, axis=1, ignore_index=True)
        print(df_new)

And

        for column in data.columns:
            column_name = f'TColumn {column}'
            val = data[column][column]
            n = 0
            for n in range(5):
                data[column_name] = val[n]
                n = n + 1
        print(data)

I haven't managed to get anything, could someone please give me a hand with this?

Thank you in advance,

yaputra jordi · Accepted Answer · 2021-12-08 16:59:52Z

2

To further simplify what @Manlai A has posted, we can create new columns on-the-fly like this:

df[[f'Column {i}' for i in range(5)]] = df['Columns'].tolist()

And yes, this oneliner actually answer the question above.

Here a small demo with dummy data to help it to be more reproducible: https://colab.research.google.com/drive/1NJLuS0thpjz4U-REpu1vOtrSfYdWmFIn?usp=sharing

Edit 1

For the second question asked in the comment section below:

"If I now have some rows that have empty lists are values ([]) and the rest of them are as in the example (lists with 5 or 6 elements), and I want to create a new column with the first of the elements of the list and, if empty, just delete the row, how could I do that?"

If you have for example a dummy table df like this:

    Columns
0   []
1   [2]
2   [18, 14]
3   [12, 19, 5]
4   [13, 12, 2, 19]
5   [8, 0, 10, 19, 8]
6   [12, 1, 4, 7, 14, 14]
7   [18, 2, 6, 12, 6, 12, 9]
8   [0, 8, 4, 19, 4, 5, 7, 4]
9   [11, 8, 5, 11, 3, 2, 4, 6, 12]

and you want to take the first item of each row if exist, you can do it like this:

df['Item'] = df['Columns'].apply(lambda items: items[0] if len(items) else None)

and the table will become:

    Columns                          Item
0   []                                NaN
1   [2]                               2.0
2   [18, 14]                         18.0
3   [12, 19, 5]                      12.0
4   [13, 12, 2, 19]                  13.0
5   [8, 0, 10, 19, 8]                 8.0
6   [12, 1, 4, 7, 14, 14]            12.0
7   [18, 2, 6, 12, 6, 12, 9]         18.0
8   [0, 8, 4, 19, 4, 5, 7, 4]         0.0
9   [11, 8, 5, 11, 3, 2, 4, 6, 12]   11.0

After that you can simply drop any row that contains NA value (None, np.NaN, pd.NA, etc):

df = df.dropna(axis=0)

and it will become:

    Columns                          Item
1   [2]                               2.0
2   [18, 14]                         18.0
3   [12, 19, 5]                      12.0
4   [13, 12, 2, 19]                  13.0
5   [8, 0, 10, 19, 8]                 8.0
6   [12, 1, 4, 7, 14, 14]            12.0
7   [18, 2, 6, 12, 6, 12, 9]         18.0
8   [0, 8, 4, 19, 4, 5, 7, 4]         0.0
9   [11, 8, 5, 11, 3, 2, 4, 6, 12]   11.0

Notice that the index 0 is now missing. To reset the index, you can call

df = df.reset_index()

I've also included this second answer into the previous demo.

edited Dec 8, 2021 at 16:59

answered Dec 2, 2021 at 18:21

yaputra jordi

5333 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Sara.SP92 Over a year ago

I see that this works in your demo but for some reason I'm getting an error when trying to run it in my code: ValueError: setting an array element with a sequence. TypeError: only size-1 arrays can be converted to Python scalars

Sara.SP92 Over a year ago

Sorry, I copied the wrong error message, it says ValueError: Must have equal len keys and value when setting with an ndarray

Sara.SP92 Over a year ago

Sorry, I made it work finally! Your code was correct, the problem was that instead of 5 elements on my list I had 6 (I wasn't seeing one of them), so I changed the range and it's working perfectly now. Thank you!

Sara.SP92 Over a year ago

@yaputrajordi I have another question related to this, but I don't know if I should open a new one or just ask it here. If I now have some rows that have empty lists are values ([]) and the rest of them are as in the example (lists with 5 or 6 elements), and I want to create a new column with the first of the elements of the list and, if empty, just delete the row, how could I do that? I've been trying some variations of your code but I haven't managed to make it work.

yaputra jordi Over a year ago

You can add additional question by editing your main question. That way I can answer by editing my current answer too. Since it is easier to explain in answer post rather than comment section. For now I'll try to answer your question by updating the Demo I've provided.

|

Collectives™ on Stack Overflow

Pandas Dataframe - Values are lists

1 Answer 1

Edit 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Edit 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related