2

I have a Pandas Dataframe with one single column, but the values for each row are lists of five elements, something like that:

Column
timestamp
06:54:00 [1, 2, 3, 4, 5 ]
06:55:00 [0.5, 2.3, 4.5, 1, 3 ]

I would like to separate the data so I get another five columns, each of them containing one of the values of the list per row. Like this (I put only the two first ones to save space):

Column Column 1 Column 2
timestamp
06:54:00 [1, 2, 3, 4, 5 ] 1 2
06:55:00 [0.5, 2.3, 4.5, 1, 3 ] 0.5 2.3

I tried with:

        L = [pd.DataFrame(data[col].values.tolist()) for col in data]
        print(L)
        df_new = pd.concat(L, axis=1, ignore_index=True)
        print(df_new)

And

        for column in data.columns:
            column_name = f'TColumn {column}'
            val = data[column][column]
            n = 0
            for n in range(5):
                data[column_name] = val[n]
                n = n + 1
        print(data)

I haven't managed to get anything, could someone please give me a hand with this?

Thank you in advance,

1 Answer 1

2

To further simplify what @Manlai A has posted, we can create new columns on-the-fly like this:

df[[f'Column {i}' for i in range(5)]] = df['Columns'].tolist()

And yes, this oneliner actually answer the question above.

Here a small demo with dummy data to help it to be more reproducible: https://colab.research.google.com/drive/1NJLuS0thpjz4U-REpu1vOtrSfYdWmFIn?usp=sharing

Edit 1

For the second question asked in the comment section below:

"If I now have some rows that have empty lists are values ([]) and the rest of them are as in the example (lists with 5 or 6 elements), and I want to create a new column with the first of the elements of the list and, if empty, just delete the row, how could I do that?"

If you have for example a dummy table df like this:

    Columns
0   []
1   [2]
2   [18, 14]
3   [12, 19, 5]
4   [13, 12, 2, 19]
5   [8, 0, 10, 19, 8]
6   [12, 1, 4, 7, 14, 14]
7   [18, 2, 6, 12, 6, 12, 9]
8   [0, 8, 4, 19, 4, 5, 7, 4]
9   [11, 8, 5, 11, 3, 2, 4, 6, 12]

and you want to take the first item of each row if exist, you can do it like this:

df['Item'] = df['Columns'].apply(lambda items: items[0] if len(items) else None)

and the table will become:

    Columns                          Item
0   []                                NaN
1   [2]                               2.0
2   [18, 14]                         18.0
3   [12, 19, 5]                      12.0
4   [13, 12, 2, 19]                  13.0
5   [8, 0, 10, 19, 8]                 8.0
6   [12, 1, 4, 7, 14, 14]            12.0
7   [18, 2, 6, 12, 6, 12, 9]         18.0
8   [0, 8, 4, 19, 4, 5, 7, 4]         0.0
9   [11, 8, 5, 11, 3, 2, 4, 6, 12]   11.0

After that you can simply drop any row that contains NA value (None, np.NaN, pd.NA, etc):

df = df.dropna(axis=0)

and it will become:

    Columns                          Item
1   [2]                               2.0
2   [18, 14]                         18.0
3   [12, 19, 5]                      12.0
4   [13, 12, 2, 19]                  13.0
5   [8, 0, 10, 19, 8]                 8.0
6   [12, 1, 4, 7, 14, 14]            12.0
7   [18, 2, 6, 12, 6, 12, 9]         18.0
8   [0, 8, 4, 19, 4, 5, 7, 4]         0.0
9   [11, 8, 5, 11, 3, 2, 4, 6, 12]   11.0

Notice that the index 0 is now missing. To reset the index, you can call

df = df.reset_index()

I've also included this second answer into the previous demo.

Sign up to request clarification or add additional context in comments.

6 Comments

I see that this works in your demo but for some reason I'm getting an error when trying to run it in my code: ValueError: setting an array element with a sequence. TypeError: only size-1 arrays can be converted to Python scalars
Sorry, I copied the wrong error message, it says ValueError: Must have equal len keys and value when setting with an ndarray
Sorry, I made it work finally! Your code was correct, the problem was that instead of 5 elements on my list I had 6 (I wasn't seeing one of them), so I changed the range and it's working perfectly now. Thank you!
@yaputrajordi I have another question related to this, but I don't know if I should open a new one or just ask it here. If I now have some rows that have empty lists are values ([]) and the rest of them are as in the example (lists with 5 or 6 elements), and I want to create a new column with the first of the elements of the list and, if empty, just delete the row, how could I do that? I've been trying some variations of your code but I haven't managed to make it work.
You can add additional question by editing your main question. That way I can answer by editing my current answer too. Since it is easier to explain in answer post rather than comment section. For now I'll try to answer your question by updating the Demo I've provided.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.