0

I have got an array 'mutlilabel' which looks like this:

       [[0, 0, 0, 1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 1, 0, 0, 0, 0],
                  ...
       [0, 0, 0, 1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0]]

and want to store each of those arrays in my target variable as I am facing a multi-label classification task. How can I achieve that? My code:

pd.DataFrame(multilabel)

Outputs multiple columns:

0   1   2   3   4   5   6   7

0   0   0   0   0   1   0   0   0
1   1   0   0   0   0   0   0   0
2   1   0   0   0   0   0   0   0

Thanks in advance!

2
  • 1
    Are you saying you just want the dataframe to have a single column with arrays of length 8 as each value in the column? Commented Sep 1, 2021 at 14:08
  • exactly! I want this as my target variable for a neural network Commented Sep 1, 2021 at 14:14

3 Answers 3

2
df = pd.DataFrame(list(multilabel))
list_column = df.apply(lambda row: row.values, axis=1)
pd.DataFrame(list_column, columns=['list_column'])

Result df:enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

But you don't need to change your multilabel list of lists to a df to train your NN.
Can I just pass it? I am new to NN and especially to the Huggingface library with its Dataset objects. I have to do the train_test_split like splitting it into train_data and test_data and not the x_train/x_test/y_train/y_test structure. Thats why I want to transfer that multilabel list into a dataframe structure...
from sklearn.model_selection.train_test_split: "Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes."
1

Have you consider using the following trick?

import pandas as pd

arr =  [[0, 0, 0, 1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0]]

pd.DataFrame([arr]).T

Output

                          0
0  [0, 0, 0, 1, 0, 0, 0, 0]
1  [1, 0, 0, 0, 0, 0, 0, 0]
2  [1, 0, 0, 1, 0, 0, 0, 0]
3  [0, 0, 0, 1, 0, 0, 0, 0]
4  [1, 0, 0, 0, 0, 0, 0, 0]

EDIT In case you are using numpy arrays you can use the following

import numpy as np

pd.DataFrame(np.array(arr))\
  .apply(lambda x: np.array(x), axis=1)

9 Comments

Doesn't work for me! It just transposes my 8x1026 array into a dataframe od 1024x8
That was @soulwreckedyouth, sorry
Probably the data is in a numpy array, rather than a python list of lists; that gives exactly this error
Thank Jiri. @soulwreckedyouth as future reference try to ask question using these resources how-to-ask mcve. If in your question you use lists instead of numpy arrays people try to answer using the same format of data you provided.
sorry for that. will do that in the future
|
1

So, the real question is why... it doesn't seem like the most useful data structure.

That said, the one-dimensional data type in pandas is the Series:

>>> pd.Series(multilabel)
0    [0, 0, 0, 1, 0, 0, 0, 0]
1    [1, 0, 0, 0, 0, 0, 0, 0]
2    [1, 0, 0, 1, 0, 0, 0, 0]
3    [0, 0, 0, 1, 0, 0, 0, 0]
4    [1, 0, 0, 0, 0, 0, 0, 0]
dtype: object

You can then convert it further into a DataFrame:

>>> pd.DataFrame(pd.Series(multilabel))
                          0
0  [0, 0, 0, 1, 0, 0, 0, 0]
1  [1, 0, 0, 0, 0, 0, 0, 0]
2  [1, 0, 0, 1, 0, 0, 0, 0]
3  [0, 0, 0, 1, 0, 0, 0, 0]
4  [1, 0, 0, 0, 0, 0, 0, 0]

Edit: Per further discussion, this works if multilabel is a nested Python list, but not if it's a NumPy array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.