Setting cell in pandas dataframe with list of strings

Question

I have a custom dataframe B with the following properties (obtained via B.dtypes):

[236 rows x 10 columns]
Filename               object
Path                   object
Filenumber              int64
Channel                object
Folder                 object
User-defined labels    object
Ranking                 int64
Lower limit             int64
Upper limit             int64
Enabled                  bool
dtype: object

As example, calling

print(
    B.loc[
        B["User-defined labels"].explode().eq("Label 1").groupby(level=0).any()
        & (B["Filenumber"] == 98)
    ]
)

gives

        Filename                                               Path  Filenumber      Channel   Folder User-defined labels  Ranking  Lower limit  Upper limit  Enabled
0    File_98.csv  C:\Users\test_user\Documents\Training_data\Label 1\...          98  subfolder_0  Label 1           [Label 1]        0            0       999999     True
39   File_98.csv  C:\Users\test_user\Documents\Training_data\Label 1\...          98  subfolder_1  Label 1           [Label 1]        0            0       999999     True
78   File_98.csv  C:\Users\test_user\Documents\Training_data\Label 1\...          98  subfolder_2  Label 1           [Label 1]        0            0       999999     True
117  File_98.csv  C:\Users\test_user\Documents\Training_data\Label 1\...          98  subfolder_3  Label 1           [Label 1]        0            0       999999     True

Now, I would like to replace/add new labels to the column ["User-defined labels"] by setting them via a list of labels. The list of labels is defined as

new_labels = ["Label 1", "Label 3"]

I accessed the column via

B.loc[
    B["User-defined labels"].explode().eq("Label 1").groupby(level=0).any()
    & (B["Filenumber"] == 98),
    ["User-defined labels"],
]

Initially, I am starting from an output of

        Filename                                               Path  Filenumber      Channel   Folder User-defined labels  Ranking  Lower limit  Upper limit  Enabled
0    File_98.csv  C:\Users\riro\Documents\Training_data\Label 1\...          98  subfolder_0  Label 1           [Label 1]        0            0       999999     True
39   File_98.csv  C:\Users\riro\Documents\Training_data\Label 1\...          98  subfolder_1  Label 1           [Label 1]        0            0       999999     True
78   File_98.csv  C:\Users\riro\Documents\Training_data\Label 1\...          98  subfolder_2  Label 1           [Label 1]        0            0       999999     True
117  File_98.csv  C:\Users\riro\Documents\Training_data\Label 1\...          98  subfolder_3  Label 1           [Label 1]        0            0       999999     True

and would like to obtain

        Filename                                               Path  Filenumber      Channel   Folder User-defined labels  Ranking  Lower limit  Upper limit  Enabled
0    File_98.csv  C:\Users\riro\Documents\Training_data\Label 1\...          98  subfolder_0  Label 1           [Label 1, Label 3]        0            0       999999     True
39   File_98.csv  C:\Users\riro\Documents\Training_data\Label 1\...          98  subfolder_1  Label 1           [Label 1, Label 3]        0            0       999999     True
78   File_98.csv  C:\Users\riro\Documents\Training_data\Label 1\...          98  subfolder_2  Label 1           [Label 1, Label 3]        0            0       999999     True
117  File_98.csv  C:\Users\riro\Documents\Training_data\Label 1\...          98  subfolder_3  Label 1           [Label 1, Label 3]        0            0       999999     True

For that, I tried the following operations:

B.loc[] = new_labels
Fails with Must have equal len keys and value when setting with an iterable
B.loc[] = [new_labels]
Fails with Must have equal len keys and value when setting with an ndarray
B.loc[] = pd.Series(new_labels, dtype=object)
Sets the first entry in the column User-defined labels to Label 1 (from [Label 1]) and the entries in all further columns to NaN
B.loc[] = pd.Series([new_labels], dtype=object) Sets the first entry in the column User-defined labels to [Label 1, Label 3] (as intended), but all further columns are set to NaN

I repeated the same approach with at() instead of loc[], as described here: https://stackoverflow.com/a/70968810/2546099, resulting in:

B.at[] = new_labels
Fails with Must have equal len keys and value when setting with an iterable
B.at[] = [new_labels]
Fails with Must have equal len keys and value when setting with an ndarray
B.at[] = pd.Series(new_labels, dtype=object)
Fails with unhashable type: 'list'
B.at[] = pd.Series([new_labels], dtype=object) Fails with unhashable type: 'list'

Are there other approaches which can solve my issues? Of course, I could just add new columns when adding new labels, but that would raise additional issues (based on my current knowledge):

Each time I add new labels for some rows, I would have to update the entire dataframe to new columns
If I remove labels, I would have to keep empty columns. For cleanup, I would have to check if a column is completely empty before deleting it
Iterating over labels is faster if I keep them in a list in one column compared to iterating over all columns which contain the name "label"

Or are those non-issues?

Do you want to change the value of label to new_labels? Why don't you use B["User-defined labels"] = anything. — Amir Pourmand
– Amir Pourmand, Commented Aug 5, 2022 at 9:44
Wouldn't that change the entire column? I only want to change some specific rows, not the entire column — arc_lupus
– arc_lupus, Commented Aug 5, 2022 at 9:45
That's right. You can still use df.loc[row_indexer,column_indexer]. Take a look at here if this is what you meant. — Amir Pourmand
– Amir Pourmand, Commented Aug 5, 2022 at 9:47
Hmm, I tried that (first test), and it failed for me, as stated above. Or did I misunderstand something? — arc_lupus
– arc_lupus, Commented Aug 5, 2022 at 9:55
I tested with numpy and understood your problem ... It seems stupid that pandas gives error :( — Amir Pourmand
– Amir Pourmand, Commented Aug 5, 2022 at 10:06

Ze'ev Ben-Tsvi · Accepted Answer · 2022-08-05 11:57:43Z

I think you can achieve your goal with apply:

import pandas as pd

df = pd.DataFrame({'a':
    [
        ['label 1'],
        ['label 1'],
        ['label 1'],
        ['label 1', 'label 2'],
        ['label 2', 'label 3'],
        ['label 1'],
        ['label 1'],
        ['label 1'],
        ['label 1'],
        ['label 1'],
    ],
    'b': [
        1,
        2,
        3,
        1,
        1,
        2,
        3,
        1,
        2,
        3,
    ]
})

print(df)
                    a  b
0           [label 1]  1
1           [label 1]  2
2           [label 1]  3
3  [label 1, label 2]  1
4  [label 2, label 3]  1
5           [label 1]  2
6           [label 1]  3
7           [label 1]  1
8           [label 1]  2
9           [label 1]  3

iterating through the rows with lambda will set a new value in every row with an index that exists in the mask

mask = df['a'].loc[df['a'].explode().eq("label 1").groupby(level=0).any() & (df['b'] == 1)]
df['a'] = df.apply(lambda x: ['label 1', 'label 3'] if x.name in mask.index else x['a'], axis=1)

print(df)

                    a  b
0  [label 1, label 3]  1
1           [label 1]  2
2           [label 1]  3
3  [label 1, label 3]  1
4  [label 2, label 3]  1
5           [label 1]  2
6           [label 1]  3
7  [label 1, label 3]  1
8           [label 1]  2
9           [label 1]  3

Collectives™ on Stack Overflow

Setting cell in pandas dataframe with list of strings

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related