Getting a list of unique values within a pandas column

Question

Can you please help me with the following issue. Imagine, I have a following df:

data = {
    'A':['A1, B2, C', 'A2, A9, C', 'A3', 'A4, Z', 'A5, A1, Z'], 
    'B':['B1', 'B2', 'B3', 'B4', 'B4'], 
}
df = pd.DataFrame(data)

How can I create a list with unique value that are stored in column 'A'? I want to smth like this:

 list_A = [A1, B2, C, A2, A9, A3, A4, Z, A5]

All values of the column A are already unique: 'A1, B2, C' != 'A2, A9, C' != 'A3' != 'A4, Z' != 'A5, A1, Z' — ForceBru
– ForceBru, Commented Dec 20, 2022 at 17:54

mozway · Accepted Answer · 2022-12-20 17:54:14Z

5

Assuming you define as "values" the comma separated substrings, you can split, explode, and use unique:

list_A = df['A'].str.split(',\s*').explode().unique().tolist()

Output: ['A1', 'B2', 'C', 'A2', 'A9', 'A3', 'A4', 'Z', 'A5']

answered Dec 20, 2022 at 17:54

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Alberto Alvarez Over a year ago

Thank you very much for your reply. Can you please clarify why we need ".str" and what '\s*' is responsible for?

Ludo Schmidt · Accepted Answer · 2022-12-20 18:16:45Z

2

The code applies a lambda function to the 'A' column to remove any white spaces from the strings in the lists.

Next, the code uses the str.split() method to split the strings in the 'A' column by the delimiter ',', resulting in a columns of lists.

Finally, the code uses a list comprehension to flatten the list of lists into a single list, and then uses the set() function to create a set object containing the unique elements of the list. The set object is then printed to the console.

answered Dec 20, 2022 at 18:16

Ludo Schmidt

1,4231 gold badge12 silver badges17 bronze badges

Comments

Hadij · Accepted Answer · 2022-12-28 02:12:16Z

1

Converting column A to the desired list (new column C). In this case, instead of 'A1, B2, C', we will have ['A1', 'B2', 'C'].

df['C'] = df['A'].str.split(',\s*')

.str is used to convert the column into a string in case it is not. .split(',\s*') will split the string wherever it observes a comma (,) or a comma and some spaces (\s*) after that.

Finding the sorted unique values of the converted column:

set(df['C'].explode())
# {'A1', 'A2', 'A3', 'A4', 'A5', 'A9', 'B2', 'C', 'Z'}

If sorting is not important, and you want to see them in the order of their appearance:

list(df['C'].explode().unique())
# ['A1', 'B2', 'C', 'A2', 'A9', 'A3', 'A4', 'Z', 'A5']

edited Dec 28, 2022 at 2:12

answered Dec 20, 2022 at 19:24

Hadij

4,9006 gold badges34 silver badges51 bronze badges

2 Comments

Alberto Alvarez Over a year ago

Thank you for your reply. Can you please explain why we need ".str" and 'split(',\s*')' and not just 'split(',')'

Hadij Over a year ago

@AlbertoAlvarez If you do split(','), then the spaces will remain there. For instance, for 'A1, B2, C', you will get ['A1', ' B2', ' C']. However, you don't want to have spaces there. .str` is optional, but in case you have something like 1 there which is an integer, .split may raise an error. If it is not the case for you, you can get rid of .str and you do not need to convert to string first.

Collectives™ on Stack Overflow

Getting a list of unique values within a pandas column

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related