How to get the distinct count of values in a python pandas dataframe

Question

I would like to get the distinct count of values in a python pandas dataframe and write the result to a new column. This is what I have so far.

import pandas as pd

df = pd.DataFrame( {
   'OrderNo': [1,1,1,1,2,2,2,3,3],
   'Barcode': [1234,2345,3456,3456,1234,1234,2345,1234,3456]
    } );

df['barcodeCountPerOrderNo'] = df.groupby(['OrderNo', 'Barcode'])['Barcode'].transform('count')

df['distinctBarcodesPerOrderNo'] = '?'

print df

This gives:

   Barcode  OrderNo  barcodeCountPerOrderNo distinctBarcodesPerOrder
0     1234        1                       1                       ?
1     2345        1                       1                       ?
2     3456        1                       2                       ?
3     3456        1                       2                       ?
4     1234        2                       2                       ?
5     1234        2                       2                       ?
6     2345        2                       1                       ?
7     1234        3                       1                       ?
8     3456        3                       1                       ?

But how can I get the distinctBarcodesPerOrder?

   Barcode  OrderNo  barcodeCountPerOrderNo distinctBarcodesPerOrder
0     1234        1                       1                       3
1     2345        1                       1                       3
2     3456        1                       2                       3
3     3456        1                       2                       3
4     1234        2                       2                       2
5     1234        2                       2                       2
6     2345        2                       1                       2
7     1234        3                       1                       2
8     3456        3                       1                       2

you can use drop_duplicates method. See the following document for the datails: pandas.pydata.org/pandas-docs/stable/generated/… — arnold
– arnold, Commented May 8, 2017 at 12:22
I am sorry but it is not clear how do you suppose to receive distinctBarcodesPerOrder column. Could you clarify? Perhaps df.distinctBarcodesPerOrder.unique() can do the trick? — feedthemachine
– feedthemachine, Commented May 8, 2017 at 12:36
that won't work since, I would like to have the count of distinct barcodes per order (df.distinctBarcodesPerOrder.unique() gives the count over the entire dataframe). — Jabb
– Jabb, Commented May 8, 2017 at 12:40

lanenok · Accepted Answer · 2017-05-08 12:44:36Z

3

You can use nunique to calculate the number of unique barcodes per order

Barcode_distinct = df.groupby('OrderNo')['Barcode'].nunique()

the result is pandas Series

> OrderNo
> 1    3
> 2    2
> 3    2
> Name: Barcode, dtype: int64

then you merge this with the original DataFrame

df.merge(Barcode_distinct.to_frame(), left_on='OrderNo', right_index=True, suffixes=('', '_unique_per_OrderNo'))

the results is

>    Barcode  OrderNo  Barcode_unique_per_OrderNo
> 0     1234        1                           3
> 1     2345        1                           3
> 2     3456        1                           3
> 3     3456        1                           3
> 4     1234        2                           2
> 5     1234        2                           2
> 6     2345        2                           2
> 7     1234        3                           2
> 8     3456        3                           2

answered May 8, 2017 at 12:44

lanenok

2,75919 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jabb Over a year ago

thanks for being the first with a working solution. I accepted Fabio Lamanna's solution because it was slightly shorter.

Fabio Lamanna · Accepted Answer · 2017-05-08 13:19:34Z

1

I would use map to get unique values and directly merge them into the original dataframe:

df['distinctBarcodesPerOrder'] = df['OrderNo'].map(df.groupby('OrderNo')['Barcode'].nunique())

which returns:

   Barcode  OrderNo  barcodeCountPerOrderNo  distinctBarcodesPerOrder
0     1234        1                       1                         3
1     2345        1                       1                         3
2     3456        1                       2                         3
3     3456        1                       2                         3
4     1234        2                       2                         2
5     1234        2                       2                         2
6     2345        2                       1                         2
7     1234        3                       1                         2
8     3456        3                       1                         2

edited May 8, 2017 at 13:19

answered May 8, 2017 at 13:11

Fabio Lamanna

21.7k24 gold badges95 silver badges126 bronze badges

1 Comment

Jabb Over a year ago

for the sake of elegance and simplicity... this is the working and accepted answer. thx!!

Allen Qin · Accepted Answer · 2017-05-08 19:50:35Z

#If you want a one-liner, you can use apply to get the distinctBarcodesPerOrder for each row. Although this method might be a slow on large dataset. 

df['distinctBarcodesPerOrder'] = df.apply(lambda x: df.loc[df.OrderNo==x.OrderNo,'Barcode'].nunique(), axis=1)

df
Out[237]: 
   Barcode  OrderNo  barcodeCountPerOrderNo  distinctBarcodesPerOrder
0     1234        1                       1                         3
1     2345        1                       1                         3
2     3456        1                       2                         3
3     3456        1                       2                         3
4     1234        2                       2                         2
5     1234        2                       2                         2
6     2345        2                       1                         2
7     1234        3                       1                         2
8     3456        3                       1                         2

Collectives™ on Stack Overflow

How to get the distinct count of values in a python pandas dataframe

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related