I would like to get the distinct count of values in a python pandas dataframe and write the result to a new column. This is what I have so far.
import pandas as pd
df = pd.DataFrame( {
'OrderNo': [1,1,1,1,2,2,2,3,3],
'Barcode': [1234,2345,3456,3456,1234,1234,2345,1234,3456]
} );
df['barcodeCountPerOrderNo'] = df.groupby(['OrderNo', 'Barcode'])['Barcode'].transform('count')
df['distinctBarcodesPerOrderNo'] = '?'
print df
This gives:
Barcode OrderNo barcodeCountPerOrderNo distinctBarcodesPerOrder
0 1234 1 1 ?
1 2345 1 1 ?
2 3456 1 2 ?
3 3456 1 2 ?
4 1234 2 2 ?
5 1234 2 2 ?
6 2345 2 1 ?
7 1234 3 1 ?
8 3456 3 1 ?
But how can I get the distinctBarcodesPerOrder?
Barcode OrderNo barcodeCountPerOrderNo distinctBarcodesPerOrder
0 1234 1 1 3
1 2345 1 1 3
2 3456 1 2 3
3 3456 1 2 3
4 1234 2 2 2
5 1234 2 2 2
6 2345 2 1 2
7 1234 3 1 2
8 3456 3 1 2
drop_duplicatesmethod. See the following document for the datails: pandas.pydata.org/pandas-docs/stable/generated/…