I'm trying to make my script less resource heavy or just looking for an easier code for python to process for the following problem:
Example Table (dataset.xlsx):
no order materials status Status_id
1 1000 100 available 1
2 1000 200 not available 3
3 1001 500 Feb-20 2
4 1002 400 available 1
5 1002 300 not available 3
6 1002 600 available 1
7 1002 900 available 1
8 1003 700 available 1
9 1003 800 available 1
I wanted to get the new column that duplicates max Status_id per order.
df=dataset
df.groupby('Status_id').max()
df['Max'] = df.groupby('order')['Status_id'].transform('max')
df
and I get:
no order materials status Status_id Max
1 1000 100 available 1 3
2 1000 200 not available 3 3
3 1001 500 Feb-20 2 2
4 1002 400 available 1 3
5 1002 300 not available 3 3
6 1002 600 available 1 3
7 1002 900 available 1 3
8 1003 700 available 1 1
9 1003 800 available 1 1
Although it looks simple and it works with small sets of data, but my actual data has 80k+ rows of data and maximum of 80 status_ids, and so it takes hours to calculate all that.
any suggestions?