0

I'm trying to make my script less resource heavy or just looking for an easier code for python to process for the following problem:

Example Table (dataset.xlsx):

no order materials status         Status_id
1  1000  100       available       1 
2  1000  200       not available   3 
3  1001  500       Feb-20          2 
4  1002  400       available       1 
5  1002  300       not available   3 
6  1002  600       available       1 
7  1002  900       available       1 
8  1003  700       available       1 
9  1003  800       available       1 

I wanted to get the new column that duplicates max Status_id per order.

df=dataset
df.groupby('Status_id').max()
df['Max'] = df.groupby('order')['Status_id'].transform('max')
df

and I get:

no order materials status         Status_id   Max
1  1000  100       available       1          3
2  1000  200       not available   3          3
3  1001  500       Feb-20          2          2
4  1002  400       available       1          3
5  1002  300       not available   3          3
6  1002  600       available       1          3
7  1002  900       available       1          3
8  1003  700       available       1          1
9  1003  800       available       1          1

Although it looks simple and it works with small sets of data, but my actual data has 80k+ rows of data and maximum of 80 status_ids, and so it takes hours to calculate all that.

any suggestions?

1
  • IMO for large file I prefer using Dask(dask.org). Dask will automatically prallelize your operations. It also provides an API almost equal as the pandas one, so you are going to feel comfortable with it. Commented Feb 21, 2020 at 7:47

1 Answer 1

1

You can try to sort by 'Status_id' first and then take the last value from each group:

df = df.sort_values('Status_id')
df['Max'] = df.groupby('order')['Status_id'].transform('last')
Sign up to request clarification or add additional context in comments.

3 Comments

Do you have some test? It should be interesting see if it is faster.
@jezrael No. Maybe you can test it.
Idea for improve answer, be free use sample dataset from this

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.