1

df

Hello Community,

in my dataframe i have three columns as follow:

restaurant_id, annee, moyenne_etoile_annee.

for each restaurant_id i would like to substract the "moyenne_etoiles_annee' of the last year minus the one from the first year.

By example for the first restaurant_id df[diff]=moyenne_etoiles_annee(2017)-moyenne_etoiles_annee(2015) 2-2.66=-0.66

I tried to .agg and select first and last value but wasn't able to execute properly.

Sample:

df = pd.DataFrame({"restaurant_id": [1,1,2,2],
                   "annee": [2015,2016,2019,2022],
                   "moyenne_etoile_annee": [1.2,1.4,1.3,1.3]}) 
3
  • Please post the data as code (text) so it's easy for us to copy to test a possible solution. Good job though, on giving us input and expected output, and explain what you tried. Commented Feb 25, 2021 at 5:23
  • i just created some sample data replicating df = pd.DataFrame({"A": [1,1,2,2], "B": [2015,2016,2019,2022],"C": [1.2,1.4,1.3,1.3]}) print (df) Commented Feb 25, 2021 at 5:29
  • @Aris - Super, added to quetion. Commented Feb 25, 2021 at 5:31

1 Answer 1

1

Solution working well if unique years per restaurant_id like in question data.

First use DataFrame.sort_values per both columns (if necessary) for first and last annee per restaurant_id, then subtract first and last value per restaurant_id in GroupBy.transform:

df1 = df.sort_values(['restaurant_id','annee'])
g = df1.groupby('restaurant_id')['moyenne_etoile_annee']
df1['diff'] = g.transform('last').sub(g.transform('first'))

Alternative with lambda function:

df1 = df.sort_values(['restaurant_id','annee'])
g = df1.groupby('restaurant_id')['moyenne_etoile_annee']
df1['diff'] = g.transform(lambda x: x.iat[-1] - x.iat[0])
Sign up to request clarification or add additional context in comments.

4 Comments

if the data frame has duplicate rows for the same year then we won't get proper results
@Jethen - hmmm, your answer has same problem.
i just tested the first solution and it worked. Perfect thanks
@Jethen - So if there are duplicated year per restaurant_id there is same problem - not seelcted first and last value. If you think my solution not working per restaurant_id then you are wrong, because using groupby

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.