3

I have data like this, without z1, what i need is to add a column to DataFrame, so it will add column z1 and represent values as in the example, what it should do is to shift z value equally on 1 day before for the same Start date.

enter image description here I was thinking it could be done with apply and lambda in pandas, but i`m not sure how to define lambda function

data = pd.read_csv("....")

data["Z"] = data[[
                "Start", "Z"]].apply(lambda x:
2
  • Why 564545 in the last row? Isn't supposed to be 56? Because if you want the z value equally on 1 day before for the same Start date., it would correspond to 32400000 2012-10-02 (row 7) instead of 32400000 2012-10-01 (row 2). Commented Jul 18, 2016 at 18:19
  • yes, youre correct, its a mistake in example given to me Commented Jul 18, 2016 at 18:22

1 Answer 1

3

You can use DataFrameGroupBy.shift with merge:

#if not datetime
df['date'] = pd.to_datetime(df.date)
df.set_index('date', inplace=True)
df1 = df.groupby('start')['z'].shift(freq='1D',periods=1).reset_index()
print (pd.merge(df.reset_index(),df1, on=['start','date'], how='left', suffixes=('','1')))

        date  start       z        z1
0 2012-12-01    324  564545       NaN
1 2012-12-01    384    5555       NaN
2 2012-12-01    349     554       NaN
3 2012-12-02    855     635       NaN
4 2012-12-02    324      56  564545.0
5 2012-12-01    341      98       NaN
6 2012-12-03    324     888      56.0

EDIT:

Try find duplicates and fillna by 0:

df['date'] = pd.to_datetime(df.date)
df.set_index('date', inplace=True)
df1 = df.groupby('start')['z'].shift(freq='1D',periods=1).reset_index()
df2 = pd.merge(df.reset_index(),df1, on=['start','date'], how='left', suffixes=('','1'))
mask = df2.start.duplicated(keep=False)
df2.ix[mask, 'z1'] = df2.ix[mask, 'z1'].fillna(0)
print (df2)
        date  start       z        z1
0 2012-12-01    324  564545       0.0
1 2012-12-01    384    5555       NaN
2 2012-12-01    349     554       NaN
3 2012-12-02    855     635       NaN
4 2012-12-02    324      56  564545.0
5 2012-12-01    341      98       NaN
6 2012-12-03    324     888      56.0
Sign up to request clarification or add additional context in comments.

15 Comments

thats great, thanks! but how come using different data set i get NotImplementedError: Not supported for type Index
It looks likeyou forget dtetimeindex df.set_index('date', inplace=True).
What is print df.index before df1 = df.groupby('start')['z'].shift(freq='1D',periods=1).reset_index() ?
i tried df['date']= pd.to_datetime(pd.Series(['date']), format="%Y-%m-%d")
tells me that date doesnt match format, i checked data itself and nothing wrong with it
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.