3

I might be doing this wrong, or there might be a much better way than this, as i am still new to Python. Apologies upfront for any obvious mistakes.

I have a Pandas Dataframe with a STR column that holds a Date and Time. It is STR because the times are "Broadcast" formatted, which means there are 29 hours in the day. so we will see dates like 01/Jan/2018 29:59:59. As 1 second to that and its 02/Jan/2018 06:00:00.

My goal here is to convert this data to a real time. Which means any hour between 24 and 29 requires a date shift too. I have already split the STR into 2 new Columns ['Dt'] and ['Ti'], from ['Ti'], pulled out the Hour to a new Column as ['Hr'] and made it an INT.

I then applied a pd.to_datetime to the ['Dt'] and added a rule.

df['Dt'] = np.where(df['Hr'] > 23, df['Dt']+pd.DateOffset(1),df['Dt']+pd.DateOffset(0) )

this works perfect.

I now need to change the Hour to be real time, eg, 24 = 00, 25 = 02 etc.

I thought the best way was to use a DICT and map it, so i made a DICT,

HourMap = {'24':'00','25':'01','26':'02','27':'03','28':'04','29':'05','30':'06'}  

Then wrote this

df['Hr1'] = np.where(df['Hr'] > 23, df.replace({'Hr':HourMap}),df['Hr'])

But I get a "ValueError"

ValueError: operands could not be broadcast together with shapes (273,) (273,29) (273,)

I have looked at those rows in the dataframe and they are just normal INTs. On testing I can apply Maths to them (eg. df['Test'] = df['Hr'] + 1.

I did convert them to STR and try the same rules, but got the same error.

Am I just crazy?

Thanks,

1
  • Don't use a dictionary, use the modulo operator i.e. %. So it's just the hour % 23 Commented Oct 29, 2018 at 11:37

2 Answers 2

4

I believe need change:

df.replace({'Hr':HourMap})

to map and if some values is not matched and returned NaNs replace it to original values by fillna:

df['Hr'].map(HourMap).fillna(df['Hr'])
#alternative solution if performance is not important in large df
#df['Hr'].replace(HourMap)

because df.replace return all columns of DataFrame with replaced column Hr

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for pointing me at .map. In this case, i get NaN for every matched INT within the Dict. As there will only ever be 24 - 29, due to the restrictions at source, all potential outputs are mapped in the Dict. When I use df['Hr1'] = np.where(df['Hr'] > 23, df['Hr'].map(HourMap),df['Hr']) Intentionally returning NaN to check etc, any of the items within the Dict return as NaN, rather than the mapped value eg, 26 = 02?
@Runawaygeek - I see problem, in dictionary are used strings. Need change HourMap = {'24':'00','25':'01','26':'02','27':'03','28':'04','29':'05','30':'06'} to HourMap = {24:00,25:01,26:02,27:03,28:04,29:05,30:06}
But why use a dictionary at all? There is a constant difference here, you can literally just subtract 24 to get the value.
Thanks, I also quickly read more about .map and see i dont need to parse it inside the np.where function either. Made the adjustments and now it works.
2

You really shouldn't be using a dictionary here, you don't even need the np.where. Use the modulo operator

In [1]: import numpy as np
In [2]: np.arange(31)%24
Out[2]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23,  0,  1,  2,  3,  4,  5, 6], dtype=int32)

You have numbers that 'wrap around' at 24, this is the text book use case for modulo. So the full code just becomes:

df['Hr1'] = df['Hr'] % 24

Also by the same token you can add to your dates without np.where by just making use of integer division

df['Dt'] = df['Dt']+pd.DateOffset(Df['Hr']//24)

1 Comment

Thanks for the clean up knowledge. Not come across that in readings, but have made a note and will read up on it tonight. :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.