0

I have tried searching other posts on here, but can't seem to solve this problem. I have a CSV file in which Year, Crash_Month, Crash_Day and Crash_Time are all seperate columns in the CSV 'data_dict'. I am trying to solve the below question. How would I go about this? I have tried to use a data frame, and pandas convert to datetime, but I'm not sure if this is the right approach. Many thanks

Here's the data frame I'm trying to assign datetime to

    year  month  day       time
0   2000      1    1   4:30:59 
1   2000      1    1   0:07:35 
2   2000      1    1   4:51:37 
3   2000      1    1   4:27:56 
4   2000      1    1   2:16:31 
5   2000      1    1   0:37:21 
6   2000      1    1   0:52:57 
7   2000      1    1   3:35:14 
8   2000      1    1   2:41:58 
9   2000      1    1   3:43:02 
10  2000      1    1   3:49:19 
11  2000      1    1   3:03:55 
12  2000      1    1   4:46:01 
13  2000      1    1   1:07:24 
14  2000      1    1   8:29:04 
15  2000      1    1   6:35:21 
16  2000      1    1   6:06:25 
17  2000      1    1   7:10:13 
18  2000      1    1   10:57:24 
19  2000      1    1   7:54:38

So far, I have coded this.

import pandas as pd

df = pd.DataFrame({'year': (data_dict['Year']),
                   'month': (data_dict['Crash_Month']),
                   'day': (data_dict['Crash_Day']),
                   'time': (data_dict['Crash_Time'])})

date=pd.to_datetime(df[["year", "month", "day", "time"]],format='%YYYY%mm%dd, %HH%MM%SS')
print(date)

day_of_week = {0 : 'Monday',
              1: 'Tuesday',
              2: 'Wednesday',
              3: 'Thursday',
              4: 'Friday',
              5: 'Saturday',
              6: 'Sunday'}

month_season= {1: 'Summer',
              2: 'Summer',
              3: 'Autumn',
              4: 'Autumn',
              5: 'Autumn',
              6: 'Winter',
              7: 'Winter',
              8: 'Winter',
              9: 'Spring',
              10: 'Spring',
              11: 'Spring',
              12: 'Summer'}
8
  • 1
    can you provide a sample of your data? 5 rows will do Commented Apr 22, 2020 at 1:36
  • Hi @nickp please provide a mcve. Commented Apr 22, 2020 at 1:39
  • 2
    Please provide a reproducible copy of the DataFrame with to_clipboard. Stack Overflow Discourages Screenshots. It is likely the question will be downvoted. You are discouraging assistance because no one wants to retype your data or code, and screenshots are often illegible. Commented Apr 22, 2020 at 1:43
  • Heres the first few rows of the CSV file Commented Apr 22, 2020 at 1:49
  • Index,Age,Year,Crash_Month,Crash_Day,Crash_Time,Road_User,Gender,Crash_Type,Injury_Severity,Crash_LGA,Crash_Area_Type 1,37,2000,1,1,4:30:59,PEDESTRIAN,MALE,UNKNOWN,1,MARIBYRNONG,MELBOURNE 2,22,2000,1,1,0:07:35,DRIVER,MALE,ADJACENT DIRECTION,1,YARRA,MELBOURNE 3,47,2000,1,1,4:51:37,DRIVER,FEMALE,ADJACENT DIRECTION,0,YARRA,MELBOURNE 4,70,2000,1,1,4:27:56,DRIVER,MALE,ADJACENT DIRECTION,1,BANYULE,MELBOURNE 5,16,2000,1,1,2:16:31,PASSENGER,MALE,OPPOSING DIRECTION,0,MONASH,MELBOURNE Commented Apr 22, 2020 at 1:49

1 Answer 1

1

We can use str.zfill and string concenation with pd.to_datetime to build up your datetime.

df2['date'] = pd.to_datetime(df2['year'].astype(str) 
               + df2['month'].astype(str).str.zfill(2)
               + df2['day'].astype(str).str.zfill(2)
               + ' '
               + df2['time'].astype(str),format='%Y%m%d %H:%M:%S'
              )

    year  month  day       time                date
0   2000      1    1   4:30:59  2000-01-01 04:30:59
1   2000      1    1   0:07:35  2000-01-01 00:07:35
2   2000      1    1   4:51:37  2000-01-01 04:51:37
3   2000      1    1   4:27:56  2000-01-01 04:27:56
4   2000      1    1   2:16:31  2000-01-01 02:16:31
5   2000      1    1   0:37:21  2000-01-01 00:37:21
6   2000      1    1   0:52:57  2000-01-01 00:52:57
7   2000      1    1   3:35:14  2000-01-01 03:35:14
8   2000      1    1   2:41:58  2000-01-01 02:41:58
9   2000      1    1   3:43:02  2000-01-01 03:43:02
10  2000      1    1   3:49:19  2000-01-01 03:49:19
11  2000      1    1   3:03:55  2000-01-01 03:03:55
12  2000      1    1   4:46:01  2000-01-01 04:46:01
13  2000      1    1   1:07:24  2000-01-01 01:07:24
14  2000      1    1   8:29:04  2000-01-01 08:29:04
15  2000      1    1   6:35:21  2000-01-01 06:35:21
16  2000      1    1   6:06:25  2000-01-01 06:06:25
17  2000      1    1   7:10:13  2000-01-01 07:10:13
18  2000      1    1  10:57:24  2000-01-01 10:57:24
19  2000      1    1   7:54:38  2000-01-01 07:54:38
Sign up to request clarification or add additional context in comments.

2 Comments

thanks. However, this did not work- there are a few records with the string value "UNKNOWN", which is presenting an error for the whole function
you add in argument in your pd.to_datetime(...,errors='coerce') @nickp

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.