0

I have a tab separated .txt file with several hundred thousand rows. One of the columns is 'date' and the date format is '14JAN2020' which is non-standard. I'm trying to convert this to a standard datetime. My code so far:

import pandas as pd
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt

with open('C:/Path/to/file/myfile.txt') as completionFile:
     completionFile.read()

df = pd.read_csv('C:/Path/to/file/myfile.txt', sep='\t', header=0)
df.head()

job-id | employee_id | date | job_type | job_time

1234 | ABCD | 14JAN2020 | foo | bar

df["date"] = df['date'].str.replace(r'^((?:\D*\d){2})', r'\1-')
df["date"] = df['date'].str.replace(r'^((?:[^a-zA-Z0-9]*[a-zA-Z0-9]){5})(?=.+)', r'\1-')
df["date"] =pd.to_datetime(df['date'])

When I do this, I get KeyError: 'date'. I have 'date' as a column header and not as an index so struggling with why I'm getting this error here

1 Answer 1

1

Coerce the date to datetime using pd.to_datetime and state the datetime format.

   print(df)
    job-id employee_id       date        job_type job_time
0    1234        ABCD        14JAN2020      foo      bar




df['date']=pd.to_datetime(df['date'])
#df['date']=pd.to_datetime(df['date'], format='%Y%m%d')
print(df)
     job-id employee_id       date           job_type job_time
0    1234        ABCD         2020-01-14      foo      bar
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @wwnde! that saved me a whole bunch of string replacement. The reason for the error was that I had an unseen header row in the original .txt file, so I eliminated that and then used your solution

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.