0

I have several types of date in Python Pandas.

1. 17/12/04 14:19:48.374835 < class 'str' >

2. 20100202072111 < class 'numpy.int64'>

3. 2.017120e+11 < class 'numpy.float64'>

4. 2018-04-04 AM 10:26:39 < class 'str'>

5. 17/12/18 13:00:04.204254 < class 'str'>

I have 5 different csv files. It looks like df1['Timestamp'], df2['Timestamp'], df3['Timestamp'], df4['Timestamp'], df5['Timestamp'].

The column name in each csv file is as same as 'Timestamp', and data formatting is shown above.

A type of date is different, and there are different formatting even though the type is 'str' such as #4 and #5.

In this case, how can I change the these value to int type such as yyyymmddhhss?

I want to get rid of other microseconds, and estimated final result is like 201911202322

Everything's conducted in Python with Pandas.

2
  • I'm interested, why is this format useful? I can see it being useful in str format. But why as an int? Commented Nov 23, 2019 at 8:25
  • @FChm The displayed data format might look different when end-user opens csv files on third party application. That is why standardization of date format in int can be useful. Commented Nov 24, 2019 at 16:30

1 Answer 1

2

You have to convert your dates to string and specify the format for each file. Use strptime:

from datetime import datetime

date = datetime.strptime("17/12/04 14:19:48.374835", "%y/%m/%d %H:%M:%S.%f")

To convert it into int you can use int() and strftime()

date_number = int(datetime.strftime(date, "%Y%m%d%H%M%S").replace("/", ""))

print(date_number)

I hope this helps you with your problem.

Edit: Example with dataframe:

import pandas as pd
from datetime import datetime

data = ["17/12/04 14:19:48.374835", "19/11/05 15:20:48.374835"]

df = pd.DataFrame(data, columns=['Timestamp'])

#this replaces the datetime with a string in your 1. dataframe
for idx, row in df.itertuples(name='Timestamp'):
    date = datetime.strptime(row, "%y/%m/%d %H:%M:%S.%f")
    date_number = int(datetime.strftime(date, "%Y%m%d%H%M%S").replace("/", ""))
    df.loc[idx, 'Timestamp'] = date_number

Do this for every dataframe and format you have, or write a function that checks the format and converts it to int for example.

I guess you can take it from there :)

Sign up to request clarification or add additional context in comments.

3 Comments

Nice answer, but to be more comprehensive it would be nice to show how this works on a pd.Series of these dates.
@FChm Can you help me how to implement on pd.Series?
An easy solution would be too look at the .apply method of the pd.Series object.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.