-1

I am trying to replace a wrongly formatted datetime String object with a space in between its 'day' and 'hours' sections (as seen in data below).

The data below are the incorrectly formatted ones which I have used regex to sieve out. I can't crack my head over how to include a space in between. Wondering if I could do it through regex? New to regex, would appreciate if anyone could share their experiences!

df[df['time'].str.contains(r'\.[0-9][0-9][0-9][0-9]\:')]['time']
20192    2010.12.2512:01:45
20193    2010.12.2512:02:52
20196    2010.12.2512:07:43
20197    2010.12.2512:25:33
42811    2013.01.0103:04:10
43527    2013.05.1605:49:49
3
  • Why not just slice the first and last part and join them with a space? Commented Oct 2, 2020 at 17:38
  • what is wrongly formatted for you? why no simply adjust the format for the parser? Commented Oct 2, 2020 at 18:21
  • @ThierryLathuille: explicit slicing might not work if you encounter a combination of "valid" and "invalid" formats. Commented Oct 2, 2020 at 18:39

1 Answer 1

0

Your regex isn't that far off; you just need to add matching groups (see e.g. here) - between which you could place the missing space. E.g. like

import pandas as pd

df = pd.DataFrame({'time': ['2010.12.2512:01:45', '2010.12.2512:02:52', '2010.12.25 12:07:43']})

df['time'].replace('(\.[0-9]{2})([0-9]{2}\:)', r'\1 \2', regex=True)
# 0    2010.12.25 12:01:45
# 1    2010.12.25 12:02:52
# 2    2010.12.25 12:07:43
# Name: time, dtype: object

Note that I've added one "valid" string at the end of the list. It's not modified by the regex replace.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.