2

I have a dataframe that looks like this:

   Film      Description       
0  Batman    Viewed in 2021-10-04T14:30:31Z City Hall, London
1  Superman  Aired 2012-01-04R11:01:10Z in the USA first
2  Hulk      2010-07-04S07:22:02Z Still being produced

I want to remove the date-time from each row in the 'Description' column, to look like this:

    Film      Description      
0   Batman    Viewed in City Hall, London
1   Superman  Aired in the USA first
2   Hulk      Still being produced

I have attempted this string regex:

df['Description'] = df['Description '].str.replace(r'\^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})Z', '')

3 Answers 3

3

\^ matches a caret symbol.

Other than T, I see R and S in the datetime stamps, they must be added.

Use

\s*\b\d{4}-\d{2}-\d{2}[TRS]\d{2}:\d{2}:\d{2}Z\b

See proof.

EXPLANATION

--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  \d{4}                    digits (0-9) (4 times)
--------------------------------------------------------------------------------
  -                        '-'
--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  -                        '-'
--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  [TRS]                    any character of: 'T', 'R', 'S'
--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  Z                        'Z'
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
Sign up to request clarification or add additional context in comments.

2 Comments

does this remove the datatime regardless of where it is in the string (i.e in the beginning, middle, end, etc) ?
@user341383 Yes.
1

i haven't gone as far as replicating your dataframe, but you regex is not going to work with the carrot ^ will lock the match to the beginning of the string, and you have a 'T' in there, which will only match on one of those description.

try:

(\d{4}-\d{2}-\d{2}[TSR]\d{2}:\d{2}:\d{2})Z

Comments

1

Use str.replace to replace;

Any non white space before : OR Any non white after : OR : itself.

    df['Description']=df['Description'].str.replace('\S+(?=[:])|(?<=[:])\S+|[:]','')
print(df)



       Film             Description
0    Batman  Viewed in  City Hall, London
1  Superman       Aired  in the USA first
2      Hulk          Still being produced

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.