I'm trying to capture and replace by regex in a DataFrame column which contains a date, i want capture the following date format in the text "YYYY-MM-DD" it seems my syntax for capturing and replace is correct but some how it doesn't work.
lst_date_version = ["2021-10-10 rev. 002", "2021-11-28 rev. 003", "2021-09-27 rev. 008","2021-11-29 rev. 008", "2021-10-16 rev. 003", "2021-10-25 rev. 008","2021-11-03 rev. 003", "2021-04-12 rev. 008", "2021-03-19 rev. 004"]
df_test_date = pd.DataFrame({"Version":lst_date_version})
df_test_date["Version"] = df_test_date["Version"].str.replace(r"(\d{4}-\d{2})-(\d{2})", r"\1", regex=True)
print(df_test_date["Version"])
the result seems the day in the date format is remove (2021-10-10 rev. 002 ==> 2021-10 rev. 002):
0 2021-10 rev. 002
1 2021-11 rev. 003
2 2021-09 rev. 008
3 2021-11 rev. 008
4 2021-10 rev. 003
5 2021-10 rev. 008
6 2021-11 rev. 003
7 2021-04 rev. 008
8 2021-03 rev. 004
but when i do the following :
df_test_date["Version"] = df_test_date["Version"].str.replace(r"(\d{4}-\d{2})-(\d{2})", r"\0", regex=True)
print(df_test_date["Version"])
the result is :
0 rev. 002
1 rev. 003
2 rev. 008
3 rev. 008
4 rev. 003
5 rev. 008
6 rev. 003
7 rev. 008
8 rev. 004
in the meantime i find different way (invert the capture) to what i wanted to achieve by this:
df_test_date["Version"] = df_test_date["Version"].str.replace(r"(\srev.+)", r"\0", regex=True))
a big thanks for your help in advance :)
PS: adapted the questions based on remarks :)
rhave to do with your question? Please consider editing your tags in order for us to be more helpful to your problem.no changes to initial list... What happened to the day component of the date? You start off withYYYY-MM-DDand somehow end up with justYYYY-MM.