I have a column in a pandas dataframe where some of the values are in this format: "From https://....com?gclid=... to https://...com". What I would like is to parse only the first URL so that the gclid and other IDs would vanish and I would like to map back that into the dataframe e.g.: "From https://....com to https://...com"
I know that there is a python module called urllib but if I apply that to this string a call a path() on it, it just parses the first URL and then I lose the other part which is as important as the first one.
Could somebody please help me? Thank you!
"From https://....com to https://...com"then you cantext.replace("From ", "").replace(" to ", ' ').split(" ")to get list["https://....com", "https://...com"]?gclid= ...then you try to use regex to replace it.["https://....com?gclid=", "https://...com"]then you can get first element from list andsplit('?')to remove it.