0

I want to replace some characters within a string in pandas (based on a match to the entirety of the string), while leaving the rest of the string unchanged.

For instance, replace dashes with decimals in a number string IF the dash isn't at the start of the number string:

'26.15971' -> '26.15971'

'1030899' -> '1030899'

'26-404700' -> '26.404700'

'-26-403268' -> '-26.403268'

Code:

# --- simple dataframe
df = pd.DataFrame({'col1':['26.15971','1030899','26-404700']})

# --- regex that only matches items of interest
regex_match = '^\d{1,2}-\d{1,8}'
df.col1.str.match(regex_match)

# --- not sure how to only replace the middle hypens?
# something like  df.col1.str.replace('^\d{1,2}(-)\d{1,8}','^\d{1,2}\.\d{1,8}') ??
# unclear how to get a repl that only alters a capture group and leaves the rest 
# of the string unchanged

1 Answer 1

1

You could try using a regex replacement with lookarounds:

df["col1"] = df["col1"].str.replace("(?<=\d)-(?=\d)", ".")

The regex pattern (?<=\d)-(?=\d) targets every dash sitting in between two numbers and replaces it with dot.

We could also approach this using capture groups:

df["col1"] = df["col1"].str.replace("(\d{2,3})-(\d{4,8})", "\\1.\\2")
Sign up to request clarification or add additional context in comments.

5 Comments

Very nice! So I think a positive lookbehind can't be variable width like (?<=\d{2,3}) which reduces flexibility in the match. Any thoughts?
@Mark_Anderson Actually, you can use a variable width positive lookahead, so this would be legitimate also: (?<=\d)-(?=\d{2,3})
Agreed, but is there a way to get flexibility in the lookbehind? I still really like the solution , but wondering if there is a way to get full flexibility (If lookbehind can't be flexible, maybe get 3 capture groups(\d{2,3})(?P<hypen>-)(\d{4,8}) and only swap out the middle capture group that has the hypen?)
@Mark_Anderson I don't know why you think you need this, but I updated my answer anyway. I think just asserting that even one digit be on either side of the dash should be OK logic here.
Just a general principle. More flexibility is more good. Mostly in case someone with a similar problem ends up on this post.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.