Replacing non-alphanumeric characters in regex match using Python

Question

I have a text file (verilog) that contains certain string sequences (escaped identifiers) that I want to modify. In the example below, I want to find any group starting with '\' and ending with ' ' (any printable character can be in between). After finding a group that matches this criteria, I want to replace all non-alphanumeric characters with alphanumeric ones (I don't really care what alphanumeric they get replaced with).

In[1]:  here i$ \$0me text to \m*dify
Out[1]: here i$ aame text to madify

I have no problem finding the groups that need replacing using regex. However, if I just use re.findAll(), I no longer have the location of the words in the string to reconstruct the string after modifying the matched groups.

Is there a way to preserve the location of the words in the string while modifying each match separately?

Note: I previously asked a very similar question here, but I oversimplified my example. I thought editing my existing question would make the existing comments and answers confusing to future readers.

Have you experimented with re.sub yet? I think it's capable of taking your modified text and reconstructing the string on its own, without any additional effort on your part. And you can pass a callable for the repl parameter of sub, so you can execute arbitrary code on each match rather than just replacing it with something static. — Kevin
– Kevin, Commented Aug 4, 2017 at 15:22
I don't see how my answer to your previous question is not useful to you. It uses re.sub, and the same is applicable here. — cs95
– cs95, Commented Aug 4, 2017 at 15:25

cs95 · Accepted Answer · 2017-08-04 15:36:05Z

1

My answer to your previous question still applies, with some minor modifications. Only the regex changes.

Since this is more complex, define a function to pass as a callback.

In [57]: def foo(m):
    ...:     return ''.join(x if re.match('[a-zA-Z]', x)\
                              else ('' if x == '\\' else 'a') for x in m.group())

Now, call re.sub:

In [58]: re.sub(r'\\.*?(?= |$)', foo, text)
Out[58]: 'here i$ aame text to madify'

edited Aug 4, 2017 at 15:36

answered Aug 4, 2017 at 15:29

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

digitaLink Over a year ago

I think I understand now. I didn't realize that a function could be passed into re.sub().

cs95 Over a year ago

@digitaLink Yes. If a lambda can go, any function can go. Feel free to mark accepted, thanks!

Collectives™ on Stack Overflow

Replacing non-alphanumeric characters in regex match using Python

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related