3

I am trying to replace each word after . in the txt file below:

line1
line2
field: [orders.cancelled,orders.delivered,orders.reached
orders.pickup,orders.time]
some line
some line

I have a dictionary:

   d = {'cancelled':'cancelled_at', 'deliver':'xxx'}

I am running the following code. However, I am getting the results for partial match i.e

I see the new file has the following words

field: [orders.cancelled_at, orders.xxxed ..........

here from the word delivered the program is still replacing the first 7 words(deliver) and adding 'ed' in the end. I am not sure why

with open('list.txt', 'r') as g:
    text = g.read()
    for k in d:
        before = f'.{k}'
        after = f'.{d[k]}
        #print(before)
        #print(after)
        text = text.replace(before, after)
        #print(text)

with open('new_list.txt', 'w') as w:
    w.write(text)

I also tried this one and I get the same results

import re

with open('list.txt', 'r') as f:
    text = f.read()
    for k in d:
        before = f'.{k}(?!=\w)'
        print(before)
        after = f'.{d[k]}'
        print(after)
        text = re.sub(before, after, text)

with open('new_list.txt', 'w') as w:
    w.write(text)
24
  • you are replacing deliver from the word delivered with xxx. The result is xxxed. Add "delivered": "xxx" to your dictionary. Commented Sep 3, 2020 at 11:43
  • 1) Use word boundaries to match whole words, 2) Escape . outside a character class to match a literal . Commented Sep 3, 2020 at 11:46
  • @spyralab I only want to remove the word after the dot if the key has a value 'deliver' and not 'delivered'. I expect the program to not change anything if the exact match is not found, in this case then it should give the new line as orders.cancelled_at, orders._delivered Commented Sep 3, 2020 at 11:50
  • f'\b{k}\b' - this should work ? @WiktorStribiżew sorry I am not that familiar with regex and would appreciate if you can explain more Commented Sep 3, 2020 at 11:54
  • 1
    \b word boundary is necessary to only match if we have a whole word in the string, so short\b will match in short. and not in shorts. Commented Sep 7, 2020 at 12:06

1 Answer 1

1

You can use

import re

d = {'cancelled':'cancelled_at', 'deliver':'xxx'}
rx = re.compile(fr"(?<=\.)(?:{'|'.join(d)})\b")

with open('list.txt', 'r') as f:
    print( re.sub(rx, lambda x: d[x.group()], f.read()) )

See the Python demo

The regex generated by the code looks like

(?<=\.)(?:cancelled|deliver)\b

See the regex demo. Details:

  • (?<=\.) - a positive lookbehind that matches a location immediately preceded with a literal .
  • (?:cancelled|deliver) - two alternatives: cancelled or deliver
  • \b - as whole words, \b is a word boundary.

The lambda x: d[x.group()] replacement replaces the matched word with the corresponding dictionary key value.

Sign up to request clarification or add additional context in comments.

2 Comments

Hey can you explain when do we use 'f' 'fr' , I can see that you have used fr in the re.complie. Would really appreciate if you can explain the difference between those
@HamzaShehzad r is the raw string literal prefix used to define a string literal where the backslash is not used to form string escape sequences (please read the BONUS top sections in Regular expression works on regex101.com, but not on prod) thread. f is an f-string prefix allowing to use variable interpolation (or variable expansion), i.e. use {varname} inside the string literal to actually concatenate strings you add manually with variables (instead of using str.format)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.