0

I have bunch of input files and need to replace few strings in them. First I created a dictionary using of key value pairs using regex. Dictionary contains key(string to be replaced) and value(replacement).

Example line in input file: Details of first student are FullName ="ABC XYZ KLM" FirstName ="ABC" ID = "123"

My dictionary would be ->

student = {
    'ABC':'Student Firstname',
    'ABC XYZ KLM':'Student Fullname',
    '123':'Student ID'
    }

I am using string replace() to do the replacement like this:

for line in inputfile1:
    for src, dst in student.items():
          line = line.replace(src,dst)

My output is coming as: Details of first student are FullName ="Student Firstname XYZ KLM" FirstName ="Student Firstname" ID = "Student ID"

What I am looking for is: Details of first student are FullName ="Student Fullname" FirstName ="Student Firstname" ID = "Student ID"

Can you please help me with figuring this out?

1 Answer 1

1

This is happening because the str.replace(..) start by replacing the ABC string first. You need to make sure that the longest pattern is replaced first. To do that, you can follow one of these options:

option 1:

Use an OrderedDict dictionary instead and put the longest strings to be replace before the shortest:

In [3]: from collections import OrderedDict

In [6]: student = OrderedDict([('ABC XYZ KLM', 'Student Fullname'),  ('ABC', 'Student Firstname'),('123', 'Student ID')])

In [7]: student.items()
Out[7]: 
[('ABC XYZ KLM', 'Student Fullname'),
 ('ABC', 'Student Firstname'),
 ('123', 'Student ID')]

In [8]: line = 'FullName ="ABC XYZ KLM" FirstName ="ABC" ID = "123"' 

In [9]: for src, dst in student.items():
   ...:        line = line.replace(src, dst)
In [10]: line 
Out[10]: 'FullName ="Student Fullname" FirstName ="Student Firstname" ID = "Student ID"'

The overall code looks like this:

from collections import OrderedDict
student = OrderedDict([('ABC XYZ KLM', 'Student Fullname'),
                       ('ABC', 'Student Firstname'),
                    ('123', 'Student ID')])
line = 'FullName ="ABC XYZ KLM" FirstName ="ABC" ID = "123"' 
for src, dst in student.items():
    line = line.replace(src, dst)

option 2:

Also as suggested by @AlexHal in the comments below, you can simply use a list of tuples and sort it based on the longest pattern before replacement, the code will look like this:

In [2]: student = [('ABC', 'Student Firstname'),('123', 'Student ID'), ('ABC XYZ KLM', 'Student Fullname')]

In [3]: sorted(student, key=lambda x: len(x[0]), reverse=True)
Out[3]: 
[('ABC XYZ KLM', 'Student Fullname'),
 ('ABC', 'Student Firstname'),
 ('123', 'Student ID')]

In [4]: sorted(student, key=lambda x: len(x[0]), reverse=True)
Out[4]: 
[('ABC XYZ KLM', 'Student Fullname'),
 ('ABC', 'Student Firstname'),
 ('123', 'Student ID')]

In [9]: line = ' "Details of first student are FirstName ="ABC" FullName ="ABC XYZ KLM" ID = "123"'

In [10]: for src, dst in sorted(student, key=lambda x: len(x[0]), reverse=True):
    ...:     line = line.replace(src, dst)
    ...:     

In [11]: line
Out[11]: ' "Details of first student are FirstName ="Student Firstname" FullName ="Student Fullname" ID = "Student ID"'

In [12]: 

Overall code:

student = [('ABC', 'Student Firstname'),
           ('123', 'Student ID'), 
           ('ABC XYZ KLM', 'Student Fullname')]

line = ' "Details of first student are FirstName ="ABC" FullName ="ABC XYZ KLM" ID = "123"'    
for src, dst in sorted(student, key=lambda x: len(x[0]), reverse=True):
    line = line.replace(src, dst)
Sign up to request clarification or add additional context in comments.

4 Comments

There's actually no need for a dictionary at all here. A list of tuples will do.
@AlexHall you are right, I was trying to suggest the least change the OP's idea though
Sorry I must have been more specific. My input line could also be like: "Details of first student are FirstName ="ABC" FullName ="ABC XYZ KLM" ID = "123". Firstname can appear first. In this case ordered dictionary doesn't do the job for the required output.
@user8237652 The order of occurrence in the string does not effect the way this works. The problem is that given "ABC XYZ KLM", there are two possible replacements: ABC->Firstname and "ABC XYZ KLM" -> Fullname. If the Firstname replacement is executed, then the pattern for Fullname will not match. The important thing is to perform the replacements with the longest key string first, then shorter key strings.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.