How to replace all instance of a String with another indexed string Python

Question

This question is a little difficult to articulate with my inadequate English but I will do my best.

I have a directory of xml files, each file contains xml such as:

<root>
    <fields>
        <field>
            <description/>
            <region id="Number.T2S366_R_487" page="1"/>
        </field>
        <field>
            <description/>
            <region id="Number.T2S366_R_488.`0" page="1"/>
            <region id="String.T2S366_R_488.`1" page="1"/>
        </field>
    </fields>
</root>

I'd like to do a String replacement on the lines which contain the dot, tick, number notation such as .`0 with an index notation like [0],[1], [2], ... and so forth.

So the transformed xml payload should look like something below:

<root>
    <fields>
        <field>
            <description/>
            <region id="Number.T2S366_R_487" page="1"/>
        </field>
        <field>
            <description/>
            <region id="Number.T2S366_R_488[0]" page="1"/>
            <region id="String.T2S366_R_488[1]" page="1"/>
        </field>
    </fields>
</root>

How can I accomplish this using python? This seems fairly straight forward to do using regex but that would be difficult to do for a directory of files containing multiple files. I'd like to see an implementation using python 3.x, as I am learning it.

Suggest checking out the lxml library for Python3: lxml.de/installation.html — JacobIRR
– JacobIRR, Commented Jan 26, 2018 at 0:16

Imran · Accepted Answer · 2018-01-26 00:49:50Z

3

In Python you can loop over all files in your directory with os.listdir and make substitutions in-place with fileinput:

import os
import fileinput

path = '/home/arabian_albert/'
for f in os.listdir(path):
    with fileinput.FileInput(f, inplace=True, backup='.bak') as file:
        for line in file:
            print(re.sub(r'\.`(\d+)', r'\[\1\]', line), end='')

However, you should consider doing this from the command line with sed:

find . -type f -exec sed -i.bak -E "s/\.`([0-9]+)/[\1]/g" {} \;

The above will make the substitution for all files in the current directory, and backup with old files with .bak.

edited Jan 26, 2018 at 0:49

answered Jan 26, 2018 at 0:37

Imran

13.6k8 gold badges69 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

arabian_albert Over a year ago

Oh nice implementation, I especially like the way of backing up old files with the .bak extension. I guess there is no way around regex

Imran Over a year ago

Yes, I left that in there in case I screwed up :)

ascripter · Accepted Answer · 2018-01-26 00:25:01Z

2

You can use a simple regex to do that:

import re
sample_str = """
<root>
    <fields>
        <field id="S366/487" type="xs:int" bind="T2S366/487">
            <description/>
            <region id="WholeNumberWithSeparator.T2S366_R_487" page="1"/>
        </field>
        <field id="S366/488" type="xs:int" bind="T2S366/488">
            <description/>
            <region id="Number.T2S366_R_488.`0" page="1"/>
            <region id="String.T2S366_R_488.`1" page="1"/>
        </field>
    </fields>
</root>
"""
pattern = "\.`(\d+)"
result = re.sub(pattern, lambda x: "[{}]".format(x.groups()[0]), sample_str)
print result

yields

<root>
    <fields>
        <field id="S366/487" type="xs:int" bind="T2S366/487">
            <description/>
            <region id="WholeNumberWithSeparator.T2S366_R_487" page="1"/>
        </field>
        <field id="S366/488" type="xs:int" bind="T2S366/488">
            <description/>
            <region id="Number.T2S366_R_488[0]" page="1"/>
            <region id="String.T2S366_R_488[1]" page="1"/>
        </field>
    </fields>
</root>

answered Jan 26, 2018 at 0:25

ascripter

6,31512 gold badges54 silver badges74 bronze badges

1 Comment

arabian_albert Over a year ago

Thanks, very straightforward implementation

tykom · Accepted Answer · 2018-01-26 01:10:13Z

How about this:

wholefile = ''

with open(r'xml_input.xml', 'r+') as f:
    lines = f.readlines()
    for line in lines:
        split_line = line.split('.')  # split at periods
        end_point = split_line.pop(-1)  # get and remove existing endpoint
        if end_point[0] == '`':  # if it matches tick notation
            idx_after_num = end_point.find('"')  # get the first index that matches a double quote
            the_int = end_point[1:idx_after_num]  # slice from after the tick to the end of the int
            end_point = list(end_point)  # convert to list
            del(end_point[:idx_after_num])  # delete up to the double quote
            end_point = ''.join(end_point)  # reconstruct string
            new_endpoint = '[{}]'.format(the_int) + end_point  # create new endpoint
            split_line += [new_endpoint]  # append new endpoint to end of list of split strs
            new_line = ''  # new empty string
            for n, segment in enumerate(split_line):
                if n >= len(split_line) - 2:  # if we're at or beyond the endpoint
                    new_line += segment  # concatenate the new endpoint
                else:
                    new_line += segment + '.'  # concatenate, replacing the needed '.'s
            wholefile += new_line  # replace, with changes
        else:
            wholefile += line  # replace, with no changes

with open('xml_out.xml', 'w+') as f:
    f.write(wholefile)

my output:

<root>
    <fields>
        <field id="S366/487" type="xs:int" bind="T2S366/487">
            <description/>
            <region id="WholeNumberWithSeparator.T2S366_R_487" page="1"/>
        </field>
        <field id="S366/488" type="xs:int" bind="T2S366/488">
            <description/>
            <region id="Number.T2S366_R_488[0]" page="1"/>
            <region id="String.T2S366_R_488[1]" page="1"/>
        </field>
    </fields>
</root>

Collectives™ on Stack Overflow

How to replace all instance of a String with another indexed string Python

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related