1

I have a repeating text in a large file which I want to replace with some other text. For example:

some text.......\n partition by range (STRT_DTTM)\n some more text......\n ); I want to use regex to find these blocks that start with partition by range and ends with ); and replace that block with 'THIS IS TEST'. I am using the below code import re

with open(r"C:\Users\x217838\Desktop\python\input.txt","rt") as in_file:
    text = in_file.read()
    s = re.compile("^partition by range(.*);\)$)",re.MULTILINE)
    replace = re.sub(s, 'THIS IS TEST', text)
    print(replace)

Can you please let me know where I am going wrong.

0

2 Answers 2

2

You have to use \ for all regex reserved symbols --> [\^$.|?*+(){}. The final code will be:

import re
text = "partition by range(CANE) uno"
s = re.compile("^partition by range\(.*\)",re.MULTILINE)
replace = re.sub(s, 'THIS IS TEST', text)
print(replace)

The result is:

THIS IS TEST uno
Sign up to request clarification or add additional context in comments.

5 Comments

That works, but I would have liked to have an explanation on why you escape the parentheses in your answer.
The reason is that '(' and ')' are reserved symbol, used internally by regex. if you want to use the actual char you have to use the escape
You have to use \ for all this symbols --> [\^$.|?*+(){}
I know. I meant in the answer. comments aren't made for that.
Can you show that with a multiline? The block ends with ); use this text please text = partition by range(CANE) uno blah blah blah blah blahhh );
1

If you have your text spanning across multiple lines something like this,

some text.......
partition by range (STRT_DTTM)
some more text......
);

Then you will have to use (?s) modifier to enable . matching a new line.

Demo

Sample python codes,

import re

s = '''some text.......
partition by range (STRT_DTTM)
some more text......
);'''

mods = re.sub(r'(?s)partition by range(.*?)\);','THIS IS TEST',s)
print(mods)

Prints,

some text.......
THIS IS TEST

3 Comments

Thank you !!! This is working but it is matching till the last ); Is there a way to match in non greedy way . Stop matching as soon as it finds the first pattern and replace it and start looking for the next pattern.
Sure, just change .* to .*? in the regex. Let me update my answer
I got it . THANK YOU VERYYY MUCH.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.