RegEx for multiline search and replace in SQL query code

Question

There is a lot of qualified documents on the Internet regarding the topic of "search and replace using regular expressions". Only few of them show how to do this in a multiline context. Even fewer show indicate how to generate a regex for several items therein.

I have tried both installable RegEx tools within editors (EditPad Pro, RJ TextED, EmEditor, Notepad++, Sublime Text 3, Visual Studio Professional 2019, the latest JetBrains PHPstorm version, and others) and online RegEx services (regular expressions 101, RegExr) the entire day, read the answers on StackOverflow which corresponded to my title criteria, and additionally tried to make the most of various online tutorials.

You make call me stupid, but I have not been able to understand whether the following concept is feasible at all

The part of the SQL query I want to change is the following one:

    AND op.OP1OPVerfahren > 0

    AND p.Testzwecke = 0

    AND NOT EXISTS (SELECT DISTINCT 1 FROM ods39.dat_optherapie op2 WHERE op2.patID = p.ID AND op2.revision > op.revision)

    UNION ALL

Legend:

op.OP1OPVerfahren is the database field for the first surgery performed, 10 surgical procedures can be documented (OP1OPVerfahren until OP10OPVerfahren)
p.Testzwecke is a JOIN to the patient's personal data such as first name, last name, etc.
ods39.dat_optherapie is the table dat_optherapie from database ods39 - the system consists of 50 MySQL databases of the exact same structure
p.ID is merely the patient's ID
op.revision is an autoincrementing tracker of how many data record sets for the same surgical procedure have been saved (sometime revisions in the sense of precisions are required)

The above-mentioned part of the query has a quantitative complexity associated: Within the query, this segment appears 780 times in the following variation:

    AND **op.OP1OPVerfahren** _up_to_ **op.OP10OPVerfahren** > 0

    AND p.Testzwecke = 0

    AND NOT EXISTS (SELECT DISTINCT 1 FROM **ods01.dat_optherapie** _up_to_ **ods39.dat_optherapie** op2 WHERE op2.patID = p.ID AND op2.revision > op.revision)

    UNION ALL

To fully understand what I want to solve here the expression I want to replace the fore-mentioned with:

    AND **op.OP1OPVerfahren** _up_to_ **op.OP10OPVerfahren** > 0

    AND p.Testzwecke = 0

    AND NOT EXISTS (SELECT DISTINCT 1 FROM **ods01.dat_optherapie** _up_to_ **ods39.dat_optherapie** op2 WHERE op2.patID = p.ID AND op2.revision > op.revision)

    GROUP BY **OP1OPVerfahren** _up_to_ **OP10OPVerfahren**

    UNION ALL

The op.OP_x_OPVerfahren (x = 1 to 10) from the very first line and the OP_x_OPVerfahren (x = 1 to 10) within the GROUP BY statement are numerically correlated to each other, i. e. when I want to change my replacing procedure from op.OP1OPVerfahren along 39 databases to op.OP2OPVerfahren for again 39 databases and so on, the GROUP BY numbers shall change accordingly.

Now, this replacement shall be carried out for all 39 databases. The entire SQL query code is about 20.000 lines of code - my reason why I do not want to spend hours on replacing manually as there are more such SQL query structures in different files which need replacing in a similar fashion.

To give you an example:

The code ...

    AND op.OP1OPVerfahren > 0

    AND p.Testzwecke = 0

    AND NOT EXISTS (SELECT DISTINCT 1 FROM ods39.dat_optherapie op2 WHERE op2.patID = p.ID AND op2.revision > op.revision)

    UNION ALL

... needs to be expanded with a GROUP BY OP1OPVerfahren before the UNION ALL for the 39 databases ods01 up to ods39, accordingly. Then with op.OP2OPVerfahren and OP2OPVerfahren for the same 39 databases again until (op.)OP10OPVerfahren is finally reached (= 780 replacements).

The newly inserted GROUP BY statement's OP_x_... counting shall have the same number as the op.OP_x_... numbering.

I have experimented with tons of different regex statements (such as \d\d, (\d)(\d), \d{2}, and many others according to the individual needs of the above-mentioned editors I used) but I was not able to find out how to make one "number detection" (op.OP_x_OPVerfahren and OP_x_OPVerfahren) dependent on the "number detection" from the databases ods_x_.dat_optherapie).

I would greatly appreciate a bit of help from your most valuable experience and expertise, and I would also be very thankful for receiving further recommendations for other than the mentioned editors with a good (and maybe even testable) regex handling.

Yes, thank you, it worked like a charm right out of the box. I had to refine it a trifle becuase I had overlooked that there was a small comment right in front of ods39 indicating the switch to a new OP_x_OPVerfahren but that was no problem. You saved me loads of tedious hours of manual editing! — mtjmohr
– mtjmohr, Commented Oct 9, 2020 at 23:31

wp78de · Accepted Answer · 2020-10-09 23:48:03Z

1

We can make this work using a regex replace like this:

(AND\ +op\.(OP\d0?OPVerfahren)\ *>\ *0\s+AND\ +p\.Testzwecke\ *=\ *0\s+AND\ +NOT\ +EXISTS\ *\(SELECT\ +DISTINCT\ +1\ +FROM\ +ods[0123][0-9]\.dat_optherapie\ +op2\ +WHERE\ +op2\.patID\ *=\ *p\.ID\ +AND\ +op2\.revision\ *>\ *op\.revision\))(\s+UNION\s+ALL)

Demo

It's sticks rather tight to the original string and mostly only introduces variable-length quantifiers for whitespace characters. When there is a \ * an optional space may occur, if the space is mandatory \ + is used. Otherwise the whitespace shorthand character \s is used to allow not only spaces but newlines and alike. To make it work, enable the s|singleline flag (or add (?s) in front of the pattern).

answered Oct 9, 2020 at 23:48

wp78de

19.1k7 gold badges49 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user9601310 · Accepted Answer · 2020-10-09 21:44:32Z

0

I believe something like the following regex find/replace expressions will do what you are asking:

Find:

AND op.OP(\d{1,2})(OPVerfahren.*?\))

Replace with:

AND op.OP$1$2 \n GROUP BY OP$1OPVerfahren

Note that it needs the "global" and "dot matches newline" options set for the regex.

To briefly explain, this has 2 capturing groups, one for the digit(s) between op.OP and OPVerfahren and the second to capture everything after that up to the closing bracket of the "(SELECT DISTINCT... ). These are then used as $1 and $2 in the replacement section of the regex.

Test example here. I believe this should work in Notepad++.

(By the way, I think your "GROUP BY OP1Verfahren" should be "GROUP BY OP1OPVerfahren" right? i.e. 2 lots of "OP"s!)

edited Oct 9, 2020 at 21:44

answered Oct 9, 2020 at 21:38

user9601310

1,1161 gold badge8 silver badges12 bronze badges

4 Comments

mtjmohr Over a year ago

Yes, you are absolutely right regarding your comment in brackets, I have edited my question accordingly.

mtjmohr Over a year ago

Thank you very much for your solution. After a bit of modifying it, it worked in EditPad Pro.

user9601310 Over a year ago

You are welcome. If you are happy with the solution, would you care to mark the question as answered?

mtjmohr Over a year ago

I am very grateful to you for having invested your time into looking and trying to solve my problem, but I had to give @wp78de credit for delivering an immediate solution. Thank you again, anyway, please do not consider me to be rude as I have learnt from both of your solutions which I highly appreciate.

Collectives™ on Stack Overflow

RegEx for multiline search and replace in SQL query code

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related