1

There is a lot of qualified documents on the Internet regarding the topic of "search and replace using regular expressions". Only few of them show how to do this in a multiline context. Even fewer show indicate how to generate a regex for several items therein.

I have tried both installable RegEx tools within editors (EditPad Pro, RJ TextED, EmEditor, Notepad++, Sublime Text 3, Visual Studio Professional 2019, the latest JetBrains PHPstorm version, and others) and online RegEx services (regular expressions 101, RegExr) the entire day, read the answers on StackOverflow which corresponded to my title criteria, and additionally tried to make the most of various online tutorials.

You make call me stupid, but I have not been able to understand whether the following concept is feasible at all

The part of the SQL query I want to change is the following one:

    AND op.OP1OPVerfahren > 0

    AND p.Testzwecke = 0

    AND NOT EXISTS (SELECT DISTINCT 1 FROM ods39.dat_optherapie op2 WHERE op2.patID = p.ID AND op2.revision > op.revision)

    UNION ALL

Legend:

  • op.OP1OPVerfahren is the database field for the first surgery performed, 10 surgical procedures can be documented (OP1OPVerfahren until OP10OPVerfahren)
  • p.Testzwecke is a JOIN to the patient's personal data such as first name, last name, etc.
  • ods39.dat_optherapie is the table dat_optherapie from database ods39 - the system consists of 50 MySQL databases of the exact same structure
  • p.ID is merely the patient's ID
  • op.revision is an autoincrementing tracker of how many data record sets for the same surgical procedure have been saved (sometime revisions in the sense of precisions are required)

The above-mentioned part of the query has a quantitative complexity associated: Within the query, this segment appears 780 times in the following variation:

    AND **op.OP1OPVerfahren** _up_to_ **op.OP10OPVerfahren** > 0

    AND p.Testzwecke = 0

    AND NOT EXISTS (SELECT DISTINCT 1 FROM **ods01.dat_optherapie** _up_to_ **ods39.dat_optherapie** op2 WHERE op2.patID = p.ID AND op2.revision > op.revision)

    UNION ALL

To fully understand what I want to solve here the expression I want to replace the fore-mentioned with:

    AND **op.OP1OPVerfahren** _up_to_ **op.OP10OPVerfahren** > 0

    AND p.Testzwecke = 0

    AND NOT EXISTS (SELECT DISTINCT 1 FROM **ods01.dat_optherapie** _up_to_ **ods39.dat_optherapie** op2 WHERE op2.patID = p.ID AND op2.revision > op.revision)

    GROUP BY **OP1OPVerfahren** _up_to_ **OP10OPVerfahren**

    UNION ALL

The op.OP_x_OPVerfahren (x = 1 to 10) from the very first line and the OP_x_OPVerfahren (x = 1 to 10) within the GROUP BY statement are numerically correlated to each other, i. e. when I want to change my replacing procedure from op.OP1OPVerfahren along 39 databases to op.OP2OPVerfahren for again 39 databases and so on, the GROUP BY numbers shall change accordingly.

Now, this replacement shall be carried out for all 39 databases. The entire SQL query code is about 20.000 lines of code - my reason why I do not want to spend hours on replacing manually as there are more such SQL query structures in different files which need replacing in a similar fashion.

To give you an example:

The code ...

    AND op.OP1OPVerfahren > 0

    AND p.Testzwecke = 0

    AND NOT EXISTS (SELECT DISTINCT 1 FROM ods39.dat_optherapie op2 WHERE op2.patID = p.ID AND op2.revision > op.revision)

    UNION ALL

... needs to be expanded with a GROUP BY OP1OPVerfahren before the UNION ALL for the 39 databases ods01 up to ods39, accordingly. Then with op.OP2OPVerfahren and OP2OPVerfahren for the same 39 databases again until (op.)OP10OPVerfahren is finally reached (= 780 replacements).

The newly inserted GROUP BY statement's OP_x_... counting shall have the same number as the op.OP_x_... numbering.

I have experimented with tons of different regex statements (such as \d\d, (\d)(\d), \d{2}, and many others according to the individual needs of the above-mentioned editors I used) but I was not able to find out how to make one "number detection" (op.OP_x_OPVerfahren and OP_x_OPVerfahren) dependent on the "number detection" from the databases ods_x_.dat_optherapie).

I would greatly appreciate a bit of help from your most valuable experience and expertise, and I would also be very thankful for receiving further recommendations for other than the mentioned editors with a good (and maybe even testable) regex handling.

3
  • 1
    Something along these lines? regex101.com/r/0NA6OO/1 Commented Oct 9, 2020 at 22:16
  • Yes, thank you, it worked like a charm right out of the box. I had to refine it a trifle becuase I had overlooked that there was a small comment right in front of ods39 indicating the switch to a new OP_x_OPVerfahren but that was no problem. You saved me loads of tedious hours of manual editing! Commented Oct 9, 2020 at 23:31
  • How can I rate your comment as the solution? Commented Oct 9, 2020 at 23:40

2 Answers 2

1

We can make this work using a regex replace like this:

(AND\ +op\.(OP\d0?OPVerfahren)\ *>\ *0\s+AND\ +p\.Testzwecke\ *=\ *0\s+AND\ +NOT\ +EXISTS\ *\(SELECT\ +DISTINCT\ +1\ +FROM\ +ods[0123][0-9]\.dat_optherapie\ +op2\ +WHERE\ +op2\.patID\ *=\ *p\.ID\ +AND\ +op2\.revision\ *>\ *op\.revision\))(\s+UNION\s+ALL)

Demo

It's sticks rather tight to the original string and mostly only introduces variable-length quantifiers for whitespace characters. When there is a \ * an optional space may occur, if the space is mandatory \ + is used. Otherwise the whitespace shorthand character \s is used to allow not only spaces but newlines and alike. To make it work, enable the s|singleline flag (or add (?s) in front of the pattern).

Sign up to request clarification or add additional context in comments.

Comments

0

I believe something like the following regex find/replace expressions will do what you are asking:

Find:

AND op.OP(\d{1,2})(OPVerfahren.*?\))

Replace with:

AND op.OP$1$2 \n GROUP BY OP$1OPVerfahren

Note that it needs the "global" and "dot matches newline" options set for the regex.

To briefly explain, this has 2 capturing groups, one for the digit(s) between op.OP and OPVerfahren and the second to capture everything after that up to the closing bracket of the "(SELECT DISTINCT... ). These are then used as $1 and $2 in the replacement section of the regex.

Test example here. I believe this should work in Notepad++.

(By the way, I think your "GROUP BY OP1Verfahren" should be "GROUP BY OP1OPVerfahren" right? i.e. 2 lots of "OP"s!)

4 Comments

Yes, you are absolutely right regarding your comment in brackets, I have edited my question accordingly.
Thank you very much for your solution. After a bit of modifying it, it worked in EditPad Pro.
You are welcome. If you are happy with the solution, would you care to mark the question as answered?
I am very grateful to you for having invested your time into looking and trying to solve my problem, but I had to give @wp78de credit for delivering an immediate solution. Thank you again, anyway, please do not consider me to be rude as I have learnt from both of your solutions which I highly appreciate.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.