Regex: Replacing a string in a sub-string only

Question

I have special file format where I need to replace dozens of strings and reformat its structure. As the simplest solution I have prepared my patterns file where all regex definitions/replacements are stored (~100 replacements). I'm using perl to find and replace patterns (perl -p patterns source.file). Everything so far so good.

However, there is one case I'm unable to resolve using regex. I need to replace strings in part of the whole line, i.e. replace string in within a sub-string only.

Example: For simplicity, I need to replace all "A" to "X" only in the middle string (delimited by ;).

Input line:

ABCD ABCD; ABCD ABCD; ABCD ABCD

Expected output:

ABCD ABCD; XBCD XBCD; ABCD ABCD
           ^    ^
           the only replaced characters

This correctly replaces all characters:

s/A/X/g;

But I need to replace commas in the middle field only. I tried:

s/(.*?;.*?)A/\1X/g;
s/(.*?;.*)A(.*?;)/\1X\2/g;  # alternative to find the last A

But this replaces either the first A. I can have multiple patterns like this to repeat the search&replace procedure but this does not sound like a good solution as I don't know how many A's I will have in the sub-string.

I tried to play with lookbehind but unsuccessfully. Please note, I just need a regex definition I could use in my patterns file (i.e. no perl code). Alternatively, I'm able to use sed or awk to handle this case but I'm not too much familiar with it.

Thanks, community!

Regex101: https://regex101.com/r/Ic4ciA/1

Are you restricted to sed and awk, or is there a programming language which you can use here? — Tim Biegeleisen
– Tim Biegeleisen, Commented Nov 4, 2019 at 9:52
This replacing procedure is just one of the actions in my bash script so I can just any simple command to the pipeline. I think I would be able to handle this using some code (I'm pretty familiar with python). But I'm more curious if it's possible to match&replace the strings with some special regex pattern. — CraZ
– CraZ, Commented Nov 4, 2019 at 9:57

Toto · Accepted Answer · 2019-11-04 11:12:47Z

1

A perl one liner:

echo 'ABCD ABCD; ABCD ABCD; ABCD ABCD' | perl -pe 's/(?:.+?;|\G).*?\KA(?=.*?;)/X/g'
ABCD ABCD; XBCD XBCD; ABCD ABCD

Explanation:

(?:             # non capture group
    .+?         # 1 or more any character but newline, not greedy
    ;           # semicolon
  |             # OR
    \G          # restart from last match position
)               # end group
.*?             # 0 or more any character but newline, not greedy
\K              # forget all we have seen until  this position
A               # letter A
(?=             # positive lookahead, make sure we have after:
    .*?         # 0 or more any character but newline, not greedy
    ;           # a semicolon
)               # end lookahead

Demo

edited Nov 4, 2019 at 11:12

answered Nov 4, 2019 at 10:56

Toto

91.7k63 gold badges97 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

CraZ Over a year ago

Wow, this is awesome! Very functional and on top of that I've learned many new things - non-capturing groups, \G and \K. Many thanks, Toto!

Tim Biegeleisen · Accepted Answer · 2019-11-04 10:04:52Z

0

I don't know of a clean way to do this in one go using a regex tool alone. But if you are open to a more iterative approach, it can fairly easily be handled in any scripting language. Here is a Python script which gets the job done:

inp = "ABCD ABCD; ABCD ABCD; ABCD ABCD"
parts = inp.split(';')

index = 1
while index < len(parts)-1:
    parts[index] = parts[index].replace('A', 'X')
    index += 1

output = ';'.join(parts)
print(output)

This prints:

ABCD ABCD; XBCD XBCD; ABCD ABCD

The approach is to split the input string on semicolon, generating a list of terms. Then, iterate from the second to second-to-last term, doing a replacement of the letter A to X. Finally, join together to produce the output you want.

answered Nov 4, 2019 at 10:04

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

2 Comments

CraZ Over a year ago

Yes, this is exactly my "backup" solution, i.e. breakdown the line, replace required strings in its sub-string and then concatenate the parts back. Thanks, anyway.

Tim Biegeleisen Over a year ago

I like your backup solution more than any alternative. The thing is, you don't even need regex here for the replacement, and regex alone isn't a parsing/iterating tool. Even using from a programming language, you will still need to iterate to get the behavior you want.

Collectives™ on Stack Overflow

Regex: Replacing a string in a sub-string only

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related