1

in this string

ab<(CN)cdXYlm<(CI)efgXYop<(CN)zXYklmn<(CI)efgXYuvw<

I want to replace each substring between XY and < by either ONE or TWO depending on characters between previous brackets:

if XY after (CN) replace substring by ONE

if XY after (CI) replace substring by TWO

So the result should be:

ab<(CN)cdONE<(CI)efgTWO<(CN)zONE<(CI)efgTWO<

XY and following characters should be replaced but not angle bracket <.

This is for modifying HTML and arbitrary characters can occur between XY and <. I guess I need two regex for (CN) and (CI).

# This one replaces just all XY:   
my $s = 'ab<(CN)cdXYlm<(CI)efgXYop<(CN)zXYklmn<(CI)efgXYuvw<';
$s =~ s/(XY(.*?))</ONE/g;    
# But how to add the conditions to the regex?

3 Answers 3

7

You don't need two regexes. Capture the C[NI] and retrieve the corresponding replacement value from a hash:

#!/usr/bin/perl
use warnings;
use strict;

my $s = 'ab<(CN)cdXYlm<(CI)efgXYop<(CN)zXYklmn<(CI)efgXYuvw<';

my %replace = (CN => 'ONE', CI => 'TWO');

$s =~ s/(\((C[NI])\).*?)XY.*?</$1$replace{$2}</g;

my $exp = 'ab<(CN)cdONE<(CI)efgTWO<(CN)zONE<(CI)efgTWO<';

use Test::More tests => 1;
is $s, $exp;
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, that works. Actually I need more character combinations between the brackets, so I replaced (C[NI]) by (CN|CO|IN)
@Fabian_Z071: If you also added a new pair into the %replace hash, everything should work :-)
2

My guess is that this expression or maybe a modified version of that might work, not sure though:

([a-z]{2}<\([A-Z]{2}\)[a-z]{2})([^<]+)(<\([A-Z]{2}\)[a-z]{3})([^<]+)(<\([A-Z]{2}\)[a-z])([^<]+)(<\([A-Z]{2}\)[a-z]{3})([^<]+)<

Test

use strict;
use warnings;

my $str = 'ab<(CN)cdXYlm<(CI)efgXYop<(CN)zXYklmn<(CI)efgXYuvw<';
my $regex = qr/([a-z]{2}<\([A-Z]{2}\)[a-z]{2})([^<]+)(<\([A-Z]{2}\)[a-z]{3})([^<]+)(<\([A-Z]{2}\)[a-z])([^<]+)(<\([A-Z]{2}\)[a-z]{3})([^<]+)</mp;
my $subst = '"$1ONE$3TWO$5ONE$7TWO<"';

my $result = $str =~ s/$regex/$subst/rgee;

print $result;

The expression is explained on the top right panel of this demo, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.

2 Comments

Note that Perl interpolates just once in the replacement part, i.e. $subst will be interpolated, but not $1 it contains.
It works with my $subst = '"$1ONE$3TWO$5ONE$7TWO<"'; and /rgee.
1

This can be done in one line regex using /e and ternary operator ? in the /replace/. /r option returns the resulting string, in effect this would keep the original string $s unmodified.

use strict;
use warnings;

my $s ='ab<(CN)cdXYlm<(CI)efgXYop<(CN)zXYklmn<(CI)efgXYuvw<';
print (($s=~s/\(([^)]+)\)([^(]+)XY[^(]+</"($1)$2".(($1 eq CN)?ONE:TWO)."<"/gre)."\n");

Output:

ab<(CN)cdONE<(CI)efgTWO<(CN)zONE<(CI)efgTWO<

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.