1

I want to change <a> tags to external links in a HTML text (not a full HTML document). Nevertheless, this Perl program fails to replace a pattern if the pattern occurs multiple times in the same line of string.

Here is a sample program:

use strict;
use warnings;

my $baseURL = "https://example.com";
my $input = <<'END';
<ul>
    <li><a href="https://www.amazon.com">Amazon</a></li>
    <li>
        <!-- Keep it in one line. -->
        <a href="https://www.google.com.tw">Google</a> and <a href="https://tw.yahoo.com">Yahoo</a> and <a href="https://duckduckgo.com">DuckDuckGo</a>
    </li>
</ul>
END

# Replace external links globally.
$input =~ s{<a href=\"([^"]+)\">(.+)</a>}{
    # Skip local URIs.
    substr($1, 0, 4) ne "http" ? "<a href=\"$1\">$2</a>"
    # Skip links in same domain.
    : index($1, "$baseURL") >= 0 ? "<a href=\"$1\">$2</a>"
    # Disable search engines from following links.
    : "<a href=\"$1\" target=\"_blank\" rel=\"noopener nofollow\">$2</a>"}ge;

# Print modified input to STDOUT.
print $input;

1 Answer 1

2

(.+) is greedy and captures everything to the last </a>. Try using (.+?) instead.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.