1

I would like to replace all strings between > and <, that is, for example, replace center (from excerpt:> is the sun the center of the universe?:<) by foo, but do not replace center (from excerpt: <...center;">).

I am using the following command:

perl -pi -w -e 's/center/foo/g;' file.html

So I tried to use replace all "foo" between two HTML tags using REGEX (PHP code), getting like this:

perl -pi -w -e 's/(?<![\w$<])\$\(center\)(?![\w$>])/foo/g;' file.html

but it doesn't work properly for what I want. I searched the web and what comes closest to what I need is Perl string replace: Match, but not replace, part of regex, Perl Regex - Search and replace between tags only if string is in-between and Replace text in string with exceptions. But I can't quite solve the need to just replace strings that are not <center> specifically.

fragment_html_code:

</td></tr><tr><th colspan="2" class="" style="text-align:center;">is the sun the center of the universe?:</th></tr><tr class=""><td colspan="2" class="" style="text-align:center;">
center </td></tr>

EDIT UPDATE:

About Lordadmira Solution:

The code fails every time there is a line jump between <> and </>. For example failed when the word to be replaced is like (here there is a line break) center </>. What can it be happening? See below for an example of context:

</td></tr><tr><th colspan="2" class="" style="text-align:center;">
   (Here there is a line jump and then the solution of Lordadmira fails and does not occur) ----> is the sun the center of the universe?:
    </th></tr><tr class=""><td colspan="2" class="" style="text-align:center;">
        center </td></tr>

EDIT UPDATE 01:

I modified the initial solution of Lordadmira to perl -0777 -pi -w -e 's{>\K[^<]*?\K.foo[^<]*(?=<).}{ bar }g;' file.html or perl -0777 -pi -w -e 's{>\K[^<]*?\K.foo.[^<]*(?=<).}{ bar }g;' file.html and this has worked with line break but it erases everything that comes after foo. I tried several methods to avoid the text after the foo was erased but I have not been able to get a solution. If in case I managed to resolve this then the question would be fully answered.

EDIT UPDATE 02:

I have now changed my modification from Lordadmira in EDIT UPDATE 01 to perl -0777 -pi -w -e 's{>\K[^<]*?\K.foo.[^<](?!=<)}{ bar }g;' in order to correct the fact that the text after foo was previously being deleted. But this is erasing the first character of the string after foo,I would like to say that for example in

> "lorem
  foo ipsum "< 

when foo is replaced the result is not as expected because I get >" lorem bar psum "< , that is, the ipsum "i" is deleted.


The solution below has solved the issue of having a character in the string after foo is being deleted with each replacement. For the time being under a broad context this has been the most functional adaptation of Lordadmira's initial solution.

To resolve this, it is necessary to omit operator dot at the end of foo, and add negative lookahead as additional explanation at Regex matching line not containing the string and exhaustively subsidized in the section "Positive and Negative Lookahead", modifying the part (?=<) belonging to Lordadmira's initial solution to (?!=<).

perl -0777 -pi -w -e 's{>\K[^<]*?\K.foo[^<](?!=<)}{ bar }g;'


EDIT UPDATE 3:

After several tests I believe have come to a maximally satisfying solution for my intentions.

perl -0777 -pi -w -e 's{>[^<]*?\K\b(foo)\b(?!=<)}{bar}g;'

2
  • 1
    What about perl -pi -w -e 's/(?<=>)[^<]+(?=<)/foo/g;' file.html or do you need to match "center" specifically? Commented Mar 4, 2021 at 6:53
  • @JerryJeremiah, as for specifying center it is just for fragment_code_html to serve with MWE, but my need arose to replace many other strings around center as well. Although thinking better about the presence of other strings around the center is probably something that influences the solution. Commented Mar 4, 2021 at 7:03

2 Answers 2

2

You would do this.

s{>\K[^<]*?center[^<]*(?=<)}{foo}g;

EDIT: Using the perl -p command line reads the file line by line and presumes that all the work you want to do in contained on single lines. If you need to work across lines, you have to read in the entire file (or whatever sufficient chunks). Use perl -0777 -p and it should work.

See perlrun for more information.

HTH

Sign up to request clarification or add additional context in comments.

5 Comments

The code fails every time there is a line jump between <> and </>. For example failed when the word to be replaced is like <tr> (here there is a line break) center </>. What can it be happening?
When you use the -p switch it is reading the file line by line. If there can be line breaks in the match, read in the entire file at once. Either put -0777 on the command line or make a short script and undef $/.
perl -0777 -p did not work, I read the documentation in perlrun and researched the web but nothing that has solved. here a sample github.com/yaacovNaNachRabbeinu/things/blob/main/…, , in this file I would like to replace for example light by luz, but the text is not changed when I run perl -0777 -pi -e 's{>k[^]*?light[^<]*?=<)}{light}g;' Alexandrite.html.
In fact the command, 'perl -0777 -pi -e 's{>k[^]*?light[^<]*?=<)}{light}g;' Alexandrite_mini.txt, did not work even when the file is not HTML and has a brief text as this file github.com/yaacovNaNachRabbeinu/things/blob/main/…
i have modified its initial solution to perl -0777 -pi -w -e 's{>\K[^<]*?\K.foo[^<]*(?=<).}{ bar }g;' file.html or perl -0777 -pi -w -e 's{>\K[^<]*?\K.foo.[^<]*(?=<).}{ bar }g;' file.html and this has worked with line break but it erases everything that comes after foo. I tried several methods to avoid the text after the foo was erased but I have not been able to get a solution. If in case I managed to resolve this then the question would be fully answered..
1

My answer is a obvious adaptation of the initial @lordadmira solution above:

Two things were necessary to promote an adaptation of Lordadmira's initial solution: to work with the break line and to keep the original text in its entirety after foo. The adaptation was as below:

perl -0777 -pi -w -e 's{>\K[^<]*?\K.foo[^<](?!=<)}{ bar }g;'

To resolve this, it is necessary to omit operator dot at the end of ..\K.foo, and add negative lookahead as additional explanation at Regex matching line not containing the string and exhaustively subsidized in the section "Positive and Negative Lookahead", modifying the part (?=<) belonging to Lordadmira's initial solution to (?!=<).

Note: I am not absolutely sure that it will work in all possible contexts of code format or html content but it has been sufficient in the tests that I have done so far.

Solution Final (i.e EDIT UPDATE 3 in my question above):

perl -0777 -pi -w -e 's{>[^<]*?\K\b(foo)\b(?!=<)}{bar}g;'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.