0

I'm attempting to build a regular expression to remove empty tags that may or may not have white space between them.

So far I'm using this:

 $pattern = '/<p>\s*<\/p>/im';
 $cleaned_html = preg_replace($pattern, "", $unclean_html);

This is the contents of $unclean_html can be seen here:

<!DOCTYPE html>
<!-- Generated by PHPWord -->
<html>
<head>
<meta charset="UTF-8" />
<title>PHPWord</title>
<meta name="author" content="Dustin Chandler" />
<style>
* {font-family: Arial; font-size: 12pt;}
a.NoteRef {text-decoration: none;}
hr {height: 1px; padding: 0; margin: 1em 0; border: 0; border-top: 1px solid #CCC;}
table {border: 1px solid black; border-spacing: 0px; width : 100%;}
td {border: 1px solid black;}
</style>
</head>
<body>
<p style="margin-top: 0; margin-bottom: 0;"><span style="font-weight: bold;">Cutline: North Carolina is the second-most at-risk state in the nation for farmland loss, according to a study by American Farmland Trust.</span></p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;"><span style="font-weight: bold;">Head shot c</span><span style="font-weight: bold;">utline: N.C. Agriculture Commissioner Steve Troxler is working on ways to preserve N.C. farmland.</span></p>
<p> </p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;"><span style="font-weight: bold;">Digging into solutions: Troxler highlights strategies</span><span style="font-weight: bold;"> </span><span style="font-weight: bold;">for</span><span style="font-weight: bold;"> </span><span style="font-weight: bold;">N.C. farmland</span></p>
<p> </p>
<p style="text-align: left; margin-top: 0pt; margin-bottom: 0pt;">Agricultual education</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Agriculture may be North Carolina’s top industry, but the state is losing farmland – fast.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">In fact, North Carolina had the second highest rate of farmland loss in the country in 2020, according to a report from the American Farmland Trust.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Evan Davis, director of the Agricultural Development and Farmland Preservation Trust Fund, joined N.C. Agriculture Commissioner Steve Troxler and other officials from the Department of Agriculture & Consumer Services in the third professional development seminar of the fall, speaking to students and faculty about the importance of farmland preservation and conservation measures to curb future loss.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">According to the report, Davis said, “732,000 acres of ag land were converted to non-ag uses between 2001 and 2016. More than 571,000 acres were converted to scattered, large-lot housing developments. North Carolina led the nation in this kind of development.”</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Unfortunately, much of this land was categorized by the agency as “nationally significant land,” the best land for long-term production of food and fiber.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“So, not only are we losing farmland, but we’re losing our most productive land,” Davis said,</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Upon entering office in 2005, Troxler said he and his attorney spent several weeks looking for answers on how to preserve N.C. farmland.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“I grew up in the little town of Brown Summit,” said Troxler. “When I was farming, I saw encroachment start to happen and I saw farms and forests start disappearing. When I went into (this) office, I looked around the state and saw the same thing all over North Carolina.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“We found out that the states that were doing the most to preserve farmland were the states that had already lost the majority of their farmland,” said Troxler. “We certainly don’t want to get to that point in North Carolina.”</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">A recent report by American Farmland Trust predicted the rates of national farmland loss by the year 2040 under current development trends, runaway sprawl and “better-built cities” (compact and dense development). According to the report, North Carolina would lose farmland at the second-highest rate in the nation in each scenario.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“If you look at the current development trends, we’re projected to lose almost 1.2 million acres by 2040,” said Davis. “In the worst scenario we would lose more acres, 1.6 million, than the entire state of Delaware.”</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">To mitigate this, Davis said it’s important to look at legal tools that would aid in preservation such as agricultural conservation easement.  Easements restrict residential, commercial, and industrial development to ensure the land remains in agricultural, horticultural, or forestry production. The most common, a perpetual conservation easement, is often used in partnership with USDA, the military and ADFP Trust Fund.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Land -development issues making sure that the open space and natural resources of private farmland cleansing water runoff, wildlife habitats, etc. – were preserved during zoning while not decreasing the property value as well as ensuring the landowner’s private property rights were not infringed.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“We set up a system where we pay people for the development rights on the piece of property that they own,” Troxler explained. “They can participate in escalating real estate prices, but at the same time, make sure that the land is always farmland.”</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Assistant Commissioner Alexander “Sandy” Stewart, Ph.D., used the history of his farm in Moore County, a 186-acre non-contiguous property owned by his family since 1775, to illustrate his personal stake in preservation.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“Farmland preservation, for my farm, is important, but it’s also important to what my neighbors do with their place, up the hill and around in the community,” said Stewart.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Studies show that both agricultural land and industrial land are net contributors to a county, Stewart explained. </p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“The cost of a county providing trash, water, sewer, fire prevention, police or other services to the land costs the county less than what the land generates in tax base. However, when you get to residential land, the cost of community services is usually the opposite. It normally costs the county more per acre to provide those services because they’re such heavy users of the services. Tax rates are higher, but they use the services at such a higher rate.”</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Stewart explained that while those in agriculture and general landowners aren’t categorically opposed to development plans, a proper arrangement between landowners and county/municipal government.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“I think that everybody wants good neighbors,” said Stewart. “They just want to see a smart plan.”</p>
<p> </p>
<p> </p>
</body>
</html>

It's not matching anything in the unclean html, though I've tried some online regular expressions testers and it shows to match the empty paragraph tags.

I've also tried

$clean_html = str_replace("<p> </p>", "", $unclean_htnl);

That still doesn't replace anything either.

7
  • Your link to the contents asks me to login. Rather than trying to show us your real data, narrow the problem down to a minimal reproducible example - test on small parts of the input, and edit the question to show that small input, the output you get, and the output you wanted for that example. AterLux's answer shows a good example of what this can look like. Commented Dec 9, 2022 at 20:08
  • I apologize, I copied the wrong link when doing it...I've corrected the link, it shouldn't ask to log in now. Commented Dec 9, 2022 at 20:14
  • I'm not sure why but I got the full data to post properly when editing the question, it would not let me add that HTML code originally as a code block. It's working now, so no more link. Commented Dec 9, 2022 at 20:18
  • You've got it to paste ... but you've ignored my advice to reduce it to a minimal example, and include the expected and actual output. Debugging is a key skill in programming, and that includes narrowing down where problems are occurring, rather than repeating the same test every time. Commented Dec 9, 2022 at 20:21
  • I...have...this is the minimal input, it's a single file...and I told my expected output from the start, ti's expected to replace the empty tags. I'm not seeing where you're coming from here. I have been debugging this for hours, my last resort was coming HERE, not my first option. I gave my input, I gave what I'm using to try and solve my problem adn I gave what I expected to happen. That's one document, one single output from PHPWord, if you're wanting to know whether I've used a small input, I have...same thing...but the fact remains it does not find <p> </p> despite my regex. Commented Dec 9, 2022 at 20:28

1 Answer 1

1

first of all, you have to use the backslash \ as the escape character.

Next, pay attention when using backslash escaping in a single-quote and double-quote strings. Better use single-quoted strings for regular expression patterns.

You have to use preg_replace (not str_replace) for regular expressions.

Also, if you want to use forward slash / as a part of your pattern (e.g. in </p>), consider using other pattern delimiters. E.g. #:

<?php

  $unclean_html = '<p>tag</p>;  empty<p> </p>tag;  linefeed<p>
  </p>tag; <p>other tag</p>';

  $clean_html = preg_replace("#<p>[\s\x{00A0}]*</p>#iu", "", $unclean_html);

  print($clean_html); // <p>tag</p>;  emptytag;  linefeedtag; <p>other tag</p>

UPD in the example you've provided, there is a non-breakable space character between <p> and </p>.

preg pattern \s does not include this symbol, therefore you need to add it manually. Use codepoint 0x00A0 and u option if you have your text encoded in UTF-8.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your reply but this still did not work. I am using preg_replace and using the exact regular expression that you provided, it still did not match anything in the text I provided.
Thank you again for all of your help, it's still not finding it, however. I thought it might be a carriage return issue but when attempting to add that it still doesn't find it. I'm not a regex master by any means, in fact, I try to avoid it as much as possible, but this has me stumped completely. You can take a look at this link to see it better, I have put your expression in: word-document-converter-dragonniichan919181.codeanyapp.com Thank you again for at least attempting!
I solved it...I put &nbsp; and it fixed it...obviously when outputting the code to a text area it doesn't show &nbsp;, I did not know that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.