I'm adding schema (description) to our product pages, all of which are dynamically generated, so I'm looking to add a good general purpose regular expression to properly format said description.
So here's what I'm currently working with (spaced a little oddly for ease of reading):
<meta itemprop="description" content="
<?php
$original_desc = $_product->getShortDescription();
$schema_desc = preg_replace('Rocking REGEX theoretically goes here','$1 $2', $original_desc);
strip_tags($schema_desc);
echo $schema_desc;
?>
">
Problem is, our product descriptions are being pulled from the admin of our CMS, so the formatting is a little squirrelly.
Here's what they look like:
content="<p><strong>Product Title</strong> - Other Product Name - <em>Blah Blah</em></p>
<p><strong>Product Heading 1</strong> </p>
<p><strong>Product Heading 2:</strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras vulputate pellentesque sem, id mattis sem blandit at.
Suspendisse tempus sodales enim nec aliquam. Vestibulum laoreet tincidunt dui, sit amet laoreet ipsum gravida at. Nulla in tempus justo,
et bibendum dolor.</p>
<p><strong>Product Heading 3:</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras vulputate pellentesque
sem, id mattis sem blandit at. Suspendisse tempus sodales enim nec aliquam. Vestibulum laoreet tincidunt dui, sit amet laoreet ipsum gravida at.
Nulla in tempus justo, et bibendum dolor.</p>"
So here's what I want to do - I want to KEEP the text between the first two <strong></strong> tags because that's the product category/title, but all the subsequent text between <strong></strong> tags are simply headings that have no usefulness in a search description, so I'd like to remove it. I've found ways to say, strip ALL the text from between ALL the <strong></strong> tags, but not all but the first.
Thanks!