13

Before we start, strip_tags() doesn't work.

now,

I've got some data that needs to be parsed, the problem is, I need to get rid of all the HTML that has been formated very strangely. the tags look like this: (notice the spaces)

< p > blah blah blah < / p > < a href= " link.html " > blah blah blah < /a >

All the regexs I've been trying aren't working, and I don't know enough about regex formating to make them work. I don't care about preserving anything inside of the tags, and would prefer to get rid of the text inside a link if I could.

Anyone have any idea?

(I really need to just sit down and learn regular expressions one day)

6 Answers 6

35

Does

preg_replace('/<[^>]*>/', '', $content)

work?

Sign up to request clarification or add additional context in comments.

1 Comment

Instead of * you could use +, because with * you will also replace <> if found in text.
17

strip_tags() will work if you use html_entity_decode() on a variable before strip_tags()

<?php
$text = '< p > blah blah blah < / p > < a href= " link.html " > blah blah blah< /a >';
echo strip_tags(html_entity_decode($text));
?>

Comments

2

Solution which isn't fool-proof, but will work for what you posted:

s/<[^>]*>//g

Comments

1

Formatted strangely? That is valid HTML though right? In that case I wouldn't touch it with regular expressions. Examples of how this can go wrong and why it's a bad idea are legion. Instead I'd use HTML Tidy on it to, for example, clean up unnecessary white-space.

2 Comments

I was going to post this, but was too tired to word it intelligibly. +1.
When I run the string through HTML Tidy it changes the < and > signs to < and > so strip_tags() still wont work on those. I was using both tidy_parse_string() and tidy_repair_string(). Is there another function that will work that I don't see?
-2

https://www.php.net/strip_tags is probably what you need.

1 Comment

strip_tags() doesn't work (as noted by the first line of my question) because PHP doesn't recognize the tags as HTML due to the formating. That was my first thought as well.
-2

Try this out and let me know.

<?php
$text = '< p > blah blah blah < / p > < a href= " link.html " > blah blah blah< /a >';
echo strip_tags($text);
echo "\n";
echo strip_tags($text, '<p><a>');
?> 

3 Comments

strip_tags() doesn't work (as noted by the first line of my question) because PHP doesn't recognize the tags as HTML. That was my first thought as well.
Did you add that later? I totally missed out on that...Did you try using preg_replace?
nope, the post hasn't been edited at all. I was asking about the regex I could use. chaos' answer is most likely the one I'll end up using, but if I could use tidy html to clean up the code then use strip_tags that would fine, but I can't find a function in tidy html that does what I need; hence why I haven't checked chaos' answer. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.