1

I'm sure someone has covered this before, but I didn't find it in a quick search of the site. Right now I'm trying to filter some input from a WYSIWYG, so that it will remove characters like: ¢©÷µ·¶±€£®§™¥ but keep HTML characters. I've tried htmlentities and htmlspecialcharacters, but that still seems to leave those characters in tact. Any methods already present, or anybody have a good regex that would handle this? Thanks!

0

4 Answers 4

1

If you are using PHP > 5.2.0 Filter could be helpful.

Sign up to request clarification or add additional context in comments.

1 Comment

Awesome, filter worked great. I'm still tinkering with the options, so maybe I can avoid string replacement. However, this is what I'm doing now: $ret = str_replace(" ", " ", $_POST[$varname]); $ret = str_replace("/", "&aslash;", $ret); $ret = filter_var($ret, FILTER_SANITIZE_URL); $ret = str_replace(" "," ", $ret); $ret = str_replace("&aslash;", "/", $ret);
0

that regex should work:

$text = preg_replace('/[¢©÷µ·¶±€£®§™¥]*/', '', $text);

you could also replace the items like this:

$bad = array('©','®'); $good = array('©', '®');

$text = preg_replace($bad, $good, $text);

4 Comments

Any way that I wouldn't have to specify each character I want removed, instead just replacing what's not a regular character or html tag/entity?
I would use htmlspecialchars() or strip_tags() first. If you want to replace the bad characters with nothing just use the first regex.
regex for this? are you serious?
Wait -- you want to keep HTML entities?
0

Have you tried the htmlentities() function? Try like this:

$text = htmlentities($text);

There's some other optional parameters which you can check out at http://php.net/manual/en/function.htmlentities.php . You might have to set the quote_style and charset ones, at the very least.

2 Comments

I've tried htmlentites with no luck. Here's what I tried: $ret = htmlentities($_POST[$varname], ENT_NOQUOTES, 'UTF-8', false); Still getting the weird characters, any idea if I'm messing something up there?
Oh, for some reason, I caught the fact that you tried htmlspecialcharacters() but not htmlentities(). My bad. Anyways, I'd try the ISO options listed in my link, just in case.
0

htmlentities() and htmlspecialchars() aren't going to work for you if you want to remove those characters completely, rather than just converting them to HTML entities.

EDIT

I just noticed that at one point you said you want to preserve HTML entities. If that's the case, use htmlentities()!! It will convert all those symbols into their html entity equivalent. If you echo it, you're still going to see the characters you tried to remove, but if you view the source, you'll see the &name; formatted entity instead.


You may need to use a regex for this, as sad as that is. Most PHP functions are trying to preserve those characters for you in one format or another. It's surprising that they're isn't a function to remove them, that I know of at least!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.