2

Im trying to make a stable system that will allow users to paste any mixture of BB / Html code into an input and i will sanitize and strip the data AS I WANT.

The content is copied from forums and the issue is that they all seems to use different code. Some display more than one
some use a self closing br tag. Others use a [URL =] And other just use [URL]URL[/URL] etc.

So far, I use HTMLpurifier to strip everything except for img tags.

HTMLpurifier doesnt (as far as i can see) remove BBCode. So, given a string like so:

[URL=http://awebsite.com]My Link [IMG]imagelink.png[/IMG][/URL]

How can i remove the URL tags and just leave the IMG tags.

I want to remove all the URL tag options so the url given and the text as well which may prove difficult.

So far i have got quite far by converting [IMG] tags etc using REGEX which works but i feel there are too many variants to hardcode this.

Any suggestions on a more efficient way / possible way to remove the URL tags?

0

2 Answers 2

1

Option 1

If you just want to remove tags such as [URL=http://awebsite.com] and [/URL], leaving the content inside, the regex is simple:

Search: \[/?URL[^\]]*\]

Replace: Empty string

In JavaScript

replaced = string.replace(/\[\/?URL[^\]]*\]/g, "");

In PHP

$replaced = preg_replace('%\[/?URL[^\]]*\]%', '', $str);

Option 2: Also Removing content such as MyLink

Here, we'll replace the content following [URL...] that is not another tag.

Search: \[URL[^\]]*\][^\[\]]*|\[/URL[^\]]*\]

Replace: Empty string

JavaScript:

replaced = string.replace(/\[URL[^\]]*\][^\[\]]*|\[\/URL[^\]]*\]/g, "");

PHP:

$replaced = preg_replace('%\[URL[^\]]*\][^\[\]]*|\[/URL[^\]]*\]%', '', $str);
Sign up to request clarification or add additional context in comments.

4 Comments

amazing! Work perfectly! Didnt think it would be possible but a bit of regex magic and it works. Really need to have a good learn of creating regex. Last question @zx81 is there a way to make the URL part in the regex non case sensitive? sometimes people use lower case url tags. I could use two different preg_replaces, One with lower case and the other upper but that seems silly if theres a way to do it non case sensitive. Thanks!
Really need to have a good learn of creating regex. Well if you're starting to study more regex and are interested in collecting cool techniques, maybe you'd like to look at this question about a very common problem, (matching... except) or save it for later. I had a lot of fun writing the answer. :)
Ill take a look! So much in my learning list bookmarks right now :) Have upvoted too!
Thanks, see you next time. :)
0

A solution could be to extract only IMG tags using regex:

$pattern ="#\[IMG\](https?://[-\w\.]+(:\d+)?/[\w/_\.]*(\?\S+?)?)?\[\/IMG\]#";
$str = "[URL=http://awebsite.com]My Link [IMG]http://google.com/imagelink.png[/IMG][/URL]";
preg_match($pattern, $str, $matches);
print_r($matches);

Result:

Array
(
    [0] => [IMG]http://google.com/imagelink.png[/IMG]
    [1] => http://google.com/imagelink.png
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.