How to automatically append "http://" before links before saving them to a database?

Question

I'm developing a PHP-based web-application in which you have a form with textarea inputs that can accept links via anchor tags. But when I tested it after adding a hyperlink as follows, it pointed to a non-existent local subdirectory:
<a href="www.link.com">link</a>
I realized that this was because I had not appended http:// before the link.

there might be cases where a user might input the link just as I did above. In such cases I don't want the link to be pointing as it did above. is there any possible solution, such as automatically appending http:// before the link in case that it doesn't exist? How do I do that?
----------------------------------------Edit---------------------------------------------
Please consider that the anchor tags are amidst other plaintext and this is making things harder to work with.

If you're only interested in links contained within A tags then this might actually make your life easier, at least as far as detection goes. You can use the DOMDocument extension (which has been part of PHP by default for a while) to grab the A tags and examine their attributes, including href. The normalisation process is still going to be problematic though. — GordonM
– GordonM, Commented Feb 17, 2011 at 8:10
@gordon Can you please brief it a bit further? That would be a great help. Thanks. :) — ikartik90
– ikartik90, Commented Feb 17, 2011 at 15:37
I've not made much use of the domdocument extension, but it would involve using uk3.php.net/manual/en/domdocument.getelementsbytagname.php to grab all the A tags. I'm afraid the rest is up to you. — GordonM
– GordonM, Commented Feb 18, 2011 at 7:20

lonesomeday · Accepted Answer · 2011-02-14 11:11:18Z

5

I'd go for something like this:

if (!parse_url($url, PHP_URL_SCHEME)) {
    $url = 'http://' . $url;
}

This is an easy and stable way to check for the presence of a protocol in a URL, and allows others (e.g. ftp, https) that may be entered.

answered Feb 14, 2011 at 11:11

lonesomeday

239k54 gold badges330 silver badges329 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ikartik90 Over a year ago

The anchor tag will be amidst other text in the text area. Where and how shall I use the code?

GordonM · Accepted Answer · 2011-02-14 11:17:58Z

1

What you're talking about involves two steps, URL detection and URL normalization. First you'll have to detect all the URLs in the string being parsed and store them in a data structure for further processing, such as an array. Then you need to iterate over the array and normalize each URL in turn, before attempting to store them.

Unfortunately, both detection and normalization can be problematic, as a URL has a quite complicated structure. http://www.regexguru.com/2008/11/detecting-urls-in-a-block-of-text/ makes some suggestions, but as the page itself says, no regex URL detection is ever perfect.

There are examples of regular expressions that can detect URLs available from various sites, but in my experience none of them are completely reliable.

As for normalization, Wikipedia has an article on the subject which may be a good starting point. http://en.wikipedia.org/wiki/URL_normalization

answered Feb 14, 2011 at 11:17

GordonM

31.9k17 gold badges94 silver badges134 bronze badges

2 Comments

ikartik90 Over a year ago

if none of the URL Optimization techniques are perfect in themselves then what am I expected to do?

GordonM Over a year ago

You have two choices, either be a lot stricter about what you'll recognise as a valid URL (reduces usability) or try to do what you're doing and attempt to munge invalid urls into something that work (increasing the risk of invalid data in your system). Neither approach is adeal but I would tend to favour the former, as it's less of a risk regarding the data that might end up in your system.

Collectives™ on Stack Overflow

How to automatically append "http://" before links before saving them to a database?

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related