MySQL query - replace any and all URL's with specific URL in table

Question

I have a table with a bunch of text and URL's. I need to have any string (URL) that has the format http://anything.anything.anything and replace it with http://www.google.com so any and all URL's get replaced with the latter.

I have tried online but I could not find anything. This should be a mysql command that I can run at certain times to replace any and all URL's.

You are looking for something that is part of an update statement? or something that is part of a select statement? — user289086
– user289086, Commented Dec 13, 2013 at 0:36

Community · Accepted Answer · 2017-05-23 11:49:45Z

2

You want to do a Regular Expression replace, which is not a MySQL native function although there have been various questions about this before.

How to do a regular expression replace in MySQL?

If you're not limited to using just MySQL, a relatively clean approach would be to write a bash script that pulls a list of all text fields with URL's in your table along with sufficient key information, do the RegEx in bash (perhaps via Perl or some other script language, if necessary), and then update back to MySQL.

edited May 23, 2017 at 11:49

CommunityBot

11 silver badge

answered Dec 13, 2013 at 0:31

DougM

2,88819 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user289086 · Accepted Answer · 2013-12-13 01:59:32Z

There are things that you can do. Realize that this is far from optimal - I can heard the DBAs out there screaming at the prospect of doing an rlike within a join condition.

All of this is in sqlfiddle: http://sqlfiddle.com/#!2/a3ea9/1/0

Consider the tables and data of:

create table data (
      url varchar(255)
  );

create table repls (
    regex varchar(255),
    repl varchar(255)
  );

insert into data values
   ("http://example.com"),
   ("http://google.com"),
   ("http://stackoverflow.com"),
   ("http://space.stackexchange.com");

insert into repls values
  ('.*ex.*', 'http://www.yahoo.com'),
  ('.*google.*', 'http://www.bing.com'),
  ('.*stack.*', 'http://programmers.stackexchange.com');

You've got a table of urls, and a table of regexes and the replacements.

Then, with a 'simple' query in a select statement...

select
  D.url,
  R.repl
from
  data D
  join repls R on (
      D.url rlike R.regex
    )

We can get all of the urls and what they are replaced with. However, there's a gotcha that I embedded in there.

With this data and query it returns 5 rows:

url                                repl
http://www.example.com             http://www.yahoo.com
http://google.com                  http://www.bing.com
http://stackoverflow.com           http://programmers.stackexchange.com
http://space.stackexchange.com     http://www.yahoo.com
http://space.stackexchange.com     http://programmers.stackexchange.com

The first thing to note here is that this is a single replacement. It doesn't do a replacement until nothing matches anymore. This is a good thing in this case because we'd have an infinite loop with http://programmers.stackexchange.com triggering on the .*stack.* rule (it would also trigger on the .*ex.* rule too, but that's not an infinite loop because the replacement there doesn't contain ex).

Second thing of note is that something that matches two rows will be in there twice. You would need additional logic to handle precedence of rules.

Lastly, this is a wholesale replacement. There are no fancy pattern matching groups so that the additional path information on some site gets moved into the replaced string's text.

Its probably possible to do all these things, but it gets quite a bit more involved.

For doing this work, it is probably best to pull the data down and process it in a language that can handle the changes.

On the other hand, if you wanted instead to do a wholesale replacement of the data with an update statement (rather than pulling it down and reissuing the associated update statements), or if the system the data is moving to is more cumbersome and error prone than the above example, this might be an approach to investigate.

Collectives™ on Stack Overflow

MySQL query - replace any and all URL's with specific URL in table

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related