Update substrings using lookup table and replace function

Question

Here's my setup:

Table 1 (table_with_info): Contains a list of varchars with substrings that I'd like to replace.

Table 2 (sub_info): Contains two columns: the substring in table_with_info that I'd like to replace and the string I'd like to replace it with.

What I'd like to do is replace all the substrings in table_with_info with their substitutions in sub_info.

This works to a point but the issue is that select replace(...) returns a new row for each one of the substituted words replaced and doesn't replace all of the ones in an individual row.

I'm explaining the best I can but I don't know if it's too clear. Here's the code an example of what's happening/what I'd like to happen.

Here's my code:

create table table_with_info
(
    val varchar
);

insert into table_with_info values
  ('this this is test data');

create table sub_info
(
    word_from varchar,
    word_to varchar
);

insert into sub_info values
  ('this','replace1')
, ('test', 'replace2');

update table_with_info set val = (select replace("val", "word_from", "word_to")
from "table_with_info", "sub_info"

the update() function doesn't work as select() returns two rows:

Row 1: replace1 replace1 is test data
Row 2: this this is replace2 data

so what I'd like for it for the select statement to return is:

Row 1: replace1 replace1 is test data

Any thoughts? I can't create UDFs on the system I'm running.

You are referencing the columns with " around them, are you aware of that? isn't that getting them treated as strings rather than fields? — Noam Rathaus
– Noam Rathaus, Commented Nov 22, 2013 at 19:10
Are you sure you don't want the select statement to return replace1 replace1 is replace2 data? — Jonathan Ruffin
– Jonathan Ruffin, Commented Nov 22, 2013 at 19:51
@nrathaus: Standard SQL and PostgreSQL use double quotes around identifiers that contain spaces, are case sensitive, ... MySQL uses back-ticks for this purpose, various MS products use brackets. Strings literals use single quotes. — mu is too short
– mu is too short, Commented Nov 22, 2013 at 21:38

mu is too short · Accepted Answer · 2013-11-22 21:42:18Z

Your UPDATE statement is incorrect in multiple ways. Consult the manual before you try to run anything like this again. You introduce two cross joins that would make this statement extremely expensive, besides yielding nonsense.

To do this properly, you need to administer each UPDATE sequentially. In a single statement, one row version eliminates the other, while each replace would use the same original row version. You can use a DO statement for this or wrap it in a plpgsql function for instance:

DO
$do$
DECLARE
   r sub_info;
BEGIN

FOR r IN
   TABLE sub_info
   -- SELECT * FROM sub_info ORDER BY ??? -- order is relevant
LOOP
   UPDATE table_with_info
   SET    val = replace(val, r.word_from, r.word_to) 
   WHERE  val LIKE ('%' || r.word_from || '%'); -- avoid empty updates
END LOOP;

END
$do$;

Be aware, that the order in which updates are applied can make a difference! If the first update creates a string where the second matches (but not otherwise) ..
So, order your columns in sub_info if that can be relevant.
Avoid empty updates. Without the additional WHERE clause, you would write many new row versions without changing anything. Expensive and useless.
double-quotes are optional for legal, lower-case names.

->SQLfiddle

Denis de Bernardy · Accepted Answer · 2013-11-22 20:55:47Z

2

Expanding on Erwin's answer, a do block with dynamic SQL can do the trick as well:

do $$
declare
  rec record;
  repl text;
begin
  repl := 'val'; -- quote_ident() this if needed
  for rec in select word_from, word_to from sub_info
  loop
    repl := 'replace(' || repl || ', '
                       || quote_literal(rec.word_from) || ', '
                       || quote_literal(rec.word_to) || ')';
  end loop;
  -- now do them all in a single query
  execute 'update ' || 'table_with_info'::regclass || ' set val = ' || repl;
end;
$$ language plpgsql;

Optionally, build a like parameter in a similar way to avoid updating rows needlessly.

edited Nov 22, 2013 at 20:55

answered Nov 22, 2013 at 20:14

Denis de Bernardy

79.1k14 gold badges138 silver badges158 bronze badges

5 Comments

Erwin Brandstetter Over a year ago

I think I see where this is going. Instead of writing multiple row versions, you take the value and run all replacements inside the plpgsql, thereby saving some intermediate row versions. Just that your current version doesn't work. See fiddle: sqlfiddle.com/#!15/346cd/5. And the current design would only work for a single row.

Erwin Brandstetter Over a year ago

If I went all-out, I would write a function to create-or-replace a multi-replace-function (to administer all replacements sequentially), and then call that function in a single UPDATE. Or persist the multi-replace-function and rewrite it with a trigger on the sub_info table, if changes on the sub_info table are rarer than function calls.

Denis de Bernardy Over a year ago

Works now (typos fixed). And yeah, that's yet another approach. :-)

Denis de Bernardy Over a year ago

One thing that neither of our approaches solves, though, is if a rec.word_from is in rec.word_to — the results can then be inconsistent depending on the order in which the replacements occur. @ErwinBrandstetter

Erwin Brandstetter Over a year ago

+1 for the working version. As for the order: I did address that in depth in my answer ...

Collectives™ on Stack Overflow

Update substrings using lookup table and replace function

2 Answers 2

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related