2

Here's my setup:

Table 1 (table_with_info): Contains a list of varchars with substrings that I'd like to replace.

Table 2 (sub_info): Contains two columns: the substring in table_with_info that I'd like to replace and the string I'd like to replace it with.

What I'd like to do is replace all the substrings in table_with_info with their substitutions in sub_info.

This works to a point but the issue is that select replace(...) returns a new row for each one of the substituted words replaced and doesn't replace all of the ones in an individual row.

I'm explaining the best I can but I don't know if it's too clear. Here's the code an example of what's happening/what I'd like to happen.

Here's my code:

create table table_with_info
(
    val varchar
);

insert into table_with_info values
  ('this this is test data');

create table sub_info
(
    word_from varchar,
    word_to varchar
);

insert into sub_info values
  ('this','replace1')
, ('test', 'replace2');

update table_with_info set val = (select replace("val", "word_from", "word_to")
from "table_with_info", "sub_info"

the update() function doesn't work as select() returns two rows:

Row 1: replace1 replace1 is test data
Row 2: this this is replace2 data

so what I'd like for it for the select statement to return is:

Row 1: replace1 replace1 is test data

Any thoughts? I can't create UDFs on the system I'm running.

3
  • 1
    You are referencing the columns with " around them, are you aware of that? isn't that getting them treated as strings rather than fields? Commented Nov 22, 2013 at 19:10
  • Are you sure you don't want the select statement to return replace1 replace1 is replace2 data? Commented Nov 22, 2013 at 19:51
  • 3
    @nrathaus: Standard SQL and PostgreSQL use double quotes around identifiers that contain spaces, are case sensitive, ... MySQL uses back-ticks for this purpose, various MS products use brackets. Strings literals use single quotes. Commented Nov 22, 2013 at 21:38

2 Answers 2

4

Your UPDATE statement is incorrect in multiple ways. Consult the manual before you try to run anything like this again. You introduce two cross joins that would make this statement extremely expensive, besides yielding nonsense.

To do this properly, you need to administer each UPDATE sequentially. In a single statement, one row version eliminates the other, while each replace would use the same original row version. You can use a DO statement for this or wrap it in a plpgsql function for instance:

DO
$do$
DECLARE
   r sub_info;
BEGIN

FOR r IN
   TABLE sub_info
   -- SELECT * FROM sub_info ORDER BY ??? -- order is relevant
LOOP
   UPDATE table_with_info
   SET    val = replace(val, r.word_from, r.word_to) 
   WHERE  val LIKE ('%' || r.word_from || '%'); -- avoid empty updates
END LOOP;

END
$do$;
  • Be aware, that the order in which updates are applied can make a difference! If the first update creates a string where the second matches (but not otherwise) ..
    So, order your columns in sub_info if that can be relevant.

  • Avoid empty updates. Without the additional WHERE clause, you would write many new row versions without changing anything. Expensive and useless.

  • double-quotes are optional for legal, lower-case names.

->SQLfiddle

Sign up to request clarification or add additional context in comments.

Comments

2

Expanding on Erwin's answer, a do block with dynamic SQL can do the trick as well:

do $$
declare
  rec record;
  repl text;
begin
  repl := 'val'; -- quote_ident() this if needed
  for rec in select word_from, word_to from sub_info
  loop
    repl := 'replace(' || repl || ', '
                       || quote_literal(rec.word_from) || ', '
                       || quote_literal(rec.word_to) || ')';
  end loop;
  -- now do them all in a single query
  execute 'update ' || 'table_with_info'::regclass || ' set val = ' || repl;
end;
$$ language plpgsql;

Optionally, build a like parameter in a similar way to avoid updating rows needlessly.

5 Comments

I think I see where this is going. Instead of writing multiple row versions, you take the value and run all replacements inside the plpgsql, thereby saving some intermediate row versions. Just that your current version doesn't work. See fiddle: sqlfiddle.com/#!15/346cd/5. And the current design would only work for a single row.
If I went all-out, I would write a function to create-or-replace a multi-replace-function (to administer all replacements sequentially), and then call that function in a single UPDATE. Or persist the multi-replace-function and rewrite it with a trigger on the sub_info table, if changes on the sub_info table are rarer than function calls.
Works now (typos fixed). And yeah, that's yet another approach. :-)
One thing that neither of our approaches solves, though, is if a rec.word_from is in rec.word_to — the results can then be inconsistent depending on the order in which the replacements occur. @ErwinBrandstetter
+1 for the working version. As for the order: I did address that in depth in my answer ...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.