1

I need to update column 'sp' (varchar) using column 'a_sp' (varchar) in the same table. Both columns hold from zero/null to multiple lists of values. These values are concatenated with the '+' sign and stored with this sign at the start and end of it. I want to take/loop through each value in 'a_sp' and check if it is already existing in its respective 'sp' cell. If it is, nothing is appended to 'sp' and if not the value is appended. I would appreciate your help as I am just adjusting to DB programming from OOP. table image

UPDATE work.table
SET sp = 
    CASE
    WHEN (sp IS NULL OR sp = '') THEN a_sp --no sp val return a_sp
    WHEN (sp IS NOT NULL OR sp != '') THEN --if there exists sp val
        CASE
        WHEN sp LIKE '%'||a_sp||'%' THEN sp --if existing sport val is          similar to a_sp return sp
        ELSE sp||ltrim(a_sp, '+') --if existing sp val not similar to asp, append
        END
    ELSE sp --if both cases above not met (which is odd) return sp
    END 
    WHERE a_sp IS NOT NULL;

My Solution (Although tacky) -

--I found the function (below) which creates a distinct array of any array type at https://postgres.cz/wiki/Array_based_functions
CREATE OR REPLACE FUNCTION work.array_distinct(anyarray)
RETURNS anyarray AS $$
  SELECT ARRAY(SELECT DISTINCT unnest($1))
$$ LANGUAGE sql;

--for sp col
UPDATE work.tab
  SET sp = 
    CASE
    WHEN (sp IS NULL OR sp = '') THEN a_sp --no sp val return a_sp
    WHEN (sp IS NOT NULL OR sp != '') THEN
    --below i did quite some manipulations
    --i removed the '+' to the left of a_sp column
    --then concatenated sp and asp columns
    --i converted the concatenated result to array based on the string '+' as a divider
    --next, i used the function 'array_distinct' above to return unique values # this was quite a way to avoid looping through the lists to append
    --i converted the result above back to string with a string '+' separator
    --i finally added string '+' to both ends of my result
    replace(replace('+'||rtrim(array_to_string(array_distinct(regexp_split_to_array(replace(sp||ltrim(a_sp, '+'), '+', ' '), E'\\s+')::varchar[]), ','), ',')||'+',',', '+'), '++', '+') --if there exists sp val
    --ELSE sport --if both cases above not met (which is odd) return sp
    END
    WHERE a_sp IS NOT NULL;

1 Answer 1

1

I assume your code works and you're just looking for a better way to do it. Doing it in a single update statement as you're doing, rather than in a loop, is definitely the way to go as it will be lightyears faster than a loop.

Your code could maybe be written in a slightly better way with fewer redundant checks and updates where nothing needs to be updated:

UPDATE work.table
SET sp = CASE WHEN NULLIF(sp, '') IS NULL THEN a_sp ELSE sp || ltrim(a_sp, '+') END
WHERE a_sp IS NOT NULL
AND COALESCE(sp, '') !~ a_sp -- don't perform unnecessary updates

Your CASE statement can be simplified somewhat - if sp is NULL/empty, set it to a_sp, otherwise concat it with a_sp. And the addition to the WHERE clause filters out cases that shouldn't be updated (sp is contained in a_sp).


Edit:

sp LIKE '%'||a_sp||'%' won't work as you expect because it puts the wildcards at the end of the whole a_sp field, therefore it won't find cases where the a_sp value contains sp as just a substring. So regex would be an easy way to check, but you could also do it via the position function which is like an indexOf in other languages:

WHERE ...
AND (NULLIF(sp, '') IS NULL OR POSITION(sp IN a_sp) = 0) -- i.e. a_sp does not contain sp

Added a NULLIF to the check because POSITION('' IN 'x') is 1, but we don't want to exclude NULL/empty sp rows.


Edit 2:

I understand now what you're trying to do. I thought it was just checking that the value of sp was in a_sp, if so then ignore, if not then concat the fields. But actually each field contains delimited data, so effectively each field's data needs to be split into parts, then the distinct parts from both fields need to be recombined and written to the sp field. Which is effectively what you are doing in your solution. I will again propose a slightly different way of doing it below.

Also I see why the regex didn't work, it's due to the + symbol in the fields. Anyway it's no longer a valid way to check, given what I now understand about what you're trying to do.


Setup

CREATE TABLE t (id INTEGER, sp TEXT, a_sp TEXT);

INSERT INTO t
VALUES
(1, '+hk1+', '+rk3+hk1+xk5+'),
(2, '+hk1+hk2+', '+jk8+hk1+'),
(3, NULL, '+hk1+dk7+'),
(4, '', '+hk1+dk7+'),
(5, '+hk3+', NULL),
(6, '+hk2+', '+hk1+');

Query

UPDATE t
SET sp = (
  SELECT '+' || STRING_AGG(DISTINCT u, '+' ORDER BY u) || '+'
  FROM UNNEST(string_to_array(sp, '+', '') || string_to_array(a_sp, '+', '')) u
  WHERE u IS NOT NULL
)
WHERE a_sp IS NOT NULL;

Results (of a SELECT * FROM t after the update has run)

| id  | sp            | a_sp          |
| --- | ------------- | ------------- |
| 1   | +hk1+rk3+xk5+ | +rk3+hk1+xk5+ |
| 2   | +hk1+hk2+jk8+ | +jk8+hk1+     |
| 3   | +dk7+hk1+     | +hk1+dk7+     |
| 4   | +dk7+hk1+     | +hk1+dk7+     |
| 5   | +hk3+         |               |
| 6   | +hk1+hk2+     | +hk1+         |

View on DB Fiddle

The subquery:

  1. Operates on the row being updated
  2. Splits both fields by the + delimiter. You could also use regexp_split_to_array but I decided to go with string_to_array as I could set '' to NULL (3rd parameter).
  3. Concatenates the arrays. Now you have one array with each value from both fields, containing some duplicate values.
  4. Unnests this new combined array and aggregates the distinct values, excluding NULL values. Also orders the values for good measure. This creates a new string where the values are delimited by '+'.
  5. Adds leading and trailing '+' to the new string

And the result is written to sp.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you for the response. I tried your code and got 'ERROR: invalid regular expression: quantifier operand invalid' i am thinking it is with the operator !~. My initial code works to some extent, but i am trying to avoid concatenating values from a_sp already existing in sp. I would appreciate your response. Thanks
@bobbynaira That regex syntax should be correct, but I've edited the answer with another way of performing that check which may work better.
I adjusted the program as you said and now it runs without an error. However, already existing values in column 'sp' are still being appended from column 'asp' hence duplicates in the final result of column 'sp'.
@bobbynaira Right I understand now what you're trying to accomplish, I've updated the answer again.
Many thanks. It works perfect now and returns the expected result. Thank you so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.