2

I need to update table in my database. For sake of simplicity lets assume that table's name is tab and it has 2 columns: id (PRIMARY KEY, NOT NULL) and col (UNIQUE VARCHAR(300)). I need to update table this way:

id                    col
----------------------------------------------------
1                     'One two three'
2                     'One twothree'
3                     'One two       three'
4                     'Remove white spaces'
5                     'Something'
6                     'Remove whitespaces '

to:

id                    col
----------------------------------------------------
1                     'Onetwothree'
2                     'Removewhitespaces'
3                     'Something'

Id numbers and order of the rows after update is not important and can be different. I use PostgreSQL. Some of the columns are FOREIGN KEYs. That's why dropping UNIQUE constraint from col would be troublesome.

3
  • Actually, you want to remove the duplicates that would be created after removing whitespaces, right? Commented Jul 18, 2013 at 15:50
  • How did you get foreign key on non primary key column? Commented Jul 18, 2013 at 16:15
  • @DavidLevel: It's defined UNIQUE, that is sufficient for a FK constraint referencing col. For an FK constraint pointing the other way, you don't need either. Unfortunately, the Q is unclear about the direction of the FK constraint. Commented Jul 18, 2013 at 17:58

3 Answers 3

2

I think just using replace in this format will do what you want.

update tab
set col = replace(col, ' ', '');

Here's a SQLFiddle for it.

Sign up to request clarification or add additional context in comments.

5 Comments

replace function is not global, it will just remove the first whitespace isn't it? But good link I didn't know it
In postgres documnentation it says: Replace all occurrences in string of substring from with substring to .. postgresql.org/docs/9.1/static/functions-string.html
Yeah it worked but if it's something different than whitespace he will have a lot of lines to write, isn't it? I tell you that because I already tried it in a script before.
SELECT regexp_replace(myfield, '[ \t\n\r]*', '', 'g') FROM mytable; This will works I guess. I deleted my answer because I forgot about the unique constraint on col
David - that's exactly the same what I tried to do. But you forget that col is UNIQUE and it would NOT work because after SELECT regexp_replace(myfield, '[ \t\n\r]*', '', 'g') you got UNIQUE constraint error.
1

You shouldn't be using the non-descriptive column name id, even if some half-wit ORMs are in the habit of doing that. I use tab_id instead for this demo.

I interpret your description this way: You have other tables with FK columns pointing to tab.col. Like table child1 in my example below.

To clean up the mess, do all of this in a single session to preserve the temporary table I use. Better yet, do it all in a single transaction.

  1. Update all referencing tables to have all referencing rows point to the "first" (unambiguously! - how ever you define that) in a set of going-to-be duplicates in tab.

    Create a translation table up to be used for all updates:

    CREATE TEMP TABLE up AS
    WITH t AS (
        SELECT tab_id, col, replace(col, ' ', '') AS col1
             ,row_number() OVER (PARTITION BY replace(col, ' ', '')
                                 ORDER BY  tab_id) AS rn
        FROM   tab
        )
    SELECT b.col AS old_col, a.col AS new_col
    FROM  (SELECT * FROM t WHERE rn = 1) a
    JOIN  (SELECT * FROM t WHERE rn > 1) b USING (col1);
    

    Then update all your referencing tables.

    UPDATE child1 c
    SET    col = up.new_col
    FROM   up
    WHERE  c.col = up.old_col;
    
    --  more tables?   
    

    -> SQLfiddle

    Now, all references point to the "first" in a group of dupes, and you have got your license to kill the rest.

  2. Remove duplicate rows except the first from tab.

    DELETE FROM tab t
    USING  up
    WHERE  t.col = up.old_col
    
  3. Be sure that all referencing FK constraints have the ON UPDATE CASCADE clause.

    ALTER TABLE child1 DROP CONSTRAINT child1_col_fkey;
    
    ALTER TABLE child1  ADD CONSTRAINT child1_col_fkey FOREIGN KEY (col)
    REFERENCES tab (col)
    ON UPDATE CASCADE;
    
    -- more tables?
    
  4. Sanitize your values by removing white space

    UPDATE tab
    SET    col = replace(col, ' ', '');
    

    This only takes care of good old space characters (ASCII value 32, Unicode U+0020). Do you have others?

All FK constraints should be pointing to tab.tab_id to begin with. Your tables would be smaller and faster and all of this would be easier.

2 Comments

The scenario may not match the OP's but I find it important that you've stated it clearly and it makes sense (to me anyway), as does the approach to solving the issue. On a secondary note in your answer, why do you say that naming a column simply as id is something that the OP shouldn't do rather than something that you yourself wouldn't do? I mean, isn't that a matter of personal preference (or of corporate policy at most)? To me, for instance, customer.id would be just as clear as customer.customer_id. The latter may be somewhat beneficial down the road, but so may the former.
@AndriyM: Well, to some degree it's always a matter of opinion, of course. But using non-descriptive names like id is almost always bad style. When you join a couple of tables in a query - which is what you do a lot in relational databases - you end up with multiple columns all named id. That's not helpful and leads to mistakes. I have dealt with a few of those cases here on SO.
0

I solved it much easier then Erwin. I don't have SQL on my computer to test it but something like that worked for me:

DELETE FROM tab WHERE id IN (
    SELECT id FROM (
        SELECT id, col, row_number() OVER (PARTITION BY regexp_replace(col, '[ \t\n]*', '')) AS c WHERE c > 1;
    )
)

UPDATE tab SET col = regexp_replace(col, '[ \t\n]*', '');

1 Comment

How could it work if you were unable even to test it? What exactly did you mean by worked for me?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.