1

I have a table t with columns id(primary key),a,b,c,d. assume that the columns id,a,b and c are already populated. I want to set column d =md5(concat(b,c)). Now the issue is that this table contains millions of records and the unique combination of b and c is only a few thousands. I want to save the time required for computing md5 of same values. Is there a way in which I can update multiple rows of this table with the same value without computing the md5 again, something like this:

update t set d=md5(concat(b,c)) group by b,c;

As group by does not work with update statement.

3
  • please provide table ddl and what a,b,c contains. Is a unique/primary key? Commented Jun 5, 2015 at 9:39
  • Assume another primary key column is also present. a,b,c are all INT(11). Commented Jun 5, 2015 at 9:43
  • 1
    I figured out the actual cause for the large amount of time taken for the query. Since the column d was indexed, and this table was populated initially, updating the column d resulted in the index being recreated which was taking a majority of time in the query. Commented Jun 10, 2015 at 9:20

2 Answers 2

1

One method is a join:

update t join
       (select md5(concat(b, c)) as val
        from table t
        group by b, c
       ) tt
       on t.b = tt.b and t.c = tt.c
     set d = val;

However, it is quite possible that any working with the data would take longer than the md5() function, so doing the update directly could be feasible.

EDIT:

Actually, updating the entire table is likely to take time, just for the updates and logging. I would suggest that you create another table entirely for the b/c/d values and join in the values when you need them.

Sign up to request clarification or add additional context in comments.

2 Comments

Good solution. But as you said, I have tried this and the joins are even more expensive as the columns are not indexed.
@Abhay . . . You should normalize the data structure as a suggest in the edit. Your problem may not be the MD5 function but simply the volume of updates. A properly structured second table should be much faster.
1

Create a temp table:

CREATE TEMPORARY TABLE IF NOT EXISTS tmpTable 
AS (SELECT b, c, md5(concat(b, c)) as d FROM t group by b, c)

Update initial table:

UPDATE t orig 
JOIN tmpTable tmp ON orig.b = tmp.b AND orig.c = tmp.c
SET orig.d = tmp.d

Drop the temp table:

DROP TABLE tmpTable 

1 Comment

As I already commented on other answer, I have tried this and the joins are even more expensive as the columns are not indexed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.