0

I am trying to update a database table based on a certain condition. Here is the sample table.

  fname mname lname
 1   RONALD D VALE
 2   RONALD  VALE
 3   JACK A SMITH
 4   JACK B SMITH
 5   JACK  SMITH

I would like to update the middle names column if the first and last names match. In this example, I would expect the following output.

  fname mname lname
 1   RONALD D VALE
 2   RONALD D VALE
 3   JACK A SMITH
 4   JACK B SMITH
 5   JACK  SMITH

I am not clear as to how to go about doing this. Any suggestions/ideas...

EDIT

Please note that I also do not want to update the table if there are two different middle initials.

I am trying to make the data consistent. There are some missing values in the data. So the main aim is to identify and merge multiple entries which are possibly similar. At the same time, we do not want to introduce erroneous data into the table. The data shown here consists of only few columns of the entire table. There are other attributes which make the tuples unique.

5
  • Given that a tuple should be unique ... why do you want to destroy the integrity of your data? Restated why is row 2 equal to row 1 and therefore, why are there two rows for the same data? I suspect there is a design issue you need to solve. Commented Feb 24, 2015 at 23:47
  • @kjtl Please see my edit above Commented Feb 24, 2015 at 23:57
  • I'd do this in 3 passes. Commented Feb 25, 2015 at 6:52
  • @kjtl Could you please help in elaborating and understanding the 3 passes? Commented Feb 25, 2015 at 16:22
  • see the answer below ... there are 3 subqueries for the 3 passes to get the middle names to update with.. There is still however something wrong with the design of the database otherwise you would not be trying to maintain data integrity across different tuples. Commented Mar 1, 2015 at 0:34

3 Answers 3

1

This is one possible answer for first one.

UPDATE table t JOIN
  ( SELECT fname, mname, lname, count(*) as qty
    FROM table
    GROUP BY fname, lname
    HAVING qty > 1
) sub
ON t.fname = sub.fname AND t.lname = sub.lname
SET t.mname = sub.mname
WHERE t.mname = '' and sub.qty = 2
;

UPDATE

Should not use CASE WHEN, Should use IF statements. It handles RONALD VALE records.

UPDATE table t JOIN
  ( SELECT fname, mname, min(mname) minname, max(mname) mxname, lname, count(*) as qty
    FROM table 
    GROUP BY fname, lname
    HAVING qty > 1
) sub
ON t.fname = sub.fname AND t.lname = sub.lname
SET t.mname = IF(sub.qty = 2, sub.mname, IF(sub.qty > 2, sub.mxname, NULL))
WHERE t.mname is NULL OR LEFT(t.mname,1) = LEFT(sub.mxname, 1)
AND t.mname <> sub.mxname
;

UPDATE 2

# Update 1    
UPDATE table t JOIN
         ( SELECT fname, mname, min(mname) minname, max(mname) mxname, lname, count(*) as qty
           FROM table
           GROUP BY fname, lname
           HAVING qty > 1    ) sub    ON t.fname = sub.fname AND t.lname = sub.lname    SET t.mname = IF(sub.qty = 2, sub.mxname, IF(sub.qty > 2 AND minname = mxname, sub.mxname,  NULL))    WHERE t.mname is NULL #OR LEFT(t.mname,1) = LEFT(sub.mxname, 1);
# Update 2    
UPDATE table t JOIN
         ( SELECT fname, mname, min(mname) minname, max(mname) mxname, lname, count(*) as qty
           FROM table
           GROUP BY fname, lname
           HAVING qty > 1    ) sub    ON t.fname = sub.fname AND t.lname = sub.lname    SET t.mname = IF(sub.qty = 2, sub.mxname, IF(sub.qty > 2, sub.mxname,  NULL))    WHERE LEFT(t.mname,1) = LEFT(sub.mxname, 1)    AND t.mname <> sub.mxname # reduce unnecessary tasks;

before

         DANIEL J   ABADI
         DANIEL     ABADI
         DANIEL     ABADI
         DANIEL     ABADI
         ROBERT     ABADI
         ROBERT K   ABADI
         AMEY   S   BAILEY
         AMEY   SCHENCK BAILEY
         KARL   K   KWON
         KARL       KWON
         DINESH     MAJETI
         ADAM   M   SMITH
         ADAM   B   SMITH
         ADAM   C   SMITH
         ADAM       SMITH
         ADAM       SMITH
         JACK   A   SMITH
         JACK   B   SMITH
         JACK       SMITH
         RONALD A   VALE
         RONALD D   VALE
         RONALD DAVID   VALE
         RONALD     VALE

after

         DANIEL J   ABADI
         DANIEL J   ABADI
         DANIEL J   ABADI
         DANIEL J   ABADI
         DANIEL J   ABADI
         ROBERT K   ABADI
         ROBERT K   ABADI
         AMEY   SCHENCK BAILEY
         AMEY   SCHENCK BAILEY
         KARL   K   KWON
         KARL   K   KWON
         DINESH     MAJETI
         ADAM   M   SMITH
         ADAM   B   SMITH
         ADAM   C   SMITH
         ADAM       SMITH
         ADAM       SMITH
         JACK   A   SMITH
         JACK   B   SMITH
         JACK       SMITH
         RONALD A   VALE
         RONALD DAVID   VALE
         RONALD DAVID   VALE
         RONALD     VALE
Sign up to request clarification or add additional context in comments.

2 Comments

AMEY S BAILEY AMEY SCHENCK BAILEY to AMEY SCHENCK BAILEY AMEY SCHENCK BAILEY breaks the specification.
There is also something wrong with the design of the database ... see en.wikipedia.org/wiki/Fourth_normal_form ... the name should not be duplicated in the table - regardless of the other columns.
1

Use a subselect to make a "clone" of the table and update the middle name, joining on first and last name.

UPDATE names JOIN 
  (SELECT fname, mname, lname FROM names WHERE mname IS NOT NULL
     GROUP BY fname,mname,lname
     HAVING COUNT(*) = 1) AS clone 
ON clone.fname = names.fname AND clone.lname=names.lname
SET names.mname = clone.mname;

2 Comments

You should have "ORDER BY mname DESC" in a sub query. If not, all middle name will be null.
Thanks for the response but it does not handle the negative case. Please see my edit above.
0
/* create the table */
CREATE TABLE if not exists  `duplicated_names` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `first_name` varchar(50) DEFAULT NULL,
  `middle_name` varchar(50) DEFAULT NULL,
  `last_name` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `first_name` (`first_name`),
  KEY `middle_name` (`middle_name`),
  KEY `last_name` (`last_name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

/* drop old values if any */
truncate duplicated_names ;

/* set up up data for example */
insert into duplicated_names ( first_name, middle_name, last_name ) values ( 'Ronald', 'D', 'Vale') ;
insert into duplicated_names ( first_name, middle_name, last_name ) values ( 'Ronald', 'D', 'Vale') ;
insert into duplicated_names ( first_name, middle_name, last_name ) values ( 'Ronald', '', 'Vale') ;
insert into duplicated_names ( first_name, middle_name, last_name ) values ( 'Jack', 'A', 'Smith') ;
insert into duplicated_names ( first_name, middle_name, last_name ) values ( 'Jack', 'B', 'Smith') ;
insert into duplicated_names ( first_name, middle_name, last_name ) values ( 'Jack', '', 'Smith') ;

update duplicated_names
join (
    /* find the middle names */
    select duplicated_names.id
    , duplicated_names.first_name
    , duplicated_names.middle_name
    , duplicated_names.last_name 
    from duplicated_names
    inner join (
        /* find first_name and last_name that have only one middle name */
        select count(*) sum, first_name, last_name from (
            /* find candidate middle name donors who have middle names */
            select count(*) sum, first_name, middle_name, last_name
            from duplicated_names
            where middle_name <> ''
            group by first_name, middle_name, last_name
        ) candidate_middle_name_donors
        group by first_name, last_name
        having count(*) = 1
    ) names_with_one_middle_name
    on names_with_one_middle_name.first_name = duplicated_names.first_name
    and names_with_one_middle_name.last_name = duplicated_names.last_name
    and duplicated_names.middle_name <> ''
) middle_names
on duplicated_names.first_name = middle_names.first_name
and duplicated_names.last_name = middle_names.last_name
set duplicated_names.middle_name = middle_names.middle_name ;

select * from duplicated_names ;

/* 

results 

id  first_name  middle_name last_name
1   Ronald  D   Vale
2   Ronald  D   Vale
3   Ronald  D   Vale
4   Jack    A   Smith
5   Jack    B   Smith
6   Jack        Smith

*/

2 Comments

What if "Ronald David Vale" exists? It doesn't work. If it has, records are supposed to be updated as "Ronald David Vale". @kjtl
The specification is there is one consistent middle name and one blank middle name, make the blank middle name the same as the consistent middle name. If there are two middle names that are different for the same first and last name - no updates.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.