0

I know this has been asked before, but I'm not a coder, and cannot figure it out from other similar posts. I've spent over 5 hours trying to figure this out with great unsuccess :( So I ask for your help.

1) Prevent Duplicates

I have a PHP script that writes to DB. Here is the code:

$sql = "INSERT INTO results (total, size, persq, strip, material, region) 
VALUES ('$total', '$size', '$persq', '$strip', '$material', '$region')";

I want to prevent duplicate rows based on TOTAL and SIZE columns. So if a new entry matches value in TOTAL and SIZE, do not enter new row.

2) Delete Duplicates

I want to delete ALL existing douplicate rows from DB, also based on TOTAL and SIZE columns.

If row contains duplicates in both TOTAL and SIZE, delete entire row.

How do I do this?

PS - I've read that I can use SQL IGNORE command to prevent futue duplicates - example (i've tryed to structure it to work for my situation:

INSERT IGNORE INTO results ...;

would something like this work? If so please help me structure it (i'm new to PHP and MySQL).

Big thanks in advance.

6
  • 2
    Apply Unique key to (total,size) will prevent future duplicates. Commented Dec 15, 2014 at 17:52
  • I think INSERT IGNORE will only check if the primary key already exists. I don't know if it checks every unique constraint. But you can just try to insert the record and check whether it works. Anyway, Bhavik's advice of adding a unique key is the right starting point (after removing the duplicates, that is) Commented Dec 15, 2014 at 18:02
  • @GolezTrol If I start from scratch (drop existing table) and set TOTAL and SIZE to be int NOT NULL UNIQUE and use use INSERT IGNORE will that avoid douplicates? If so, I can just strt from scratch (data in DB is not essential, as it will repopulate fast anyway) Commented Dec 15, 2014 at 18:28
  • 1
    I think Size and Total should not each be unique but only the combination of them. Commented Dec 15, 2014 at 18:56
  • @GozelTrol - after some testing, I figured that I do only need TOTAL to be unique. Also, I ended up dropping th whole table, and recreating it, making total UNIQUE VARCHAR (not INT). I dropped table becuase GROUPING realle messed up how results display (logically) ... anyway - thank you for your help. Now my DB is easier to use and more accurate and is not populated with duplicates ... although all data is gone :D Commented Dec 15, 2014 at 19:25

2 Answers 2

2

I think the easiest way to remove the duplicates is to use a CTAS (Create Table As Select) statement to create a temporary table for your data. Using group by, you can remove the duplicates. MySQL is 'smart' enough to just pick any value for the other fields from one of the rows that match the group.

/* De-duplicate and copy all the data to a temporary table. */
CREATE TABLE Temp AS
  SELECT * FROM results
  GROUP BY total, size;

/* Delete all data from your current table. Truncate is faster but more dangerous. */
DELETE FROM results; /* TRUNCATE results; */

/* Insert the de-duplicated data back into your table. */
INSERT INTO results
SELECT * FROM Temp;

/* Drop the temporary table. */
DROP TABLE Temp;

After that you can add a unique constraint for total,size to prevent new duplicates.

ALTER TABLE results 
  ADD UNIQUE results_uni_total_size (total, size);
Sign up to request clarification or add additional context in comments.

4 Comments

how do I add unique constraint for Total, Size?
I just added that for completeness, although I think a developer should be able to find such information once given concrete terms like 'add', 'unique' and 'constraint'. ;)
thanks for your time - I really appreciate it. I'm not a developer - I'm a roofer with CS background. It's just EASIER (not cheaper) to figure this out myself than to hire a developer.
Kudos for trying yourself. :)
0

If you have duplicate rows where EVERY column has a duplicate value, the easiest thing to do is crate a new table and import all of the rows using a group by on every column. First create a new table with each column set as a unique key:

CREATE TABLE newresults total INT NOT NULL, size ...
UNIQUE KEY (total, size, presq, strip, material, region)

Then push clean values into the new table:

INSERT INTO newresults (total, size, persq, strip, material, region) SELECT total, size, persq, strip, material, region FROM RESULTS GROUP BY total, size, persq, strip, material, region

That will give you a clean data set. The final thing you'll have to do is drop the old table and rename newresults to results:

DROP TABLE results;
RENAME TABLE mydatabase.newresults TO mydatabase.results

Hope that helps...

3 Comments

That is the easiest solution if you don't have any foreign key constraints indexes or other things related to the original table.
@Michael Bissell - I did not need every field to be unique. Originally I wanted TOTAL and SIZE to be UNIQUE, but ended up making only total column unique. Thanks for your input
By grouping them this becomes a key -- UNIQUE (col1, col2) means that you can have values "1, 2" as one key and "1, 3" as another -- col 1 can have duplicate values and col 2 can have duplicate values, but you can't have two rows with the same values in col1 and col2.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.