MySQL avoiding data duplication with inserts

Question

I am using mysql.

Every month I upload a txt file create a table and then, after launching a query to flter the results, I want to add the new rows to a bigger table which keeps all the months of the year.

The table I create every month always maintains the same structure.

I have experimented that after launching the query, inserting the INSERT INTO table_name statement all the rows are effectively inserted in the bigger table, but the problems is if that I forget that one month I have already uploaded the data and I process it again there is no filter and the rows will be inserted a second time and I will find them duplicated.

Is there a way to avoid this?

I do not use primary keys on either tables.

Primary keys are pretty fundamental to stuff like this. You probably want to start using them. — Spudley
– Spudley, Commented Jul 11, 2013 at 20:51
Is it possible for 2 rows to be identical, but from different months? do you store timestamps with each row? Adding Unique(timeStampColumn) in your create table statement would fix it. — JustinDanielson
– JustinDanielson, Commented Jul 11, 2013 at 21:05
A table without a PRIMARY KEY or a column with a UNIQUE index is going to be trouble if these are your objectives. — tadman
– tadman, Commented Jul 11, 2013 at 21:06

exussum · Accepted Answer · 2013-07-11 20:51:26Z

1

Set up a unique index on the columns you class as being unique. and then use INSERT IGNORE instead of just insert

answered Jul 11, 2013 at 20:51

exussum

18.6k8 gold badges35 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sam Aleksov · Accepted Answer · 2013-07-11 20:59:27Z

0

I understand your problem.

Add a column on your table with the md5 value of the file.

Before uploading it check if the md5 value just calculated exists in at least one of the rows in your table, if yes don't upload it.

md5 can guarantee you a very good uniqueness.

Cheers!

answered Jul 11, 2013 at 20:59

Sam Aleksov

1,2016 silver badges12 bronze badges

7 Comments

JustinDanielson Over a year ago

This solution is like scratching your left ear by using your right arm and reaching over your head. Unique constraints or primary keys would solve the problem. The database software can prevent duplicates from being entered, there is no need to put that burden on a higher application or the dba.

Sam Aleksov Over a year ago

I do not state that the value above cannot be used as a primary key. It's what everyone uses to identify a single file!

tadman Over a year ago

MD5 is pretty trashy when SHA1 is available.

Sam Aleksov Over a year ago

Do we really want to go on with this? Are you concerned with security? SHA-2 Maybe? Or we want to help solve a problem?

JustinDanielson Over a year ago

Your solution will work. But I think the glaring problem is that the database accepts bad data. It should not allow duplicates to be inserted, it shouldn't break due to human error. Fixing his large table will prevent errors with uploading the same file twice. If there is no consistency to his file names, won't the hashes be different? An md5 hash on a file also hashes the metadata doesn't it?

|

JustinDanielson · Accepted Answer · 2013-07-11 21:00:26Z

0

I'd write this in a comment, but there are too many links so it would be too long.

You only need to implement keys/constraints and recreate the table(rerun your create table script)

Check out these links.

Constraints
Unique Constraint
Primary Keys

If you do not want to use Primary Keys, use the Unique constraint.
UNIQUE(column1, column2, column3, ..., columnLast)

answered Jul 11, 2013 at 21:00

JustinDanielson

3,1851 gold badge21 silver badges26 bronze badges

Collectives™ on Stack Overflow

MySQL avoiding data duplication with inserts

3 Answers 3

Comments

7 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related