0

I am using mysql.

Every month I upload a txt file create a table and then, after launching a query to flter the results, I want to add the new rows to a bigger table which keeps all the months of the year.

The table I create every month always maintains the same structure.

I have experimented that after launching the query, inserting the INSERT INTO table_name statement all the rows are effectively inserted in the bigger table, but the problems is if that I forget that one month I have already uploaded the data and I process it again there is no filter and the rows will be inserted a second time and I will find them duplicated.

Is there a way to avoid this?

I do not use primary keys on either tables.

7
  • 2
    Why do you not use primary keys ? Commented Jul 11, 2013 at 20:50
  • Primary keys are pretty fundamental to stuff like this. You probably want to start using them. Commented Jul 11, 2013 at 20:51
  • Is it possible for 2 rows to be identical, but from different months? do you store timestamps with each row? Adding Unique(timeStampColumn) in your create table statement would fix it. Commented Jul 11, 2013 at 21:05
  • 1
    A table without a PRIMARY KEY or a column with a UNIQUE index is going to be trouble if these are your objectives. Commented Jul 11, 2013 at 21:06
  • In DBMS land, a table without a PK is not really a table. Commented Jul 11, 2013 at 21:12

3 Answers 3

1

Set up a unique index on the columns you class as being unique. and then use INSERT IGNORE instead of just insert

Sign up to request clarification or add additional context in comments.

Comments

0

I understand your problem.

Add a column on your table with the md5 value of the file.

Before uploading it check if the md5 value just calculated exists in at least one of the rows in your table, if yes don't upload it.

md5 can guarantee you a very good uniqueness.

Cheers!

7 Comments

This solution is like scratching your left ear by using your right arm and reaching over your head. Unique constraints or primary keys would solve the problem. The database software can prevent duplicates from being entered, there is no need to put that burden on a higher application or the dba.
I do not state that the value above cannot be used as a primary key. It's what everyone uses to identify a single file!
MD5 is pretty trashy when SHA1 is available.
Do we really want to go on with this? Are you concerned with security? SHA-2 Maybe? Or we want to help solve a problem?
Your solution will work. But I think the glaring problem is that the database accepts bad data. It should not allow duplicates to be inserted, it shouldn't break due to human error. Fixing his large table will prevent errors with uploading the same file twice. If there is no consistency to his file names, won't the hashes be different? An md5 hash on a file also hashes the metadata doesn't it?
|
0

I'd write this in a comment, but there are too many links so it would be too long.

You only need to implement keys/constraints and recreate the table(rerun your create table script)

Check out these links.

Constraints
Unique Constraint
Primary Keys

If you do not want to use Primary Keys, use the Unique constraint.
UNIQUE(column1, column2, column3, ..., columnLast)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.