[MySQL] Procedure to Add A Unique Key Constraint to a Table

The following is a procedure to safely add a unique key constraint to a table.

I wanted to add a unique key constraint, so that I could use REPLACE instead of INSERT and then DELETE (INSERT IGNORE was not an option because the latest data would have potentially more up to date other fields, which I wanted.) And I didn't want to retain lots of duplicate data in my table.

Steps:

  1. Backup the table
  2. Restore the table to test schema (or test instance)
  3. Test the change against the test table
  4. Make the change to the production table

1) Backup The Table

From the command line:

mysqldump --column-statistics=0 --user=USERNAME --password=PASSWORD --host=MYSQL_SERVER --port=3306 DATABASE TABLE --result-file=DATABASE.TABLE.sql

Note: I had to add --set-gtid-purged=off to the command, else the restore would not work (with permissions error.)

2) Restore the table to test schema (or test instance)

mysql --user=USERNAME --password=PASSWORD --host=MYSQL_SERVER --port=3306 test DATABASE.TABLE.sql

Note 1: If you're not using non-standard port, you can skip the --port
Note 2: You do not need to specify the table for the restore (since that's what you backed up. Examine the .sql file if you have doubts.

3) Test the change against the test table

ALTER TABLE DATABASE.TABLE ADD CONSTRAINT UKEY UNIQUE (c1,c2,c3...)

Where cX are your constraints.

Note: If you have a test schema, you could always simply do a restore to the test schema.

4) Make the change to the production table

If the testing in part 3 all meets with your satisfaction, go ahead and make the change to production.

APPENDIX A: What happened when I tried to make the change?

The change didn't do what I expected. I actually got an error:

Error Code: 1062. Duplicate entry...

So, does that mean I have to delete all the duplicates in order to enable my unique key constraint?

Well, that's what I did, deleted all the duplicate rows from using the knowledge gain from the Appendix B.

DELETE FROM database.table WHERE id = X

APPENDIX B: Finding the Duplicates Which Might Disappear When you Make You Change

The challenge with implementing a Unique Key Constraint to a table, is that non-unique stuff will be deleted (so you should be left with only 1 row that has the Unique Key.) Really, every table you create should have an id column, and you'd hope that the latest id is the row that is kept, but there's no guarantee (at least I don't think so, maybe the latest is kept.)

If you want to find the max and min IDs of any to-be-unique sets in your data, and counts. It is fairly simple:

  • Table = dwh_custom.my_table
  • Unique key constraints = c1,c2,c3,c4
SELECT * FROM(
  SELECT *, min(ID), max(ID), count(*) AS 'count'
  FROM dwh_custom.my_table
  GROUP BY c1,c2,c3,c4
) AS t1 WHERE `count` > 1

If you wanted to manually delete data, you could simple delete the min IDs until you have not output from the above.

If you want to see all the duplicate rows (based on your desired unique key):

SELECT a.* FROM dwh_custom.my_table AS a JOIN(
  SELECT * FROM(
    SELECT *, min(ID), max(ID), count(*) AS 'count'
    FROM dwh_custom.my_table
    GROUP BY c1,c2,c3,c4
  ) AS t1 WHERE `count` > 1
) AS t ON t.c1 = a.c1 AND t.c2 = a.c2 AND t.c3 = a.c3 AND t.c4 = a.c4


Comments