Tuesday, August 25, 2009

Why can't I add a unique index?

Adding unique indexes once data is already loaded into a table can prove to be a bit of a headache. The question of how to clean up duplicate keys is one that I have had posed to me many times and it seems somewhat simple. I guess anything is simple if you already know how to do it, so to that end, I thought I would write my solution.

This became an issue again recently because the UNIX admin where I work was creating a database to store system statistics. He has daily flat files that he wants to load and knew the basics of creating a database. As he was loading the data it occurred to him that he did not want to be able to load the same file twice and he decided to try to add a unique index. Funny thing was that what he was trying to prevent had already happened and he received an error trying to create the unique index. His next stop was my desk to ask why the error was happening and how to fix it.

The first thing we needed to do was to figure out which rows were duplicated. The following SQL gave us the date that was loaded twice:

select collection_date, server, count(*)
from system_stats
group by collection_date, server
having count(*) > 1;


This gave us a list of machines for one date with 2 rows each. Now we knew what date was loaded twice, but since they were exactly the same, how do we get rid of one row for each key? This took creating a temporary table to hold the records we didn't want to delete.

create table hold_records as
select distinct * from system_stats
where collection_date = ;


Checking the hold_records table shows us that we have exactly half the number of records for that date and each server only has one record for that date. Now we clean up the data for that date from the main table.

delete system_stats where collection_date = ;

Now we put the data back:

insert into system_stats (select * from hold_records);

Voila! One record per server for that date and the unique index could be created. Again, this is not really rocket science, but I am surprised at how many times I have been asked to do something similar. Hopefully it will save someone else some time in the future.

1 comment:

  1. have you looked into the NOVALIDATE option?

    check this out

    SQL> desc dupes
    Name Null? Type
    ----------------------------------------- -------- ----------------------------
    MYNUMBER NUMBER

    SQL> select mynumber,count(*) from dupes group by mynumber having count(*) > 1;

    MYNUMBER COUNT(*)
    ---------- ----------
    1 3
    15 2

    SQL> alter table dupes add constraint dupes_pk primary key(mynumber)
    2 using index(create index dupes_pk on dupes(mynumber)) enable novalidate;

    Table altered.

    SQL> select mynumber,count(*) from dupes group by mynumber having count(*) > 1;

    MYNUMBER COUNT(*)
    ---------- ----------
    1 3
    15 2

    SQL> insert into dupes values(1);
    insert into dupes values(1)
    *
    ERROR at line 1:
    ORA-00001: unique constraint (NEIL.DUPES_PK) violated


    SQL>

    ReplyDelete