Friday, October 2, 2009

Delete duplicate rows in PostgreSQL

The basic assumptions are that you have a table, named "emp_exp", that contains a int field, called emp_id and that the emp_id field may contain duplicate rows.

having table "emp_exp", it contain rows like

emp_id organization
------- -------------
1567 POSTGRES
1567 ORACLE
2089 WEBRAIN
2089 IBM

1) First, as always, backup your database.
2) Create a unique id (i.e. like a primary key) for each row, if one does not exist. else follow the step 3, use your unique column in where clause and select clause of sub-query in step 3

I) CREATE SEQUENCE emp_sequence
INCREMENT 1 MINVALUE 1 MAXVALUE 1999999 START 1 CACHE 1;

-- add new column to the table and Populate the field
II) ALTER TABLE emp_exp ADD COLUMN id integer;
III) UPDATE emp_exp SET id = nextval('emp_sequence');

-- after above step your table look like (You have added id column)

id emp_id organization
-- ------- -------------
1 1567 POSTGRES
2 1567 ORACLE
3 2089 WEBRAIN
4 2089 IBM



--NOTE before delete row in your table, if you want to know no.of duplicate rows in your table kindly follow step 4

3) Delete the duplicates

I) DELETE FROM emp_exp
WHERE id IN (SELECT max(id)
FROM emp_exp
GROUP BY emp_id HAVING count(*) > 1)

---same query can be re-write like (OR)

I) DELETE FROM emp_exp
WHERE id NOT IN (SELECT min(id)
FROM emp_exp
GROUP BY emp_id HAVING count(*) >= 1)


now your table emp_exp, have only unique emp_id value



4) Estimate the number of duplicates

You can use the query below to estimate the number of duplicates, so that once the deletion takes place you can figure out if something has gone horribly wrong.

SELECT sum(dupes.c) - count(dupes.c)
FROM (SELECT count(*) AS c
FROM emp_exp
GROUP BY emp_id
HAVING count(*) > 1
) AS dupes


Delete the duplicate records from table in postgresql

DELETE FROM emp_exp
WHERE ctid NOT IN (SELECT min(ctid) FROM emp_exp GROUP BY username)

CTID : It uniquely identifies row in a table and it refer the physical location of the records in table.

No comments: