Summary -

In this topic, we described about the below sections -

A table may contain duplicate rows if the key not defined. To define a key on the table, we have to delete the duplicate rows from the table. So as a first step, we have to check the table has a duplicate row or not. To do this, sort the table using the column on which the key to define. If multiple rows found with the same data, then follow the below process to delete the duplicates before defining the key.


Note :- Sometimes even though the primary key defined on the table, the table may contain duplicates.

There are two ways to delete the duplicate rows from the table. Those are –

  1. By using temp table - This method is useful when all columns have duplicates and no unique column exists.
  2. By using direct DELETE query – This method is useful when atleast one unique column exists in the table.

Let us discuss about both processes in detail –

By using temp table

In this process, we need to create a temp table. To do that, we need to have admin privileges.

If you have it, then we can proceed with this process. Otherwise, we should proceed with option2.

Let us assume, we have admin privileges and the process steps are specified below -

  • Create a temp table with same structure as base table
  • Copy the distinct rows from base table
  • Delete all rows from base table
  • Copy distinct rows from temp table to base table.

The syntaxes for the above process is -

Syntax for create a temporary table as base table -

CREATE TABLE temp_table AS
SELECT * FROM base_table WHERE 1=2; 

Syntax for moving DISTICT rows from base table to a temporary table –

INSERT INTO temp_table
SELECT DISTINCT * FROM base_table; 

Syntax for deleting duplicate data in base table –

DELETE FROM base_table; 

Syntax for moving DISTICT rows from a temporary table to actual table –

INSERT INTO base_table
SELECT DISTINCT * FROM temp_table;   

Example –

Let us consider below table(s) as an example table(s) to frame the SQL query for getting the desired results.

employee_details -

emp_id emp_name designation manager_id date_of_hire salary dept_id
001 Employee1 Director 2019-07-11 45000.00 1000
002 Employee2 Director 2019-07-11 40000.00 2000
003 Employee3 Manager Employee1 2019-07-11 27000.00 1000
004 Employee4 Manager Employee2 2019-10-08 25000.00 2000
005 Employee5 Analyst Employee3 2019-07-11 20000.00 1000
006 Employee6 Analyst Employee3 2019-10-08 18000.00 1000
007 Employee7 Clerk Employee3 2019-07-11 15000.00 1000
008 Employee8 Salesman Employee4 2019-09-09 14000.00 2000
009 Employee9 Salesman Employee4 2019-10-08 13000.00 2000
010 Employee6 Analyst Employee3 2019-10-08 18000.00 1000
011 Employee7 Clerk Employee3 2019-07-11 15000.00 1000
012 Employee8 Salesman Employee4 2019-09-09 14000.00 2000
013 Employee9 Salesman Employee4 2019-10-08 13000.00 2000

Step1 – Create table as employee_details.

CREATE TABLE temp_edetails AS
SELECT * FROM employee_details
WHERE 1=2; 

By executing above query, we can create temp_edetails table with the same structure as employee_details table.


Step2 – Copy distinct rows from base table to temp table.

INSERT INTO temp_edetails
SELECT DISTINCT * FROM employee_details; 

By executing above query, we can copy unique rows from employee_detail table to temp_edetails table. The temp_edetails table now contains the below rows -

Temp_edetails -

emp_id emp_name designation manager_id date_of_hire salary dept_id
001 Employee1 Director 2019-07-11 45000.00 1000
002 Employee2 Director 2019-07-11 40000.00 2000
003 Employee3 Manager Employee1 2019-07-11 27000.00 1000
004 Employee4 Manager Employee2 2019-10-08 25000.00 2000
005 Employee5 Analyst Employee3 2019-07-11 20000.00 1000
006 Employee6 Analyst Employee3 2019-10-08 18000.00 1000
007 Employee7 Clerk Employee3 2019-07-11 15000.00 1000
008 Employee8 Salesman Employee4 2019-09-09 14000.00 2000
009 Employee9 Salesman Employee4 2019-10-08 13000.00 2000

Step3 – Delete all rows from the base table.

DELETE FROM employee_details; 

By executing above query, we can delete all data present in the employee_details table. The output was as follows –

employee_details -

emp_id emp_name designation manager_id date_of_hire salary dept_id

Step4 – Copy distinct rows from temp table to base table

INSERT INTO employee_details
SELECT DISTINCT * FROM temp_edetails; 

By executing above query, we can INSERT DISTINCT details from temp_edetails to employee_details. Now, employee_details table contains unique rows without duplicates –

employee_details -

emp_id emp_name designation manager_id date_of_hire salary dept_id
001 Employee1 Director 2019-07-11 45000.00 1000
002 Employee2 Director 2019-07-11 40000.00 2000
003 Employee3 Manager Employee1 2019-07-11 27000.00 1000
004 Employee4 Manager Employee2 2019-10-08 25000.00 2000
005 Employee5 Analyst Employee3 2019-07-11 20000.00 1000
006 Employee6 Analyst Employee3 2019-10-08 18000.00 1000
007 Employee7 Clerk Employee3 2019-07-11 15000.00 1000
008 Employee8 Salesman Employee4 2019-09-09 14000.00 2000
009 Employee9 Salesman Employee4 2019-10-08 13000.00 2000

By using direct DELETE query

Execute DELETE query to delete the duplicate rows. Let us see how to prepare the DELETE query. The process should follow the below steps –

  1. Prepare query to get only unique rows
  2. Verify the query as a subquery to get duplicate rows
  3. Execute DELETE query

Let us consider the above employee_details table with duplicates.

emp_id emp_name designation manager_id date_of_hire salary dept_id
001 Employee1 Director 2019-07-11 45000.00 1000
002 Employee2 Director 2019-07-11 40000.00 2000
003 Employee3 Manager Employee1 2019-07-11 27000.00 1000
004 Employee4 Manager Employee2 2019-10-08 25000.00 2000
005 Employee5 Analyst Employee3 2019-07-11 20000.00 1000
006 Employee6 Analyst Employee3 2019-10-08 18000.00 1000
007 Employee7 Clerk Employee3 2019-07-11 15000.00 1000
008 Employee8 Salesman Employee4 2019-09-09 14000.00 2000
009 Employee9 Salesman Employee4 2019-10-08 13000.00 2000
010 Employee6 Analyst Employee3 2019-10-08 18000.00 1000
011 Employee7 Clerk Employee3 2019-07-11 15000.00 1000
012 Employee8 Salesman Employee4 2019-09-09 14000.00 2000
013 Employee9 Salesman Employee4 2019-10-08 13000.00 2000

Step1 – Prepare query to get only unique rows.

In the above example, emp_id has unique values. However, the table contains duplicates. So prepare the query to get the latest unique rows emp_id from the table.

SELECT MAX(emp_id) FROM employee_details GROUP BY emp_name,
designation, manager_id, date_of_hire, salary, dept_id;  

By executing above query, we can get the DISTINCT rows emp_id from employee_details table. The result was as follows –

emp_id
001
002
003
004
005
006
007
008
009
010
011
012
013

Note :- The columns in the GROUP BY are the columns having the duplicate values.

Step2 – Verify the above query as a subquery to get duplicate rows

SELECT * FROM employee_details WHERE emp_id NOT IN (SELECT MAX(emp_id)
FROM employee_details GROUP BY emp_name, designation, manager_id,
date_of_hire, salary, dept_id);  

By executing above query, we can get only duplicate rows from employee_details table. The output was as follows –

emp_id emp_name designation manager_id date_of_hire salary dept_id
006 Employee6 Analyst Employee3 2019-10-08 18000.00 1000
007 Employee7 Clerk Employee3 2019-07-11 15000.00 1000
008 Employee8 Salesman Employee4 2019-09-09 14000.00 2000
009 Employee9 Salesman Employee4 2019-10-08 13000.00 2000

Step3 - Execute DELETE query

DELETE FROM employee_details WHERE emp_id NOT IN (SELECT MAX(emp_id)
FROM employee_details GROUP BY emp_id, emp_name, designation, manager_id,
date_of_hire, salary, dept_id);  

By executing above query, we can delete the duplicate rows from employee_details table.

Once the DELETE statement executed, verify whether duplicate rows are deleted or not by using below query –

SELECT * FROM employee_details; 

emp_id emp_name designation manager_id date_of_hire salary dept_id
001 Employee1 Director 2019-07-11 45000.00 1000
002 Employee2 Director 2019-07-11 40000.00 2000
003 Employee3 Manager Employee1 2019-07-11 27000.00 1000
004 Employee4 Manager Employee2 2019-10-08 25000.00 2000
005 Employee5 Analyst Employee3 2019-07-11 20000.00 1000
010 Employee6 Analyst Employee3 2019-10-08 18000.00 1000
011 Employee7 Clerk Employee3 2019-07-11 15000.00 1000
012 Employee8 Salesman Employee4 2019-09-09 14000.00 2000
013 Employee9 Salesman Employee4 2019-10-08 13000.00 2000