Dear All,
There are many ways to delete duplicate rows in a table. The choice depends on many factors like; density of duplicates; whether the whole row has to be identical to be considered duplicate or is it based on one or many attributes, etc.
Technique 1: when the density of duplicates is very high and you do not have any unique identifier in the table:
SELECT DISTINCT * INTO dbo.TableTmp FROM dbo.TableSource; DROP TABLE dbo.TableSource; EXEC sp_rename 'dbo.TableTmp', 'TableSource'; -- Add constraints, indexes, triggers, etc
Technique 2: When there is a unique identifier available, say KeyColumn and attribute like IDCol can be used to identify the duplicates:
DELETE FROM dbo.TableSource WHERE EXISTS (SELECT * FROM dbo.TableSource AS T2 WHERE T2.IDCol = dbo.TableSource.IDCol AND T2.KeyColumn > dbo.TableSource.KeyColumn);
Technique 3: Using CTE & Row_NUMBER function; no dependency on unique identifier or complete row to be duplicate.
WITH TableDup AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY IDCol ORDER BY IDCol) AS RN_col
FROM dbo.TableSource
)
DELETE FROM TableDup WHERE RN_col > 1;
Regards
Rahul Sharma
Like us on FaceBook | Follow us on Twitter | Join the fastest growing SQL Server group on FaceBook
Follow me on Twitter | Follow me on FaceBook
I like it, its very simple and very useful