Monday, July 2, 2012

Best Way to Delete multiple non unique records

Yesterday when standing with my friends we started discussing best interview questions and one of the best questions is “how to delete duplicate records if there is no unique key in table”.

So today I will tell you best way to achieve this, I heard many answers like using temp tables with while loop or cursors, sub tables etc etc. But best is CTE (Common table expression). I think CTE is most power full method wrt recussion or looping.

Coming back to the point here are two approaches:

USING CURSORS

DECLARE @Count int

DECLARE @FirstName nvarchar(50)

DECLARE @LastName nvarchar(50)

 

DECLARE dublicate_cursor CURSOR FAST_FORWARD FOR

SELECT FirstName, LastName, Count(*) - 1

FROM Users

GROUP BY FirstName, LastName

HAVING Count(*) > 1

 

OPEN dublicate_cursor

 

FETCH NEXT FROM dublicate_cursor INTO @FirstName, @LastName, @Count

 

WHILE @@FETCH_STATUS = 0

BEGIN

 

DELETE TOP(@Count) FROM Users WHERE FirstName = @FirstName AND LastName = @LastName

 

FETCH NEXT FROM dublicate_cursor INTO @FirstName, @LastName, @Count

END

 

CLOSE dublicate_cursor

DEALLOCATE dublicate_cursor

 

NOW with CTEs

 

WITH CTE (FirstName,LastName,DuplicateCount)

AS

(

SELECT FirstName,LastName,

ROW_NUMBER() OVER(PARTITION BY FirstName,LastName  ORDER BY FirstName) AS DuplicateCount

FROM Users

)

DELETE

FROM CTE

WHERE DuplicateCount > 1