日期:2014-05-16  浏览次数:20633 次

从数据库中删除重复记录 SQL(MSSQL)

假定一张表Person, 主键为Id (Identity), 还有intPersonId, Name, Sex, Address 等等字段。分为一下两种情况:

1、删除单一字段上的重复:

SELECT  *
FROM    Person
WHERE   intPersonId IN ( SELECT intPersonId
                         FROM   Person
                         GROUP BY intPersonId
                         HAVING COUNT(*) > 1 ) 

 

2、多字段上的重复。我们假定两个人如果名字和住址一样,那这个人就是重复的。选出重复的人,只保留一个,代码如下:

SELECT *
FROM Person
WHERE intPersonId IN (	SELECT	intPersonId
						FROM	Person A
						WHERE	EXISTS(	SELECT * FROM Person B
										WHERE A.strName = B.strNAME
										  AND A.strAddress = B.straddress
										GROUP BY B.strName, B.strAddress
										HAVING COUNT (*)>1))
  AND intPersonId NOT IN (	SELECT	MIN(intPersonId)
							FROM	Person A
							WHERE	EXISTS(	SELECT * FROM Person B
											WHERE A.strName = B.strNAME
											  AND A.strAddress = B.straddress
											GROUP BY B.strName, B.strAddress
											HAVING COUNT (*)>1))
							GROUP BY A.strName, A.strAddress
						 )


如果intPersonId 是一个主键,没有重复的,效率更高的解决方案是:

SELECT * 
FROM Person P
WHERE EXISTS (	SELECT * FROM Person WHERE intPersonId < P.intPersonId AND strName = P.strNAME AND strAddress = P.straddress)


这样就成功解决了MSSQL不能使用 where aaa,bbb IN(SELECT AAA, BBB FROM XXXX)的问题了