日期:2014-05-18  浏览次数:20526 次

处理重复数据的问题
表结构如下:Timer
[Id]   [smallint]   NOT   NULL   ,
[Card]   [int]   Not   NULL   ,
[EmployeeId]   [int]   NULL   ,
[BrushTime]   [datetime]   NOT   NULL   ,
现在此表有近一亿多条数据,其中有大量重复数据,现在要清理掉这些重复数据,大家有什么好的解决办法!现在主要是因为数据量太大,处理不过来!

------解决方案--------------------
表结构如下:Timer
[Id] [smallint] NOT NULL ,
[Card] [int] Not NULL ,
[EmployeeId] [int] NULL ,
[BrushTime] [datetime] NOT NULL ,
现在此表有近一亿多条数据,其中有大量重复数据,现在要清理掉这些重复数据,大家有什么好的解决办法!现在主要是因为数据量太大,处理不过来!

以ID为例,BrushTime分大小
delete from timer where BrushTime not in (select min(BrushTime) from timer group by id)
------解决方案--------------------
如果所有列都有重复,指不能区分两行(按任何字段排序).

则需要建立临时表来删除.(不过一亿,恐怖啊)

select px = identity(int,1,1),* into temp from timer
delete from temp where px not in (select min(px) from temp group by id)
------解决方案--------------------
---保留相同Id,Cardid,EmployeeId,BrushTime最大那行
Delete A From Timer As A Where Exists
(Select 1 From Timer Where Id=A.Id And Cardid=A.Cardid And
EmployeeId=A.EmployeeId And BrushTime> A.BrushTime)

---保留相同Id,Cardid,EmployeeId,BrushTime最小那行
Delete A From Timer As A Where Exists
(Select 1 From Timer Where Id=A.Id And Cardid=A.Cardid And
EmployeeId=A.EmployeeId And BrushTime <A.BrushTime)

---只要Id,Cardid,EmployeeId有重复的,就将其全部删除
Delete A From @Timer As A Where Exists
(Select 1 From @Timer Where Id=A.Id And Cardid=A.Cardid And
EmployeeId=A.EmployeeId Group By Id,Cardid,EmployeeId Having Count(1)> 1)
------解决方案--------------------
哇,看來新增時要多注意重復的問題才行。
------解决方案--------------------
数据重复是指数据完全一样,现在的问题是怎么处理大量的重复数据!

1亿数据,天啊。这样:

select distinct * into #temp from Timer -- 但愿你的内存和磁盘空间足够

truncate table Timer

insert Timer select * from #temp

drop table #temp