非常棘手的重复记录有关问题，大大帮忙解决-MSSQL教程-爱易网页

非常棘手的重复记录有关问题，大大帮忙解决

日期：2014-05-19　浏览次数：20511 次

非常棘手的重复记录问题，大大帮忙解决啊
现有数据库村有500万条以上的记录，由于当时设计人员的问题，没有给表建立主键索引，现在里面有大量的重复记录，存取速度也异常缓慢，数据库的结构如下：
有 barcode、temprature、data_1、data_2、data_3……字段，其中Barcode和temprature确定一条记录，即需要将barcode和temprature设置成联合主键。
可是我用distanct无法过滤掉重复记录，有没有什么方法能过滤掉重复记录？

------解决方案--------------------
1、如果barcode和temprature相同的组合中，data_1值各不同：

delete
t
from
表 t
where
exists(select
1
from
表
where
barcode=t.barcode and temprature=t.temprature and data_1> t.data_1)
------解决方案--------------------
select barcode,tempratur,max(data_1) as data_1...
from 表
group by barcode,tempratur
having count(*)> 1
将上面数据导出,删除原表再导入
------解决方案--------------------
简单粗暴的处理方式：

select identity(int,1,1) as nid,* into test from 表
go

delete t
from
test t
where
exists(select 1 from test where barcode=t.barcode and temprature=t.temprature and nid <t.nid)
go

alter table test drop column nid
go

truncate table 表
go

insert into 表 select * from test
go

drop table test
go
------解决方案--------------------
--将所有数据自动编号到临时表 test
select identity(int,1,1) as nid,* into test from 表
go

--从test中删除重复记录
delete t
from
test t
where
exists(select 1 from test where barcode=t.barcode and temprature=t.temprature and nid <t.nid)
go

--删除test表中的nid字段
alter table test drop column nid
go

--清空原表
truncate table 表
go

--将过滤掉重复记录的数据insert会原表
insert into 表 select * from test
go

--删除临时表test
drop table test
go
------解决方案--------------------
在几千条记录里,存在着些相同的记录,如何能用SQL语句,删除掉重复的呢?谢谢!
1、查找表中多余的重复记录，重复记录是根据单个字段（peopleId）来判断
select * from people
where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)

2、删除表中多余的重复记录，重复记录是根据单个字段（peopleId）来判断，只留有rowid最小的记录
delete from people
where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)
and rowid not in (select min(rowid) from people group by peopleId having count(peopleId )> 1)

3、查找表中多余的重复记录（多个字段）
select * from vitae a
where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)

4、删除表中多余的重复记录（多个字段），只留有rowid最小的记录
delete from vitae a
where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)> 1)

5、查找表中多余的重复记录（多个字段），不包含rowid最小的记录
select * from vitae a
where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)> 1)

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

非常棘手的重复记录有关问题，大大帮忙解决

相关资料更多>

推荐阅读更多>