非常棘手的重复记录问题,大大帮忙解决啊
现有数据库村有500万条以上的记录,由于当时设计人员的问题,没有给表建立主键索引,现在里面有大量的重复记录,存取速度也异常缓慢,数据库的结构如下:
有 barcode、temprature、data_1、data_2、data_3……字段,其中Barcode和temprature确定一条记录,即需要将barcode和temprature设置成联合主键。
可是我用distanct无法过滤掉重复记录,有没有什么方法能过滤掉重复记录?
------解决方案--------------------1、如果barcode和temprature相同的组合中,data_1值各不同:
delete
t
from
表 t
where
exists(select
1
from
表
where
barcode=t.barcode and temprature=t.temprature and data_1> t.data_1)
------解决方案--------------------select barcode,tempratur,max(data_1) as data_1...
from 表
group by barcode,tempratur
having count(*)> 1
将上面数据导出,删除原表再导入
------解决方案--------------------简单粗暴的处理方式:
select identity(int,1,1) as nid,* into test from 表
go
delete t
from
test t
where
exists(select 1 from test where barcode=t.barcode and temprature=t.temprature and nid <t.nid)
go
alter table test drop column nid
go
truncate table 表
go
insert into 表 select * from test
go
drop table test
go
------解决方案----------------------将所有数据自动编号到临时表 test
select identity(int,1,1) as nid,* into test from 表
go
--从test中删除重复记录
delete t
from
test t
where
exists(select 1 from test where barcode=t.barcode and temprature=t.temprature and nid <t.nid)
go
--删除test表中的nid字段
alter table test drop column nid
go
--清空原表
truncate table 表
go
--将过滤掉重复记录的数据insert会原表
insert into 表 select * from test
go
--删除临时表test
drop table test
go
------解决方案--------------------在几千条记录里,存在着些相同的记录,如何能用SQL语句,删除掉重复的呢?谢谢!
1、查找表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断
select * from people
where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)
2、删除表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断,只留有rowid最小的记录
delete from people
where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)
and rowid not in (select min(rowid) from people group by peopleId having count(peopleId )> 1)
3、查找表中多余的重复记录(多个字段)
select * from vitae a
where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
4、删除表中多余的重复记录(多个字段),只留有rowid最小的记录
delete from vitae a
where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)> 1)
5、查找表中多余的重复记录(多个字段),不包含rowid最小的记录
select * from vitae a
where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)> 1)