MSSQL某列数据中去除重复数据,求解,该怎么解决-MSSQL教程-爱易网页

MSSQL某列数据中去除重复数据,求解,该怎么解决

日期：2014-05-18　浏览次数：20906 次

MSSQL某列数据中去除重复数据,求解
例:
字段1 | 字段2
A: | '美国,法国,英国,美国,日本,韩国,日本'
B: | '朝鲜,缅甸,缅甸'

需要得到的数据应该是以下格式

字段1 | 字段2
A: | '美国,法国,英国,日本,韩国'
B: | '朝鲜,缅甸'

求解

------解决方案--------------------
先拆分再合并精华帖子里面有很多。

搜索合并拆分列
------解决方案--------------------
尽量不要这样设计表
你可以这样
字段1 字段2
A 美国
A 法国
A 英国
A 美国
A 日本
A 韩国
A 日本
B 朝鲜
B 缅甸
B 缅甸
这样即便是有重复的,也很容易筛选.
------解决方案--------------------

当SQLServer设计表的时候没有建组合字段唯一约束，以后需要增加这一约束时，却发现表里已经有了很多重复记录了。

当SQLServer设计表的时候没有建组合字段唯一约束，以后需要增加这一约束时，
却发现表里已经有了很多重复记录了。

请看看我用的去掉SQLServer表里组合字段重复的记录方法：

假设原始表名为source_table,字段名1为field_name1,字段名2为field_name2。

（当然稍加修改也可以用到三个及以上组合字段重复的情况）

第一步: 生成组合字段重复的临时表source_dup_simple
select field_name1,field_name2,count(0) as num into source_dup_simple
from source_table
group by field_name1,field_name2 having count(0)>1

第二步: 生成组合字段重复的主表里完整记录的临时表source_dup
select t1.* into source_table_dup
from source_table t1,source_dup_simple t2
where t1.field_name1=t2.field_name1 and t1.field_name2=t2.field_name2

第三步: 删去source_dup里的全部重复记录
delete from source_table
where convert(varchar,field_name1)+convert(varchar,field_name2) in
(select convert(varchar,field_name1)+convert(varchar,field_name2)
from source_dup_simple)

第四步: 生成有序列号的重复组合字段记录表source_table_dup_2
select IDENTITY(int,1,1) as rowid,t1.* into source_table_dup_2
from source_table_dup t1
order by field_name1,field_name2,date_field_name

说明：用自动增长序列号IDENTITY(int,1,1)生成唯一的行号字段rowid
这里是按字段field_name1,field_name2,date_field_name排序，
以方便后面删除最新还是最旧时间的重复记录

当然date_field_name字段可以替换成你想要排序的字段，并可用desc选项

第五步: 删去有序列号重复组合字段记录表source_table_dup_2里面的重复记录
delete from source_table_dup_2
where rowid in (select min(rowid) from source_table_dup_2
group by field_name1,field_name2 having count(*)>1)

（所影响的行数为 5586 行）

delete from source_table_dup_2
where rowid in (select min(rowid) from source_table_dup_2
group by field_name1,field_name2 having count(*)>1)
（所影响的行数为 1108 行）

...... ......

delete from source_table_dup_2
where rowid in (select min(rowid) from source_table_dup_2
group by field_name1,field_name2 having count(*)>1)
（所影响的行数为 0 行）

注意：上面这条删除的SQL要执行一到多次，因为组合字段重复记录可能一条以上，
一直到它（所影响的行数为 0 行）才算彻底删除干净重复记录。

我这里是保留重复记录里时间字段date_field_name最新的记录。

第六步: 把剩下的没有重复的记录插回原始表
insert into source_table(field_name1,field_name2,.....)
select field_name1,field_name2,...... from source_table_dup_2

------解决方案--------------------

SQL code

create table t1(col1 varchar(10),col2 varchar(8000))
insert t1
select 'A' , '美国,法国,英国,美国,日本,韩国,日本' union all
select 'B' , '朝鲜,缅甸,缅甸'
go
;with cte as
(
select col1,
col2=SUBSTRING(case when right(col2,1)=',' then col2 else col2+',' end,1,charindex(',',case when right(col2,1)=',' then col2 else col2+',' end)-1),
col3=stuff(case when right(col2,1)=',' then col2 else col2+',' end,1,charindex(',',case when right(col2,1)=',' then col2 else col2+',' end),'')
 from t1
union all
select col1,col2=SUBSTRING(col3,1,charindex(',',col3)-1),col3=stuff(col3,1,charindex(',',col3),'') from cte
where CHARINDEX(',',col3)>0
)
select distinct col1,col2 from cte
order by col1
/*
col1  col2
--  ----
A    法国
A    韩国
A    美国
A    日本
A    英国
B    朝鲜
B    缅甸
*/
go
drop table t1

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

MSSQL某列数据中去除重复数据,求解,该怎么解决

相关资料更多>

推荐阅读更多>