一. 刪除完全重復的記錄
完全重復的數(shù)據,通常是由于沒有設置主鍵/唯一鍵約束導致的。
測試數(shù)據:
復制代碼 代碼如下:www.CuoXIn.com
if OBJECT_ID('duplicate_all') is not null
drop table duplicate_all 
GO 
create table duplicate_all 
( 
c1 int, 
c2 int, 
c3 varchar(100) 
) 
GO 
insert into duplicate_all 
select 1,100,'aaa' union all
select 1,100,'aaa' union all
select 1,100,'aaa' union all
select 1,100,'aaa' union all
select 1,100,'aaa' union all
select 2,200,'bbb' union all
select 3,300,'ccc' union all
select 4,400,'ddd' union all
select 5,500,'eee'
GO
(1) 借助臨時表
利用DISTINCT得到單條記錄,刪除源數(shù)據,然后導回不重復記錄。
如果表不大的話,可以把所有記錄導出一次,然后truncate表后再導回,這樣可以避免delete的日志操作。
復制代碼 代碼如下:www.CuoXIn.com
if OBJECT_ID('tempdb..#tmp') is not null
drop table #tmp 
GO 
select distinct * into #tmp 
from duplicate_all 
where c1 = 1 
GO 
delete duplicate_all where c1 = 1 
GO 
insert into duplicate_all 
select * from #tmp
(2) 使用ROW_NUMBER
復制代碼 代碼如下:www.CuoXIn.com
with tmp 
as
( 
select *,ROW_NUMBER() OVER(PARTITION BY c1,c2,c3 ORDER BY(getdate())) as num 
from duplicate_all 
where c1 = 1 
) 
delete tmp where num > 1
如果多個表有完全重復的行,可以考慮通過UNION將多個表聯(lián)合,插到一個新的同結構的表,SQL Server會幫助去掉表和表之間的重復行。
二. 刪除部分重復的記錄
部分列重復的數(shù)據,通常表上是有主鍵的,可能是程序邏輯造成了多行數(shù)據列值的重復。
測試數(shù)據:
復制代碼 代碼如下:www.CuoXIn.com
if OBJECT_ID('duplicate_col') is not null
drop table duplicate_col 
GO 
create table duplicate_col 
( 
c1 int primary key, 
c2 int, 
c3 varchar(100) 
) 
GO 
insert into duplicate_col 
select 1,100,'aaa' union all
select 2,100,'aaa' union all
select 3,100,'aaa' union all
select 4,100,'aaa' union all
select 5,500,'eee'
GO
(1) 唯一索引
唯一索引有個忽略重復建的選項,在創(chuàng)建主鍵約束/唯一鍵約束時都可以使用這個索引選項。
復制代碼 代碼如下:www.CuoXIn.com
if OBJECT_ID('tmp') is not null
drop table tmp 
GO 
create table tmp 
( 
c1 int, 
c2 int, 
c3 varchar(100), 
constraint UQ_01 unique(c2,c3) with(IGNORE_DUP_KEY = ON) 
) 
GO 
insert into tmp 
select * from duplicate_col 
select * from tmp
(2) 借助主鍵/唯一鍵來刪除
通常會選擇主鍵/唯一鍵的最大/最小值保留,其他行刪除。以下只保留重復記錄中c1最小的行。
復制代碼 代碼如下:www.CuoXIn.com
delete from duplicate_col 
where exists(select 1 from duplicate_col b where duplicate_col.c1 > b.c1 and (duplicate_col.c2 = b.c2 and duplicate_col.c3 = b.c3)) 
--或者 
復制代碼 代碼如下:www.CuoXIn.com
delete from duplicate_col 
where c1 not in (select min(c1) from duplicate_col group by c2,c3)
如果要保留重復記錄中的第N行,可以參考05.取分組中的某幾行。
(3) ROW_NUMBER
和刪除完全重復記錄的寫法基本一樣。
復制代碼 代碼如下:www.CuoXIn.com
with tmp 
as
( 
select *,ROW_NUMBER() OVER(PARTITION BY c2,c3 ORDER BY(getdate())) as num 
from duplicate_col 
) 
delete tmp where num > 1 
select * from duplicate_col
SQL刪除重復數(shù)據只保留一條 (下面的代碼,很多網友反饋錯誤,大家多測試)
用SQL語句,刪除掉重復項只保留一條
在幾千條記錄里,存在著些相同的記錄,如何能用SQL語句,刪除掉重復的呢
1、查找表中多余的重復記錄,重復記錄是根據單個字段(peopleId)來判斷 
select * from people 
where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1) 
2、刪除表中多余的重復記錄,重復記錄是根據單個字段(peopleId)來判斷,只留有rowid最小的記錄 
delete from people 
where   peopleName in (select peopleName    from people group by peopleName      having count(peopleName) > 1) 
and   peopleId not in (select min(peopleId) from people group by peopleName     having count(peopleName)>1) 
3、查找表中多余的重復記錄(多個字段) 
select * from vitae a 
where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) 
4、刪除表中多余的重復記錄(多個字段),只留有rowid最小的記錄 
delete from vitae a 
where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) 
and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1) 
5、查找表中多余的重復記錄(多個字段),不包含rowid最小的記錄 
select * from vitae a 
where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) 
and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)   
6.消除一個字段的左邊的第一位:
update tableName set [Title]=Right([Title],(len([Title])-1)) where Title like '村%'
7.消除一個字段的右邊的第一位:
update tableName set [Title]=left([Title],(len([Title])-1)) where Title like '%村'
8.假刪除表中多余的重復記錄(多個字段),不包含rowid最小的記錄 
update vitae set ispass=-1
where peopleId in (select peopleId from vitae group by peopleId