記錄重復(fù)的這些問(wèn)題小編介紹過(guò)非常的多了,如果不信可以可以百度找找,下面一起來(lái)看小編整理的一篇mysql表中記錄重復(fù)處理方案,今天寫了一個(gè)airport表,主要有這么幾個(gè)字段.
- //主鍵id,機(jī)場(chǎng)英文名稱,機(jī)場(chǎng)中文名稱,機(jī)場(chǎng)三字碼,機(jī)場(chǎng)所在城市名子
- id, name, c_name, code,city_name
由于機(jī)場(chǎng)三字碼是不重復(fù)的,所以一開(kāi)始我為airport表添加唯一索引.
ALTER TABLE `airport` ADD UNIQUE(`code`);
但在寫的過(guò)程中發(fā)現(xiàn)我爬蟲爬到的信息有重復(fù)的情況,所以先暫時(shí)去掉唯一索引.
show index from airport;//查看索引情況
DROP INDEX code ON airport
上下爬蟲程序代碼:
- require('phpQuery.php');
- phpQuery::newDocumentFile("http://******************因?yàn)槟承┰?**********");
- $res = pq('tbody')->find('tr')->text();
根據(jù)自己的業(yè)務(wù)邏輯插入到表中,很快完成了工作,完成之后,還是要理解三字碼重復(fù)的問(wèn)題,我的做法是采用了把所有重復(fù)的最小的三字碼記錄進(jìn)行刪除處理,最早寫的代碼如下:
- DELETE FROM airport WHERE
- id IN (SELECT id FROM airport GROUP BY code HAVING COUNT(code) > 1) --Vevb.com
- AND id NOT IN ( SELECT max(id) FROM airport GROUP BY code HAVING COUNT(code ) >1);
但運(yùn)行這條sql的時(shí)候出錯(cuò),原因是在選擇的時(shí)候,不能同步進(jìn)行刪除或更新操作,這時(shí)候引用一個(gè)臨時(shí)表吧.
- create TEMPORARY table tmp select id from airport WHERE
- id IN (SELECT id FROM airport GROUP BY code HAVING COUNT(code) > 1)
- AND id NOT IN (SELECT max(id) FROM airport GROUP BY code HAVING COUNT(code ) >1);
- delete from airport where id in (select id from tmp)
ok,操作完成,再看看有沒(méi)有重復(fù)的情況:
SELECT id FROM airport GROUP BY code HAVING COUNT(code) > 1
如果有再進(jìn)行刪除,表的三字碼已經(jīng)不重復(fù)了,再為三字碼添加唯一索引了.
新聞熱點(diǎn)
疑難解答
圖片精選