国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > Python > 正文

Python自定義scrapy中間模塊避免重復采集的方法

2020-02-23 00:34:18
字體:
來源:轉載
供稿:網友

本文實例講述了Python自定義scrapy中間模塊避免重復采集的方法。分享給大家供大家參考。具體如下:

from scrapy import logfrom scrapy.http import Requestfrom scrapy.item import BaseItemfrom scrapy.utils.request import request_fingerprintfrom myproject.items import MyItemclass IgnoreVisitedItems(object):  """Middleware to ignore re-visiting item pages if they  were already visited before.   The requests to be filtered by have a meta['filter_visited']  flag enabled and optionally define an id to use   for identifying them, which defaults the request fingerprint,  although you'd want to use the item id,  if you already have it beforehand to make it more robust.  """  FILTER_VISITED = 'filter_visited'  VISITED_ID = 'visited_id'  CONTEXT_KEY = 'visited_ids'  def process_spider_output(self, response, result, spider):    context = getattr(spider, 'context', {})    visited_ids = context.setdefault(self.CONTEXT_KEY, {})    ret = []    for x in result:      visited = False      if isinstance(x, Request):        if self.FILTER_VISITED in x.meta:          visit_id = self._visited_id(x)          if visit_id in visited_ids:            log.msg("Ignoring already visited: %s" % x.url,                level=log.INFO, spider=spider)            visited = True      elif isinstance(x, BaseItem):        visit_id = self._visited_id(response.request)        if visit_id:          visited_ids[visit_id] = True          x['visit_id'] = visit_id          x['visit_status'] = 'new'      if visited:        ret.append(MyItem(visit_id=visit_id, visit_status='old'))      else:        ret.append(x)    return ret  def _visited_id(self, request):    return request.meta.get(self.VISITED_ID) or request_fingerprint(request)

希望本文所述對大家的Python程序設計有所幫助。

發表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發表
主站蜘蛛池模板: 礼泉县| 新野县| 祁门县| 阳原县| 温泉县| 永和县| 桑日县| 瓮安县| 晋州市| 平湖市| 沾化县| 大关县| 个旧市| 乐安县| 洱源县| 北票市| 九龙城区| 静宁县| 沈丘县| 沅陵县| 徐水县| 威海市| 鄂托克旗| 化州市| 盐边县| 布尔津县| 绥芬河市| 东海县| 阜城县| 乌恰县| 扎囊县| 海伦市| 黔南| 定结县| 龙口市| 兰西县| 二连浩特市| 河津市| 巧家县| 凤凰县| 横峰县|