国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > Python > 正文

Python自定義scrapy中間模塊避免重復(fù)采集的方法

2019-11-25 17:49:23
字體:
供稿:網(wǎng)友

本文實例講述了Python自定義scrapy中間模塊避免重復(fù)采集的方法。分享給大家供大家參考。具體如下:

from scrapy import logfrom scrapy.http import Requestfrom scrapy.item import BaseItemfrom scrapy.utils.request import request_fingerprintfrom myproject.items import MyItemclass IgnoreVisitedItems(object):  """Middleware to ignore re-visiting item pages if they  were already visited before.   The requests to be filtered by have a meta['filter_visited']  flag enabled and optionally define an id to use   for identifying them, which defaults the request fingerprint,  although you'd want to use the item id,  if you already have it beforehand to make it more robust.  """  FILTER_VISITED = 'filter_visited'  VISITED_ID = 'visited_id'  CONTEXT_KEY = 'visited_ids'  def process_spider_output(self, response, result, spider):    context = getattr(spider, 'context', {})    visited_ids = context.setdefault(self.CONTEXT_KEY, {})    ret = []    for x in result:      visited = False      if isinstance(x, Request):        if self.FILTER_VISITED in x.meta:          visit_id = self._visited_id(x)          if visit_id in visited_ids:            log.msg("Ignoring already visited: %s" % x.url,                level=log.INFO, spider=spider)            visited = True      elif isinstance(x, BaseItem):        visit_id = self._visited_id(response.request)        if visit_id:          visited_ids[visit_id] = True          x['visit_id'] = visit_id          x['visit_status'] = 'new'      if visited:        ret.append(MyItem(visit_id=visit_id, visit_status='old'))      else:        ret.append(x)    return ret  def _visited_id(self, request):    return request.meta.get(self.VISITED_ID) or request_fingerprint(request)

希望本文所述對大家的Python程序設(shè)計有所幫助。

發(fā)表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發(fā)表
主站蜘蛛池模板: 额尔古纳市| 新昌县| 温州市| 庐江县| 延边| 满城县| 额济纳旗| 抚宁县| 安溪县| 台东县| 遵化市| 湛江市| 定襄县| 商丘市| 马鞍山市| 凌云县| 巫溪县| 新和县| 丹巴县| 枝江市| 新邵县| 金溪县| 喀什市| 慈溪市| 寻乌县| 荣昌县| 广平县| 睢宁县| 竹山县| 衡水市| 乾安县| 通化县| 横山县| 玉龙| 鄂州市| 治多县| 靖江市| 通渭县| 丰城市| 马边| 克山县|