国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > Python > 正文

基于scrapy實現的簡單蜘蛛采集程序

2019-11-25 17:42:26
字體:
來源:轉載
供稿:網友

本文實例講述了基于scrapy實現的簡單蜘蛛采集程序。分享給大家供大家參考。具體如下:

# Standard Python library imports# 3rd party importsfrom scrapy.contrib.spiders import CrawlSpider, Rulefrom scrapy.contrib.linkextractors.sgml import SgmlLinkExtractorfrom scrapy.selector import HtmlXPathSelector# My importsfrom poetry_analysis.items import PoetryAnalysisItemHTML_FILE_NAME = r'.+/.html'class PoetryParser(object):  """  Provides common parsing method for poems formatted this one specific way.  """  date_pattern = r'(/d{2} /w{3,9} /d{4})'   def parse_poem(self, response):    hxs = HtmlXPathSelector(response)    item = PoetryAnalysisItem()    # All poetry text is in pre tags    text = hxs.select('//pre/text()').extract()    item['text'] = ''.join(text)    item['url'] = response.url    # head/title contains title - a poem by author    title_text = hxs.select('//head/title/text()').extract()[0]    item['title'], item['author'] = title_text.split(' - ')    item['author'] = item['author'].replace('a poem by', '')    for key in ['title', 'author']:      item[key] = item[key].strip()    item['date'] = hxs.select("http://p[@class='small']/text()").re(date_pattern)    return itemclass PoetrySpider(CrawlSpider, PoetryParser):  name = 'example.com_poetry'  allowed_domains = ['www.example.com']  root_path = 'someuser/poetry/'  start_urls = ['http://www.example.com/someuser/poetry/recent/',         'http://www.example.com/someuser/poetry/less_recent/']  rules = [Rule(SgmlLinkExtractor(allow=[start_urls[0] + HTML_FILE_NAME]),                  callback='parse_poem'),       Rule(SgmlLinkExtractor(allow=[start_urls[1] + HTML_FILE_NAME]),                  callback='parse_poem')]

希望本文所述對大家的Python程序設計有所幫助。

發表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發表
主站蜘蛛池模板: 阿克陶县| 娄底市| 皮山县| 宣城市| 攀枝花市| 雷州市| 紫金县| 龙游县| 蓝山县| 宁南县| 两当县| 且末县| 历史| 威信县| 武邑县| 江城| 定州市| 谷城县| 唐海县| 青田县| 马公市| 唐河县| 宾川县| 扶余县| 华蓥市| 盐山县| 东丰县| 邹城市| 安康市| 南和县| 静宁县| 临江市| 汽车| 庐江县| 阿图什市| 当阳市| 广水市| 建平县| 衡阳市| 荆州市| 磐石市|