本文實例講述了Python3實現(xiàn)并發(fā)檢驗代理池地址的方法。分享給大家供大家參考,具體如下:
#encoding=utf-8#author: walker#date: 2016-04-14#summary: 用協(xié)程/線程池并發(fā)檢驗代理有效性import os, sys, timeimport requestsfrom concurrent import futurescur_dir_fullpath = os.path.dirname(os.path.abspath(__file__))Headers = { 'Accept': '*/*', 'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)', }#檢驗單個代理的有效性#如果有效,返回該proxy;否則,返回空字符串def Check(desturl, proxy, feature): proxies = {'http': 'http://' + proxy} r = None #聲明 exMsg = None try: r = requests.get(url=desturl, headers=Headers, proxies=proxies, timeout=3) except: exMsg = '* ' + traceback.format_exc() #print(exMsg) finally: if 'r' in locals() and r: r.close() if exMsg: return '' if r.status_code != 200: return '' if r.text.find(feature) < 0: return '' return proxy#輸入代理列表(set/list),返回有效代理列表def GetValidProxyPool(rawProxyPool, desturl, feature): validProxyList = list() #有效代理列表 pool = futures.ThreadPoolExecutor(8) futureList = list() for proxy in rawProxyPool: futureList.append(pool.submit(Check, desturl, proxy, feature)) print('/n submit done, waiting for responses/n') for future in futures.as_completed(futureList): proxy = future.result() print('proxy:' + proxy) if proxy: #有效代理 validProxyList.append(proxy) print('validProxyList size:' + str(len(validProxyList))) return validProxyList#獲取原始代理池def GetRawProxyPool(): rawProxyPool = set() #通過某種方式獲取原始代理池...... return rawProxyPoolif __name__ == "__main__": rawProxyPool = GetRawProxyPool() desturl = 'http://...' #需要通過代理訪問的目標地址 feature = 'xxx' #目標網(wǎng)頁的特征碼 validProxyPool = GetValidProxyPool(rawProxyPool, desturl, feature)更多關(guān)于Python相關(guān)內(nèi)容感興趣的讀者可查看本站專題:《Python入門與進階經(jīng)典教程》、《Python URL操作技巧總結(jié)》、《Python圖片操作技巧總結(jié)》、《Python數(shù)據(jù)結(jié)構(gòu)與算法教程》、《Python Socket編程技巧總結(jié)》、《Python函數(shù)使用技巧總結(jié)》、《Python字符串操作技巧匯總》及《Python文件與目錄操作技巧匯總》
希望本文所述對大家Python程序設計有所幫助。
新聞熱點
疑難解答
圖片精選