詳解Python requests 超時(shí)和重試的方法

2020-01-04 13:46:42

字體：大中小

供稿：網(wǎng)友

網(wǎng)絡(luò)請(qǐng)求不可避免會(huì)遇上請(qǐng)求超時(shí)的情況，在 requests 中，如果不設(shè)置你的程序可能會(huì)永遠(yuǎn)失去響應(yīng)。

超時(shí)又可分為連接超時(shí)和讀取超時(shí)。

連接超時(shí)

連接超時(shí)指的是在你的客戶(hù)端實(shí)現(xiàn)到遠(yuǎn)端機(jī)器端口的連接時(shí)（對(duì)應(yīng)的是 connect() ），Request 等待的秒數(shù)。

import timeimport requestsurl = 'http://www.google.com.hk'print(time.strftime('%Y-%m-%d %H:%M:%S'))try:  html = requests.get(url, timeout=5).text  print('success')except requests.exceptions.RequestException as e:  print(e)print(time.strftime('%Y-%m-%d %H:%M:%S'))

因?yàn)?google 被墻了，所以無(wú)法連接，錯(cuò)誤信息顯示 connect timeout（連接超時(shí)）。

2018-12-14 14:38:20
HTTPConnectionPool(host='www.google.com.hk', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x00000000047F80F0>, 'Connection to www.google.com.hk timed out. (connect timeout=5)'))
2018-12-14 14:38:25

就算不設(shè)置，也會(huì)有一個(gè)默認(rèn)的連接超時(shí)時(shí)間（我測(cè)試了下，大概是21秒）。

讀取超時(shí)

讀取超時(shí)指的就是客戶(hù)端等待服務(wù)器發(fā)送請(qǐng)求的時(shí)間。（特定地，它指的是客戶(hù)端要等待服務(wù)器發(fā)送字節(jié)之間的時(shí)間。在 99.9% 的情況下這指的是服務(wù)器發(fā)送第一個(gè)字節(jié)之前的時(shí)間）。

簡(jiǎn)單的說(shuō)，連接超時(shí)就是發(fā)起請(qǐng)求連接到連接建立之間的最大時(shí)長(zhǎng)，讀取超時(shí)就是連接成功開(kāi)始到服務(wù)器返回響應(yīng)之間等待的最大時(shí)長(zhǎng)。

讀取超時(shí)是沒(méi)有默認(rèn)值的，如果不設(shè)置，程序?qū)⒁恢碧幱诘却隣顟B(tài)。我們的爬蟲(chóng)經(jīng)常卡死又沒(méi)有任何的報(bào)錯(cuò)信息，原因就在這里了。

如果你設(shè)置了一個(gè)單一的值作為 timeout，如下所示：

r = requests.get('https://github.com', timeout=5)

這一 timeout 值將會(huì)用作 connect 和 read 二者的 timeout。如果要分別制定，就傳入一個(gè)元組：

r = requests.get('https://github.com', timeout=(3.05, 27))

黑板課爬蟲(chóng)闖關(guān)的第四關(guān)正好網(wǎng)站人為設(shè)置了一個(gè)15秒的響應(yīng)等待時(shí)間，拿來(lái)做說(shuō)明最好不過(guò)了。

import timeimport requestsurl_login = 'http://www.heibanke.com/accounts/login/?next=/lesson/crawler_ex03/'session = requests.Session()session.get(url_login)token = session.cookies['csrftoken']session.post(url_login, data={'csrfmiddlewaretoken': token, 'username': 'guliang21', 'password': '123qwe'})print(time.strftime('%Y-%m-%d %H:%M:%S'))url_pw = 'http://www.heibanke.com/lesson/crawler_ex03/pw_list/'try:  html = session.get(url_pw, timeout=(5, 10)).text  print('success')except requests.exceptions.RequestException as e:  print(e)print(time.strftime('%Y-%m-%d %H:%M:%S'))

錯(cuò)誤信息中顯示的是 read timeout（讀取超時(shí)）。

2018-12-14 15:20:47
HTTPConnectionPool(host='www.heibanke.com', port=80): Read timed out. (read timeout=10)
2018-12-14 15:20:57

超時(shí)重試

一般超時(shí)我們不會(huì)立即返回，而會(huì)設(shè)置一個(gè)三次重連的機(jī)制。

def gethtml(url):  i = 0  while i < 3:    try:      html = requests.get(url, timeout=5).text      return html    except requests.exceptions.RequestException:      i += 1

其實(shí) requests 已經(jīng)幫我們封裝好了。（但是代碼好像變多了…）

import timeimport requestsfrom requests.adapters import HTTPAdapters = requests.Session()s.mount('http://', HTTPAdapter(max_retries=3))s.mount('https://', HTTPAdapter(max_retries=3))print(time.strftime('%Y-%m-%d %H:%M:%S'))try:  r = s.get('http://www.google.com.hk', timeout=5)  return r.textexcept requests.exceptions.RequestException as e:  print(e)print(time.strftime('%Y-%m-%d %H:%M:%S'))

max_retries 為最大重試次數(shù)，重試3次，加上最初的一次請(qǐng)求，一共是4次，所以上述代碼運(yùn)行耗時(shí)是20秒而不是15秒

2018-12-14 15:34:03
HTTPConnectionPool(host='www.google.com.hk', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000000013269630>, 'Connection to www.google.com.hk timed out. (connect timeout=5)'))
2018-12-14 15:34:23

以上就是本文的全部?jī)?nèi)容，希望對(duì)大家的學(xué)習(xí)有所幫助，也希望大家多多支持VEVB武林網(wǎng)。

注：相關(guān)教程知識(shí)閱讀請(qǐng)移步到python教程頻道。

上一篇：如何用python寫(xiě)一個(gè)簡(jiǎn)單的詞法分析器

下一篇：解決新django中的path不能使用正則表達(dá)式的問(wèn)題