Requests
網絡資源(URLs)拮取套件
改善Urllib2的缺點,讓使用者更加方便的獲取網絡資源
可以使用REST操作(POST,PUT,GET DELETE)存取網絡資源
requests 使用
import requestsres = requests.get('http://news.sina.com.cn/china/')res.encoding = 'utf-8'PRint(res.text)
BeautifulSoup4 范例
from bs4 import BeautifulSouphtml_sample = '/ <html> / <body> / <h1 id = "title">Hello world</h1> / <a href = "#" class = "link">This is link1</a> / <a href = "# link2" class = "link">This is link2</a> / </body> / </html>'soup = BeautifulSoup(html_sample, 'html.parser')print(soup.text)
使用select 找出含有h1標簽的元素
soup = BeautifulSoup(html_sample)
header = soup.select('h1')
print(header)
使用select找出含有a標簽的元素
soup = BeautifulSoup(html_sample)
alink = soup.select('a')
print(alink)
新聞熱點
疑難解答