Python利用BeautifulSoup解析Html的方法示例

2020-01-04 17:05:53

字體：大中小

來源：轉載

供稿：網友

介紹

Beautiful Soup提供一些簡單的、html">python式的函數用來處理導航、搜索、修改分析樹等功能。它是一個工具箱，通過解析文檔為用戶提供需要抓取的數據，因為簡單，所以不需要多少代碼就可以寫出一個完整的應用程序。

Beautiful Soup自動將輸入文檔轉換為Unicode編碼，輸出文檔轉換為utf-8編碼。你不需要考慮編碼方式，除非文檔沒有指定一個編碼方式，這時，Beautiful Soup就不能自動識別編碼方式了。然后，你僅僅需要說明一下原始編碼方式就可以了。

Beautiful Soup已成為和lxml、html6lib一樣出色的python解釋器，為用戶靈活地提供不同的解析策略或強勁的速度。

本文將給大家詳細介紹關于Python利用BeautifulSoup解析Html的方法，下面話不多說了，來一起看看詳細的介紹：

1. 安裝Beautifulsoup4

pip install beautifulsoup4pip install lxmlpip install html5lib

lxml 和 html5lib 是解析器

2. html

<!-- This is the example.html file. --> <html><head><title>The Website Title</title></head><body><p>Download my <strong>Python</strong> book from <a href="http://inventwithpython.com" rel="external nofollow" >my website</a>.</p><p class="slogan">Learn Python the easy way!</p><p>By <span id="author">Al Sweigart</span></p></body></html>

上面的html保存html文件

3.開始解析

import bs4 exampleFile = open('example.html')exampleSoup = bs4.BeautifulSoup(exampleFile.read(),'html5lib')elems = exampleSoup.select('#author')type(elems)print (elems[0].getText())

結果輸出 Al Sweigart

BeautifulSoup 使用select 方法尋找元素，類似jquery的css選擇器