国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > Python > 正文

python爬蟲之BeautifulSoup 使用select方法詳解

2020-01-04 16:38:45
字體:
來源:轉載
供稿:網友

本文介紹了python/95394.html">python爬蟲之BeautifulSoup 使用select方法詳解 ,分享給大家。具體如下:

<html><head><title>The Dormouse's story</title></head><body><p class="title" name="dromouse"><b>The Dormouse's story</b></p><p class="story">Once upon a time there were three little sisters; and their names were<a href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" class="sister" id="link1"><!-- Elsie --></a>,<a href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" rel="external nofollow" class="sister" id="link2">Lacie</a> and<a href="http://example.com/tillie" rel="external nofollow" rel="external nofollow" rel="external nofollow" class="sister" id="link3">Tillie</a>;and they lived at the bottom of a well.</p><p class="story">...</p>"""

我們在寫 CSS 時,標簽名不加任何修飾,類名前加點,id名前加 #,在這里我們也可以利用類似的方法來篩選元素,用到的方法是 soup.select(),返回類型是 list

(1)通過標簽名查找

print soup.select('title') #[<title>The Dormouse's story</title>] print soup.select('a')#[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link3">Tillie</a>] print soup.select('b')#[<b>The Dormouse's story</b>]

(2)通過類名查找

print soup.select('.sister')#[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link3">Tillie</a>]

(3)通過 id 名查找

print soup.select('#link1')#[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]

(4)組合查找

組合查找即和寫 class 文件時,標簽名與類名、id名進行的組合原理是一樣的,例如查找 p 標簽中,id 等于 link1的內容,二者需要用空格分開

print soup.select('p #link1')#[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]

直接子標簽查找

print soup.select("head > title")#[<title>The Dormouse's story</title>]

(5)屬性查找

查找時還可以加入屬性元素,屬性需要用中括號括起來,注意屬性和標簽屬于同一節點,所以中間不能加空格,否則會無法匹配到。

print soup.select("head > title")#[<title>The Dormouse's story</title>] print soup.select('a[href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" ]')#[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]

同樣,屬性仍然可以與上述查找方式組合,不在同一節點的空格隔開,同一節點的不加空格

print soup.select('p a[href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" ]')#[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]

以上就是本文的全部內容,希望對大家的學習有所幫助,也希望大家多多支持VEVB武林網。


注:相關教程知識閱讀請移步到python教程頻道。
發表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發表
主站蜘蛛池模板: 客服| 泰兴市| 永登县| 大竹县| 甘德县| 通河县| 江山市| 同德县| 邵阳市| 万载县| 钟祥市| 武威市| 定远县| 定陶县| 桐乡市| 广丰县| 呼玛县| 磐安县| 彭泽县| 城步| 光泽县| 安庆市| 天全县| 安宁市| 肥城市| 昂仁县| 山阴县| 临海市| 邵武市| 沙湾县| 丘北县| 新蔡县| 天长市| 凯里市| 杂多县| 成都市| 洪湖市| 旺苍县| 蒲江县| 临夏县| 澄城县|