本例中主要是通過HtmlAgilityPack解析html源碼獲取所需的數據.
using HtmlAgilityPack;
1.通過C#中WebRequest,WebResponse,StreamReader類獲取網頁源代碼
WebRequest request = WebRequest.Create(url);using (WebResponse response = request.GetResponse())using (StreamReader reader = new StreamReader(response.GetResponseStream(), encoding))result = reader.ReadToEnd();
2.通過網頁URL獲取HtmlNode ,通過HtmlAgilityPack中的HtmlDocument類獲取
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();document.LoaDHTML(htmlSource);HtmlNode rootNode = document.DocumentNode;return rootNode;
3.通過HtmlNode的SelectSingleNode方法就可獲取你所需要的內容了,注意以下代碼中path是HTML的標簽路徑如:path="http://div[@class='article_title']/h1/span/a";//文章標題PATH
對應于
<div class=’article_title’>
<h1>
<span>
<a>獲取這里的內容
</a>
</span>
</h1>
</div>
參考源碼如下:
HtmlNode temp = srcNode.SelectSingleNode(path);if (temp == null)return null;return temp.InnerText;
返回值為: 獲取這里的內容
其中temp.InnerHtml可獲取網站HTML的內容如:<a>獲取這里的內容</a>
新聞熱點
疑難解答