爬蟲,又稱蜘蛛,是從別的網(wǎng)站抓取資源的一種方法,C#.NET使用爬蟲的方法如下:
PRotected string GetPageHtml(string url){string pageinfo;try{WebRequest myreq = WebRequest.Create(url);WebResponse myrep = myreq.GetResponse();StreamReader reader = new StreamReader(myrep.GetResponseStream(), Encoding.GetEncoding("gb2312"));pageinfo = reader.ReadToEnd();}catch{pageinfo = "";}return pageinfo;}
按上述方法就可以在程序中獲取某URL的頁面源文件。
但是有些網(wǎng)站屏蔽了爬蟲,那就需要模擬瀏覽器獲取的方法來進行,具體代碼如下:
protected string GetPageHtml(string url){string pageinfo;try{HttpWebRequest myReq = (HttpWebRequest)HttpWebRequest.Create(url);myReq.Accept = "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-Flash, application/vnd.ms-Excel, application/vnd.ms-Powerpoint, application/msWord, */*";myReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)";HttpWebResponse myRep = (HttpWebResponse)myReq.GetResponse();Stream myStream = myRep.GetResponseStream();StreamReader sr = new StreamReader(myStream, Encoding.Default);pageinfo = sr.ReadToEnd().ToString();}catch{pageinfo = "";}return pageinfo;}
|
新聞熱點
疑難解答