如果你看到的話,那么你需要設置你的php教程并開啟這個庫,如果你是在windows平臺下,那么非常簡單,你需要改一改你的php.ini文件的設置,找到php_curl.dll,并取消前面的分號注釋就行了,如下所示:
取消下在的注釋 extension=php_curl.dll
如果你是在linux下面,那么,google排名你需要重新編譯你的php了,編輯時,你需要打開編譯參數——在configure命令上加上“–with-curl” 參數.
一個小示例,如果一切就緒,下面是一個小例程,代碼如下:
- <?php
- // 初始化一個 curl 對象
- $curl = curl_init();
- // 設置你需要抓取的url
- curl_setopt($curl, curlopt_url, 'http://Vevb.com');
- // 設置header
- curl_setopt($curl, curlopt_header, 1);
- // 設置curl 參數,要求結果保存到字符串中還是輸出到屏幕上。
- curl_setopt($curl, curlopt_returntransfer, 1);
- // 運行curl,請求網頁
- $data = curl_exec($curl);
- // 關閉url請求
- curl_close($curl);
- // 顯示獲得的數據
- ?>
var_dump($data);
如何post數據
上面是抓取網頁的代碼,下面則是向某個網頁post數據,假設我們有一個處理表單的網址http://www.example.com/sendsms.php,其可以接受兩個表單域,一個是電話號碼,一個是短信內容,代碼如下:
- <?php
- $phonenumber = '13912345678';
- $message = 'this message was generated by curl and php'; //開源代碼Vevb.com
- $curlpost = 'pnumber=' . urlencode($phonenumber) . '&message=' . urlencode($message) . '&submit=send';
- $ch = curl_init();chain link fencing
- curl_setopt($ch, curlopt_url, 'http://www.example.com/sendsms.php');
- curl_setopt($ch, curlopt_header, 1);
- curl_setopt($ch, curlopt_returntransfer, 1);
- curl_setopt($ch, curlopt_post, 1);
- curl_setopt($ch, curlopt_postfields, $curlpost);
- $data = curl_exec();
- curl_close($ch);
- ?>
從上面的程序我們可以看到,使用curlopt_post設置http協議的post方法,而不是get方法,然后以curlopt_postfields設置post的數據.
關于代理服務器
下面是一個如何使用代理服務器的示例,請注意其中高亮的代碼,代碼很簡單,我就不用多說了,代碼如下:
- <?php
- $ch = curl_init();
- curl_setopt($ch, curlopt_url, 'http://m.survivalescaperooms.com');
- curl_setopt($ch, curlopt_header, 1);
- curl_setopt($ch, curlopt_returntransfer, 1);
- curl_setopt($ch, curlopt_httpproxytunnel, 1);
- curl_setopt($ch, curlopt_proxy, 'fakeproxy.com:1080');
- curl_setopt($ch, curlopt_proxyuserpwd, 'user:password');
- $data = curl_exec();
- curl_close($ch);
- ?>
關于ssl和cookie
關于ssl也就是https協議,你只需要把curlopt_url連接中的http://變成https://就可以了,當然,還有一個參數叫curlopt_ssl_verifyhost可以設置為驗證站點.
關于cookie,你需要了解下面三個參數:
curlopt_cookie,在當面的會話中設置一個cookie.
curlopt_cookiejar,當會話結束的時候保存一個cookie.
curlopt_cookiefile,cookie的文件.
http服務器認證,最后,我們來看一看http服務器認證的情況,代碼如下:
- <?php
- $ch = curl_init();
- curl_setopt($ch, curlopt_url, 'http://m.survivalescaperooms.com');
- curl_setopt($ch, curlopt_returntransfer, 1);
- curl_setopt($ch, curlopt_httpauth, curlauth_basic);
- curl_setopt(curlopt_userpwd, '[username]:[password]')
- $data = curl_exec();
- curl_close($ch);
- ?>
看一個利用curl抓取163郵箱地址列表代碼
curl技術說白了就是模擬瀏覽器的動作實現頁面抓取或表單提交,通過此技術可以實現許多有去的功能,代碼如下:
- <?php
- error_reporting(0);
- //郵箱用戶名(不帶@163.com后綴的)
- $user = 'papatata_test';
- //郵箱密碼
- $pass = '000000';
- //目標郵箱
- //$mail_addr = uenucom@163.com';
- //登陸
- $url = 'http://reg.163.com/logins.jsp教程?type=1&url=http://entry.mail.163.com/coremail/fcg/ntesdoor2?lightweight%3d1%26verifycookie%3d1%26language%3d-1%26style%3d-1';
- $ch = curl_init($url);
- //創建一個用于存放cookie信息的臨時文件
- $cookie = tempnam('.','~');
- $referer_login = 'http://mail.163.com';
- //返回結果存放在變量中,而不是默認的直接輸出
- curl_setopt($ch, curlopt_returntransfer, true);
- curl_setopt($ch, curlopt_header, true);
- curl_setopt($ch, curlopt_connecttimeout, 120);
- curl_setopt($ch, curlopt_post, true);
- curl_setopt($ch, curlopt_referer, $referer_login);
- $fields_post = array(
- 'username'=> $user,
- 'password'=> $pass,
- 'verifycookie'=>1,
- 'style'=>-1,
- 'product'=> 'mail163',
- 'seltype'=>-1,
- 'secure'=>'on'
- );
- $headers_login = array(
- 'user-agent' => 'mozilla/5.0 (windows; u; windows nt 5.1; zh-cn; rv:1.9) gecko/2008052906 firefox/3.0',
- 'referer' => 'http://www.163.com'
- );
- $fields_string = '';
- foreach($fields_post as $key => $value)
- {
- $fields_string .= $key . '=' . $value . '&';
- }
- $fields_string = rtrim($fields_string , '&');
- curl_setopt($ch, curlopt_cookiesession, true);
- //關閉連接時,將服務器端返回的cookie保存在以下文件中
- curl_setopt($ch, curlopt_cookiejar, $cookie);
- curl_setopt($ch, curlopt_httpheader, $headers_login);
- curl_setopt($ch, curlopt_post, count($fields));
- curl_setopt($ch, curlopt_postfields, $fields_string);
- $result= curl_exec($ch);
- curl_close($ch);
- //跳轉
- $url='http://entry.mail.163.com/coremail/fcg/ntesdoor2?lightweight=1&verifycookie=1&language=-1&style=-1&username=loki_wuxi';
- $ch = curl_init($url);
- $headers = array(
- 'user-agent' => 'mozilla/5.0 (windows; u; windows nt 5.1; zh-cn; rv:1.9) gecko/2008052906 firefox/3.0'
- );
- curl_setopt($ch, curlopt_returntransfer, true);
- curl_setopt($ch, curlopt_header, true);
- curl_setopt($ch, curlopt_connecttimeout, 120);
- curl_setopt($ch, curlopt_post, true);
- curl_setopt($ch, curlopt_httpheader, $headers);
- //將之前保存的cookie信息,一起發送到服務器端
- curl_setopt($ch, curlopt_cookiefile, $cookie);
- curl_setopt($ch, curlopt_cookiejar, $cookie);
- $result = curl_exec($ch);
- curl_close($ch);
- //取得sid
- preg_match('/sid=[^"].*/', $result, $location);
- $sid = substr($location[0], 4, -1);
- //file_put_contents('./result.txt', $sid);
- //通訊錄地址
- $url='http://g4a30.mail.163.com/jy3/address/addrlist.jsp?sid='.$sid.'&gid=all';
- $ch = curl_init($url);
- $headers = array(
- 'user-agent' => 'mozilla/5.0 (windows; u; windows nt 5.1; zh-cn; rv:1.9) gecko/2008052906 firefox/3.0'
- );
- curl_setopt($ch, curlopt_returntransfer, true);
- curl_setopt($ch, curlopt_header, true);
- curl_setopt($ch, curlopt_connecttimeout, 120);
- curl_setopt($ch, curlopt_post, true);
- curl_setopt($ch, curlopt_httpheader, $headers);
- curl_setopt($ch, curlopt_cookiefile, $cookie);
- curl_setopt($ch, curlopt_cookiejar, $cookie);
- $result = curl_exec($ch);
- curl_close($ch);
- //file_put_contents('./result.txt', $result);
- unlink($cookie);
- //開始抓取內容
- preg_match_all('/<td class="ibx_td_addrname"><a[^>]*>(.*?)</a></td><td class="ibx_td_addremail"><a[^>]*>(.*?)</a></td>/i', $result,$infos,preg_set_order);
- //1:姓名2:郵箱
- print_r($infos);
- ?>
新聞熱點
疑難解答