2014-12-23
東方時尚約車還要網(wǎng)上選課,資源太緊張了,于是我決定自己寫一個搶票程序來幫我刷票。
第一步,抓包。瀏覽器登陸選課系統(tǒng),用抓包工具觀察網(wǎng)絡(luò)行為,這里我用的fildder。這里抓包主要需要獲取兩個信息,一是瀏覽器發(fā)送了哪幾條url請求,二是獲得http報文的頭部以及post出去的data內(nèi)容和格式。
第二步,模擬登陸。將抓取來的頭部抄過來,讓python模擬瀏覽器進行登陸,輸入用戶名和密碼。這里登陸需要驗證碼,先來一個手工識別的,讓python把獲取到的.jpg文件保存到本地,然后暫停等待輸入,人眼識別出驗證碼后輸入給程序,讓程序繼續(xù)執(zhí)行。一開始總是說驗證碼錯誤,最后發(fā)現(xiàn)時請求方式不對,這里要注意,每次獲取url都要用同一個opener去獲取,這樣服務(wù)器才會認為是同一只瀏覽器。修改后,登錄成功。
第三步,解決驗證碼。手工輸入驗證碼總非長遠之計,還是要讓機器去做。這里用了PIL包和pytesser包,里面有實現(xiàn)好了的解析驗證碼的方法,直接拿來用就行了。由于這個包對驗證碼的識別率不是100%,所以我把登錄的代碼放進一個while循環(huán)里,直到順利登錄為止。
第四步,搶課。繼續(xù)抓包,分析選課的ur請求過程,然后讓程序模擬。比如在瀏覽器發(fā)現(xiàn)周五晚上有一節(jié)課可以選,然后讓程序去搶周五晚上的課,程序返回結(jié)果顯示ok,刷新瀏覽器,這節(jié)課確實選上了,說明程序大功告成了!
后續(xù),讀數(shù)據(jù)。東方時尚網(wǎng)站的制作者也不是完全吃素的。由于好的時間段通常都選不到課,我讓我的程序做while循環(huán),一直刷課直到選上為止。刷了幾小時后,網(wǎng)頁說我的操作次數(shù)過多,今天禁止我的訪問。為了解決這個問題,我把刷課的頻率改為10分鐘一次。由于選課也需要驗證碼,而驗證碼識別率不高,這樣如果有課的時候因為驗證碼錯誤卻要等待10分鐘豈不是浪費機會了,所以我又將程序改為如果沒課,就等待10分鐘,如果有課就一直刷。這樣就又需要提取數(shù)據(jù),分析網(wǎng)頁結(jié)構(gòu),發(fā)現(xiàn)有一個單獨的url用來存儲數(shù)據(jù),剩余課時在其中的一個json格式的字符串里。先用正則匹配提取出這個串,然后解析這個json數(shù)據(jù)就得到需要的數(shù)據(jù)啦!
最后,貼上我的代碼:

1 import re 2 import json 3 import time 4 import urllib 5 import urllib2 6 import urlparse 7 import cookielib 8 from PIL import Image, ImageDraw, ImageFont, ImageFilter 9 from pytesser import * 10 from datetime import date 11 import os 12 13 os.chdir('C://Python27/Lib/site-packages/pytesser') 14 15 def getVerify(name): 16 #data = urllib2.urlopen( 17 im = Image.open(name) 18 imgry = im.convert('L') 19 text = image_to_string(imgry) 20 text = re.sub('/W','',text) 21 return text 22 23 def urlToString(url): 24 data = urllib2.urlopen(url).read() 25 f = open('buffer/temp.jpg', 'wb') 26 f.write(data) 27 f.close() 28 return getVerify('buffer/temp.jpg') 29 30 def openerUrlToString(opener, url): 31 data = opener.open(url).read() 32 f = open('buffer/temp.jpg', 'wb') 33 f.write(data) 34 f.close() 35 return getVerify('buffer/temp.jpg') 36 37 def getOpener(head): 38 # deal with the Cookies 39 cj = cookielib.CookieJar() 40 PRo = urllib2.HTTPCookieProcessor(cj) 41 opener = urllib2.build_opener(pro) 42 header = [] 43 for key, value in head.items(): 44 elem = (key, value) 45 header.append(elem) 46 opener.addheaders = header 47 return opener 48 49 def decodeAnyType(data): 50 ret = data 51 try: 52 temp = data.decode('utf-8') 53 ret = temp 54 except: 55 pass 56 try: 57 temp = data.decode('gbk') 58 ret = temp 59 except: 60 pass 61 try: 62 temp = data.decode('gb2312') 63 ret = temp 64 except: 65 pass 66 return ret 67 68 header = { 69 'Connection': 'Keep-Alive', 70 'Accept': 'text/html, application/xhtml+xml, */*', 71 'Accept-Language': 'en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3', 72 'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko', 73 'Accept-Encoding': 'gzip, deflate', 74 'Host': 'wsyc.dfss.com.cn', 75 'DNT': '1' 76 } 77 78 ## the data below are settled by customer to select the class needed 79 start = 13 80 end = 17 81 numid = '3' 82 year = 2014 83 month = 12 84 day = 22 85 username = 'myname' 86 passWord = 'mypasswd' 87 88 opener = getOpener(header) 89 url1 = 'http://wsyc.dfss.com.cn/' 90 url2 = 'http://wsyc.dfss.com.cn/DfssAjax.aspx' 91 url3 = 'http://wsyc.dfss.com.cn/validpng.aspx?aa=3&page=lg' 92 url4 = 'http://wsyc.dfss.com.cn/pc-client/jbxx.aspx' 93 url5 = 'http://wsyc.dfss.com.cn/validpng.aspx' 94 95 ## try to login until the validcode is right 96 count = 0 97 while True: 98 print '------------------------' 99 print 'have tryed to login %d times, now try again!' % (count)100 count = count + 1101 validcode = openerUrlToString(opener, url3)102 print 'the validcode is ' + validcode103 postDict = {104 'AjaxMethod': 'LOGIN',105 'Account': username,106 'ValidCode': validcode,107 'Pwd': password108 }109 110 postData = urllib.urlencode(postDict).encode()111 op = opener.open(url2, postData)112 result = op.read().decode('utf-8')113 print 'the result of login is ' + result114 #if result.find('true') >= 0:115 if result == 'true':116 print 'login success!'117 break118 else:119 continue120 121 122 yuechedate = date(year, month, day)123 today = date.today()124 intervaldays = (yuechedate - today).days125 print intervaldays126 if intervaldays < 2:127 exit()128 validcode = ''129 count = 0130 ## try to select a class until success131 while True:132 print '--------------------------'133 print 'have tryed to select %d times, now try again!' % (count)134 count = count + 1135 try:136 validcode = openerUrlToString(opener, url5)137 except:138 continue139 url7 = 'http://wsyc.dfss.com.cn/Ajax/StuHdl.ashx?loginType=2&method=stu'/140 + '&stuid=%s&sfznum=&carid=&ValidCode=%s' % (username, validcode)141 data = opener.open(url7).read().decode('utf-8')142 strs = re.search('/[/{/"fchrdate.*?/}/]', data)143 #print data144 print strs145 if strs is None:146 continue147 jsontext = json.loads(strs.group())148 num = jsontext[intervaldays][numid].split('/')[1]149 print 'remain num is ' + num150 if num == '0':151 print 'no class avaliable!'152 time.sleep(600)153 continue154 try:155 validcode = openerUrlToString(opener, url5)156 except:157 continue158 url6 = 'http://wsyc.dfss.com.cn/Ajax/StuHdl.ashx?loginType=2&method=yueche'/159 + '&stuid=%s&bmnum=BD14101500687&start=%d&end=%d' % (username, start, end)/160 + '&lessionid=001&trainpriceid=BD13040300001&lesstypeid=02'/161 + '&date=%d-%d-%d' % (year, month, day)/162 + '&id=1&carid=&ycmethod=03&cartypeid=01&trainsessionid=0' + numid/163 + '&ReleaseCarID=&ValidCode=' + validcode164 result = opener.open(url6).read().decode('utf-8')165 print 'result of select is ' + result166 if result == 'success':167 print 'select success!'168 break169 else:170 continue
新聞熱點
疑難解答