本文實(shí)例講述了python讀取word文檔的方法。分享給大家供大家參考。具體如下:
首先下載安裝win32com
from win32com import client as wcword = wc.Dispatch('Word.Application')doc = word.Documents.Open('c:/test')doc.SaveAs('c:/test.text', 2)doc.Close()word.Quit()這種方式產(chǎn)生的text文檔,不能用python用普通的r方式讀取,為了讓python可以用r方式讀取,應(yīng)當(dāng)寫(xiě)成
doc.SaveAs('c:/test', 4)注意:系統(tǒng)執(zhí)行完成后,會(huì)自動(dòng)產(chǎn)生文件后綴txt(雖然沒(méi)有指明后綴)。
在xp系統(tǒng)下面,應(yīng)當(dāng),
open(r'c:/text','r')wdFormatDocument = 0wdFormatDocument97 = 0wdFormatDocumentDefault = 16wdFormatDOSText = 4wdFormatDOSTextLineBreaks = 5wdFormatEncodedText = 7wdFormatFilteredHTML = 10wdFormatFlatXML = 19wdFormatFlatXMLMacroEnabled = 20wdFormatFlatXMLTemplate = 21wdFormatFlatXMLTemplateMacroEnabled = 22wdFormatHTML = 8wdFormatPDF = 17wdFormatRTF = 6wdFormatTemplate = 1wdFormatTemplate97 = 1wdFormatText = 2wdFormatTextLineBreaks = 3wdFormatUnicodeText = 7wdFormatWebArchive = 9wdFormatXML = 11wdFormatXMLDocument = 12wdFormatXMLDocumentMacroEnabled = 13wdFormatXMLTemplate = 14wdFormatXMLTemplateMacroEnabled = 15wdFormatXPS = 18
照著字面意思應(yīng)該能對(duì)應(yīng)到相應(yīng)的文件格式,如果你是office 2003可能支持不了這么多格式。word文件轉(zhuǎn)html有兩種格式可選wdFormatHTML、wdFormatFilteredHTML(對(duì)應(yīng)數(shù)字 8、10),區(qū)別是如果是wdFormatHTML格式的話,word文件里面的公式等ole對(duì)象將會(huì)存儲(chǔ)成wmf格式,而選用 wdFormatFilteredHTML的話公式圖片將存儲(chǔ)為gif格式,而且目測(cè)可以看出用wdFormatFilteredHTML生成的HTML 明顯比wdFormatHTML要干凈許多。
當(dāng)然你也可以用任意一種語(yǔ)言通過(guò)com來(lái)調(diào)用office API,比如PHP.
from win32com import client as wcword = wc.Dispatch('Word.Application')doc = word.Documents.Open(r'c:/test1.doc')doc.SaveAs('c:/test1.text', 4)doc.Close()import restrings=open(r'c:/test1.text','r').read()result=re.findall('/(/s*[A-D]/s*/)|/(/xa1*[A-D]/xa1*/)|/(/s*[A-D]/s*/)|/(/xa1*[A-D]/xa1*/)',strings)chan=re.sub('/(/s*[A-D]/s*/)|/(/xa1*[A-D]/xa1*/)|/(/s*[A-D]/s*/)|/(/xa1*[A-D]/xa1*/)','()',strings)question=open(r'c:/question','a+')question.write(chan)question.close()answer=open(r'c:/answeronly','a+')for i,a in enumerate(result): m=re.search('[A-D]',a) answer.write(str(i+1)+' '+m.group()+'/n')answer.close()chan=re.sub(r'/xa3/xa8/s*[A-D]/s*/xa3/xa9','()',strings)#不要(),容易引起歧義。希望本文所述對(duì)大家的Python程序設(shè)計(jì)有所幫助。
新聞熱點(diǎn)
疑難解答
圖片精選