在python中使用正則表達(dá)式查找可嵌套字符串組

2020-01-04 16:38:50

字體：大中小

供稿：網(wǎng)友

在網(wǎng)上看到一個(gè)小需求，需要用正則表達(dá)式來處理。原需求如下：

找出文本中包含”因?yàn)?hellip;…所以”的句子，并以兩個(gè)詞為中心對(duì)齊輸出前后3個(gè)字，中間全輸出，如果“因?yàn)?rdquo;和“所以”中間還存在“因?yàn)?rdquo;“所以”，也要找出來，另算一行，輸出格式為：

行號(hào) 前面3個(gè)字 *因?yàn)? 全部 &所以& 后面3個(gè)字(標(biāo)點(diǎn)符號(hào)算一個(gè)字)

2 還不是 *因?yàn)? 這里好， &所以& 沒有人

實(shí)現(xiàn)方法如下：

#encoding:utf-8import osimport redef getPairStriList(filename):  pairStrList = []  textFile = open(filename, 'r')  pattern = re.compile(u'.{3}/u56e0/u4e3a.*/u6240/u4ee5.{3}') #u'/u56e0/u4e3a和u'/u6240/u4ee5'分別為“因?yàn)?rdquo;和“所以”的utf8碼  for line in textFile:    utfLine = line.decode('utf8')    result = pattern.search(utfLine)    while result:      resultStr = result.group()      pairStrList.append(resultStr)      result = pattern.search(resultStr,2,len(resultStr)-2)  #對(duì)每個(gè)字符串進(jìn)行格式轉(zhuǎn)換和拼接    for i in range(len(pairStrList)):    pairStrList[i] = pairStrList[i][:3] + pairStrList[i][3:5].replace(u'/u56e0/u4e3a',u' */u56e0/u4e3a* ',1) + pairStrList[i][5:]    pairStrList[i] = pairStrList[i][:len(pairStrList[i])-5] + pairStrList[i][len(pairStrList[i])-5:].replace(u'/u6240/u4ee5',u' &/u6240/u4ee5& ',1)    pairStrList[i] = str(i+1) + ' ' + pairStrList[i]  return pairStrList  if __name__ == '__main__':  pairStrList = getPairStriList('test.txt')  for str in pairStrList:    print str

PS：下面看下python里使用正則表達(dá)式的組嵌套

由于組本身是一個(gè)完整的正則表達(dá)式，所以可以將組嵌套在其他組中，以構(gòu)建更復(fù)雜的表達(dá)式。下面的例子，就是進(jìn)行組嵌套的例子：

#python 3.6 #蔡軍生  #http://blog.csdn.net/caimouse/article/details/51749579 # import re def test_patterns(text, patterns):   """Given source text and a list of patterns, look for   matches for each pattern within the text and print   them to stdout.   """   # Look for each pattern in the text and print the results   for pattern, desc in patterns:     print('{!r} ({})/n'.format(pattern, desc))     print(' {!r}'.format(text))     for match in re.finditer(pattern, text):       s = match.start()       e = match.end()       prefix = ' ' * (s)       print(         ' {}{!r}{} '.format(prefix,                    text[s:e],                    ' ' * (len(text) - e)),         end=' ',       )       print(match.groups())       if match.groupdict():         print('{}{}'.format(           ' ' * (len(text) - s),           match.groupdict()),         )     print()   return

例子：

#python 3.6 #蔡軍生  #http://blog.csdn.net/caimouse/article/details/51749579 # from re_test_patterns_groups import test_patterns test_patterns(   'abbaabbba',   [(r'a((a*)(b*))', 'a followed by 0-n a and 0-n b')], )

結(jié)果輸出如下：

'a((a*)(b*))' (a followed by 0-n a and 0-n b) 'abbaabbba' 'abb'    ('bb', '', 'bb')   'aabbb'  ('abbb', 'a', 'bbb')     'a' ('', '', '')

總結(jié)

以上所述是小編給大家介紹的在python中使用正則表達(dá)式查找可嵌套字符串組，希望對(duì)大家有所幫助，如果大家有任何疑問請(qǐng)給我留言，小編會(huì)及時(shí)回復(fù)大家的。在此也非常感謝大家對(duì)VEVB武林網(wǎng)網(wǎng)站的支持！

注：相關(guān)教程知識(shí)閱讀請(qǐng)移步到python教程頻道。

上一篇：python爬蟲之BeautifulSoup 使用select方法詳解

下一篇：詳解python里使用正則表達(dá)式的分組命名方式

學(xué)習(xí)交流

解決內(nèi)存不足妙方

解決內(nèi)存不足妙方...

熱門圖片

猜你喜歡的新聞

猜你喜歡的關(guān)注

国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

在python中使用正則表達(dá)式查找可嵌套字符串組