国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > Python > 正文

python正向最大匹配分詞和逆向最大匹配分詞的實(shí)例

2020-02-15 23:40:55
字體:
供稿:網(wǎng)友

正向最大匹配

# -*- coding:utf-8 -*- CODEC='utf-8' def u(s, encoding):  'converted other encoding to unicode encoding'  if isinstance(s, unicode):    return s  else:    return unicode(s, encoding) def fwd_mm_seg(wordDict, maxLen, str):  'forward max match segment'  wordList = []  segStr = str  segStrLen = len(segStr)  for word in wordDict:    print 'word: ', word  print "/n"  while segStrLen > 0:    if segStrLen > maxLen:      wordLen = maxLen    else:      wordLen = segStrLen    subStr = segStr[0:wordLen]    print "subStr: ", subStr    while wordLen > 1:      if subStr in wordDict:        print "subStr1: %r" % subStr        break      else:        print "subStr2: %r" % subStr        wordLen = wordLen - 1        subStr = subStr[0:wordLen]#      print "subStr3: ", subStr    wordList.append(subStr)    segStr = segStr[wordLen:]    segStrLen = segStrLen - wordLen  for wordstr in wordList:    print "wordstr: ", wordstr  return wordList          def main():  fp_dict = open('words.dic')  wordDict = {}  for eachWord in fp_dict:    wordDict[u(eachWord.strip(), 'utf-8')] = 1  segStr = u'你好世界hello world'  print segStr  wordList = fwd_mm_seg(wordDict, 10, segStr)  print "==".join(wordList)   if __name__ == '__main__':  main()  

逆向最大匹配

# -*- coding:utf-8 -*-  def u(s, encoding):  'converted other encoding to unicode encoding'  if isinstance(s, unicode):    return s  else:    return unicode(s, encoding) CODEC='utf-8' def bwd_mm_seg(wordDict, maxLen, str):  'forward max match segment'  wordList = []  segStr = str  segStrLen = len(segStr)  for word in wordDict:    print 'word: ', word  print "/n"  while segStrLen > 0:    if segStrLen > maxLen:      wordLen = maxLen    else:      wordLen = segStrLen    subStr = segStr[-wordLen:None]    print "subStr: ", subStr    while wordLen > 1:      if subStr in wordDict:        print "subStr1: %r" % subStr        break      else:        print "subStr2: %r" % subStr        wordLen = wordLen - 1        subStr = subStr[-wordLen:None]#      print "subStr3: ", subStr    wordList.append(subStr)    segStr = segStr[0: -wordLen]    segStrLen = segStrLen - wordLen  wordList.reverse()  for wordstr in wordList:    print "wordstr: ", wordstr  return wordList          def main():  fp_dict = open('words.dic')  wordDict = {}  for eachWord in fp_dict:    wordDict[u(eachWord.strip(), 'utf-8')] = 1  segStr = ur'你好世界hello world'  print segStr  wordList = bwd_mm_seg(wordDict, 10, segStr)  print "==".join(wordList) if __name__ == '__main__':  main()  

以上這篇python正向最大匹配分詞和逆向最大匹配分詞的實(shí)例就是小編分享給大家的全部內(nèi)容了,希望能給大家一個(gè)參考,也希望大家多多支持武林站長站。

發(fā)表評論 共有條評論
用戶名: 密碼:
驗(yàn)證碼: 匿名發(fā)表
主站蜘蛛池模板: 横峰县| 兴安县| 尚义县| 华蓥市| 芒康县| 凤冈县| 南通市| 博客| 许昌县| 青海省| 松江区| 新巴尔虎左旗| 宝山区| 博兴县| 姜堰市| 孙吴县| 南京市| 天全县| 晋城| 安西县| 蒙山县| 广元市| 安龙县| 乌苏市| 尚义县| 敦煌市| 桃园市| 本溪市| 梨树县| 土默特右旗| 电白县| 临清市| 宁阳县| 永康市| 清水县| 永丰县| 保靖县| 康马县| 波密县| 麻城市| 德庆县|