国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > Python > 正文

Python統計純文本文件中英文單詞出現個數的方法總結【測試可用

2020-02-15 22:31:29
字體:
來源:轉載
供稿:網友

本文實例講述了Python統計純文本文件中英文單詞出現個數的方法。分享給大家供大家參考,具體如下:

第一版: 效率低

# -*- coding:utf-8 -*-#!python3path = 'test.txt'with open(path,encoding='utf-8',newline='') as f:  word = []  words_dict= {}  for letter in f.read():    if letter.isalnum():      word.append(letter)    elif letter.isspace(): #空白字符 空格 /t /n      if word:        word = ''.join(word).lower() #轉小寫        if word not in words_dict:          words_dict[word] = 1        else:          words_dict[word] += 1        word = []#處理最后一個單詞if word:  word = ''.join(word).lower() # 轉小寫  if word not in words_dict:    words_dict[word] = 1  else:    words_dict[word] += 1  word = []for k,v in words_dict.items():  print(k,v)

運行結果:

we 4
are 1
busy 1
all 1
day 1
like 1
swarms 1
of 6
flies 1
without 1
souls 1
noisy 1
restless 1
unable 1
to 1
hear 1
the 7
voices 1
soul 1
as 1
time 1
goes 1
by 1
childhood 1
away 2
grew 1
up 1
years 1
a 1
lot 1
memories 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence 1
regardless 1
shackles 1
mind 1
indulge 1
in 1
world 1
buckish 1
focus 1
on 1
beneficial 1
principle 1
lost 1
themselves 1

第二版:

缺點:遇到大文件要一次讀入內存,性能不好

# -*- coding:utf-8 -*-#!python3import repath = 'test.txt'with open(path,'r',encoding='utf-8') as f:  data = f.read()  word_reg = re.compile(r'/w+')  #word_reg = re.compile(r'/w+/b')  word_list = word_reg.findall(data)  word_list = [word.lower() for word in word_list] #轉小寫  word_set = set(word_list) #避免重復查詢  # words_dict = {}  # for word in word_set:  #   words_dict[word] = word_list.count(word)  # 簡潔寫法  words_dict = {word: word_list.count(word) for word in word_set}  for k,v in words_dict.items():    print(k,v)

運行結果:

on 1
also 1
souls 1
focus 1
soul 1
time 1
noisy 1
grew 1
lot 1
childish 1
like 1
voices 1
indulge 1
swarms 1
buckish 1
restless 1
we 4
hear 1
childhood 1
as 1
world 1
themselves 1
are 1
bottom 1
memories 1
the 7
of 6
flies 1
without 1
have 2
day 1
busy 1
to 1
eroded 1
regardless 1
unable 1
innocence 1
up 1
a 1
in 1
mind 1
goes 1
by 1
lost 1
principle 1
once 1
away 2
years 1
beneficial 1
all 1
shackles 1

發表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發表
主站蜘蛛池模板: 英德市| 虹口区| 鄯善县| 广宁县| 广丰县| 邮箱| 庆阳市| 深泽县| 康乐县| 鲁甸县| 堆龙德庆县| 黄大仙区| 乡宁县| 丹江口市| 家居| 临安市| 中超| 磐安县| 宁波市| 溧水县| 云浮市| 黄陵县| 瓦房店市| 连州市| 浦东新区| 龙岩市| 平度市| 含山县| 二连浩特市| 镇远县| 小金县| 湖南省| 浪卡子县| 诏安县| 郁南县| 额敏县| 呼玛县| 临朐县| 万载县| 准格尔旗| 化德县|