国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 編程 > Python > 正文

Python統計純文本文件中英文單詞出現個數的方法總結【測試可用】

2020-01-04 14:45:03
字體:
來源:轉載
供稿:網友

本文實例講述了Python統計純文本文件中英文單詞出現個數的方法。分享給大家供大家參考,具體如下:

第一版: 效率低

# -*- coding:utf-8 -*-#!python3path = 'test.txt'with open(path,encoding='utf-8',newline='') as f:  word = []  words_dict= {}  for letter in f.read():    if letter.isalnum():      word.append(letter)    elif letter.isspace(): #空白字符 空格 /t /n      if word:        word = ''.join(word).lower() #轉小寫        if word not in words_dict:          words_dict[word] = 1        else:          words_dict[word] += 1        word = []#處理最后一個單詞if word:  word = ''.join(word).lower() # 轉小寫  if word not in words_dict:    words_dict[word] = 1  else:    words_dict[word] += 1  word = []for k,v in words_dict.items():  print(k,v)

運行結果:

we 4
are 1
busy 1
all 1
day 1
like 1
swarms 1
of 6
flies 1
without 1
souls 1
noisy 1
restless 1
unable 1
to 1
hear 1
the 7
voices 1
soul 1
as 1
time 1
goes 1
by 1
childhood 1
away 2
grew 1
up 1
years 1
a 1
lot 1
memories 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence 1
regardless 1
shackles 1
mind 1
indulge 1
in 1
world 1
buckish 1
focus 1
on 1
beneficial 1
principle 1
lost 1
themselves 1

第二版:

缺點:遇到大文件要一次讀入內存,性能不好

# -*- coding:utf-8 -*-#!python3import repath = 'test.txt'with open(path,'r',encoding='utf-8') as f:  data = f.read()  word_reg = re.compile(r'/w+')  #word_reg = re.compile(r'/w+/b')  word_list = word_reg.findall(data)  word_list = [word.lower() for word in word_list] #轉小寫  word_set = set(word_list) #避免重復查詢  # words_dict = {}  # for word in word_set:  #   words_dict[word] = word_list.count(word)  # 簡潔寫法  words_dict = {word: word_list.count(word) for word in word_set}  for k,v in words_dict.items():    print(k,v)

運行結果:

on 1
also 1
souls 1
focus 1
soul 1
time 1
noisy 1
grew 1
lot 1
childish 1
like 1
voices 1
indulge 1
swarms 1
buckish 1
restless 1
we 4
hear 1
childhood 1
as 1
world 1
themselves 1
are 1
bottom 1
memories 1
the 7
of 6
flies 1
without 1
have 2
day 1
busy 1
to 1
eroded 1
regardless 1
unable 1
innocence 1
up 1
a 1
in 1
mind 1
goes 1
by 1
lost 1
principle 1
once 1
away 2
years 1
beneficial 1
all 1
shackles 1

第三版:

# -*- coding:utf-8 -*-#!python3import repath = 'test.txt'with open(path, 'r', encoding='utf-8') as f:  word_list = []  word_reg = re.compile(r'/w+')  for line in f:    #line_words = word_reg.findall(line)    #比上面的正則更加簡單    line_words = line.split()    word_list.extend(line_words)  word_set = set(word_list) # 避免重復查詢  words_dict = {word: word_list.count(word) for word in word_set}  for k, v in words_dict.items():    print(k, v)

運行結果:

childhood 1
innocence, 1
are 1
of 6
also 1
lost 1
We 1
regardless 1
noisy, 1
by, 1
on 1
themselves. 1
grew 1
lot 1
bottom 1
buckish, 1
time 1
childish 1
voices 1
once 1
restless, 1
shackles 1
world 1
eroded 1
As 1
all 1
day, 1
swarms 1
we 3
soul. 1
memories, 1
in 1
without 1
like 1
beneficial 1
up, 1
unable 1
away 1
flies 1
goes 1
a 1
have 2
away, 1
mind, 1
focus 1
principle, 1
hear 1
to 1
the 7
years 1
busy 1
souls, 1
indulge 1

第四版:使用Counter統計

# -*- coding:utf-8 -*-#!python3import collectionsimport repath = 'test.txt'with open(path, 'r', encoding='utf-8') as f:  word_list = []  word_reg = re.compile(r'/w+')  for line in f:    line_words = line.split()    word_list.extend(line_words)  words_dict = dict(collections.Counter(word_list)) #使用Counter統計  for k, v in words_dict.items():    print(k, v)

運行結果:

We 1
are 1
busy 1
all 1
day, 1
like 1
swarms 1
of 6
flies 1
without 1
souls, 1
noisy, 1
restless, 1
unable 1
to 1
hear 1
the 7
voices 1
soul. 1
As 1
time 1
goes 1
by, 1
childhood 1
away, 1
we 3
grew 1
up, 1
years 1
away 1
a 1
lot 1
memories, 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence, 1
regardless 1
shackles 1
mind, 1
indulge 1
in 1
world 1
buckish, 1
focus 1
on 1
beneficial 1
principle, 1
lost 1
themselves. 1

注:這里使用的測試文本test.txt如下:

We are busy all day, like swarms of flies without souls, noisy, restless, unable to hear the voices of the soul. As time goes by, childhood away, we grew up, years away a lot of memories, once have also eroded the bottom of the childish innocence, we regardless of the shackles of mind, indulge in the world buckish, focus on the beneficial principle, we have lost themselves.

希望本文所述對大家Python程序設計有所幫助。


注:相關教程知識閱讀請移步到python教程頻道。
發表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發表
主站蜘蛛池模板: 汉中市| 盐池县| 祁连县| 琼结县| 兴安盟| 鹤庆县| 江北区| 武夷山市| 海门市| 秦安县| 响水县| 通海县| 长春市| 连城县| 浠水县| 南汇区| 萍乡市| 历史| 龙州县| 漳州市| 军事| 遂昌县| 昂仁县| 尤溪县| 麻栗坡县| 拉萨市| 宝坻区| 五峰| 马鞍山市| 潮州市| 射洪县| 左贡县| 邵武市| 科尔| 沁源县| 丹凤县| 临邑县| 奉贤区| 霍林郭勒市| 彩票| 张北县|