本文實例講述了Python統計純文本文件中英文單詞出現個數的方法。分享給大家供大家參考,具體如下:
第一版: 效率低
# -*- coding:utf-8 -*-#!python3path = 'test.txt'with open(path,encoding='utf-8',newline='') as f:  word = []  words_dict= {}  for letter in f.read():    if letter.isalnum():      word.append(letter)    elif letter.isspace(): #空白字符 空格 /t /n      if word:        word = ''.join(word).lower() #轉小寫        if word not in words_dict:          words_dict[word] = 1        else:          words_dict[word] += 1        word = []#處理最后一個單詞if word:  word = ''.join(word).lower() # 轉小寫  if word not in words_dict:    words_dict[word] = 1  else:    words_dict[word] += 1  word = []for k,v in words_dict.items():  print(k,v)運行結果:
we 4
are 1
busy 1
all 1
day 1
like 1
swarms 1
of 6
flies 1
without 1
souls 1
noisy 1
restless 1
unable 1
to 1
hear 1
the 7
voices 1
soul 1
as 1
time 1
goes 1
by 1
childhood 1
away 2
grew 1
up 1
years 1
a 1
lot 1
memories 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence 1
regardless 1
shackles 1
mind 1
indulge 1
in 1
world 1
buckish 1
focus 1
on 1
beneficial 1
principle 1
lost 1
themselves 1
第二版:
缺點:遇到大文件要一次讀入內存,性能不好
# -*- coding:utf-8 -*-#!python3import repath = 'test.txt'with open(path,'r',encoding='utf-8') as f:  data = f.read()  word_reg = re.compile(r'/w+')  #word_reg = re.compile(r'/w+/b')  word_list = word_reg.findall(data)  word_list = [word.lower() for word in word_list] #轉小寫  word_set = set(word_list) #避免重復查詢  # words_dict = {}  # for word in word_set:  #   words_dict[word] = word_list.count(word)  # 簡潔寫法  words_dict = {word: word_list.count(word) for word in word_set}  for k,v in words_dict.items():    print(k,v)運行結果:
on 1
also 1
souls 1
focus 1
soul 1
time 1
noisy 1
grew 1
lot 1
childish 1
like 1
voices 1
indulge 1
swarms 1
buckish 1
restless 1
we 4
hear 1
childhood 1
as 1
world 1
themselves 1
are 1
bottom 1
memories 1
the 7
of 6
flies 1
without 1
have 2
day 1
busy 1
to 1
eroded 1
regardless 1
unable 1
innocence 1
up 1
a 1
in 1
mind 1
goes 1
by 1
lost 1
principle 1
once 1
away 2
years 1
beneficial 1
all 1
shackles 1
第三版:
# -*- coding:utf-8 -*-#!python3import repath = 'test.txt'with open(path, 'r', encoding='utf-8') as f:  word_list = []  word_reg = re.compile(r'/w+')  for line in f:    #line_words = word_reg.findall(line)    #比上面的正則更加簡單    line_words = line.split()    word_list.extend(line_words)  word_set = set(word_list) # 避免重復查詢  words_dict = {word: word_list.count(word) for word in word_set}  for k, v in words_dict.items():    print(k, v)運行結果:
childhood 1
innocence, 1
are 1
of 6
also 1
lost 1
We 1
regardless 1
noisy, 1
by, 1
on 1
themselves. 1
grew 1
lot 1
bottom 1
buckish, 1
time 1
childish 1
voices 1
once 1
restless, 1
shackles 1
world 1
eroded 1
As 1
all 1
day, 1
swarms 1
we 3
soul. 1
memories, 1
in 1
without 1
like 1
beneficial 1
up, 1
unable 1
away 1
flies 1
goes 1
a 1
have 2
away, 1
mind, 1
focus 1
principle, 1
hear 1
to 1
the 7
years 1
busy 1
souls, 1
indulge 1
	第四版:使用Counter統計
# -*- coding:utf-8 -*-#!python3import collectionsimport repath = 'test.txt'with open(path, 'r', encoding='utf-8') as f: word_list = [] word_reg = re.compile(r'/w+') for line in f: line_words = line.split() word_list.extend(line_words) words_dict = dict(collections.Counter(word_list)) #使用Counter統計 for k, v in words_dict.items(): print(k, v)
運行結果:
We 1
are 1
busy 1
all 1
day, 1
like 1
swarms 1
of 6
flies 1
without 1
souls, 1
noisy, 1
restless, 1
unable 1
to 1
hear 1
the 7
voices 1
soul. 1
As 1
time 1
goes 1
by, 1
childhood 1
away, 1
we 3
grew 1
up, 1
years 1
away 1
a 1
lot 1
memories, 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence, 1
regardless 1
shackles 1
mind, 1
indulge 1
in 1
world 1
buckish, 1
focus 1
on 1
beneficial 1
principle, 1
lost 1
themselves. 1
注:這里使用的測試文本test.txt如下:
We are busy all day, like swarms of flies without souls, noisy, restless, unable to hear the voices of the soul. As time goes by, childhood away, we grew up, years away a lot of memories, once have also eroded the bottom of the childish innocence, we regardless of the shackles of mind, indulge in the world buckish, focus on the beneficial principle, we have lost themselves.
希望本文所述對大家Python程序設計有所幫助。
新聞熱點
疑難解答