使用Python設計一個代碼統計工具

2020-01-04 15:32:40

字體：大中小

來源：轉載

供稿：網友

問題

設計一個程序，用于統計一個項目中的代碼行數，包括文件個數，代碼行數，注釋行數，空行行數。盡量設計靈活一點可以通過輸入不同參數來統計不同語言的項目，例如：

# type用于指定文件類型python counter.py --type python

輸出：

files:10
code_lines:200
comments:100
blanks:20

分析

這是一個看起來很簡單，但做起來有點復雜的設計題，我們可以把問題化小，只要能正確統計一個文件的代碼行數，那么統計一個目錄也不成問題，其中最復雜的就是關于多行注釋，以 Python 為例，注釋代碼行有如下幾種情況：

1、井號開頭的單行注釋

# 單行注釋

2、多行注釋符在同一行的情況

"""這是多行注釋"""
'''這也是多行注釋'''
3、多行注釋符

"""
這3行都是注釋符
"""

我們的思路采取逐行解析的方式，多行注釋需要一個額外的標識符in_multi_comment 來標識當前行是不是處于多行注釋符當中，默認為 False，多行注釋開始時，置為 True，遇到下一個多行注釋符時置為 False。從多行注釋開始符號直到下一個結束符號之間的代碼都應該屬于注釋行。

知識點

如何正確讀取文件，讀出的文件當字符串處理時，字符串的常用方法

簡化版

我們逐步進行迭代，先實現一個簡化版程序，只統計Python代碼的單文件，而且不考慮多行注釋的情況，這是任何入門 Python 的人都能實現的功能。關鍵地方是把每一行讀出來之后，先用 strip() 方法把字符串兩邊的空格、回車去掉

# -*- coding: utf-8 -*-"""只能統計單行注釋的py文件"""def parse(path): comments = 0 blanks = 0 codes = 0 with open(path, encoding='utf-8') as f: for line in f.readlines():  line = line.strip()  if line == "":  blanks += 1  elif line.startswith("#"):  comments += 1  else:  codes += 1 return {"comments": comments, "blanks": blanks, "codes": codes}if __name__ == '__main__': print(parse("xxx.py"))

多行注釋版

如果只能統計單行注釋的代碼，意義并不大，要解決多行注釋的統計才能算是一個真正的代碼統計器

# -*- coding: utf-8 -*-"""

可以統計包含有多行注釋的py文件

"""def parse(path): in_multi_comment = False # 多行注釋符標識符號 comments = 0 blanks = 0 codes = 0 with open(path, encoding="utf-8") as f: for line in f.readlines():  line = line.strip()  # 多行注釋中的空行當做注釋處理  if line == "" and not in_multi_comment:  blanks += 1  # 注釋有4種  # 1. # 井號開頭的單行注釋  # 2. 多行注釋符在同一行的情況  # 3. 多行注釋符之間的行  elif line.startswith("#") or /    (line.startswith('"""') and line.endswith('"""') and len(line)) > 3 or /   (line.startswith("'''") and line.endswith("'''") and len(line) > 3) or /   (in_multi_comment and not (line.startswith('"""') or line.startswith("'''"))):  comments += 1  # 4. 多行注釋符的開始行和結束行  elif line.startswith('"""') or line.startswith("'''"):  in_multi_comment = not in_multi_comment  comments += 1  else:  codes += 1 return {"comments": comments, "blanks": blanks, "codes": codes}if __name__ == '__main__': print(parse("xxx.py"))

上面的第4種情況，遇到多行注釋符號時，in_multi_comment 標識符進行取反操作是關鍵操作，而不是單純地置為 False 或 True，第一次遇到 """ 時為True，第二次遇到 """ 就是多行注釋的結束符，取反為False，以此類推，第三次又是開始，取反又是True。

那么判斷其它語言是不是要重新寫一個解析函數呢？如果你仔細觀察的話，多行注釋的4種情況可以抽象出4個判斷條件，因為大部分語言都有單行注釋，多行注釋，只是他們的符號不一樣而已。

CONF = {"py": {"start_comment": ['"""', "'''"], "end_comment": ['"""', "'''"], "single": "#"}, "java": {"start_comment": ["/*"], "end_comment": ["*/"], "single": "//"}}start_comment = CONF.get(exstansion).get("start_comment")end_comment = CONF.get(exstansion).get("end_comment")cond2 = Falsecond3 = Falsecond4 = Falsefor index, item in enumerate(start_comment): cond2 = line.startswith(item) and line.endswith(end_comment[index]) and len(line) > len(item) if cond2: breakfor item in end_comment: if line.startswith(item): cond3 = True breakfor item in start_comment+end_comment: if line.startswith(item): cond4 = True breakif line == "" and not in_multi_comment: blanks += 1# 注釋有4種# 1. # 井號開頭的單行注釋# 2. 多行注釋符在同一行的情況# 3. 多行注釋符之間的行elif line.startswith(CONF.get(exstansion).get("single")) or cond2 or / (in_multi_comment and not cond3): comments += 1# 4. 多行注釋符分布在多行時，開始行和結束行elif cond4: in_multi_comment = not in_multi_comment comments += 1else: codes += 1

只需要一個配置常量把所有語言的單行、多行注釋的符號標記出來，對應出 cond1到cond4幾種情況就ok。剩下的任務就是解析多個文件，可以用 os.walk 方法。

def counter(path): """ 可以統計目錄或者某個文件 :param path: :return: """ if os.path.isdir(path): comments, blanks, codes = 0, 0, 0 list_dirs = os.walk(path) for root, dirs, files in list_dirs:  for f in files:  file_path = os.path.join(root, f)  stats = parse(file_path)  comments += stats.get("comments")  blanks += stats.get("blanks")  codes += stats.get("codes") return {"comments": comments, "blanks": blanks, "codes": codes} else: return parse(path)

當然，想要把這個程序做完善，還有很多工作要多，包括命令行解析，根據指定參數只解析某一種語言。

補充：

Python實現代碼行數統計工具

我們經常想要統計項目的代碼行數，但是如果想統計功能比較完善可能就不是那么簡單了，今天我們來看一下如何用python來實現一個代碼行統計工具。

思路：

首先獲取所有文件，然后統計每個文件中代碼的行數，最后將行數相加.

實現的功能：

統計每個文件的行數；
統計總行數；
統計運行時間；
支持指定統計文件類型，排除不想統計的文件類型；
遞歸統計文件夾下包括子文件件下的文件的行數；

排除空行；

# coding=utf-8import osimport timebasedir = '/root/script'filelists = []# 指定想要統計的文件類型whitelist = ['php', 'py']#遍歷文件, 遞歸遍歷文件夾中的所有def getFile(basedir): global filelists for parent,dirnames,filenames in os.walk(basedir):  #for dirname in dirnames:  # getFile(os.path.join(parent,dirname)) #遞歸  for filename in filenames:   ext = filename.split('.')[-1]   #只統計指定的文件類型，略過一些log和cache文件   if ext in whitelist:    filelists.append(os.path.join(parent,filename))#統計一個文件的行數def countLine(fname): count = 0 for file_line in open(fname).xreadlines():  if file_line != '' and file_line != '/n': #過濾掉空行   count += 1 print fname + '----' , count return countif __name__ == '__main__' : startTime = time.clock() getFile(basedir) totalline = 0 for filelist in filelists:  totalline = totalline + countLine(filelist) print 'total lines:',totalline print 'Done! Cost Time: %0.2f second' % (time.clock() - startTime)

結果：

[root@pythontab script]# python countCodeLine.py
/root/script/test/gametest.php---- 16
/root/script/smtp.php---- 284
/root/script/gametest.php---- 16
/root/script/countCodeLine.py---- 33
/root/script/sendmail.php---- 17
/root/script/test/gametest.php---- 16
total lines: 382
Done! Cost Time: 0.00 second
[root@pythontab script]#

只會統計php和python文件，非常方便。

總結

以上所述是小編給大家介紹的使用Python設計一個代碼統計工具，希望對大家有所幫助，如果大家有任何疑問請給我留言，小編會及時回復大家的。在此也非常感謝大家對VEVB武林網網站的支持！

注：相關教程知識閱讀請移步到python教程頻道。

上一篇：用 Python 連接 MySQL 的幾種方式詳解

下一篇：python 列表,數組,矩陣兩兩轉換tolist()的實例