Python讀csv文件去掉一列后再寫入新的文件實(shí)例

2020-02-16 11:21:57

字體：大中小

來源：轉(zhuǎn)載

供稿：網(wǎng)友

用了兩種方式解決該問題，都是網(wǎng)上現(xiàn)有的解決方案。

場景說明：

有一個(gè)數(shù)據(jù)文件，以文本方式保存，現(xiàn)在有三列user_id,plan_id,mobile_id。目標(biāo)是得到新文件只有mobile_id,plan_id。

解決方案

方案一：用python的打開文件寫文件的方式直接擼一遍數(shù)據(jù)，for循環(huán)內(nèi)處理數(shù)據(jù)并寫入到新文件。

代碼如下：

def readwrite1( input_file,output_file): f = open(input_file, 'r') out = open(output_file,'w') print (f) for line in f.readlines(): a = line.split(",") x=a[0] + "," + a[1]+"/n" out.writelines(x) f.close() out.close()

方案二：用 pandas 讀數(shù)據(jù)到 DataFrame 再做數(shù)據(jù)分割，直接用 DataFrame 的寫入功能寫到新文件

代碼如下：

def readwrite2(input_file,output_file): date_1=pd.read_csv(input_file,header=0,sep=',') date_1[['mobile', 'plan_id']].to_csv(output_file, sep=',', header=True,index=False)

從代碼上看，pandas邏輯更清晰。

下面看下執(zhí)行的效率吧！

def getRunTimes( fun ,input_file,output_file): begin_time=int(round(time.time() * 1000)) fun(input_file,output_file) end_time=int(round(time.time() * 1000)) print("讀寫運(yùn)行時(shí)間：",(end_time-begin_time),"ms")getRunTimes(readwrite1,input_file,output_file) #直接擼數(shù)據(jù)getRunTimes(readwrite2,input_file,output_file1) #使用dataframe讀寫數(shù)據(jù)

讀寫運(yùn)行時(shí)間： 976 ms

讀寫運(yùn)行時(shí)間： 777 ms

input_file 大概有27萬的數(shù)據(jù)，dataframe的效率比for循環(huán)效率還是要快一點(diǎn)的，如果數(shù)據(jù)量更大些，效果是否更明顯呢？

下面試下增加input_file記錄的數(shù)量試試，有如下結(jié)果