Python中collections模塊的基本使用教程

2020-01-04 13:54:15

字體：大中小

供稿：網(wǎng)友

前言

之前認(rèn)識(shí)了python基本的數(shù)據(jù)類型和數(shù)據(jù)結(jié)構(gòu)，現(xiàn)在認(rèn)識(shí)一個(gè)高級(jí)的：Collections，一個(gè)模塊主要用來(lái)干嘛，有哪些類可以使用，看__init__.py就知道

'''This module implements specialized container datatypes providing
alternatives to Python's general purpose built-in containers, dict,
list, set, and tuple.

* namedtuple   factory function for creating tuple subclasses with named fields
* deque        list-like container with fast appends and pops on either end
* ChainMap     dict-like class for creating a single view of multiple mappings
* Counter      dict subclass for counting hashable objects
* OrderedDict dict subclass that remembers the order entries were added
* defaultdict dict subclass that calls a factory function to supply missing values
* UserDict     wrapper around dictionary objects for easier dict subclassing
* UserList     wrapper around list objects for easier list subclassing
* UserString   wrapper around string objects for easier string subclassing

'''

__all__ = ['deque', 'defaultdict', 'namedtuple', 'UserDict', 'UserList',
            'UserString', 'Counter', 'OrderedDict', 'ChainMap']

collections模塊實(shí)現(xiàn)一些特定的數(shù)據(jù)類型，可以替代Python中常用的內(nèi)置數(shù)據(jù)類型如dict, list, set, tuple，簡(jiǎn)單說(shuō)就是對(duì)基本數(shù)據(jù)類型做了更上一層的處理。

一、deque

用途：雙端隊(duì)列，頭部和尾部都能以O(shè)(1)時(shí)間復(fù)雜度插入和刪除元素。類似于列表的容器

所謂雙端隊(duì)列，就是兩端都能操作，與Python內(nèi)置的list區(qū)別在于：頭部插入與刪除的時(shí)間復(fù)雜度為O(1)，來(lái)個(gè)栗子感受一下：

#!/usr/bin/env python# -*- coding:utf-8 -*-# __author__ = 'liao gao xiang'"""保留最后n個(gè)元素"""from collections import dequedef search(file, pattern, history=5): previous_lines = deque(maxlen=history) for l in file: if pattern in l:  yield l, previous_lines # 使用yield表達(dá)式的生成器函數(shù)，將搜索過(guò)程的代碼和搜索結(jié)果的代碼解耦 previous_lines.append(l)with open(b'file.txt', mode='r', encoding='utf-8') as f: for line, prevlines in search(f, 'Python', 5): for pline in prevlines:  print(pline, end='') print(line, end='')d = deque()d.append(1)d.append("2")print(len(d))print(d[0], d[1])d.extendleft([0])print(d)d.extend([6, 7, 8])print(d)d2 = deque('12345')print(len(d2))d2.popleft()print(d2)d2.pop()print(d2)# 在隊(duì)列兩端插入或刪除元素時(shí)間復(fù)雜度都是 O(1) ，區(qū)別于列表，在列表的開(kāi)頭插入或刪除元素的時(shí)間復(fù)雜度為 O(N)d3 = deque(maxlen=2)d3.append(1)d3.append(2)print(d3)d3.append(3)print(d3)

輸出結(jié)果如下

人生苦短
我用Python
2
1 2
deque([0, 1, '2'])
deque([0, 1, '2', 6, 7, 8])
5
deque(['2', '3', '4', '5'])
deque(['2', '3', '4'])
deque([1, 2], maxlen=2)
deque([2, 3], maxlen=2)

因此，如果你遇到經(jīng)常操作列表頭的場(chǎng)景，使用deque最好。deque類的所有方法，自行操作一遍就知道了。

class deque(object): """ deque([iterable[, maxlen]]) --> deque object  A list-like sequence optimized for data accesses near its endpoints. """ def append(self, *args, **kwargs): # real signature unknown """ Add an element to the right side of the deque. """ pass def appendleft(self, *args, **kwargs): # real signature unknown """ Add an element to the left side of the deque. """ pass def clear(self, *args, **kwargs): # real signature unknown """ Remove all elements from the deque. """ pass def copy(self, *args, **kwargs): # real signature unknown """ Return a shallow copy of a deque. """ pass def count(self, value): # real signature unknown; restored from __doc__ """ D.count(value) -> integer -- return number of occurrences of value """ return 0 def extend(self, *args, **kwargs): # real signature unknown """ Extend the right side of the deque with elements from the iterable """ pass def extendleft(self, *args, **kwargs): # real signature unknown """ Extend the left side of the deque with elements from the iterable """ pass def index(self, value, start=None, stop=None): # real signature unknown; restored from __doc__ """ D.index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present. """ return 0 def insert(self, index, p_object): # real signature unknown; restored from __doc__ """ D.insert(index, object) -- insert object before index """ pass def pop(self, *args, **kwargs): # real signature unknown """ Remove and return the rightmost element. """ pass def popleft(self, *args, **kwargs): # real signature unknown """ Remove and return the leftmost element. """ pass def remove(self, value): # real signature unknown; restored from __doc__ """ D.remove(value) -- remove first occurrence of value. """ pass def reverse(self): # real signature unknown; restored from __doc__ """ D.reverse() -- reverse *IN PLACE* """ pass def rotate(self, *args, **kwargs): # real signature unknown """ Rotate the deque n steps to the right (default n=1). If n is negative, rotates left. """ pass

這里提示一下，有些函數(shù)對(duì)隊(duì)列進(jìn)行操作，但返回值是None，比如reverse()反轉(zhuǎn)隊(duì)列，rotate(1)將隊(duì)列中元素向右移1位，尾部的元素移到頭部。

二、defaultdict

用途：帶有默認(rèn)值的字典。父類為Python內(nèi)置的dict

字典帶默認(rèn)值有啥好處？舉個(gè)栗子，一般來(lái)講，創(chuàng)建一個(gè)多值映射字典是很簡(jiǎn)單的。但是，如果你選擇自己實(shí)現(xiàn)的話，那么對(duì)于值的初始化可能會(huì)有點(diǎn)麻煩，你可能會(huì)像下面這樣來(lái)實(shí)現(xiàn)：

d = {}for key, value in pairs: if key not in d: d[key] = [] d[key].append(value)

如果使用 defaultdict 的話代碼就更加簡(jiǎn)潔了：

d = defaultdict(list)for key, value in pairs: d[key].append(value)

defaultdict 的一個(gè)特征是它會(huì)自動(dòng)初始化每個(gè) key 剛開(kāi)始對(duì)應(yīng)的值，所以你只需要關(guān)注添加元素操作了。比如：

#!/usr/bin/env python# -*- coding:utf-8 -*-# __author__ = 'liao gao xiang'# 字典中的鍵映射多個(gè)值from collections import defaultdictd = defaultdict(list)print(d)d['a'].append([1, 2, 3])d['b'].append(2)d['c'].append(3)print(d)d = defaultdict(set)print(d)d['a'].add(1)d['a'].add(2)d['b'].add(4)print(d)

輸出結(jié)果如下：

defaultdict(<class 'list'>, {})
defaultdict(<class 'list'>, {'a': [[1, 2, 3]], 'b': [2], 'c': [3]})
defaultdict(<class 'set'>, {})
defaultdict(<class 'set'>, {'a': {1, 2}, 'b': {4}})

三、namedtuple()

用途：創(chuàng)建命名字段的元組。工廠函數(shù)

namedtuple主要用來(lái)產(chǎn)生可以使用名稱來(lái)訪問(wèn)元素的數(shù)據(jù)對(duì)象，通常用來(lái)增強(qiáng)代碼的可讀性，在訪問(wèn)一些tuple類型的數(shù)據(jù)時(shí)尤其好用。

比如我們用戶擁有一個(gè)這樣的數(shù)據(jù)結(jié)構(gòu)，每一個(gè)對(duì)象是擁有三個(gè)元素的tuple。使用namedtuple方法就可以方便的通過(guò)tuple來(lái)生成可讀性更高也更好用的數(shù)據(jù)結(jié)構(gòu)。

from collections import namedtuplewebsites = [ ('Sohu', 'http://www.sohu.com/', u'張朝陽(yáng)'), ('Sina', 'http://www.sina.com.cn/', u'王志東'), ('163', 'http://www.163.com/', u'丁磊')]Website = namedtuple('Website', ['name', 'url', 'founder'])for website in websites: website = Website._make(website) print website# 輸出結(jié)果:Website(name='Sohu', url='http://www.sohu.com/', founder=u'/u5f20/u671d/u9633')Website(name='Sina', url='http://www.sina.com.cn/', founder=u'/u738b/u5fd7/u4e1c')Website(name='163', url='http://www.163.com/', founder=u'/u4e01/u78ca')

注意，namedtuple是函數(shù)，不是類。

四、Counter

用途：統(tǒng)計(jì)可哈希的對(duì)象。父類為Python內(nèi)置的dict

尋找序列中出現(xiàn)次數(shù)最多的元素。假設(shè)你有一個(gè)單詞列表并且想找出哪個(gè)單詞出現(xiàn)頻率最高：

#!/usr/bin/env python# -*- coding:utf-8 -*-# __author__ = 'liao gao xiang'from collections import Counterwords = [ 'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes', 'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the', 'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into', 'my', 'eyes', "you're", 'under']word_counts = Counter(words)# 出現(xiàn)頻率最高的三個(gè)單詞top_three = word_counts.most_common(3)print(top_three)# Outputs [('eyes', 8), ('the', 5), ('look', 4)]print(word_counts['eyes'])morewords = ['why', 'are', 'you', 'not', 'looking', 'in', 'my', 'eyes']# 如果你想手動(dòng)增加計(jì)數(shù)，可以簡(jiǎn)單的用加法：for word in morewords: print(word) word_counts[word] += 1print(word_counts['eyes'])

結(jié)果如下：

[('eyes', 8), ('the', 5), ('look', 4)]
8
why
are
you
not
looking
in
my
eyes
9

因?yàn)镃ounter繼承自dict，所有dict有的方法它都有（defaultdict和OrderedDict也是的），Counter自己實(shí)現(xiàn)或重寫(xiě)了6個(gè)方法：

most_common(self, n=None),
elements(self)
fromkeys(cls, iterable, v=None)
update(*args, **kwds)
subtract(*args, **kwds)
copy(self)

五、OrderedDict

用途：排序的字段。父類為Python內(nèi)置的dict

OrderedDict在迭代操作的時(shí)候會(huì)保持元素被插入時(shí)的順序，OrderedDict內(nèi)部維護(hù)著一個(gè)根據(jù)鍵插入順序排序的雙向鏈表。每次當(dāng)一個(gè)新的元素插入進(jìn)來(lái)的時(shí)候，它會(huì)被放到鏈表的尾部。對(duì)于一個(gè)已經(jīng)存在的鍵的重復(fù)賦值不會(huì)改變鍵的順序。

需要注意的是，一個(gè)OrderedDict的大小是一個(gè)普通字典的兩倍，因?yàn)樗鼉?nèi)部維護(hù)著另外一個(gè)鏈表。所以如果你要構(gòu)建一個(gè)需要大量OrderedDict 實(shí)例的數(shù)據(jù)結(jié)構(gòu)的時(shí)候(比如讀取100,000行CSV數(shù)據(jù)到一個(gè) OrderedDict 列表中去)，那么你就得仔細(xì)權(quán)衡一下是否使用 OrderedDict帶來(lái)的好處要大過(guò)額外內(nèi)存消耗的影響。

#!/usr/bin/env python# -*- coding:utf-8 -*-# __author__ = 'liao gao xiang'from collections import OrderedDictd = OrderedDict()d['foo'] = 1d['bar'] = 2d['spam'] = 3d['grok'] = 4# d['bar'] = 22 #對(duì)于一個(gè)已經(jīng)存在的鍵，重復(fù)賦值不會(huì)改變鍵的順序for key in d: print(key, d[key])print(d)import jsonprint(json.dumps(d))

結(jié)果如下：

foo 1
bar 2
spam 3
grok 4
OrderedDict([('foo', 1), ('bar', 2), ('spam', 3), ('grok', 4)])
{"foo": 1, "bar": 2, "spam": 3, "grok": 4}

OrderDict實(shí)現(xiàn)或重寫(xiě)了如下方法。都是干嘛的？這個(gè)留給大家當(dāng)課后作業(yè)了^_^

clear(self)
popitem(self, last=True)
move_to_end(self, key, last=True)
keys(self)
items(self)
values(self)
pop(self, key, default=__marker)
setdefault(self, key, default=None)
copy(self)
fromkeys(cls, iterable, value=None)

六、ChainMap

用途：創(chuàng)建多個(gè)可迭代對(duì)象的集合。類字典類型

很簡(jiǎn)單，如下：

#!/usr/bin/env python# -*- coding:utf-8 -*-# __author__ = 'liao gao xiang'from collections import ChainMapfrom itertools import chain# 不同集合上元素的迭代a = [1, 2, 3, 4]b = ('x', 'y', 'z')c = {1, 'a'}# 方法一，使用chainfor i in chain(a, b, c): print(i)print('--------------')# 方法二，使用chainmapfor j in ChainMap(a, b, c): print(j)# 這兩種均為節(jié)省內(nèi)存，效率更高的迭代方式

一個(gè) ChainMap 接受多個(gè)字典并將它們?cè)谶壿嬌献優(yōu)橐粋€(gè)字典。然后，這些字典并不是真的合并在一起了，ChainMap 類只是在內(nèi)部創(chuàng)建了一個(gè)容納這些字典的列表并重新定義了一些常見(jiàn)的字典操作來(lái)遍歷這個(gè)列表。大部分字典操作都是可以正常使用的，比如：

#!/usr/bin/env python# -*- coding:utf-8 -*-# __author__ = 'liao gao xiang'# 合并多個(gè)字典和映射a = {'x': 1, 'z': 3}b = {'y': 2, 'z': 4}# 現(xiàn)在假設(shè)你必須在兩個(gè)字典中執(zhí)行查找操作# (比如先從 a 中找，如果找不到再在 b 中找)。# 一個(gè)非常簡(jiǎn)單的解決方案就是使用collections模塊中的ChainMap類from collections import ChainMapc = ChainMap(a, b)print(c)a['x'] = 11 # 使用ChainMap時(shí)，原字典做了更新，這種更新會(huì)合并到新的字典中去print(c) # 按順序合并兩個(gè)字典print(c['x'])print(c['y'])print(c['z'])# 對(duì)于字典的更新或刪除操作影響的總是列中的第一個(gè)字典。c['z'] = 10c['w'] = 40del c['x']print(a)# del c['y']將出現(xiàn)報(bào)錯(cuò)# ChainMap對(duì)于編程語(yǔ)言中的作用范圍變量（比如globals,locals等）# 是非常有用的。事實(shí)上，有一些方法可以使它變得簡(jiǎn)單：values = ChainMap() # 默認(rèn)會(huì)創(chuàng)建一個(gè)空字典print('/t', values)values['x'] = 1values = values.new_child() # 添加一個(gè)空字典values['x'] = 2values = values.new_child()values['x'] = 30# values = values.new_child()print(values, values['x']) # values['x']輸出最后一次添加的值values = values.parents # 刪除上一次添加的字典print(values['x'])values = values.parentsprint(values)a = {'x': 1, 'y': 2}b = {'y': 2, 'z': 3}merge = dict(b)merge.update(a)print(merge['x'], merge['y'], merge['z'])a['x'] = 11print(merge['x'])

輸出結(jié)果如下：

ChainMap({'x': 1, 'z': 3}, {'y': 2, 'z': 4})
ChainMap({'x': 11, 'z': 3}, {'y': 2, 'z': 4})
11
2
3
{'z': 10, 'w': 40}
ChainMap({})
ChainMap({'x': 30}, {'x': 2}, {'x': 1}) 30
2
ChainMap({'x': 1})
1 2 3
1

作為ChainMap的替代，你可能會(huì)考慮使用 update() 方法將兩個(gè)字典合并。這樣也能行得通，但是它需要你創(chuàng)建一個(gè)完全不同的字典對(duì)象(或者是破壞現(xiàn)有字典結(jié)構(gòu))。同時(shí)，如果原字典做了更新，這種改變不會(huì)反應(yīng)到新的合并字典中去。

ChainMap實(shí)現(xiàn)或重寫(xiě)了如下方法：

get(self, key, default=None)
fromkeys(cls, iterable, *args)
copy(self)
new_child(self, m=None)
parents(self)
popitem(self)
pop(self, key, *args)
clear(self)

七、UserDict、UserList、UserString

這三個(gè)類是分別對(duì) dict、list、str 三種數(shù)據(jù)類型的包裝，其主要是為方便用戶實(shí)現(xiàn)自己的數(shù)據(jù)類型。在 Python2 之前，這三個(gè)類分別位于 UserDict、UserList、UserString 三個(gè)模塊中，需要用類似于 from UserDict import UserDict 的方式導(dǎo)入。在 Python3 之后則被挪到了 collections 模塊中。這三個(gè)類都是基類，如果用戶要擴(kuò)展這三種類型，只需繼承這三個(gè)類即可。

總結(jié)

以上就是這篇文章的全部?jī)?nèi)容了，希望本文的內(nèi)容對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值，如果有疑問(wèn)大家可以留言交流，謝謝大家對(duì)VEVB武林網(wǎng)的支持。

注：相關(guān)教程知識(shí)閱讀請(qǐng)移步到python教程頻道。

上一篇：解決python 未發(fā)現(xiàn)數(shù)據(jù)源名稱并且未指定默認(rèn)驅(qū)動(dòng)程序的問(wèn)題

下一篇：對(duì)python 操作solr索引數(shù)據(jù)的實(shí)例詳解