概述
1.3 collections-容器数据类型
主要类型如下:
namedtuple()。创建有名字域的元组子类的工厂函数。python 2.6新增。
deque:双端队列,类似于列表,两端进栈和出栈都比较快速。python 2.4新增。
Counter:字典的子类,用于统计哈希对象。python 2.7新增。
OrderedDict:字典的子类,记录了添加顺序。python 2.7新增。
defaultdict:dict的子类,调用一个工厂函数支持不存在的值。python 2.5新增。
还提供了抽象基类,用来测试类是否提供了特殊接口,比如是哈希或者映射。
1.3.1 Counter
计数器(Counter)是一个容器,用来跟踪值出现了多少次。和其他语言中的bag或multiset类似。
计数器支持三种形式的初始化。构造函数可以调用序列,包含key和计数的字典,或使用关键字参数。
importcollections
printcollections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])
printcollections.Counter({'a':2, 'b':3, 'c':1})
printcollections.Counter(a=2, b=3, c=1)
执行结果:
#./collections_counter_init.py
Counter({'b':3, 'a': 2, 'c': 1})
Counter({'b':3, 'a': 2, 'c': 1})
Counter({'b':3, 'a': 2, 'c': 1})
注意key的出现顺序是根据计数的从大到小。
可以创建空的计数器,再update:
importcollections c =collections.Counter() print'Initial :', c c.update('abcdaab') print'Sequence:', c c.update({'a':1,'d':5}) print'Dict :', c
执行结果:#./collections_counter_update.py Initial: Counter() Sequence:Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1}) Dict : Counter({'d': 6, 'a': 4, 'b': 2, 'c': 1})
访问计数:importcollections c =collections.Counter('abcdaab') for letter in 'abcde': print '%s : %d' % (letter, c[letter])
执行结果:#./collections_counter_get_values.py a :3 b :2 c :1 d :1 e :0
elements可以列出所有元素:impor tcollections c =collections.Counter('extremely') c['z']= 0 printc printlist(c.elements()) 执行结果: #./collections_counter_elements.py Counter({'e':3, 'm': 1, 'l': 1, 'r': 1, 't': 1, 'y': 1, 'x': 1, 'z': 0}) ['e','e', 'e', 'm', 'l', 'r', 't', 'y', 'x']
most_common()可以提取出最常用的。importcollections c =collections.Counter() withopen('/usr/share/dict/words', 'rt') as f: for line in f: c.update(line.rstrip().lower()) print'Most common:' forletter, count in c.most_common(3): print '%s: %7d' % (letter, count)
执行结果:#./collections_counter_most_common.py Mostcommon: e: 484673 i: 382454 a: 37803
0Counter还支持算术和集合运算,它们都只会保留数值为正整数的key。importcollections c1 =collections.Counter(['a', 'b', 'c', 'a', 'b', 'b']) c2 =collections.Counter('alphabet') print'C1:', c1 print'C2:', c2 print'nCombined counts:' printc1 + c2 print'nSubtraction:' printc1 - c2 print'nIntersection (taking positive minimums):' printc1 & c2 print'nUnion (taking maximums):' printc1 | c2 执行结果: #./collections_counter_arithmetic.py C1:Counter({'b': 3, 'a': 2, 'c': 1}) C2:Counter({'a': 2, 'b': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1}) Combinedcounts: Counter({'a':4, 'b': 4, 'c': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1}) Subtraction: Counter({'b':2, 'c': 1}) Intersection(taking positive minimums): Counter({'a':2, 'b': 1}) Union(taking maximums): Counter({'b':3, 'a': 2, 'c': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1}) 上面的例子让人觉得collections只能处理单个字符。其实不是这样的,请看标准库中的实例。 >>>from collections import Counter >>>cnt = Counter() >>>for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']: ... cnt[word] += 1 ... >>>cnt Counter({'blue':3, 'red': 2, 'green': 1}) >>>cnt = Counter(['red', 'blue', 'red', 'green', 'blue', 'blue']) >>>cnt Counter({'blue':3, 'red': 2, 'green': 1}) >>> import re >>> words = re.findall('w+',open('/etc/ssh/sshd_config').read().lower()) >>>Counter(words).most_common(10) [('yes', 27), ('no', 23), ('to', 12),('the', 9), ('for', 8), ('and', 8), ('protocol', 6), ('ssh', 6), ('default',6), ('this', 6)]
第1段代码和第2段的代码效果式样的,后面一段代码通过Counter实现了简单的单词的统计功能。比如面试题:使用python打印出/etc/ssh/sshd_config出现次数最高的10个单词及其出现次数。下面看看Counter的相关定义:classcollections.Counter([iterable-or-mapping]) 。注意Counter是无序的字典。在key不存在的时候返回0.c['sausage'] = 0。设置值为0不会删除元素,要使用delc['sausage']。除了标准的字典方法,额外增加了:elements() :返回一个包含所有元素的迭代器,忽略小于1的计数。most_common([n]):返回最常用的元素及其计数的列表。默认返回所有元素。subtract([iterable-or-mapping]) :相减。>>> c =Counter(a=4, b=2, c=0, d=-2)>>> d =Counter(a=1, b=2, c=3, d=4)>>> c - dCounter({'a': 3})>>> cCounter({'a': 4,'b': 2, 'c': 0, 'd': -2})>>> dCounter({'d': 4,'c': 3, 'b': 2, 'a': 1})>>>c.subtract(d)>>> cCounter({'a': 3,'b': 0, 'c': -3, 'd': -6})>>> dCounter({'d': 4,'c': 3, 'b': 2, 'a': 1})从上面可以看出subtract会对实际的Counter产生作用,负数也会计算在里面。标准的字典方法,fromkeys在Counter中没有实现。Update被重载,实现机制不一样。常用方式:sum(c.values()) # total of all countsc.clear() # reset all countslist(c) # list unique elementsset(c) # convert to a setdict(c) # convert to a regulardictionaryc.items() # convert to a list of(elem, cnt) pairsCounter(dict(list_of_pairs)) # convert from a list of (elem, cnt) pairsc.most_common()[:-n:-1] # n least common elementsc += Counter() # remove zero and negativecounts数学和交集,并集:>>> c= Counter(a=3, b=1)>>> d = Counter(a=1, b=2)>>> c + d # add two counterstogether: c[x] + d[x]Counter({'a': 4, 'b': 3})>>> c - d # subtract (keeping onlypositive counts)Counter({'a': 2})>>> c & d # intersection: min(c[x], d[x])Counter({'a': 1, 'b': 1})>>> c | d # union: max(c[x], d[x])Counter({'a': 3, 'b': 2})关于运算的说明:The Counter class itself is a dictionary subclasswith no restrictions on its keys and values. The values are intended to benumbers representing counts, but you could store anything in the value field.The most_common() method requires only that the values beorderable.For in-place operations such as c[key] += 1, the valuetype need only support addition and subtraction. So fractions, floats, anddecimals would work and negative values are supported. The same is also truefor update() and subtract() which allow negative and zero values for bothinputs and outputs.The multiset methods are designed only for use cases withpositive values. The inputs may be negative or zero, but only outputs withpositive values are created. There are no type restrictions, but the value typeneeds to support addition, subtraction, and comparison.The elements() method requires integer counts. It ignoreszero and negative counts.
转自:http://www.2cto.com/kf/201303/196938.html
最后
以上就是典雅嚓茶为你收集整理的collections Counter的全部内容,希望文章能够帮你解决collections Counter所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复