Python 集合与可变、哈希，集合如何判断元素重复

之前一直说的字典的key,集合的元素要求是不可变对象，其实感觉是不准确的，更准确的说是要求是可哈希的对象。这也可以解释类的实例是可变的，但是可以作为集合的元素或者字典的key。

官方文档说的也是集合是一组哈希值唯一的对象的无序合集。

https://docs.python.org/3.8/library/stdtypes.html#set-types-set-frozenset

A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built-in dict, list, and tuple classes, and the collections module.)

所以集合判断是否重复是通过元素的哈希值判断的，不是通过内存地址。跟字典的key是一样的。

写一段验证代码：

class SetHash():
    def __init__(self,value):
        self.value = value
    def __eq__(self, other):
        return self.value == other.value
    def __hash__(self):
        #用value计算hash
        return hash(self.value)
hash_set=set()

s1=SetHash('hash')
s2=SetHash('hash')
print("s1 is s2:{}".format(s1 is s2))
print("s1 == s2:{}".format(s1 == s2))
print("s1地址:{}".format(id(s1)))
print("s2地址:{}".format(id(s2)))
#只在集合中添加s1
hash_set.add(s1)
#然后判断s2是否在集合中
print("s2在集合中hash_set:{}".format(s2 in hash_set))

输出结果：

s1 is s2:False
s1 == s2:True
s1地址:4372952344
s2地址:4374306544
s2在集合中hash_set:True

明显看出实例s1和s2不是一个对象，因为地址不同，但是只在空集合hash_set中添加了s1，s2也显示在集合hash_set中，因为s1和s2的哈希值相同的，他们的哈希值都是根据self.value计算的。

但如果我们用id(self)重写__hash__方法或者自己不实现__hash__和__eq__方法，类中默认的__hash__方法是根据地址值计算的。那么s2就会不在集合hash_set中。

修改代码：

class SetHash():
    def __init__(self,value):
        self.value = value
    def __eq__(self, other):
        return self.value == other.value
    def __hash__(self):
        #用id计算hash
        return hash(id(self))

输出结果：

s1 is s2:False
s1 == s2:True
s1地址:4374007536
s2地址:4374007368
s2在集合中hash_set:False

wangjinyu124419

发布了115 篇原创文章 · 获赞 34 · 访问量 9万+

私信关注

Python 集合与可变、哈希，集合如何判断元素重复

猜你喜欢