之前一直说的字典的key,集合的元素要求是不可变对象,其实感觉是不准确的,更准确的说是要求是可哈希的对象。这也可以解释类的实例是可变的,但是可以作为集合的元素或者字典的key。
官方文档说的也是集合是一组哈希值唯一的对象的无序合集。
https://docs.python.org/3.8/library/stdtypes.html#set-types-set-frozenset
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built-in
dict
,list
, andtuple
classes, and thecollections
module.)
所以集合判断是否重复是通过元素的哈希值判断的,不是通过内存地址。跟字典的key是一样的。
写一段验证代码:
class SetHash():
def __init__(self,value):
self.value = value
def __eq__(self, other):
return self.value == other.value
def __hash__(self):
#用value计算hash
return hash(self.value)
hash_set=set()
s1=SetHash('hash')
s2=SetHash('hash')
print("s1 is s2:{}".format(s1 is s2))
print("s1 == s2:{}".format(s1 == s2))
print("s1地址:{}".format(id(s1)))
print("s2地址:{}".format(id(s2)))
#只在集合中添加s1
hash_set.add(s1)
#然后判断s2是否在集合中
print("s2在集合中hash_set:{}".format(s2 in hash_set))
输出结果:
s1 is s2:False
s1 == s2:True
s1地址:4372952344
s2地址:4374306544
s2在集合中hash_set:True
明显看出实例s1和s2不是一个对象,因为地址不同,但是只在空集合hash_set中添加了s1,s2也显示在集合hash_set中,因为s1和s2的哈希值相同的,他们的哈希值都是根据self.value计算的。
但如果我们用id(self)重写__hash__方法或者自己不实现__hash__和__eq__方法,类中默认的__hash__方法是根据地址值计算的。那么s2就会不在集合hash_set中。
修改代码:
class SetHash():
def __init__(self,value):
self.value = value
def __eq__(self, other):
return self.value == other.value
def __hash__(self):
#用id计算hash
return hash(id(self))
输出结果:
s1 is s2:False
s1 == s2:True
s1地址:4374007536
s2地址:4374007368
s2在集合中hash_set:False