Redis哈希对象的ziplist编码实现了O(1)复杂度吗

问题描述

问题：Redis中哈希对象有两种编码方式，分别是ziplist、hashtable方式。哈希对象，总得体现哈希算法，使得基本操作达到O(1)的效率。hashtable编码方式使用字典，也即是Java中hashMap的方式，这个我可以理解。但是，ziplist方式所有元素都是紧挨的，它是怎么实现hash，并使得查询等操作有O(1)的时间效率的呢？

分析：

让我们从方法调用开始分析。我们都知道，获取哈希对象中某个元素的命令是“HGET”，当哈希对象的编码方式是ziplist时，它的执行过程如下：

首先调用ziplistFind函数，在压缩列表中查找指定键对应的节点。
然后调用ziplistNext函数，将指针移动到键节点旁边的值节点，最后返回值节点。

从HGET命令在ziplist编码下的执行过程可以看出，问题的关键在ziplistFind方法中。
ziplistFind方法在Redis源码的src/ziplist.c中：

/* 源码注释来自黄键宏先生的github Redis源码注释版本仓库：https://github.com/huangz1990/redis-3.0-annotated
 * Find pointer to the entry equal to the specified entry. 
 * 
 * 寻找节点值和 vstr 相等的列表节点，并返回该节点的指针。
 * 
 * Skip 'skip' entries between every comparison. 
 *
 * 每次比对之前都跳过 skip 个节点。
 *
 * Returns NULL when the field could not be found. 
 *
 * 如果找不到相应的节点，则返回 NULL 。
 *
 * T = O(N^2)
 */
unsigned char *ziplistFind(unsigned char *p, unsigned char *vstr, unsigned int vlen, unsigned int skip) {
    int skipcnt = 0;
    unsigned char vencoding = 0;
    long long vll = 0;

    // 只要未到达列表末端，就一直迭代
    // T = O(N^2)
    while (p[0] != ZIP_END) {
        unsigned int prevlensize, encoding, lensize, len;
        unsigned char *q;

        ZIP_DECODE_PREVLENSIZE(p, prevlensize);
        ZIP_DECODE_LENGTH(p + prevlensize, encoding, lensize, len);
        q = p + prevlensize + lensize;

        if (skipcnt == 0) {

            /* Compare current entry with specified entry */
            // 对比字符串值
            // T = O(N)
            if (ZIP_IS_STR(encoding)) {
                if (len == vlen && memcmp(q, vstr, vlen) == 0) {
                    return p;
                }
            } else {
                /* Find out if the searched field can be encoded. Note that
                 * we do it only the first time, once done vencoding is set
                 * to non-zero and vll is set to the integer value. */
                // 因为传入值有可能被编码了，
                // 所以当第一次进行值对比时，程序会对传入值进行解码
                // 这个解码操作只会进行一次
                if (vencoding == 0) {
                    if (!zipTryEncoding(vstr, vlen, &vll, &vencoding)) {
                        /* If the entry can't be encoded we set it to
                         * UCHAR_MAX so that we don't retry again the next
                         * time. */
                        vencoding = UCHAR_MAX;
                    }
                    /* Must be non-zero by now */
                    assert(vencoding);
                }

                /* Compare current entry with specified entry, do it only
                 * if vencoding != UCHAR_MAX because if there is no encoding
                 * possible for the field it can't be a valid integer. */
                // 对比整数值
                if (vencoding != UCHAR_MAX) {
                    // T = O(1)
                    long long ll = zipLoadInteger(q, encoding);
                    if (ll == vll) {
                        return p;
                    }
                }
            }

            /* Reset skip count */
            skipcnt = skip;
        } else {
            /* Skip entry */
            skipcnt--;
        }

        /* Move to next entry */
        // 后移指针，指向后置节点
        p = q + len;
    }

    // 没有找到指定的节点
    return NULL;
}

从黄建宏先生的源码注释（源码看不懂:) ）可以看出，该方法通过遍历列表实现查找，花费O(N²)时间。由此可以得到我们的结论：

结论：

使用ziplist编码方式的哈希对象不能实现O(1)复杂度的基本操作，而是通过遍历来查找元素。不过，ziplist编码方式只有在所有键值长度小于64字节，且哈希对象保存的键值对数量小于512个时才使用，因此，它对效率的影响并没有想象中的那么大。

补充与拓展

当Redis哈希对象使用另一种编码——hashtable编码时它保证了基本操作O(1)复杂度吗？
是的，它提供了。hashtable编码方式使用字典（类似于Java中的HashMap）作为底层数据结构，当进行查找时，调用的是dictFind方法，它的源码如下：

/*
 * 返回字典中包含键 key 的节点
 *
 * 找到返回节点，找不到返回 NULL
 *
 * T = O(1)
 */
dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    unsigned int h, idx, table;

    // 字典（的哈希表）为空
    if (d->ht[0].size == 0) return NULL; /* We don't have a table at all */

    // 如果条件允许的话，进行单步 rehash
    if (dictIsRehashing(d)) _dictRehashStep(d);

    // 计算键的哈希值
    h = dictHashKey(d, key);
    // 在字典的哈希表中查找这个键
    // T = O(1)
    for (table = 0; table <= 1; table++) {

        // 计算索引值
        idx = h & d->ht[table].sizemask;

        // 遍历给定索引上的链表的所有节点，查找 key
        he = d->ht[table].table[idx];
        // T = O(1)
        while(he) {

            if (dictCompareKeys(d, key, he->key))
                return he;

            he = he->next;
        }

        // 如果程序遍历完 0 号哈希表，仍然没找到指定的键的节点
        // 那么程序会检查字典是否在进行 rehash ，
        // 然后才决定是直接返回 NULL ，还是继续查找 1 号哈希表
        if (!dictIsRehashing(d)) return NULL;
    }

    // 进行到这里时，说明两个哈希表都没找到
    return NULL;
}

可以看到，它实现了O(1)的查找复杂度。

Redis哈希对象的ziplist编码实现了O(1)复杂度吗

问题描述

分析：

结论：

补充与拓展

猜你喜欢