redis基础数据结构（八）压缩列表

ziplist即压缩链表，规定编码格式，通过编码可以获取内存中的内容，redis中的ziplist总体而言没有什么新技术，注意的一个细节是realloc以后内存地址可能变化，需要谨慎的使用之前保存的指针。总体设计：

<zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>若没有特殊说明，所有字段都用小端存储

zlbytes：一个uint32_t，保存一个ziplist总的内存字节数，包括自己，这个字段的意义是，需要resize的时候，不需要对ziplist进行遍历

zltail：一个uint32_t，保存最后一个entry的偏移，有了这个字段可以支持pop操作而不用每次都遍历整个ziplist

zllen：一个uint16_t，保存entry数量，若超过2^16-1，那么若想获取entry数量，需要遍历整个ziplist，这种情况下这个字段要填全f

zlend：一个uint8_t，标识ziplist结束，值是255，其他entry的第一个字节不允许是255

ziplist的entry：

每个entry有两个前缀，第一个是前一个entry的长度，这样就可以支持从尾开始向前遍历，第二个是entry的编码类型，比如number或者string，若是string的话，填string的有效载荷的长度，结构如下：

有时候entry-data包含在encoding的一些bit中，比如一些小整数，这时候就没有entry-data字段了

<prevlen>的规则是，若上一个entry的长度比254字节小，那么只需要消耗一个字节用来表示一个8bit整数，从0到253；若比254大或者等于254，那么占用5字节，第一个字节填写0xFE表示大于等于254，后面4个字节表示上一个entry的实际长度

<encoding>的规则是，encoding的前2个bit用来决定是number还是string，若是string，那么是00，01，10，若是number，那么是11，具体规则如下：

|00pppppp|：小于等于63字节的字符串，pppppp是实际长度

|01pppppp|qqqqqqqq|：小于等于16383字节的string，注意这里是大端保存的

|11000000|：表示是uint16，后面2个字节是这个整数的值，一共3个字节

|11010000|：表示是uint32，后面4个字节是这个整数的值，一共5个字节

|11100000|：表示是uint64，后面8个字节是这个整数的值，一共9个字节

|11110000|：一个24bit整数，后面3个字节是这个整数的值，一共4个字节

|11111110|：一个8bit整数，后面1个字节是这个整数的值，一共2个字节

|1111xxxx|：一个4bit整数，为了标识符不重复，xxxx只能从0001到1101，即十进制1到13，减1，得到一个0到12的范围，就是这个4bit整数能表示的范围

|11111111|：ziplist结束的标识符

和ziplist的头一样，所有整数都是小端表示，即使是在大端系统中编译的。

一个ziplist存储number的例子：

[0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff]

前4个字节是zlbytes，意思是一共15字节，接下来4个字节是zltail，意思是，最后一个entry的偏移是12，接下来2个字节是entry的个数，即2，第一个entry是00f3，00是上一个entry的长度，因为它是第一个，所以是0，f3是一个4bit整数，3-1=2，即这个整数的值是2，第二个entry是02f6，02是上一个entry的字节数，f6表示4bit整数5，最后是结束标记ff。

一个ziplist存储string的例子：

[02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64]：这是一个entry的内容，02表示上一个entry的长度是12，0b表示这是一个00pppppp结构的string，长度是11字节，后面跟着的11字节是string的ASCII码表示，即hello world

ziplist中提供的宏：

#define ZIP_END 255         /* Special "end of ziplist" entry. */
#define ZIP_BIG_PREVLEN 254 /* Max number of bytes of the previous entry, for
                               the "prevlen" field prefixing each entry, to be
                               represented with just a single byte. Otherwise
                               it is represented as FF AA BB CC DD, where
                               AA BB CC DD are a 4 bytes unsigned integer
                               representing the previous entry len. */

/* Different encoding/length possibilities */
#define ZIP_STR_MASK 0xc0			//string类型掩码
#define ZIP_INT_MASK 0x30			//number类型掩码
#define ZIP_STR_06B (0 << 6)		//6bit string
#define ZIP_STR_14B (1 << 6)		//14bit string
#define ZIP_STR_32B (2 << 6)		//32bit string
#define ZIP_INT_16B (0xc0 | 0<<4)	//16bit int
#define ZIP_INT_32B (0xc0 | 1<<4)	//32bit int
#define ZIP_INT_64B (0xc0 | 2<<4)	//64bit int
#define ZIP_INT_24B (0xc0 | 3<<4)	//24bit int
#define ZIP_INT_8B 0xfe				//8bit int

/* 4 bit integer immediate encoding |1111xxxx| with xxxx between
 * 0001 and 1101. */
#define ZIP_INT_IMM_MASK 0x0f   /* Mask to extract the 4 bits value. To add
                                   one is needed to reconstruct the value. */
#define ZIP_INT_IMM_MIN 0xf1    /* 11110001 */
#define ZIP_INT_IMM_MAX 0xfd    /* 11111101 */

#define INT24_MAX 0x7fffff
#define INT24_MIN (-INT24_MAX - 1)	//0x800000

/* Macro to determine if the entry is a string. String entries never start
 * with "11" as most significant bits of the first byte. */
#define ZIP_IS_STR(enc) (((enc) & ZIP_STR_MASK) < ZIP_STR_MASK)

/* Utility macros.*/

/* Return total bytes a ziplist is composed of. */
#define ZIPLIST_BYTES(zl)       (*((uint32_t*)(zl)))

/* Return the offset of the last item inside the ziplist. */
#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl) + sizeof(uint32_t))))

/* Return the length of a ziplist, or UINT16_MAX if the length cannot be
 * determined without scanning the whole ziplist. */
#define ZIPLIST_LENGTH(zl)      (*((uint16_t*)((zl) + sizeof(uint32_t) * 2)))

/* The size of a ziplist header: two 32 bit integers for the total
 * bytes count and last item offset. One 16 bit integer for the number
 * of items field. */
#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t) * 2 + sizeof(uint16_t))

/* Size of the "end of ziplist" entry. Just one byte. */
#define ZIPLIST_END_SIZE        (sizeof(uint8_t))

/* Return the pointer to the first entry of a ziplist. */
#define ZIPLIST_ENTRY_HEAD(zl)  ((zl) + ZIPLIST_HEADER_SIZE)

/* Return the pointer to the last entry of a ziplist, using the
 * last entry offset inside the ziplist header. */
#define ZIPLIST_ENTRY_TAIL(zl)  ((zl) + intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))

/* Return the pointer to the last byte of a ziplist, which is, the
 * end of ziplist FF entry. */
#define ZIPLIST_ENTRY_END(zl)   ((zl) + intrev32ifbe(ZIPLIST_BYTES(zl)) - 1)

/* Increment the number of items field in the ziplist header. Note that this
 * macro should never overflow the unsigned 16 bit integer, since entires are
 * always pushed one at a time. When UINT16_MAX is reached we want the count
 * to stay there to signal that a full scan is needed to get the number of
 * items inside the ziplist. */
#define ZIPLIST_INCR_LENGTH(zl, incr) { \
    if (ZIPLIST_LENGTH(zl) < UINT16_MAX) \
        ZIPLIST_LENGTH(zl) = intrev16ifbe(intrev16ifbe(ZIPLIST_LENGTH(zl)) + incr); \
}

ziplist.c中提供的结构体：

typedef struct zlentry {
    unsigned int prevrawlensize; /* Bytes used to encode the previos entry len*/
    unsigned int prevrawlen;     /* Previous entry len. */
    unsigned int lensize;        /* Bytes used to encode this entry type/len.
                                    For example strings have a 1, 2 or 5 bytes
                                    header. Integers always use a single byte.*/
    unsigned int len;            /* Bytes used to represent the actual entry.
                                    For strings this is just the string length
                                    while for integers it is 1, 2, 3, 4, 8 or
                                    0 (for 4 bit immediate) depending on the
                                    number range. */
    unsigned int headersize;     /* prevrawlensize + lensize. */
    unsigned char encoding;      /* Set to ZIP_STR_* or ZIP_INT_* depending on
                                    the entry encoding. However for 4 bits
                                    immediate integers this can assume a range
                                    of values and must be range-checked. */
    unsigned char *p;            /* Pointer to the very start of the entry, that
                                    is, this points to prev-entry-len field. */
} zlentry;

ziplist.c中提供的zpi：

ZIPLIST_ENTRY_ZERO：给一个zlentry结构清零

ZIP_ENTRY_ENCODING：获取一个entry的编码方式，是第一个字节，是number的话正好，若是string那么获取的不是全部的encoding

zipIntSize：根据一个整数的encoding返回<entry-data>部分的长度

zipStoreEntryEncoding：构建一个entry的encoding字段

ZIP_DECODE_LENGTH：从一个entry的encoding字段中解码出其<entry-data>部分的长度

zipStorePrevEntryLengthLarge：构建一个entry的prevlen字段，只有当上一个entry的长度大于等于254字节才能调用此函数

zipStorePrevEntryLength：构建一个entry的<prevlen>字段的长度，包括1字节和5字节两种情况

ZIP_DECODE_PREVLENSIZE：从一个entry中解码出<prevlen>字段的长度

ZIP_DECODE_PREVLEN：从一个entry的<prevlen>字段中解码出上一个entry的长度

zipPrevLenByteDiff：从一个entry当前内存中解码出上一个entry的长度，根据提供的当前长度，计算差值

zipRawEntryLength：从一个entry中获取这个entry的总长度，包括<prevlen><encoding><entry-data>三部分

zipTryEncoding：若一个entry可以被解码为整数，获取它的encoding和值

zipSaveInteger：构建一个整数entry的<entry-data>部分

zipLoadInteger：获取一个整数entry保存的这个整数的值

zipEntry：将一个entry的字段解析成zlentry结构体

ziplistNew：初始化一个ziplist

ziplistResize：重建一个ziplist的大小，注意这个函数会给end赋值

__ziplistCascadeUpdate：处理由于某一个entry变化导致后面所有entry的<prevlen>字段需要更新的情况，注意意图是不想让ziplist萎缩，所以即使某个entry的长度下降到254字节以下，后面的entry仍然使用5字节的<prevlen>

__ziplistDelete：删除从某个entry开始的连续若干个entry

__ziplistInsert：插入一个entry，内部函数，调用者提供<entry-data>及其长度，要插入的位置

ziplistMerge：合并两个ziplist，不支持合并自己，需要注意这个函数会导致其中较短的ziplist的内存被释放

ziplistPush：在头或者尾插入一个entry

ziplistIndex：获取第index个entry，前面编号从0开始，后面编号从-1开始

ziplistNext：返回当前entry的下一个entry

ziplistPrev：返回当前entry的上一个entry

ziplistGet：获取一个entry在<entry-data>中存储的内容

ziplistInsert：是__ziplistInsert的包装，插入一个entry

ziplistDelete：是__ziplistDelete的包装，删除一个entry

ziplistDeleteRange：从指定index开始，删除若干entry

ziplistCompare：比较一个entry中的<entry-data>中保存的值是否和给定的值相等

ziplistFind：在一个ziplist中寻找一个特定的entry，支持每次找不到的话跳过若干个entry

ziplistLen：计算一个ziplist中的entry个数

ziplistBlobLen：返回一个ziplist的总长度

ziplistRepr：此函数用于完整的打印一个ziplist的所有entry的内容

redis基础数据结构（八） 压缩列表

猜你喜欢

redis基础数据结构（八）压缩列表