《Linux内核学习笔记》--- 第二章内存管理 2.5 slab分配器

伙伴系统用于分配内存时是以page为单位的，在实际中有很多内存需求是以Byte为单位的，那么如果我们需要分配以Byte为单位的小内存块时，该如何分配呢？

slab分配器就是用来解决小内存块分配问题的，也是内存分配中非常重要的角色之一。

slab分配器最终还是由伙伴系统来分配出实际的物理页面，只不过slab分配器在这些连续的物理页面上实现了自己的算法，以此来对小内存块进行管理。

slab分配器提供如下接口来创建、释放slab描述符和分配缓存对象。

创建 slab 描述符
struct kmem_cache * kmem_cache_create(const char *name, size_t size, size_t align, unsigned long flags, void (*ctor)(void *))

释放slab描述符
void kmem_cache_destroy(struct kmem_cache *s)

分配缓存对象
void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags)

释放缓存对象
void kmem_cache_free(struct kmem_cache *, void *)

kmem_cache_create()函数中有如下参数。
❑ name: slab描述符的名称。
❑ size：缓存对象的大小。
❑ align：缓存对象需要对齐的字节数。
❑ flags：分配掩码。
❑ ctor：对象的构造函数。

在Intel显卡驱动中就大量使用kmem_cache_create()来创建自己的slab描述符。

//driver/gpu/drm/i915/i915_gem.c

void i915_gem_load(struct drm_device *dev)
{
	dev_priv->slab = kmem_cache_create("i915_gem_object", sizeof(struct drm_i915_gem_object), 0, SLAB_HWCACHE_ALIGN, NULL);
}

void *i915_gem_object_alloc(struct drm_device *dev)
{
	// 分配缓存对象
	return kmem_cache_zalloc(dev_priv->slab, GFP_KERNEL);
}

另外一个大量使用slab机制的是kmallc()函数接口。
kmem_cache_create()函数用于创建自己的缓存描述符，
kmalloc()函数用于创建通用的缓存，类似于用户空间中C标准库malloc()函数。

下面来看一个例子，在ARM Vexpress平台上创建名为“figo_object”的slab描述符，大小为20Byte, align为8Byte, flags为0，
假设L1 Cacheline大小为16Byte，
我们可以编写一个简单的内核模块来实现上述需求。

static struct kmem_cache *fcache;
static void *buf;

//举例：创建名为 “figo_object” 的slab描述符，大小为20Byte，8字节Byte
static int __int fcache_init(void){
	fcache = kmem_cache_create("figo_object", 20, 8, 0, NULL);
	if(!fcache){
		kmem_cache_destroy(fcache);
		return -ENOMEM;
	}
	buf = kmem_cache_zalloc(fcache, GFP_KERNEL);
	return 0;
}

static void __exit fcache_exit(void){
	kmem_cache_free(fcache, buf);
	kmem_cache_destroy(fcache);
}

2.5.1 创建slab描述符

struct kmem_cache数据结构是slab分配器中的核心数据结构，我们把它称为slab描述符。
每个slab描述符都由一个struct kmem_cache数据结构来抽象描述。

struct kmem_cache数据结构定义如下：

struct kmem_cache {
	struct array_cache __percpu *cpu_cache;
	// 一个Per-CPU的struct array_cache数据结构，每个CPU一个，表示本地CPU的对象缓冲池。

/* 1) Cache tunables. Protected by slab_mutex */
	unsigned int batchcount;  
	// 表示当前CPU的本地对象缓冲池array_cache为空时，从共享的缓冲池或者slabs_partial/slabs_free列表中获取对象的数目。
	
	unsigned int limit;	
	//当本地对象缓冲池的空闲对象数目大于limit时就会主动释放batchcount个对象，便于内核回收和销毁slab。
	
	unsigned int shared; // 用于多核系统。

	unsigned int size;	// 对象的长度，这个长度要加上align对齐字节。
	struct reciprocal_value reciprocal_buffer_size;
/* 2) touched by every alloc & free from the backend */

	unsigned int flags;		/* constant flags */  // 对象的分配掩码。
	unsigned int num;		/* # of objs per slab */ // 一个slab中最多可以有多少个对象

/* 3) cache_grow/shrink */
	/* order of pgs per slab (2^n) */
	unsigned int gfporder; //一个slab中占用2^gfporder个页面

	/* force GFP flags, e.g. GFP_DMA */
	gfp_t allocflags;

	size_t colour;			/* cache colouring range */  //一个slab中有几个不同的cache line
	unsigned int colour_off;	/* colour offset */ // 一个cache colour的长度，和L1 cache line大小相同。
	struct kmem_cache *freelist_cache;  // 
	unsigned int freelist_size;// 每个对象要占用1Byte来存放freelist。

	/* constructor func */
	void (*ctor)(void *obj);

/* 4) cache creation/removal */
	const char *name; //slab描述符的名称
	struct list_head list;
	int refcount;
	int object_size; //对象的实际大小
	int align;		//对齐的长度

/* 5) statistics */
#ifdef CONFIG_DEBUG_SLAB
	unsigned long num_active;
	unsigned long num_allocations;
	unsigned long high_mark;
	unsigned long grown;
	unsigned long reaped;
	unsigned long errors;
	unsigned long max_freeable;
	unsigned long node_allocs;
	unsigned long node_frees;
	unsigned long node_overflow;
	atomic_t allochit;
	atomic_t allocmiss;
	atomic_t freehit;
	atomic_t freemiss;

	/*
	 * If debugging is enabled, then the allocator can add additional
	 * fields and/or padding to every object. size contains the total
	 * object size including these internal fields, the following two
	 * variables contain the offset to the user object and its size.
	 */
	int obj_offset;
#endif /* CONFIG_DEBUG_SLAB */
#ifdef CONFIG_MEMCG_KMEM
	struct memcg_cache_params *memcg_params;
#endif

	struct kmem_cache_node *node[MAX_NUMNODES];  
	//slab节点，在NUMA系统中每个节点有一个structkmem_cache_node数据结构。
};

struct array_cache数据结构定义如下：

struct array_cache{
	unsigned int avail;	// 对象缓存池中可用的对象数目。
	unsigned int limit;  // 当本地对象缓冲池的空闲对象数目大于limit时就会主动释放batchcount个对象，便于内核回收和销毁slab。
	unsigned int batchcount;	
	// 表示当前CPU的本地对象缓冲池array_cache为空时，从共享的缓冲池或者slabs_partial/slabs_free列表中获取对象的数目。
		
	unsigned int touched; 	// 从缓冲池移除一个对象时，将touched置1，而收缩缓存时，将touched置0.
	void *entry[];	// 保存对象的实体。
};

kmem_cache_create()函数的实现是在slab_common.c文件中

/*
 * kmem_cache_create - Create a cache.
 * @name: A string which is used in /proc/slabinfo to identify this cache.
 * @size: The size of objects to be created in this cache.
 * @align: The required alignment for the objects.
 * @flags: SLAB flags
 * @ctor: A constructor for the objects.
 *
 * Returns a ptr to the cache on success, NULL on failure.
 * Cannot be called within a interrupt, but can be interrupted.
 * The @ctor is run when new pages are allocated by the cache.
 *
 * The flags are
 *
 * %SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5)
 * to catch references to uninitialised memory.
 *
 * %SLAB_RED_ZONE - Insert `Red' zones around the allocated memory to check
 * for buffer overruns.
 *
 * %SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware
 * cacheline.  This can be beneficial if you're counting cycles as closely
 * as davem.
 */
struct kmem_cache * kmem_cache_create(const char *name, size_t size, size_t align, unsigned long flags, void (*ctor)(void *))
{
	struct kmem_cache *s;
	char *cache_name;
	
	get_online_cpus();
	get_online_mems();
	err = kmem_cache_sanity_check(name, size);
	/*
	 * Some allocators will constraint the set of valid flags to a subset
	 * of all flags. We expect them to define CACHE_CREATE_MASK in this
	 * case, and we'll just provide them with a sanitized version of the
	 * passed flags. */
	flags &= CACHE_CREATE_MASK;
	s = __kmem_cache_alias(name, size, align, flags, ctor);

	cache_name = kstrdup(name, GFP_KERNEL);
	s = do_kmem_cache_create(cache_name, size, size,
				 calculate_alignment(flags, align, size), flags, ctor, NULL, NULL);
	return s;
}
EXPORT_SYMBOL(kmem_cache_create);

首先通过__kmem_cache_alias()函数查找是否有现成的slab描述符可以复用，
若没有，就通过do_kmem_cache_create()来创建一个新的slab描述符。

static struct kmem_cache * do_kmem_cache_create(char *name, size_t object_size, size_t size, size_t align,
		     unsigned long flags, void (*ctor)(void *), struct mem_cgroup *memcg, struct kmem_cache *root_cache)
{
	struct kmem_cache *s;

	s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);

	s->name = name;
	s->object_size = object_size;
	s->size = size;
	s->align = align;
	s->ctor = ctor;

	err = memcg_alloc_cache_params(memcg, s, root_cache);
	err = __kmem_cache_create(s, flags);
	s->refcount = 1;

	return s;
}

do_kmem_cache_create()函数首先分配一个struct kmem_cache数据结构。

回到do_kmem_cache_create()函数中，分配好struct kmem_cache数据结构后把name、size、align等值填入struct kmem_cache相关成员中，然后调用__kmem_cache_create()来创建slab缓冲区，最后把这个新创建的slab描述符都加入全局链表slab_caches中。

calculate_slab_order()函数会计算一个slab需要多少个物理页面，同时也计算slab中可以容纳多少个对象。

下图所示，一个slab 由 2^gfporder个连续物理页面组成，包含了num个slab对象、着色区和freelist区。
在这里插入图片描述

calculate_slab_order()函数调用cache_estimate()来计算在2^gfporder个页面大小的情况下，可以容纳多少个obj对象，然后剩下的空间用于cache colour着色。

在__do_tune_cpucache()函数中，
首先通过alloc_kmem_cache_cpus()函数来分配Per-CPU类型的struct array_cache数据结构，我们称之为对象缓冲池。

对象缓冲池中包含了一个Per-CPU类型的struct array_cache指针，即系统每个CPU有一个struct array_cache指针。
当前CPU的array_cache称为本地对象缓冲池，另外还有一个概念为共享对象缓冲池。

通过alloc_kmem_cache_cpus()函数来分配对象缓冲池，注意这里计算size时考虑到对象缓冲池的最大阈值limit，参数entries是指最大阈值limit

回到__do_tune_cpucache()函数，刚分配的对象缓冲池cpu_cache会被设置为slab描述符的本地对象缓冲池。调用alloc_kmem_cache_node()来继续初始化slab缓冲区cachep->kmem_cache_node数据结构。

Jaimex8

发布了349 篇原创文章 · 获赞 74 · 访问量 12万+

私信关注

《Linux内核 学习笔记》--- 第二章 内存管理 2.5 slab分配器

2.5.1 创建slab描述符

猜你喜欢

《Linux内核学习笔记》--- 第二章内存管理 2.5 slab分配器