真实场景中的内存序重排 Memory Reordering Caught in the Act
在涉及到内存屏障的文章中多次提到了内存乱序的后果,本篇文章构造一个真实的场景来复现乱序。场景如下,在多核场景下,两个线程共享了两个变量 X Y,初始值均为0。线程1给X赋值为1,然后读取Y,线程2给Y赋值1,然后读取X。两者并发执行,可能运行在不同的 CPU 核心上。
processor 1 | processor 2 |
---|---|
X = 1 | Y = 1 |
read Y | read X |
直觉上我们可能得到:
- X = 1,Y = 1
- X = 0,Y = 1
- X = 1,Y = 0
- X = 0,Y = 0
<-- impossible
得到全 0 的结果是反直觉的。但是在真实的硬件场景下,这是可能出现的。
#include <semaphore.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
sem_t beginSema1;
sem_t beginSema2;
sem_t endSema;
int X, Y;
int r1, r2;
void *thread1Func(void *param)
{
srandom(1);
for (;;) // Loop indefinitely
{
sem_wait(&beginSema1); // Wait for signal from main thread
while (random() % 8 != 0) {
} // Add a short, random delay
// ----- THE TRANSACTION! -----
X = 1;
asm volatile("" ::: "memory"); // Prevent compiler reordering
/*
#if defined __aarch64__ || defined __arm__
asm volatile("dmb ish" ::: "memory");
#elif defined __x86_64__
asm volatile("mfence" ::: "memory");
#else
#error "unsupported architecture"
#endif
*/
r1 = Y;
sem_post(&endSema); // Notify transaction complete
}
return NULL; // Never returns
};
void *thread2Func(void *param)
{
srandom(1);
for (;;) // Loop indefinitely
{
sem_wait(&beginSema2); // Wait for signal from main thread
while (random() % 8 != 0) {
} // Add a short, random delay
// ----- THE TRANSACTION! -----
Y = 1;
asm volatile("" ::: "memory"); // Prevent compiler reordering
/*
#if defined __aarch64__ || defined __arm__
asm volatile("dmb ish" ::: "memory");
#elif defined __x86_64__
asm volatile("mfence" ::: "memory");
#else
#error "unsupported architecture"
#endif
*/
r2 = X;
sem_post(&endSema); // Notify transaction complete
}
return NULL; // Never returns
};
int main()
{
// Initialize the semaphores
sem_init(&beginSema1, 0, 0);
sem_init(&beginSema2, 0, 0);
sem_init(&endSema, 0, 0);
// Spawn the threads
pthread_t thread1, thread2;
pthread_create(&thread1, NULL, thread1Func, NULL);
pthread_create(&thread2, NULL, thread2Func, NULL);
// Repeat the experiment ad infinitum
int detected = 0;
for (int iterations = 1; ; iterations++)
{
// Reset X and Y
X = 0;
Y = 0;
// Signal both threads
sem_post(&beginSema1);
sem_post(&beginSema2);
// Wait for both threads
sem_wait(&endSema);
sem_wait(&endSema);
// Check if there was a simultaneous reorder
if (r1 == 0 && r2 == 0)
{
detected++;
printf("%d reorders detected after %d iterations\n", detected, iterations);
}
}
return 0; // Never returns
}
asm volatile("" ::: "memory");
是编译器屏障,指示编译器不要将内存操作重排。得到的二进制代码如下:
$ gcc -O2 ordering.c -lpthread
// 以 aarch64 为例
$ aarch64-linux-gnu-gcc -O2 ordering.c -lpthread
40093c: 52800037 mov w23, #0x1 // #1
400940: 97ffff58 bl 4006a0 <srandom@plt>
400944: d503201f nop
400948: aa1603e0 mov x0, x22
40094c: 97ffff59 bl 4006b0 <sem_wait@plt>
400950: 97ffff40 bl 400650 <random@plt>
400954: f240081f tst x0, #0x7
400958: 54ffffc1 b.ne 400950 <thread1Func+0x50> // b.any
40095c: b90002b7 str w23, [x21] <-- store 1 first
400960: b9400261 ldr w1, [x19]
400964: 91026300 add x0, x24, #0x98
400968: b9000281 str w1, [x20]
40096c: 97ffff49 bl 400690 <sem_post@plt>
可见在指令层面满足我们对执行顺序的要求,运行代码发现依然出现了我们不期望的结果。
1 reorders detected after 1964 iterations
2 reorders detected after 57781 iterations
3 reorders detected after 99155 iterations
4 reorders detected after 142275 iterations
5 reorders detected after 201890 iterations
6 reorders detected after 236544 iterations
7 reorders detected after 438035 iterations
...
要避免上述问题,需要 CPU 层面的内存屏障指令来阻止运行时的内存操作重排。实际上这里需要一种 Store Load
类型的屏障指令,关于屏障指令的类型,可以查看博客中 Memory Model
的相关文章。
...
#if defined __aarch64__ || defined __arm__
asm volatile("dmb ish" ::: "memory");
#elif defined __x86_64__
asm volatile("mfence" ::: "memory");
#else
#error "unsupported architecture"
#endif
...
40093c: 52800037 mov w23, #0x1 // #1
400940: 97ffff58 bl 4006a0 <srandom@plt>
400944: d503201f nop
400948: aa1603e0 mov x0, x22
40094c: 97ffff59 bl 4006b0 <sem_wait@plt>
400950: 97ffff40 bl 400650 <random@plt>
400954: f240081f tst x0, #0x7
400958: 54ffffc1 b.ne 400950 <thread1Func+0x50> // b.any
40095c: b90002b7 str w23, [x21] <-- store 1 first
400960: d5033bbf dmb ish <------- memory barrier
400964: b9400261 ldr w1, [x19]
400968: 91026300 add x0, x24, #0x98
40096c: b9000281 str w1, [x20]
400970: 97ffff48 bl 400690 <sem_post@plt>