谈谈C语言中的序列点(sequence point)和副作用(side effects)

网上关于序列点的介绍很多,参考几篇,做个总结。在C99标准文件5.1.2.3讲到了序列点问题,序列点的定义是一个程序执行中的点,这个点的特殊性在于,在这个点之前语句产生的所有副作用都将生效,而后面语句的副作用还没有发生。标准规定,在两个序列点之间,一个对象所保存的值最多只能被修改一次。在这一点,所有的事都是肯定的,而在序列点间,不能肯定某一个变量的值已经稳定,所以总体说来C语言的序列点只是用来说明这一点的值是肯定的。如何理解呢?先讲一下什么是副作用。

     一个表达式有一个值,而在写出这个表达式的时候可能只是想要取得这个表达式的值。但有些表达式会有副作用。而有些表达式没有副作用,有时候我们正是要利用表达式的副作用来工作。比如:
     int a = 10;
     int b = a;     /* a这个表达式在这里没有副作用,这里只是想要取得 */
             /* a这个变量的值10,而b = a这个表达式有副作用,它的 */
             /* 副作用是使b的值改变成a的值。 */
     这就是所谓的一个表达式的副作用。正是因为有了副作用,很多功能才得以完成。有些表达式既会产生一个值,也会产生副作用。如i++这个表达式既会产生一个值(它是i自增以前的值),也会产生副作用。
      在一个序列点之间,连续两次改变,并且访问该变量,会带来问题,比如经典的:
int i = 1;
a = i++;
在一个序列点之间,改变了i的值,并且访问了i的值,它的作用是什么呢?是a[1] = 1;还是a[2] = 2呢?不确定,这种代码没有价值,并且老板肯定不会赏识你写出这么精简的代码,你会被开除的。再比如更经典的:
int i = 1;
printf("%d, %d, %d\n", i++, i++, i++);
i = 1;
printf("%d\n", i++ + i++ + i++);
i = 1;
printf("%d\n", ++i + ++i + ++i);
     很多大学的C语言老师都会讲解这个问题,包括我的老师,在讲的时候笔者就没有弄明白,
     其实,这是一个不值得讲解的问题,这是在跟编译器较劲,不同的编译器可能会得出不同的结果(但是平常的编译器可能会得出相同的结果,让程序员私下总结错误的经验。),这种根据不同的实现而得出不同的结果的代码没什么用。i++ + i++ + i++只是一个表达式,在这个表达式的内多次访问了变量i,结果不确定。并且这又会引发另外一个有趣的问题,可能有人会认为在这条语句执行完成以后i自加了3次,那i肯定是4?这也不确定,可能很多编译器做得确实是4,但是,在C标准中有这样一条:当一个表达式的值取决于编译器实现而不是C语言标准的时候,其中所做的任何处理都会不确定。即,如果有一个编译器在i++ + i++ + i++这个表达式中只读取一次i的值,并且一直记住这个值,那么算第一个i++,因为i的值是1所以算出后i的值为2,再算第二个因为假设的是只读取一次i的值,那此时i的值还是1并且被加到2(因为没有经过序列点,所以i的值不能肯定为2),于是经过三次从1加到2的过程以后,最后i的值是2而不是期望的4,呵呵。其实这要看编译器如何实现了,不过既然得看编译器如何实现,那这种代码也得被炒鱿鱼。
     

1. chinaunix上找了一段非常通俗的描述,讲的很好。

C语言中,只包含一个表达式的语句,如
x = (i++) * 2;
称为“表达式语句”。表达式语句结尾的";"是C标准定义的顺序点之一,但这不等同于说所有的";"都是顺序点,也不是说顺序点只有这一种。下面就是标准中定义的顺序点:
(1)函数调用时,实参表内全部参数求值结束,函数的第一条指令执行之前(注意参数分隔符“,”不是顺序点);
(2)&&操作符的左操作数结尾处;
(3)||操作符的左操作数结尾处;
(4)?:操作符的第一个操作数的结尾处;
(5)逗号运算符;
(6)表达式求值的结束点,具体包括下列几类:自动对象的初值计算结束处;表达式语句末尾的分号处; do/while/if/switch/for语句的控制条件的右括号处;for语句控制条件中的两个分号处;return语句返回值计算结束(末尾的分号)处。 
定义顺序点是为了尽量消除编译器解释表达式时的歧义,如果顺序点还是不能解决某些歧义,那么标准允许编译器的实现自由选择解释方式。理解顺序点还是要从定义它的目的来下手。
再举一个例子:
y = x++, x+1;
已知这个语句执行前x=2,问y的值是多少?
逗号运算符是顺序点。那么该表达式的值就是确定的,是4,因为按照顺序点的定义,在对x+1求值前,顺序点","前的表达式——x++求值的全部副作用都已发生完毕,计算x+1时x=3。这个例子中顺序点成功地消除了歧义。
注意这个歧义是怎样消除的。因为中间的顺序点使“相邻顺序点间对象的值只更改一次”的条件得到满足。
y = (x++) * (x++), 执行前x=2, y=?
答案是,因为这个表达式本身不包含顺序点,顺序点未能消除歧义,编译器生成的代码使y取4, 6(以及更多的一些可能值)都是符合标准定义的,程序员自己应为这个不符合顺序点定义的表达式造成的后果负责。
我对我自己的表达能力欠佳表示抱歉,但我的确不准备对这个问题再做更多的解释。我愿意引用《Expert C Programming》中的一段话,来给自己找一个下台阶:
However, the problem with standards manuals is that they only make sense if you already know what they mean. If people write them in English, the more precise they try to be, the longer, duller and more obscure they become. If they write them using mathematical notation to define the language, the manuals become inaccessible to too many people.
自然语言本身的不精确,往往容易造成越解释越不清楚的现象,而精确的数学语言,又已经超过包括我在内的大多数人的理解和应用能力。

序列点是程序执行序列中一些特殊的点。 当有序列点存在时,序列点前面的表达式必须求值完毕,并且副作用也已经发生, 才会计算序列点后面的表达式和其副作用。

2. 什么是副作用?举例子来说明。

int a = 5;
int b = a ++;

在给b赋值的语句中,表达式a++就有副作用,它返回a当前的值5后,要对a进行加1的操作。

哪些符号会生成序列点呢?

","会生成序列点。

","用于把多条语句拼接成一条语句。 例如:

int b = 5;
++ b;

可由","拼接成

int b = 5, ++b;

因为","会产生序列点,所以","左边的表达式必须先求值,如果有副作用,副作用也会生效。然后才会继续处理","右边的表达式。

&&||会产生序列点

逻辑与 && 和逻辑或 || 会产生序列点。

因为&&支持短路操作,必须先将&&左边的表达式计算完毕,如果结果为false,则不必再计算&&右边的表达式,直接返回false

||&&类似。

?:中的"?"会产生序列点

三元操作符 ?:中的"?"会产生序列点。 如:

int a = 5;
int b = a++ > 5? 0 : a;

b的结果是什么?因为"?"处有序列点,其左边的表达式必须先求值完毕。 a++ > 5在和5比较时,a并没有自增,所以表达式求值为false。 因为"?"处的序列点,其左边表达式的副作用也要立即生效,即a自增1,变为6。 因为"?"左边的表达式求值为false,所以三元操作符?:返回:右边的值a。 此时a的值是6,所以b的值是6。

既然序列点这么重要,那现在就得讲讲一些重要的序列点了,这些重要的序列点要程序员自己平时总结。
     1). 一个重要的序列点在完整表达式的结尾,所谓完整表达式就是指不是一个更大的表达式的子表达式的表达式,仔细理解。
     int i = 1;
     i++;     /* i++是一个完整表达式 */
     i++ + 1; /* i++就不是一个完整的表达式,因为它是i++ + 1这个完整表达式的一部分 */
     具体的完整表达式的种类,可以查阅相关资料,C99的标准文档是一个不错的选择。
     2). 逗号表达式。逗号表达式会严格的按照顺序来执行并且在被逗号分隔开的表达式之间有一个序列点,所以,前一个逗号表达式如果是i++,则后面的表达式可以肯定现在的值是原来的值加1(如果有溢出则另当别论)。如:
     int i = 1;
     i++, i++, i++;
     printf("%d\n", i);
现在的i肯定是4;
3). &&和||运算符。有一种短路算法来解决除法中的除0情况。如下
int a = 10;
int b = 0;
if (b && a/b) 
{ /* some code here */ }
其中在求b的值的时候会短路,即,a/b不会执行。因为b的值为0,这样
可以放心的使用除法了。这两个运算符在使用的时候都可以当成一个序列点,如果前一个表达式的值已经可以认定这整个表达式的值为真或者为假,则后面的表达式没有必要再求值,是多余的。即如上面的a/b是多余的,不能求值,求值也会出错。它们之间的求值顺序是肯定的。
4). 条件运算符? : 。在问号的地方也存在一个序列点,也没什么可讲。反正就是问号前后可以访问和改变同一个变量,并且这种访问是安全的。
     最后,在一个表达式内的求值顺序没有固定顺序,还有一个表现是,如下:
     funa() + funb() + func();
     C语言标准没有规定这三个函数谁会先执行,如果对顺序有要求,可以用临时变量来缓解。

序列点之间的执行顺序

奇怪的C代码中给出的例子。

	int i = 3;
	int ans = (++i)+(++i)+(++i);

(++i)+(++i)+(++i)之间并没有序列点,它们的执行顺序如何呢? gcc编译后,先执行两个++i,把它们相加后,再计算第三个++i, 再相加。而Microsoft VC++编译后,先执行三个++i,再相加。 两者得到的结果不同,谁对谁错呢?

谁也没有错。C标准规定:两个序列点之间的执行顺序是任意的。 当然这个任意是在不违背操作符优先级和结合特性的前提下的。 这个规定的意义是为编译器的优化留下空间。

知道这个规定,我们就应该避免在一行代码中重复出现被递增的同一个变量, 因为编译器的行为不可预测。 试想如果(++i)+(++i)+(++i)换成(++a)+(++b)+(++c)(其中abc是不同的变量), 不管++a++b++c的求值顺序谁先谁后,结果都会是一致的。

3.  MISRA-C:2004这样告诫用户:

Rule 12.2 (required): The value of an expression shall be the same under any order of evaluation that the standard permits. [Unspecified 7–9; Undefined 18]
Apart from a few operators (notably the function call operator (), &&, ||, ?: and , (comma)) the order in which sub-expressions are evaluated is unspecified and can vary. This means that no reliance can be placed on the order of evaluation of sub-expressions, and in particular no reliance can be placed on the order in which side effects occur. Those points in the evaluation of an expression at which all previous side effects can be guaranteed to have taken place are called “sequence points”. Sequence points and side effects are described in sections 5.1.2.3, 6.3 and 6.6 of ISO 9899:1990 [2].
Note that the order of evaluation problem is not solved by the use of parentheses, as this is not a precedence issue.
The following notes give some guidance on how dependence on order of evaluation may occur, and therefore may assist in adopting the rule.
increment or decrement operators
As an example of what can go wrong, consider
x = b[i] + i++;
This will give different results depending on whether b[i] is evaluated before i++ or vice versa. The problem could be avoided by putting the increment operation in a separate statement. The example would then become:
x = b[i] + i;
i++;
function arguments
The order of evaluation of function arguments is unspecified.
x = func( i++, i );
This will give different results depending on which of the function’s two parameters is evaluated first. l function pointers
If a function is called via a function pointer there shall be no dependence on the order in which function designator and function arguments are evaluated.
p->task_start_fn(p++);
function calls
Functions may have additional effects when they are called (e.g. modifying some global data). Dependence on order of evaluation could be avoided by invoking the function prior to the expression that uses it, making use of a temporary variable for the value.
For example
x = f(a) + g(a);
could be written as
x = f(a);
x += g(a);
As an example of what can go wrong, consider an expression to get two values off a stack, subtract the second from the first, and push the result back on the stack:
push( pop() - pop() );
This will give different results depending on which of the pop() function calls is evaluated first (because pop() has side effects).
l nested assignment statements
Assignments nested within expressions cause additional side effects. The best way to avoid any chance of this leading to a dependence on order of evaluation is to not embed assignments within expressions.
For example, the following is not recommended:
x = y = y = z / 3 ;
x = y = y++;
l accessing a volatile
The volatile type qualifier is provided in C to denote objects whose value can change independently of the execution of the program (for example an input register). If an object of volatile qualified type is accessed this may change its value. C compilers will not optimise out reads of a volatile. In addition, as far as a C program is concerned, a read of a volatile has a side effect (changing the value of the volatile). It will usually be necessary to access volatile data as part of an expression, which then means there may be dependence on order of evaluation. Where possible though it is recommended that volatiles only be accessed in simple assignment statements, such as the following:
volatile uint16_t v;
x = v;
The rule addresses the order of evaluation problem with side effects. Note that there may also be an issue with the number of times a sub-expression is evaluated, which is not covered by this rule. This can be a problem with function invocations where the function is implemented as a macro. For example, consider the following function-like macro and its invocation:
#define MAX(a, b) ( ((a) > (b)) ? (a) : (b) )
z = MAX( i++, j );
The definition evaluates the first parameter twice if a > b but only once if a ² b. The macro invocation may thus increment i either once or twice, depending on the values of i and j. It should be noted that magnitude-dependent effects, such as those due to floating-point rounding, are also not addressed by this rule. Although the order in which side-effects occur is undefined, the result of an operation is otherwise well-defined and is controlled by the structure of the expression. In the following example, f1 and f2 are floating-point variables; F3, F4 and F5 denote expressions with floating-point types.
f1 = F3 + (F4 + F5);
f2 = (F3 + F4) + F5;
The addition operations are, or at least appear to be, performed in the order determined by the position of the parentheses, i.e. first F4 is added to F5 then secondly F3 is added to give the value of f1. Provided that F3, F4 and F5 contain no side-effects, their values are independent of the order in which they are evaluated. However, the values assigned to f1 and f2 are not guaranteed to be the same because floating-point rounding following the addition operations will depend on the values being added.
3. gcc本身对于这种违反序列点的表达式努力的给出了warning,使用-Wsequence-point, -Wall会给出这个警告。
-Wsequence-point
Warn about code that may have undefined semantics because of violations of sequence point rules in the C standard. The C standard defines the order in which expressions in a C program are evaluated in terms of sequence points, which represent a partial ordering between the execution of parts of the program: those executed before the sequence point, and those executed after it. These occur after the evaluation of a full expression_r(one which is not part of a larger expression), after the evaluation of the first operand of a &&, ||, ? : or , (comma) operator, before a function is called (but after the evaluation of its arguments and the expression denoting the called function), and in certain other places. Other than as expressed by the sequence point rules, the order of evaluation of subexpressions of an expression is not specified. All these rules describe only a partial order rather than a total order, since, for example, if two functions are called within one expression with no sequence point between them, the order in which the functions are called is not specified. However, the standards committee have ruled that function calls do not overlap. It is not specified when between sequence points modifications to the values of objects take effect. Programs whose behavior depends on this have undefined behavior; the C standard specifies that “Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.”. If a program breaks these rules, the results on any particular implementation are entirely unpredictable.
Examples of code with undefined behavior are a = a++;, a[n] = b[n++] and a[i++] = i;. Some more complicated cases are not diagnosed by this option, and it may give an occasional false positive result, but in general it has been found fairly effective at detecting this sort of problem in programs. The present implementation of this option only works for C programs. A future implementation may also work for C++ programs. The C standard is worded confusingly, therefore there is some debate over the precise meaning of the sequence point rules in subtle cases. Links to discussions of the problem, including proposed formal definitions, may be found on the GCC readings page, at http://gcc.gnu.org/readings.html
4. gcc是这样来实现这个check的:
   Walk the tree X, and record accesses to variables.  If X is written by the parent tree, WRITER is the parent. We store accesses in one of the two lists: PBEFORE_SP, and PNO_SP.  If this  expression or its only operand forces a sequence point, then everything up to the sequence point is stored in PBEFORE_SP.  Everything else gets stored in PNO_SP.
  Once we return, we will have emitted warnings if any subexpression before such a sequence point could be undefined.  On a higher level, however, the sequence point may not be relevant, and we'll merge the two lists.
Example: (b++, a) + b;
   The call that processes the COMPOUND_EXPR will store the increment of B in PBEFORE_SP, and the use of A in PNO_SP.  The higher-level call that processes the PLUS_EXPR will need to merge the two lists so that eventually, all accesses end up on the same list (and we'll warn about the unordered subexpressions b++ and b.
A note on merging.  If we modify the former example so that our expression becomes
     (b++, b) + a
care must be taken not simply to add all three expressions into the final PNO_SP list.  The function merge_tlist takes care of that by merging the before-SP list of the COMPOUND_EXPR into its after-SP list in a special way, so that no more than one access to B is recorded.
5. 但是gcc对于这个warning做的有4个问题:
(1) 对于结构体元素不能给出warning (s->a++ = s->a + 5;), 原因在于它没有把s->a看成一个整体的元素,而是分解开来做的,不能识别出s->a 是一次read,而s->a++是一次writer
(2)将a[i]分解来看,所以可以check“a[i] + i++”,但是对于“a[i]++ + a[i]”无能为力。
(3)对于return语句没有verify_sequence_points
(4)对于alias(例如 p = q; *p++ = q++;)无法处理,因为前段只是简单的语法树分析,还做不到这一点。
我只处理了(1)和(3),(2)本身就是矛盾的,除非check 两次,所以我保留了原来的做法。(4)对于前段的check是无能为力的。

但是这个选项-Wsequence-point 只对C语言起作用,还没弄清楚,为什么g++对这个不做check。


参考:http://www.cnblogs.com/jiqingwu/p/c_sequence_point.html

参考:http://blog.sina.com.cn/s/blog_6591eb240100q01k.html

参考:http://tieba.baidu.com/p/673634549

参考:http://www.2cto.com/kf/201210/161225.html


猜你喜欢

转载自blog.csdn.net/tsroad/article/details/49834261