C++ Explained: Object initialization and assignment, lvalues and rvalues, copy and move semantics and the copy-and-swap idiom
February 27, 2013Leave a commentGo to comments
3 Votes
In this article I’m going to try and clear up some fundamental C++ topics which are a common source of confusion for beginning and intermediate C++ programmers, and programmers coming to C++ from other languages such as C# or Java:
- the difference between initialization and assignment, which uses of the “=” operator trigger initialization and which trigger assignment, and how to implement them correctly in your own classes
- the meaning of lvalues, rvalues and rvalue references and how to spot which is which in your programs
- an introduction to C++11 move semantics
- when copy and move constructors and assignment operators are called in your classes, and how to implement them
- reducing the amount of duplicated and error-prone code in constructor and assignment operator implementations by using the so-called copy-and-swap idiom
I won’t be dealing with other initialization or assignment topics here (eg. of in-built or POD (plain-old-data) types, and type conversions), only the initialization and assignment of class objects. It is assumed you understand the following concepts (but not necessarily how to implement them):
- constructors and copy constructors
- assignment operator overloading (&operator=)
In the examples below, I will show how to manipulate a class containing a single resource in the form of a std::unique_ptr, which is a single-ownership smart pointer which comes as part of the C++11 standard library, however you can replace this with any resource type which shouldn’t be copied in a bit-wise (shallow copy) manner, eg. any raw pointer or resource handle whose targets should only be freed once no matter how many pointers are held to the resource across object instances, or whose target should be copied if the object instance is copied.
Defining the class
Our class will receive a pointer to a string in its constructor, allocate memory to make a copy of this string, copy it and store a pointer to the copy in a std::unique_ptr. This will be our resource.
The final goal will be to copy the string resource into a new allocated block of memory when an instance of the class is copied (so that when the original object is destructed, the string in the copied object is preserved), and to allow ownership of the resource to be transferred to a different object when the original object is “moved” (I will explain C++11 move semantics below).
A good reason to use std::unique_ptr instead of a raw pointer is that only one object can own (control memory management of) the resource at a time, which prevents accidental copying of the pointer, leading to the possibility that the pointer is freed (delete
‘d) more than once when multiple objects pointing to the same resource are destructed. The copy constructor and copy assignment operator of std::unique_ptr are declared private, which prevents code that might copy the pointer (the std::unique_ptr object) from compiling. This leaves you with only the two desired alternatives: copy the resource when the object is copied (make a new copy of the resource in a different memory block, and therefore a different pointer), or transfer ownership of the pointer to the new object.
The initial definition of the class is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
(std::unique_ptr is defined in memory
in the C++ standard library)
For now, we don’t define a copy constructor or assignment operator overload; the compiler will generate defaults for us. The defaults attempt to copy the object by assigning each member of the new object with the corresponding member of the object being copied, ie. it applies something like memberOfThisObject = memberOfCopiedObject
for each member.
With only the constructor we have supplied so far, we can write code like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
The compiler processes the creation of these three objects as follows:
resource1 is the simplest case where we supply the resource argument to the constructor and the object is constructed in the normal, expected way
resource2 is created in the same way as resource1. Although it may appear from the syntax that a temporary object is created and then copied into resource2, this is not the case. Modern optimizing compilers are smart enough to elide (remove or optimize away) the temporary copy and instead initialize resource2 directly. As a result, the copy constructor or copy assignment operator is not called, only the standard constructor as with resource1.
resource3 uses a factory method to create the object. Again, it appears that returning a new object instance from LoadResource creates a temporary object, but once again the compiler elides the copy and initializes resource3 directly as the object created in LoadResource. Only the standard constructor is used.
The difference between initialization and assignment
- When you declare an object (with its type) and provide an expression to give the object an initial value, this is initialization.
- When you change the value of an existing object (and in fact, in all other cases besides the above), this is assignment.
Initialization:
1 2 3 4 5 |
|
For the example class above:
1 2 3 4 5 6 7 8 9 |
|
Assignment:
1 2 3 4 |
|
When you initialize an object, the standard constructor, copy constructor, or in C++11 move cosntructor is called.
When you assign to an object, the copy assignment operator, or in C++11 move assignment operator is called. If a temporary object is created on the right-hand side of the expression, the standard constructor for that object is called to initialize the temporary object, then the copy assignment operator (or move assignment operator) is called on the object being assigned to, with the temporary object as the argument, ie.
1 2 |
|
Copy initialization
If you initialize an object with another object, this is called copy initialization, and is the onlycase in which your object’s copy constructor is called:
1 |
|
Note that the copy constructor is not called here:
1 |
|
because this is assignment, not initialization.
Implementing copy initialization and (copy) assignment
Copy initialization is handled by a copy constructor, and assignment is handled by overloading the assignment operator (=). A naive implementation for our class might look like this:
1 2 3 4 5 6 7 8 9 10 11 |
|
You will notice this code is very similar to the standard constructor we defined earlier, except that we get the existing resource pointer from the object being copied (r) and copy the string from it, rather than copying the string supplied in the argument as we did in the standard constructor.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Assignment is a little more tricky than initialization, because what we are essentially doing is destructing the existing object and then re-constructing it with new values. In a more complex class, you might need to free various resources and then re-allocate them using copies of the resources from the object being copied. std::unique_ptr makes our life easy here, because assigning a new instance of std::unique_ptr to an existing std::unique_ptr object with =
as above (or by using reset()
) first frees the old pointer, so releasing the resource is handled for us automatically (this is the same reason our class doesn’t need a destructor – when std::unique_ptr goes out of scope as our object goes out of scope, std::unique_ptr‘s destructor is called and the resource is freed automatically).
The condition if (&r != this) checks for the usually rare case of self-assignment, in which case we want to do nothing. Indeed, if we try to destroy and re-create resources with the same source and target object, we’re almost certainly going to cause problems, so we want to avoid this. This does impose a very small and usually pointless performance penalty, and below we’ll discuss a way to avoid this check altogether with the copy-and-swap idiom.
Let’s look at what happens now:
1 2 3 4 |
|
This invokes the copy assignment operator to copy resource1
into resource2
. Since a new copy of the resource pointed to by the std::unique_ptr
is made, the resource pointer in resource1
remains valid.
1 |
|
This invokes the copy constructor (because we are using initialization, not assignment) and creates a new copy of the resource in the same fashion.
lvalues & rvalues
Although the concept of lvalues and rvalues has always existed in C++, it is really with the advent of C++11 that it has become particularly important to understand it. An lvalue is a historical abbrieviation for locator value, and rvalue essentially means “everything that isn’t an lvalue”. Simply put, an lvalue is a concrete variable or object which has non-temporary memory allocated to it, and an rvalue is a temporary expression or object. For example:
1 2 3 4 5 6 7 8 |
|
When we talk about references in C++, we are really talking about lvalue references:
1 2 |
|
Notice that lvalue references must always point to an lvalue (unlike raw pointers which don’t have to point to anything specific). They cannot point to an rvalue:
1 |
|
This is why you must always initialize a reference when it is declared (or in the initializer list of a class constructor): it must always point to an lvalue. If the reference wasn’t initialized, it would not point to anything.
Once a reference has been initialized, you cannot change the lvalue it points to (references). If you use the assignment operator with the reference, you change the lvalue it references:
1 2 3 |
|
This is why references can be used on the left-hand side of an assignment statement.
C++11 move semantics: rvalue references
Copy assignment in earlier versions of C++ is often a complete waste of memory and processor time. Consider a class which allocates a lot of memory and performs a lot of work in the constructor:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
Wow, this is horrible. When o
is assigned to, a temporary object (an rvalue) is created, allocating 1 million int
s, then the whole object is copied to o
. In this example, it is easily avoidable by simply changing the code to:
1 2 3 |
|
But if o
is a class member, it cannot be initialized directly because C++ forbids non-const
members from being directly initialized when they are declared, so the copy becomes unavoidable.
Since we don’t need the temporary object except for the purposes of initializing o
, what we would really like to do is just copy the int
pointer in foo
, and prevent the temporary object from freeing the allocated memory when it is destructed. C++11 makes this possible by introducing the concepts of rvalue references, the move constructor and the move assignment operator.
When you initialize or assign to an object using a temporary object (an rvalue), C++11 looks to see if you have defined a move constructor or move assignment operator in your class. If you have, the temporary object is passed to it as a modifiable (non-const
) rvalue reference, allowing you to transfer ownership of resource pointers and handles, and nullify them in the temporary object. Note that the destructor in SomeClass
only frees the int
pointer if it is non-null; this is crucial to making this paradigm work correctly.
We can implement the move constructor and move assignment operator as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Note the use of the special new syntax &&
to indicate that the variable is an rvalue reference. This is much better! When the temporary object is assigned, we now simply copy the pointer instead of the whole array.
Implementing move initialization and move assignment
Let’s see how this applies to our resource class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
The standard library introduces a new function std::move
which takes an lvalue and turns it into an rvalue reference. This is very handy for handling objects such as std::unique_ptr: if we had just written resource = r.resource
, an attempt to make a copy of the std::unique_ptr
would be made (and fail because the copy assignment operator in std::unique_ptr is declared private). However, std::unique_ptr
defines a public move assignment operator, which copies the raw pointer into the target std::unique_ptr
and nullifies it in the std::unique_ptr
being copied, essentially transferring ownership of the pointer, which is exactly what we want.
Notice that the process is completely transparent and backwards-compatible: if the object can be copied, and no move assignment operator is defined, then the object will be copied on assignment. If a move assignment operator is defined, the assignment will automatically use that instead. Therefore, you don’t need to modify your main application code besides adding a move constructor and move assignment operator to your existing classes.
Let’s look at what happens now:
1 2 |
|
The expression on the right-hand side is an rvalue. The compiler sees that we have defined a move assignment operator, so it converts the temporary object into an rvalue reference and calls our move assignment operator, which transfers ownership of the std::unique_ptr
to the member in resource
.
Similarly, with the factory method:
1 2 |
|
The compiler knows that the return value from LoadResource
is an rvalue, and uses the same logic as above to call resource
‘s move assignment operator.
Moving or transferring ownership of an existing object
Existing objects are lvalues, so initialization or assignment of new objects from them will call the copy constructor or copy assignment operator by default. We can use std::move
to convert the existing objects to rvalue references, forcing invocation of the move constructor or move assignment operator instead:
1 2 3 4 5 6 7 |
|
After this code executes, the resources pointed to in resource1
and resource2
will be null, preventing std::unique_ptr
from freeing the resources when the temporary objects are destroyed.
Unifying assignment operator and the copy-and-swap idiom
Let’s face it: adding 3 constructors and 2 assignment operator overloads to our class – which almost entirely duplicate the same code – is pretty horrible. The copy-and-swap idiom kills not two, not three, but four birds with one stone: it allows us to define only a copy constructor and move constructor, then implement the assignment operators in terms of these; it unifies the two assignment operators into a single overload, it provides exception safety in the case of memory allocation failure (for example), and it mitigates the need for a self-assignment check.
There are various implementations and I shall demonstrate the most modern C++11 version here. The principle is that the implementation of the assignment operator will receive the object being assigned from by value instead of by reference, which causes the creation of a temporary object which is local in scope to the assignment operator function. We then swap the contents of the object being assigned to with this local temporary object. When the assignment operator function returns, the temporary object goes out of scope and is destructed, but since have swapped everything with the contents of the existing object, it is the values that were in the existing object which are destructed. Since we are over-writing all of those values with new values from the temporary object, this is exactly what we want!
We delete our existing two assignment operator overloads and replace them with:
1 2 3 4 5 6 7 8 9 10 11 |
|
In swap()
, you will essentially implement calls to std::swap
on all the members in your class.
The true genius of this method is what happens when the passed-by-value object to be assigned from (the source object) is created as a temporary object in the assignment operator function. There are three possibilities:
- The source object was an lvalue: the source object’s copy constructor will be called to make a copy and the overload will behave as a copy assignment operator.
- The source object was an rvalue: nothing is called; the compiler elides the temporary copy and passes the object by value; the overload will behave as a move assignment operator.
- The source object was an rvalue the source object’s move constructor will be called and the overload will behave as a move assignment operator.
Additionally, as mentioned earlier, because the object is passed to the unifying assignment operator by value, we no longer need to check for self-assignment.
Lvalue example:
1 2 3 4 |
|
Intended behaviour: copy the resource in resource
to copiedResource
so that each points to its own copy of the resource. The resource in resource
should remain valid afterwards.
Here is what happens:
resource
is initialized with the standard constructor (thestd::unique_ptr
points to"ResourceToCopy"
)copiedResource
is initialized with the default constructor (thestd::unique_ptr
doesn’t point to anything)resource
is copied intor
using the copy constructor when the unifying assignment operator receivesresource
by value; a new copy of the string resource is made and is pointed to by a new instance ofstd::unique_ptr
- the contents of
copiedResource
andr
are swapped (copiedResource
now has ownership of thestd::unique_ptr
pointing to"ResourceToCopy"
, andr
has ownership of nothing becausecopiedResource
never pointed to a resource originally) r
goes out of scope and is destructed – no resource is destroyed because it is a copy and not the original ofresource
, and the resource it pointed to was replaced by the null resource fromcopiedResource
End result: copiedResource
controls ownership of a new std::unique_ptr
pointing to a new copy of "ResourceToCopy"
and resource
remains unmodified. When resource
or copiedResource
are later destructed, the resource in the other object will not be affected since a copy has been made.
Rvalue example:
1 2 3 |
|
Intended behaviour: move (transfer ownership of the resource in) to resource
, freeing the resource originally pointed to by resource
.
Here is what happens:
resource
is initialized with the standard constructor (thestd::unique_ptr
points to"ResourceToReplace"
)- a temporary is created with the standard constructor (the
std::unique_ptr
points to"ResourceToMove"
) - the temporary is passed to the unifying assignment operator directly by value (no copy is made, the compiler elides it)
- the contents of
resource
andr
are swapped (resource
now has ownership of thestd::unique_ptr
pointing to"ResourceToMove"
, andr
has ownership of thestd::unique_ptr
pointing to"ResourceToReplace"
) r
goes out of scope and is destructed – the"ResourceToReplace"
resource is destroyed- the temporary in the calling function goes out of scope and is destructed – nothing happens because the pointer is null; it has already been destructed
End result: resource
now controls ownership of the std::unique_ptr
pointing to "ResourceToMove"
, and the "ResourceToReplace"
pointer has been freed
Note that while the behaviour is different to when the copy-and-swap idiom is not used (where the temporary will be moved if the object has a move assignment operator, but copied if it doesn’t have one but does have a copy assignment operator), the end result is the same.
Rvalue reference example:
1 2 3 4 |
|
Intended behaviour: move (transfer ownership of the resource in) resource
to newResource
, freeing the resource originally pointed to by newResource
.
Here is what happens:
resource
is initialized with the standard constructor (thestd::unique_ptr
points to"ResourceToMove"
)newResource
is initialized with the standard constructor (thestd::unique_ptr
points to"ResourceToReplace"
)resource
is converted to an rvalue reference and a copy of the object (r
) is made using the move constructor when the unifying assignment operator receivesresource
by value; ownership of the pointer owning"ResourceToMove"
has been transferred fromresource
tor
- the contents of
newResource
andr
are swapped (newResource
now has ownership of thestd::unique_ptr
pointing to"ResourceToMove"
, andr
has ownership of thestd::unique_ptr
pointing to"ResourceToReplace"
) r
goes out of scope and is destructed – the"ResourceToReplace"
resource is destroyed
End result: newResource
now controls ownership of the std::unique_ptr
pointing to "ResourceToMove"
, the "ResourceToReplace"
pointer has been freed and resource
no longer has ownership of any pointer/resource. When resource
is later destructed, no resource will be freed as ownership has been transferred to newResource
.
Bringing it all together
Below are two concrete example programs, one using the copy-and-swap idiom and one using the normal assignment operators. Run these programs to prove that the resource copies and moves work as expected, and step through them with your debugger to confirm which constructors and assignment operator overloads are called for each initialization and assignment. Note that the contents of main() in both examples is identical, but the comments in some places are different to highlight the different behaviour when the copy-and-swap idiom is used.
Without copy-and-swap idiom:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
|
With copy-and-swap idiom:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
Both programs produce the same output, of course:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
I hope you found this article useful. I’ll finish with some references to other great related articles where you can find more intricate details. Please leave comments and feedback below!
References
Move semantics and rvalue references in C++11
The new C++ 11 rvalue reference && and why you should start using it
Understanding lvalues and rvalues in C and C++