5. Deep copy and shallow copy + return value optimization

Deep copy and shallow copy + return value optimization

Concept

In C++, deep copy and shallow copy are two different methods of object copying, and they show different behaviors when dealing with dynamically allocated memory.

Shallow Copy:
- Shallow copy refers to copying the value of the data member of one object to another object, but if the object contains pointers to dynamically allocated memory, they will point to the same memory space.
- In this way, when an object is destroyed, the memory space it points to will be released. If another object still references this memory, then this memory will become a dangling pointer, and accessing it may cause undefined behavior.
- Shallow copies usually occur when copying using the default copy constructor and assignment operator.
Deep Copy:
- Deep copy means creating a new object and then copying the data members of the original object to the new object, including a pointer to dynamically allocated memory, rather than simply copying the pointer itself.
- In this way, the original object and the new object each have an independent memory space, and operations on one object will not affect the other object.
- In order to implement deep copy, you usually need to customize the copy constructor and assignment operator to ensure that all resources are copied correctly.
You may not understand it this way, so let’s understand it in the code

Shallow copy

 /********************************************** ***************************
         > File Name: test.cpp
         > Author:Xiao Yuheng
         > Mail:[email protected]
         > Created Time: Sat Oct 28 17:03:20 2023
  *************************************************** ************************/
 ?
 #include <iostream>
 ?
 using namespace std;
 ?
 #define BEGINS(x) namespace x {
 #defineENDS(x) }
 ?
 BEGINS(xyh)
 ?
 class A{
 public:
     int x,y;
     A(int x = 100, int y = 100) : x(x), y(y) {}
 ?
     ~A() {}
 };
 ?
 ENDS(xyh)
 ?
 int main() {
     xyh::A a;
     xyh::A b = a;
     cout << a.x << " " << a.y << endl;
     cout << b.x << " " << b.y << endl;
     return 0;
 }

Let’s first take a look at the results of this code

From the knowledge of constructors, we can know that xyh::A b = a; this line of code will call the copy constructor. Since we have not written it, the system’s default copy constructor will be called at this time, that is to say Copy all the values of object a to object b, so there is no problem. Let’s look at a piece of code next:

 /********************************************** ***************************
         > File Name: test.cpp
         > Author:Xiao Yuheng
         > Mail:[email protected]
         > Created Time: Sat Oct 28 17:03:20 2023
  *************************************************** ************************/
 ?
 #include <iostream>
 ?
 using namespace std;
 ?
 #define BEGINS(x) namespace x {
 #defineENDS(x) }
 ?
 BEGINS(xyh)
 ?
 class A{
 public:
     int n;
     int *data;
     A(int n = 100) : n(n), data(new int[n]) {}
 ?
     ~A() {
         delete[] data;
     }
 };
 ?
 ENDS(xyh)
 ?
 int main() {
     xyh::A a;
     xyh::A b = a;
     return 0;
 }

Let’s look directly at the running results:

We can see that an error is reported, but why? We can know from the error that we free() twice, but why is this happening? We have to mention the shortcomings of shallow copy. In the second example, we xyh::A b = a; can know that it will call the default copy constructor that comes with the system. This When it does, it will completely copy the value of object a to object b, including the address pointed by the pointer. But when we end this program, the system will call the destructor to release this space, but because object a and The b object points to the same address, so we release the same memory twice, causing an error. We can add this piece of code to the code:

int main() {
     xyh::A a;
     xyh::A b = a;
     cout << a.data << endl;
     cout << b.data << endl;
     return 0;
 }

Print out the addresses pointed to by the two pointers a.data and b.data, and let’s see the running results:

We can find that the addresses are the same. Now we know why the error was reported, but how to correct it? Let’s look at deep copy next.

Deep copy

We hand-write a copy constructor on the previous code

 /********************************************** ***************************
         > File Name: test.cpp
         > Author:Xiao Yuheng
         > Mail:[email protected]
         > Created Time: Sat Oct 28 17:03:20 2023
  *************************************************** ************************/
 ?
 #include 
 ?
 using namespace std;
 ?
 #define BEGINS(x) namespace x {
 #defineENDS(x) }
 ?
 BEGINS(xyh)
 ?
 class A{
 public:
     int n;
     int *data;
     A(int n = 100) : n(n), data(new int[n]) {}
     A(const A & amp; a) : n(a.n), data(new int[n]){
         for (int i = 0; i < n; i + + ) {
             data[i] = a.data[i];
         }
     }
     ~A() {
         delete[] data;
     }
 };
 ?
 ENDS(xyh)
 ?
 int main() {
     xyh::A a;
     xyh::A b = a;
     cout << a.data << endl;
     cout << b.data << endl;
     return 0;
 }

We are re-opening a space, and then use the data of the b object to point to this space, and then we are performing an assignment operation, so that we will not get an error message. Let's run it:

We can find that the address pointed by the data pointer of object a is different from the address pointed by the data pointer of object b. Well, in this example, we have completed the deep copy, but is there a good method that can adapt to it? What about various situations? Okay, let's look down.

 /********************************************** ***************************
         > File Name: test1.cpp
         > Author:Xiao Yuheng
         > Mail:[email protected]
         > Created Time: Fri Oct 27 16:16:22 2023
  *************************************************** ************************/
 #include <iostream>
 ?
 using namespace std;
 ?
 template<typename T>
 class Vector {
 public:
     Vector(int n = 100) : n(n), data(new T[n]) {}
     Vector(const Vector & amp;a) : n(a.n), data(new T[n]) {
         for (int i = 0; i < n; i + + ) {
             data[i] = a.data[i];
         }
         return ;
     }
     ~Vector() {
         delete[] data;
     }
 ?
     int n;
     T *data;
 ?
 };
 ?
 int main() {
     Vector<Vector<int>> arr1;
     Vector<Vector<int>> arr2(arr1);
     cout << arr1.data << endl;
     cout << arr2.data << endl;
     return 0;
 }

Please take a look at this code. Can the same copy constructor continue to execute correctly? Let’s run it and see:

You can see that an error is reported here, but what is the reason? We can see that data[i] is an array pointer. At this time, running data[i] = a.data[i]; directly is equivalent to a shallow copy? How should we solve this problem at this time?

We introduce a new concept here, In-situ copy. What is in-situ copy? You can understand it this way: copy construction can be performed recursively. The specific code is as follows:

 /********************************************** ***************************
         > File Name: test1.cpp
         > Author:Xiao Yuheng
         > Mail:[email protected]
         > Created Time: Fri Oct 27 16:16:22 2023
  *************************************************** ************************/
 #include <iostream>
 ?
 using namespace std;
 ?
 template<typename T>
 class Vector {
 public:
     Vector(int n = 100) : n(n), data(new T[n]) {}
     Vector(const Vector & amp;a) : n(a.n), data(new T[n]) {
         for (int i = 0; i < n; i + + ) {
             new(data + i) T(a.data[i]);
         }
         return ;
     }
     ~Vector() {
         delete[] data;
     }
 ?
     int n;
     T *data;
 ?
 };
 ?
 ?
 ?
 int main() {
     Vector<Vector<int>> arr1;
     Vector<Vector<int>> arr2(arr1);
     cout << arr1.data << endl;
     cout << arr2.data << endl;
     return 0;
 }

This can solve this problem very well.

Return value optimization

Return Value Optimization (RVO) is a C++ compiler optimization technology that is used to avoid unnecessary copy operations when a function returns an object.

In C++, when a function returns an object, a temporary object is usually created, the local object in the function is copied to the temporary object, and a copy of the temporary object is returned. This process involves calling the copy constructor.

The implementation principle of RVO is that the compiler will construct the object directly at the place where the function is called when returning the object, instead of constructing a temporary object inside the function and then performing a copy operation. This saves memory and runtime overhead.

It may not be clear to say this, so let me tell you through a piece of code:

 /********************************************** ***************************
         > File Name: test3.cpp
         > Author:Xiao Yuheng
         > Mail:[email protected]
         > Created Time: Sat Oct 28 20:16:17 2023
  *************************************************** ************************/
 ?
 #include <iostream>
 ?
 using namespace std;
 ?
 class A{
 public:
     int x, y;
     A() {
         cout << "Constructor" << endl;
     }
     A(const A &a) {
         cout << "copy construction" << endl;
     }
 ?
 };
 ?
 A fun() {
     A temp;
     return temp;
 }
 ?
 int main() {
     A b = fun();
     return 0;
 }

Can you guess what this code will output? Let’s run it:

We can see that a "Constructor" is output at the end, but according to the principle, what we should output first is

Constructor: A temp; this will call the constructor
Copy construction: return temp; copy to a temporary variable
Copy construction: A b = fun(); finally the temporary variable is copied to the b object

This is because our current compilers basically have return value optimization. Here we turn off return value optimization. Let’s take a look:

At this time we can see that he outputs the three running results we output above;

This is the process we should have gone through, but what happens when return value optimization is turned on?

 /********************************************** ***************************
         > File Name: test3.cpp
         > Author:Xiao Yuheng
         > Mail:[email protected]
         > Created Time: Sat Oct 28 20:16:17 2023
  *************************************************** ************************/
 ?
 #include <iostream>
 ?
 using namespace std;
 ?
 class A{
 public:
     int x, y;
     A() {
         cout << "Constructor" << endl;
     }
     A(const A &a) {
         cout << "copy construction" << endl;
     }
 ?
 };
 ?
 void fun(A & temp) {
     return ;
 }
 ?
 int main() {
     A b;
     fun(b);
     return 0;
 }

Let’s take a look at this code. Yes, after turning on return value optimization, it’s almost the same as this code. Let’s take a look at the running results.

Isn’t it similar to turning on return value optimization? Let's analyze this code. At this time, we first define an object b, and then pass the reference of the b object to the function fun(); then run the content in the fun function. The return value optimization is almost the same.