[C++] Vector simulation implementation | Problems when using memcpy copy | Implementing deep copy

Table of Contents

Basic framework and interface

Constructor

No-argument construction

Iterator range construction

Initialization construction

destructor

size() | capacity()

Expanded reserve()

Problems using memcpy copy

resize() to change size

operator[]

Iterator implementation

Addition and deletion of vector

End insertion push_back()

Tail deletion pop_back()

Use insert and erase to discuss the issue of iterator failure

insert()

erase()

Implementation of deep copy

copy constructor

Assignment operator=

In the last article, we talked about vector, which is a class template that can accommodate various types of objects as its elements and can be dynamically resized. Can be understood as a dynamic array.

In this article, we will implement a simple version of vector ourselves, which can greatly deepen our understanding of vector!

Because the implementation of vector has many similarities with string, some details of the implementation process will not be detailed.

Basic framework and interface

vector.h:

#pragma once
namespace jzy //In order to distinguish it from the vector in the STL library, we put it into a custom namespace
{
    template<typename T>
    class vector
    {
    public:
        typedef T* iterator;
        
    private:
        iterator_start;
        iterator _finish; //finish represents the position after the last position
        iterator _end_of_storage;
    };
}

The three member variables here are implemented according to STL 3.0 version with reference to “STL Source Code Analysis”.

In this case, if you want to know _size or _capacity, just use member variables to subtract.

Constructor

No-parameter construction

vector()
    :_start(nullptr)
    ,_finish(nullptr)
    , _end_of_storage(nullptr)
{}

Iterator interval construction

It is constructed by passing the starting and starting range of the iterator (left closed and right open).

vector(InputIterator first, InputIterator last)
        {
            InputIterator it = first;
            int num = 0; //Count number
            while (it != last)
            {
                it++;
                num + + ;
            }
?
            _start = new T[num];
            for (int i = 0; i < num; i + + )
            {
                _start[i] = *first + + ;
            }
            _finish = _start + num;
            _end_of_storage = _start + num;
        }

Initialization construction

The object can be initialized during construction to contain n val values.

vector(int n, const T & amp; val = T()) //Note: size_t cannot be given here!
        {
            _start = new T[n];
            for (int i = 0; i < n; i + + )
            {
                _start[i] = val;
            }
            _finish = _start + n;
            _end_of_storage = _start + n;
        }

Why can’t the type of n be size_t?

If it is size_t, when the two parameters passed are of int type, the test result is:

void test7()
{
    vector<int> v1(5,1);
    for (auto & e : v1)
    {
        cout << e << " ";
    }
}

reason:

We know that when v1 matches the constructor, it matches based on the type of the parameter.

size_t and int do not match well, but InputIerator can match the int type, because the InputIerator itself is a template, and int can be matched without conversion.

So the constructor called by v1 is vector(InputIterator first, InputIterator last); ,

In this function, int is dereferenced, so an error is reported: illegal indirect addressing.

Destructor

~vector()
{
    delete[] _start;
    _start = _finish = _end_of_storage = nullptr;
}

size() | capacity()

The current three member variables cannot intuitively represent the capacity and size, so we need to implement it ourselves.

size_t size()
{
    return _finish - _start;
}
?
size_t capacity()
{
    return _end_of_storage - _start;
}

Expanded reserve()

The idea for expansion is:

First open a new space, then copy all the data to the new space, then release the old space and let the pointer point to the new space.

Uncorrected version of reserve:

void reserve(size_t n)
        {
            if (n > capacity())
            {
                size_t sz = size();
?
                T* tmp = new T[n];
                int a = size();
                if (_start)
                {
                    memcpy(tmp, _start, sz* sizeof(T));
                    delete[] _start;
                }
                _start = tmp;
                _finish = _start + sz;
                _end_of_storage = _start + n;
            }
        }

Let’s test it:

void test10()
{
    vector<int> v;
    v.push_back(1);
    v.push_back(2);
    v.push_back(3);
    v.push_back(4);
    v.push_back(5);
?
    for (auto & e : v)
    {
        cout << e << " ";
    }
    cout << endl;
}

Looks like it’s done. But is it really OK?

If we use a custom type, such as vector to test:

void test9()
{
    vector<string> v;
    v.push_back("happy");
    v.push_back("happy");
    v.push_back("happy");
    v.push_back("happy");
    v.push_back("happy");
?
    for (auto & e : v)
    {
        cout << e << " ";
    }
    cout << endl;
?
}

The program actually crashed!

In fact, this is all the fault of memcpy.

Problems using memcpy copy

?memcpy can only perform shallow copies, so if you are copying built-in types, you are happy to use memcpy.

If it is a custom type and involves resource management, memcpy cannot be used, otherwise it may cause memory leaks or even program crashes.

Now let’s explain why the vector use case crashes:

When calling push_back, if there is not enough space, push_back will call reserve internally to open up space. The problem lies in this reserve. Let’s take a look at how reserve is implemented:

void reserve(size_t n)
        {
            if (n > capacity())
            {
                size_t sz = size();
?
                T* tmp = new T[n];
                int a = size();
                if (_start)
                {
                    memcpy(tmp, _start, sz* sizeof(T)); //Use memcpy when copying data
                    delete[] _start;
                }
                _start = tmp;
                _finish = _start + sz;
                _end_of_storage = _start + n;
            }
        }

As you can see, reserve is used to adjust memcpy to copy data, and _start is released after copying.

The principle of memcpy copy is shallow copy, which directly copies the value. If the elements in the vector are int or char, then there is no problem in directly copying them. But at this time, the vector in the vector is a string. We know that if we want to find the string, we need to know the address of the first element. Therefore, the vector container stores the addresses of the first elements. At this time, shallow copying will be like this:

The content of tmp is copied from the memcpy value and points to the same space as _start. When _start is deleted, the tmp space is also released.

Therefore, if the object involves resource management, you must not use memcpy to copy between objects. You still have to copy it honestly yourself.

Modified reserve:

void reserve(size_t n)
        {
            if (n > capacity())
            {
                //open space
                T* tmp = new T[n];
                //copy data
                iterator begin = _start;
                int i = 0;
                while (begin != _finish)
                {
                    tmp[i + + ] = *begin + + ;
                }
                //Release, assignment
                delete[] _start;
                _start = tmp;
                _finish = tmp + i;
                _end_of_storage = tmp + n;
            }
        }

Test again at this time:

resize() that changes the size

void resize(size_t n , T val = T())
        {
            if (n < size())
            {
                _finish = _end_of_storage = _start + n;
            }
            else
            {
                reserve(n);
                for (int i = size(); i < n; i + + )
                {
                    _start[i] = val;
                }
                _finish = _start + n;
            }
        }

operator[]

T & amp; operator[] (size_t pos)
{
    assert(pos < size());
    return *(_start + pos);
}

Iterator implementation

begin() | end() of ordinary iterator:

typedef T* iterator;
iterator begin()
{
    return _start;
}
?
iterator end()
{
    return _finish;
}

begin() | end() of const iterator:

After being modified by const, it can only be read, not written.

typedef const T* const_iterator;
const_iterator begin() const
{
    return _start;
}
?
const_iterator end() const
{
    return _finish;
}

About scope for:

As long as the iterator is implemented, the range for can be used without having to implement it specially:

void test1()
{
    vector<int> v;
    v.push_back(1);
    v.push_back(2);
    v.push_back(3);
    v.push_back(4);
?
    for (auto e : v) //Traverse with range for
    {
        cout << e << " ";
    }
    cout << endl;
}

In fact, the underlying principle of range for is iterator. It relies on begin() and end() to implement, and only knows begin() and end().

If I change the name of begin() to Begin(), the iterator will still work, but the range for won’t work: it doesn’t know Begin(). . .

vector additions and deletions

Tail insert push_back()

void push_back(const T & amp; val)
{
    //First consider whether the capacity is enough
    if (size() == capacity())
    {
        reserve(capacity() == 0 ? 4 : 2 * capacity());
    }
?
    *_finish = val;
    _finish + + ;
}

Note here: the formal parameter must be modified by const and passed by reference.

It is more labor-saving to pass by reference, otherwise deep copying will be expensive; with const, formal parameters can receive constant strings.

Tail delete pop_back()

void pop_back()
{
    assert(_start<_finish);
    _finish--;
}

Use insert and erase to discuss the issue of iterator failure

insert()

void insert(iterator pos, const T & amp; val)
        {
            assert(pos >= _start);
            assert(pos <= _finish);
            //First consider whether there is enough space
            if (_finish == _end_of_storage)
            {
                reserve(capacity() == 0 ? 4 : 2 * capacity());
            }
            //Move data
            iterator end = _finish - 1;
            while (end >= pos)
            {
                *(end + 1) = *end;
                end--;
            }
            //insert
            *pos = val;
            _finish + + ;
        }

Writing this way is actually not enough. Problems will arise once expansion is involved. Let’s test it out:

void test2()
{
    vector<int> v;
    v.push_back(1);
    v.push_back(2);
    v.push_back(3);
    v.pop_back();
    v.insert(v.begin(), 10);
    v.insert(v.begin(), 11);
    v.insert(v.begin(), 12);
?
    for (auto e : v)
    {
        cout << e << " ";
    }
    cout << endl;
}

A random value appears!

The reason is actually that there was an omission in the reserve expansion step, which made the iterator pos invalid.

This is the iterator invalidation problem.

In other words, after the expansion, the iterator pos needs to be updated: so that pos that originally pointed to the old space now points to the same position in the new space.

After modification:

void insert(iterator pos, const T & amp; val)
        {
            int flag_pos = pos - _start; //Record the relative position of pos first so that pos can be updated later
?
            assert(pos >= _start);
            assert(pos <= _finish);
            //Consider whether there is enough space
            if (_finish == _end_of_storage)
            {
                reserve(capacity() == 0 ? 4 : 2 * capacity());
                pos = _start + flag_pos; //Update pos based on the position just recorded
            }
?
            //Move data
            iterator end = _finish - 1;
            while (end >= pos)
            {
                *(end + 1) = *end;
                end--;
            }
            //insert
            *pos = val;
            _finish + + ;
        }

Now you can insert successfully:

Expanded thinking: If v.begin() is passed to pos, is it feasible to pass parameters by reference?
void insert(iterator & amp; pos, const T & amp; val);
Not feasible. This question tests our basic knowledge of classes and objects.

Let’s take a look at begin():
iterator begin()
        {
            return _start;
        }
It returns by value, and what is returned is not _start, but its copied temporary object.

The temporary object is permanent, so pos cannot be its alias. We can only copy it and save it in pos.

erase()

void erase(iterator pos)
        {
            assert(pos >= _start & amp; & amp; pos < _finish ); //Note here: cannot <=_finish! Because it points to the position after the last element
            
            iterator begin = pos + 1;
            while (begin < _finish)
            {
                *(begin - 1) = *begin;
                begin++;
            }
            
            _finish--;
        }

but! The seemingly calm erase() actually contains hidden dangers: erase will also have the problem of iterator failure.

Now we use an example to show its problem: we are now required to delete all even numbers.

v is divided into two groups, namely A: {1,2,3,4,5}; B: {1,2,3,4}.

void test3()
{
    vector<int> v;
    v.push_back(1);
    v.push_back(2);
    v.push_back(3);
    v.push_back(4);
    v.push_back(5);
?
    vector<int>::iterator it = v.begin();
    while (it != v.end())
    {
        if (*it % 2 == 0)
        {
            v.erase(it);
        }
        it++;
    }
    for (auto e : v)
    {
        cout << e << " ";
    }
    cout << endl;
}

Deletion successful. But if v is 1 2 3 4, it won’t work.

B：

void test3()
{
    vector<int> v;
    v.push_back(1);
    v.push_back(2);
    v.push_back(3);
    v.push_back(4);
    …
}

The program crashed:

This is because the iterator is invalid. We use a diagram to illustrate the reason:

And {1,2,3,4,5} is just a coincidence. The deleted even number is followed by an odd number, so no error is exposed.

So how do the experts who write C++ deal with the failure of the iterator in erase?

The processing idea is: change the return value from void to iterator, and return the position of pos after deletion. In this case, the iterator will still point to pos after deletion, and the comparison of pos positions will not be missed.

Modified erase:

iterator erase(iterator pos)
        {
            assert(pos >= _start & amp; & amp; pos < _finish ); //Note here: cannot <=_finish! Because it points to the position after the last element
            
            iterator begin = pos + 1;
            while (begin < _finish)
            {
                *(begin - 1) = *begin;
                begin++;
            }
            
            _finish--;
            return pos;
        }

test:

void test3()
{
    vector<int> v;
    v.push_back(1);
    v.push_back(2);
    v.push_back(3);
    v.push_back(4);
?
    //Request to delete all even numbers
    vector<int>::iterator it = v.begin();
    while (it != v.end())
    {
        if (*it % 2 == 0) //Use if else statement. After deletion, the iterator will still stop at pos position and will not increase by itself.
        {
            it = v.erase(it);
        }
        else
        {
            it++;
        }
    }
    for (auto e : v)
    {
        cout << e << " ";
    }
    cout << endl;
}

Pay attention to the way the test is written here! ! After v.erase(), use an iterator to receive its return value, and use an if else statement to increase it. This prevents the program from crashing due to accessing an invalid iterator.

If you still write it this way:
void test5() {
vector<int> v1;
v1.push_back(1);
v1.push_back(2);
v1.push_back(3);
v1.push_back(4);

vector<int>::iterator it = v1.begin();
while (it != v1.end()) {
if (*it % 2 == 0) {
v1.erase(it);
}
it++;
}
for (auto e : v1) {
cout << e << " ";
}
}
That would still crash the program! !

In fact, no error will be reported when the iterator is invalid. The error is reported because the invalid iterator is accessed.

Therefore, be careful when using erase.

In order for the program to run normally, firstly, the underlying implementation must return an iterator;

Second, when using erase, use an if else statement, and use an iterator to receive the return value of the function.

Written question about iterator failure

Use the knowledge you just learned to answer this question!

What is the result of running the following program?

int main()
{
int ar[] ={1,2,3,4,0,5,6,7,8,9};
int n = sizeof(ar) / sizeof(int);
vector<int> v(ar, ar + n);
vector<int>::iterator it = v.begin();
while(it != v.end())
{
if(*it != 0)
cout<<*it;
else
v.erase(it);
it++;
}
return 0;
}

A. The program crashes
B.1 2 3 4 5 0 6 7 8 9
C.1 2 3 4 5 6 7 8 9
D.1 2 3 4 6 7 8 9

We have just emphasized that when using erase, the writing method is very particular. If else statement + use an iterator to receive the return value, the program can run normally. Written this way, the program will crash due to accessing an invalid iterator. Choose A.

implementation of deep copy

copy constructor

If we use the default copy constructor to perform a shallow copy of the vector:

void test4()
{
    vector<int> v1;
    vector<int> v2(v1);
}

This is because a shallow copy can only copy the value, not the same space.

In this way, v1 and v2 point to the same space. When v1 and v2 are destructed, the same space is destructed twice, so the program crashes.

Therefore, we need to manually implement the copy construction of vector and implement deep copy.

Way1 traditional writing method: open space and copy data honestly.

vector(vector<T> & amp; v)
            :_start(new T[v.capacity()])
            , _finish(_start + v.size())
            , _end_of_storage(_start + v.capacity())
        {
            memcpy(_start, v._start, sizeof(T) * v.size());
        }

Way2 modern writing method: The essence is to reuse existing code, “construct a new object + swap yourself with the new object”.

vector(const vector<T> & amp; v)
            :_start(nullptr)
            , _finish(nullptr)
            , _end_of_storage(nullptr)
        {
            vector<T> tmp(v.begin(), v.end());
            swap(_start, tmp._start);
            swap(_finish, tmp._finish);
            swap(_end_of_storage, tmp._end_of_storage);
        }

assignment operator=

vector<T> & amp; operator=(vector<T> v) //Because parameters are passed by value, v is already a copy of the actual parameters, so there is no need to construct tmp again.
        {
            swap(_start, v._start);
            swap(_finish, v._finish);
            swap(_end_of_storage, v._end_of_storage);
            return *this;
        }