Rust Syntax: Ownership & References & Lifecycle

Article directory

    • ownership
      • Garbage collection manages memory
      • Manual memory management
      • Ownership of Rust
      • ownership transfer
      • Function ownership transfer
    • Quoting and Borrowing
      • Mutable and immutable references
    • life cycle
      • dangling reference
      • Function life cycle declaration
      • Structure life cycle declaration
      • Self-inference of Rust life cycle
      • life cycle constraints
      • static life cycle

Ownership

Garbage collection management memory

Languages such as Python and Java use a technology called garbage collection when managing memory. This technology will set a reference counter for each variable to count the number of references to each object.

Once the reference count of an object is 0, the garbage collector will choose an opportunity to reclaim the space it occupies.
Take Python as an example

x = [1, 2] # The list object [1, 2] is referenced by x, and the reference count is 1
y = x # The list object [1, 2] is referenced by y, and the reference count is 2
del x # The reference x of the list object [1, 2] is deleted, the object is not deleted at this time, and the reference count is 1
del y # The reference y of the list object [1, 2] is deleted. At this time, the object is not deleted immediately, and the reference count is 0
# But this object will be cleaned up in the next garbage collection

This garbage collection method prevents developers from managing memory well, but it also causes a series of problems. That is:

  1. This kind of garbage collection will not be carried out immediately, but will be carried out at a certain time, which will undoubtedly be occupied during this period of time.
  2. Garbage collection consumes a lot of performance, because the compiler/interpreter needs to keep track of the reference count of each object to determine whether a variable will be recycled.

Manual memory management

C/C++ adopts the method of manual memory management, that is, apply for memory by yourself, and you need to release it yourself. If you don’t release it, it may lead to the accumulation of memory occupation. The advantage of this method is controllability and high performance. The disadvantage is It is easy for various errors to occur. For example, secondary release.

void test(int arr[]){<!-- -->
    free(arr); //The array has been released at this time
}

int main(){<!-- -->
    int *arr = (int *)malloc(sizeof(int) * 10);
    test(arr);
    free(arr); //A second release is performed here
    return 0;
}

You may think that the above code will not be so stupid, but if the code logic becomes complicated, it is difficult for you to guarantee that you will not make such a mistake.
Secondary release is very harmful, because when a piece of memory is released, it may be occupied by new objects again, and secondary release will destroy the storage of new objects, causing a series of serious errors.

Ownership of Rust

Rust takes an unconventional approach by employing a mechanism called ownership whose rules are as follows:

  1. Every value in Rust is owned by a variable called the owner of the value
  2. A value can only be owned by one variable at a time, or a value can only have one owner
  3. When the owner (variable) leaves the scope, this value will be discarded (drop)

Let’s look at the following example

fn main() {<!-- -->
    let x = 4; //The beginning of the scope of x
    {<!-- -->//The starting position of y’s scope
        let y = 10;
    }//The scope end position of y, the scope ends, and y is discarded
}//The scope end position of x, the scope ends, x is discarded

At this point, it is necessary to mention the two types of memory, the heap and the stack. The main differences are as follows:

The stack is a last-in-first-out memory, and its interior is highly organized, and the data in the stack memory needs to be realized to know its size, and the access speed of the stack memory is very fast

Different from the stack memory, the heap can store data of unknown size (meaning that the size cannot be determined at compile time), and the speed of access is relatively slow. The OS will find a piece of memory that can be stored and allocate it to the corresponding process. For this piece of heap memory, there will be a pointer pointing to this piece, and this pointer must be stored in the stack.

heap stack
storage Fetching speed slow fast
allocation memory size can not be fixed Must be fixed

In Rust, the following types are stored on the stack memory, and these types implement a trait called copy

(Picture from station b Yang Xu)
Since these types are all stored on the stack, and the storage speed of the stack is much higher than that of the heap, for Rust, the above data is often copied directly when assigning and passing, without the need for techniques such as references.

fn main(){<!-- -->
let x = 10;
let y = x; //At this time, y has a copy of 10
}

Transfer of ownership

Since we know that the above types exist on the stack, ownership cannot be reflected, so we use another type String that exists on the heap to demonstrate ownership.

fn main(){<!-- -->
let s = String::from("hello"); // get a String type from the string literal
}

Here we need to distinguish the difference between String and string literals. Here, String is a variable stored in heap memory, and its size can be expanded and changed. However, string literals (such as let x = “hello”;) are fixed and determined at compile time.

Consider the following code:

fn main(){<!-- -->
let s1 = String::from("hello"); // get a String type from the string literal
let s2 = s1;
println!("{}", s); //An error will be reported
}

The above code reported an error, let’s look at the error message:

error[E0382]: borrow of moved value: `s1`
 --> src\main.rs:4:17
  |
2 | let s1 = String::from("hello"); //Get a String type from the string literal value
  | - move occurs because `s` has type `String`, which does not implement the `Copy` trait
3 | let s2 = s1;
  | - value moved here
4 | println!("{}", s1); //An error will be reported
  | ^ value borrowed here after move
  |
  = note: this error originates in the macro `$crate::format_args_nl` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)

The error message says that we borrowed a value s that has been moved.
Remember the ownership rules mentioned at the beginning:

  1. Every value in Rust is owned by a variable, which is called the owner of the value
  2. A value can only be owned by one variable at the same time, or a value can only have one owner.
  3. When the owner (variable) leaves the scope, the value will be dropped (drop)

String::from(“hello”); This value was originally owned by s, but when s2 = s1, this value is no longer owned by s, but a move (move) occurred, and the ownership was handed over to s2 .

This is because of the second article: a value can only be owned by one variable at the same time, so s1 no longer has String::from(“hello”);, and naturally it can no longer be used.

This is what we look at from the memory perspective (the picture comes from Yang Xu of station b). At the beginning, s1 was a pointer pointing to the hello heap memory.

Later, because s2=s1, the pointer was copied and the following scenario was obtained

But since a value can only have one owner, the previous s1 pointer will be considered invalid, so we get the following scenario.

So in the end it looks like the ownership of s1 has moved to s2, and the ownership (ownership of the value hello) has moved.

This has a great advantage, that is, since a value has only one owner at the same time, when leaving the scope, the corresponding value will not be released repeatedly

fn main(){<!-- -->
let s1 = String::from("hello"); //s1 scope begins
let s2 = s1;//s2 scope starts, s1 loses ownership
}//s1, s2 scope ends, since s1 has no ownership, only s2 will release this heap memory.

This effectively avoids the problem of secondary release, but at the same time introduces many new problems for developers.

Function ownership transfer

But when a variable is passed as a parameter to a function, its ownership will be transferred to the formal parameter inside the function. See the following example:

fn get_len(s: String) -> usize{<!-- --> //At this point, the formal parameter s here gets the ownership of the external s1
    return s.len(); //Return the length of s
} //The scope of s ends, and s has the ownership of hello, so hello will be released

fn main(){<!-- -->
let s1 = String::from("hello"); // get a String type from the string literal
let len = get_len(s1); //hand over s1 to the function get_len
    println!("{}", s1); //Since the ownership has been handed over to the formal parameter s inside the function at this time, s1 has no ownership, resulting in an error
}

The above comment has already mentioned the specific situation. The reason for the error is that the transfer of ownership will occur when passing parameters, and whether there is any return after the transfer, resulting in the use of variables without ownership.

Solution 1:
Use the feature of shadowing to return the right to use.

fn get_len(s: String) -> (usize, String){<!-- --> //return length and string
    return (s.len(), s);
}

fn main(){<!-- -->
let s = String::from("hello"); // get a String type from the string literal
let (s, len) = get_len(s); //Use the features of Tuple unpacking and shadowing to return the right to use
    println!("{}'s len is {}", s, len);
}

Solution 2:
See citations section below

If you don’t want to be so troublesome, you want to directly copy a copy of hello and pass it over, instead of passing the right to use it, you can use the clone method, which will clone a copy of the data on the heap, and then pass it to the function.

fn get_len(s: String) -> usize{<!-- --> //Return the length and string
    return s.len();
}

fn main(){<!-- -->
let s = String::from("hello"); // get a String type from the string literal
let len = get_len(s.clone()); //Clone s, and the right to use the newly cloned part will be handed over to the formal parameter s
    println!("{}'s len is {}", s, len);
}

Note that cloning means that you have to make a copy of the data on the heap, which is quite time-consuming. But the good thing is, you don’t have to worry about ownership.

Quotation and borrowing

It is obviously a headache to transfer ownership every time you want to transfer a value, so there is borrowing. Borrowing is actually a reference, but its special feature is that it will not take ownership. Borrowing is expressed using the & operator. Examples are as follows:

fn get_len(s: & amp;String) -> usize{<!-- --> //s is a borrowing, not a transfer of ownership
    return s.len();
}

fn main(){<!-- -->
let s1 = String::from("hello"); //Get a String type from the string literal value
let len = get_len( & amp;s1); // Pass in the reference of s1
    println!("{}'s len is {}", s1, len);
}

Borrowing looks like this in memory:

You can see that the reference s only points to s1, not to the location of the heap memory. So borrowing can be seen as a reference to a variable. Because of this feature, the borrowing does not obtain the ownership of s1, but only temporarily borrows it.

So after the get_len function ends, since s is just borrowing, it has no right to release hello. The ownership is still in the hands of s1.

Mutable and immutable references

The above borrowing has solved the problem of passing parameters very well, but it is not finished yet. Sometimes we want to modify the corresponding value. If you modify it directly, you will find an error. This is because you did not add the mut keyword.

fn get_len(s: & amp;mut String) -> usize{<!-- --> //Variable string reference
    return s.len();
}

fn main(){<!-- -->
let mut s = String::from("hello"); //Get a String type from a string literal
let len = get_len( & amp;mut s);//Pass in a variable string reference
    println!("{}'s len is {}", s, len);
}

The problem seemed to be solved and everyone seemed happy, but then a bigger problem emerged. Let’s look at the following example

fn main(){<!-- -->
let mut s: String = String::from("hello");
let mut re1: & amp;String = & amp;mut s;
    let mut re2: & amp;String = & amp;mut s;//An error is reported here
    println!("{} {}", re1, re2);
}

The reason for the error is simple, two mutable references to the same variable are not allowed in the same scope. Rust does this to prevent data races from happening. Data races can be caused by:

1. Two or more pointers access the same data at the same time
2. At least one pointer is used to write data
3. There is no mechanism for synchronizing data access
(Reference source: https://course.rs/)

At the same time, in order to ensure that immutable references will not cause exceptions due to modifications to mutable references, it is stipulated that immutable references and mutable references cannot appear at the same time in the same scope.

fn main(){<!-- -->
let mut s: String = String::from("hello");
let mut re1: &String = &s;
    let mut re2: & amp;String = & amp;mut s; // cannot appear at the same time
    println!("{} {}", re1, re2);
}

But within the same scope, multiple immutable references can appear at the same time.

If you really need to use multiple mutable references, you can use curly braces to create a new scope.

fn main(){<!-- -->
let mut s: String = String::from("hello");
{<!-- -->
        let mut re1: &String = &s;
    }
    let mut re2: &String = &mut s;
    println!("{}", re2); //Re1 cannot be called at this time, because its scope is within curly braces, which is invalid here
}

Life cycle

For Rust, each variable has its own life cycle, that is, each variable has a valid scope.

If a variable has expired, but still use his reference, it will cause a dangling reference. And Rust’s life cycle is designed to avoid the error of dangling references.

Dangling reference

Take a look at the following code

fn main() {<!-- -->
    let result;
    {<!-- -->
        let tmp = String::from("abc");
        result = & amp;tmp;
    }
    println!("{}", result);
}

The above code will report an error. This is because the life cycle of result is within the entire main, but tmp is only an internal local variable. When printing, tmp has expired and can be considered empty.

But at this time result still holds and wants to use the reference of tmp, so an error is reported. Take a look at the error message:

error[E0597]: `tmp` does not live long enough
 --> src\main.rs:6:18
  |
5 | let tmp = String::from("abc");
  | --- binding `tmp` declared here
6 | result = &tmp;
  | ^^^^ borrowed value does not live long enough
7 | }
  | - `tmp` dropped here while still borrowed
8 | println!("{}", result);
  | ------ borrow later used here

For more information about this error, try `rustc --explain E0597`.

The content is very straightforward. It says that result borrows a variable that has not grown yet.

Let’s take a look at the life cycle of the two variables (that is, the survival time)

fn main() {<!-- -->
    let result;---------------------------------- +
    {<!-- --> |<-result life cycle
        let tmp = String::from("abc");----- + |
        result = & amp;tmp; tmp life cycle -> | |
    }----------------------------------------- + |
    println!("{}", result); |
}--------------------------------------------- +

From the above figure, we can intuitively see that the survival range (life cycle) of result is greater than the life cycle of tmp. A large lifetime variable borrows a small one, so an error occurs.

We just need to modify the above code to make it correct

fn main() {<!-- -->
    let result;
    let tmp = String::from("abc");
    result = & amp;tmp;
    println!("{}", result);
}

At this time, when print is called, the life cycle of tmp has not ended, but at this time you can also find that the life cycle of result is actually greater than tmp, which means

It is not necessarily wrong to borrow from a large cycle to a small cycle. But in many cases, Rust still thinks that this situation is risky and will reject us at the compilation stage.

But there is absolutely nothing wrong with a small cycle borrowing from a large cycle (because at this time, the small cycle will fail first and will not borrow an invalid large cycle object)

Function life cycle declaration

Let’s first look at a function. There is a requirement to return the longest one of the two strings. At this point you will want to write code like this

fn get_greater(x: & amp;str, y: & amp;str) -> & amp;str {<!-- -->
    if x.len() > y.len() {<!-- -->
        x
    } else {<!-- -->
        the y
    }
}

At this time, you think the code is written correctly, but you find that the compiler reports an error. You will find the error as follows:

error[E0106]: missing lifetime specifier
 --> src\main.rs:9:37
  |
9 | fn get_greater(x: & amp;str, y: & amp;str) -> & amp;str {<!-- -->
  | ---- ---- ^ expected named lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `x` or `y`

What does it mean, that is to say, Rust expects you to manually add a life cycle declaration.

In Rust’s eyes, this code returns a string reference &str. And this reference may come from x or y, or other places.

It depends on how you write the internal implementation of the function. At this time, the Rust compiler is confused because it does not know how long the life cycle of &str is.
Is it as long as x? Or is it the same length as y? then what? If it is as long as x, and the life cycle of x is greater than y, then returning a reference to y may cause a dangling pointer.
Look at the following example:

fn main() {<!-- -->
    let result;
    let s1 = String::from("abc"); //s1 has a long life cycle
    {<!-- -->
        let s2 = String::from("abcd");//s2 has a small life cycle
        result = get_greater(s1.as_str(), s2.as_str()); //Assign the small reference to the big one
    }
    println!("{}", result); //S2 has been destroyed at this time, and the result pointer is hanging.
}

fn get_greater(x: & amp;str, y: & amp;str) -> & amp;str {<!-- -->
    if x.len() > y.len() {<!-- -->
        x
    } else {<!-- -->
        y //The smaller one is returned at this time
    }
}

The code above produces this unsafe behavior because we may be returning a variable with a small lifetime that is borrowed from a large lifetime.

At this time, because Rust does not know what operations your function does, it cannot infer whether the borrowed life cycle of the return value is large or small.

So at this time, Rust requires you to limit the life cycle of the reference of the incoming parameter to ensure that the life cycle of the return value can be inferred by the Rust compiler (yes, the life cycle annotation is to tell the Rust compiler your return How long is the lifetime of the value, allowing Rust to detect potential dangling pointer errors)

Life cycle annotation symbols start with ‘, for example

'a
'b
'abc

They need to be placed in angle brackets after the function name to indicate that this is a life cycle annotation variable.

fn get_greater<'a>(x: & amp;str, y: & amp;str) -> & amp;str {<!-- -->
    if x.len() > y.len() {<!-- -->
        x
    } else {<!-- -->
        the y
    }
}

Then you need to mark subsequent references with a life cycle. Its writing method is to add the life cycle tag variable and the type after & amp;

fn get_greater<'a>(x: & amp;'a str, y: & amp;'a str) -> & amp;'a str {<!-- -->
    if x.len() > y.len() {<!-- -->
        x
    } else {<!-- -->
        the y
    }
}

Let’s look at the above code, which defines a life cycle label variable ‘a, where the size of the life cycle of x is ‘a, and so is y. So does the return value. That is to say, x, y, and the return value have the same life cycle.

You might be wondering what this means, x and y can obviously be different. At this point you can think that the size of ‘a is the smallest of x and y (obviously it is wrong to force a small one to have a large one).

So the annotation of the above code is telling the compiler two things:

  1. Inside the function, the returned reference must have the same life cycle as x and y, otherwise there is a problem with the implementation inside the function.
  2. For the outside of the function, the returned value is the smallest one in the life cycle of x and y, so when checking externally, if this small one is assigned to the big one, the compiler can directly predict and report an error.

After reading the above two points, you will find that adding a life cycle does not change the program at all, it just points out a way for the compiler to check your errors.

Let’s look at the following code to deepen our understanding of this sentence:

fn main() {<!-- -->
    let result;
    let s1 = String::from("abc");
    {<!-- -->
        let s2 = String::from("abcd");
        result = get_greater(s1.as_str(), s2.as_str());
    }
    println!("{}", result);
}

fn get_greater<'a>(x: & amp;'a str, y: & amp;str) -> & amp;'a str {<!-- -->
    if x.len() > y.len() {<!-- -->
        x
    } else {<!-- -->
        y //An error will be reported here
    }
}

The above code only adds life cycle restrictions to x and the return value, that is to say, the return value has the same life cycle as x.
But at this time the Rust compiler checks your code and finds a loophole in your logic, because you return y, and the lifetime of y is not ‘a. So at this time, an error is reported at y

fn main() {
    let result;
    let s1 = String::from("abc");
    {
        let s2 = String::from("abcd");
        result = get_greater(s1.as_str(), s2.as_str()); // report an error here
    }
    println!("{}", result);
}

fn get_greater<'a>(x: & amp;'a str, y: & amp;'a str) -> & amp;'a str {<!-- -->
    if x.len() > y.len() {<!-- -->
        x
    } else {<!-- -->
        the y
    }
}

At this time, we modify the function, and the Rust compiler will find that the internal return value of the function has the life cycle of ‘a. So the logic inside the function is correct.

Then the Rust compiler will start to check the caller externally at this time, because the function specifies that x and y have the same life cycle.

But one of the life cycles is too small (s2 is too small), causing ‘a to be the life cycle of this life cycle (‘a is s2) that is too small. At this time, the compiler checked it and found that (the compiler only knows the life cycle of the return value by looking at the function name, and does not need to know the implementation details inside the function. This is why the life cycle is marked. Just for the compiler), the life cycle of the return value is less than the life cycle of the variable it is assigned to.

That is, the large period variable at the beginning borrows the small period variable. So report an error decisively at this time.

From the above, one thing can be seen. The life cycle annotation is to enable Rust to check your logical errors within the function during compilation and externally check whether errors will occur when calling. Therefore, this annotation only serves a checking purpose (Serve the compiler well)

Life cycle declaration of structure

References may also appear in structures and enumerations. At this time, we need to mark the life cycle of each reference.
as follows:

struct Test<'a>{<!-- -->
name: &'a String;
}

The meaning of this annotation at this time is that the life cycle of name is at least longer than that of the structure. We can give a counterexample as follows:

struct Test<'a>{<!-- -->
name: &'a String
}

fn main() {<!-- -->
    let test;
    {<!-- -->
    let name = String::from("abc");
        test = Test{<!-- -->
        name: & amp;name //An error will be reported here, saying that the life cycle of name is too short.
        };
    }
    println!("{}",test.name);
}

At this time, Rust found out after checking that the name has not been tested for long, so it reported an error decisively.
The life cycle also needs to be marked when implementing using impl, because this is equivalent to part of the structure name.

impl<'a> Test<'a>{<!-- -->
fn print_hello( & amp;self) -> (){<!-- -->
\t\t
}
}

Automatic inference of Rust life cycle

In some cases, Rust can infer the life cycle of the return value by itself, mainly according to the following rules:

Let’s look at example 1:

fn test(s: & amp;String) -> & amp;String{<!-- -->
\t
}

The above code does not report an error because Rust has already inferred the life cycle of the return value.

  1. According to the first rule, each parameter of reference type has its own lifetime, so s has a lifetime
  2. According to the second rule, there is only one input, so the life cycle of this input will be given to the return value

So Rust inferred the life cycle of the return value reference. This is because the life cycle of the return value can only come from the input, because the object created inside the function will cause a dangling reference when the reference is returned (because as soon as the function ends, it is The referenced object becomes invalid).

Let’s look at another situation

impl<'a> Test<'a>{<!-- -->
fn print_hello( & amp;self, word: & amp;String) -> & amp;String{<!-- -->
word
}
}

An error will not be reported in the above situation, for the following reasons:

  1. According to the first rule, each parameter of reference type has its own life cycle, so self and word have their own life cycle
  2. According to the third principle, the life cycle of self will be assigned to word and the return value. At this time, the life cycle of all parameters has been inferred.

Life cycle constraints

For life cycle labeling, we can label any number of categories, such as the following formula:

fn test<'a, 'b>(s1: & amp;'a String, s2: & amp;'b String) -> & amp;String{<!-- -->
s1
}

Of course, the above code is wrong, because the compiler cannot infer the lifetime of the return value.

After a little modification we got this:

fn test<'a, 'b>(s1: & amp;'a String, s2: & amp;'b String) -> & amp;'a String{<!-- - ->
s1
}

At this time, we have obtained this formula, so we don’t report an error. What if we suddenly want to change the formula? What if we return to s2

fn test<'a, 'b>(s1: & amp;'a String, s2: & amp;'b String) -> & amp;'a String{<!-- - ->
s2
}

At this time, the compiler reported an error again, because the compiler did not know the relationship between s1 and s2. The return should be the period of ‘a, but the actual return was the period of ‘b.

At this point, we can solve the problem by adding constraints to the life cycle relationship:

fn test<'a, 'b>(s1: & amp;'a String, s2: & amp;'b String) -> & amp;'a String
    where 'b: 'a //'b life cycle is greater than 'a
{<!-- -->
s2
}

No error is reported at this time, because what we return at this time is the ‘b period, and it is larger than ‘a. So this return will not cause a dangling reference (Because the return is larger than ‘a, ‘a will cause a dangling reference, ‘b may not cause it)

Static life cycle

For some variables, their life cycle may be the entire running period of the program, so you can use the special annotation ‘static’ to declare a life cycle for the entire program.

fn test(s: & amp;'static str){<!-- -->
\t
}

Among them, the common string literal value &str adopts the life cycle of the static type.