RUST – Safety During and After Programming

Intro

The programming language Rust is dwelling on the web for half a decade already. Initially started as a personal project by an Mozilla employee and later continued by the Mozilla Foundation itself, it repeatedly gained attention with the claims of being the language of choice for technically safe software – including system software.

With this blog entry I want to give you an introduction to the language and talk about how said safety has been proven mathematically.

By the way, if you want to try out some code snippets from this blog post, you can do so directly online here: https://play.rust-lang.org/

 

The Basics

Rust is a fully compiled language with a Syntax fairly similar to C.
The basic language is procedural and the variable scopes are within curly brackets ( ‘{}’ ).

A first significant difference is the variable initialization with the keyword Rust and optional implied variable type. A very simple program looks as follows (Note, the code-blocks say “C/C++” because Rust is sadly not available here >: ):

fn main() {
   let on_world = true;
    
   if on_world 
    {println!("Hello World!");}
   else
    {println!("Goodbye, World.");}
}

It has the output “Hello World”.

To choose the variable type explicitly, the first line would need to look as follows:

let on_world : bool = true;

So far so good and unsurprising.
Things turn more interesting in the following example:

let s1 = String::from("Hello");
let s2 = s1;
println!("{}, world!", s2);
println!("{}, world!", s1);

ERROR: -> Value used here after MOVE

This error occurs because Rust MOVES the content of non-primitive variables by default instead of copying them. That simply means that the String “Hello” is in variable s2 and not anymore in s1 which has been invalidated automatically.
If you do want to COPY the content, you must explicitly write:

let s2 = s1.clone();

The following is just another variant of the same aspect but shows a different way of avoiding the error:

let s1 = String::from("Hello");
toWorld( &s1 );
println!("{}, world!", s1);

 

toWorld is a function here which takes a single argument of the String type.
Note the ‘&’ symbol for the variable. Without it, we would have the same error as before – the println macro (behaves here like a function) would raise an error because s1 isn’t anymore.

However ‘&’ caused that only a reference has been given away, not the content of the variable itself.
The result is that the outer scope can still use s1 but the toWorld function would not be able to modify the variable.

We will talk about the background and reason of this behavior later on.

Let’s stay at functions and look at a code snipped you might accidentally write similarly in C++ and thus end up in one of its infamous traps:

fn main()
{
   let reference_to_nothing = dangle();
}
fn dangle() -> &String
{
   let s = String::from("Hello");
   return &s;
}

ERROR: -> no value to BORROW from

The function dangle ís defined to have the return type “reference to String”.
However while C++ would allow you to return this reference of the object that has been created inside the function, Rust throws an error at runtime.

The reason is that objects are automatically destroyed when they end out of their scope. That means after the function-call, s doesn’t exist anymore.
Furthermore the compiler recognized that the variable would be empty and doesn’t allow that.

The result is that there are no dangling references in Rust and therefore no “null pointer” or “access violation” trouble!

Now on to the last basic example I shall give you as it shows another unique feature of the language:

let mut s = String::from("Hello");
let rs = &mut s;
rs.push(‘W’); // pushes a letter into the string
s.push(‘o’); // pushes another letter

ERROR: -> cannot borrow ‘s’ as mutable more than once

So far, all variables – or maybe we should just call them ‘objects’ – we defined simply with ‘let, were in fact what other languages would call constants. They were not changeable.

Of course, usually we want variables which are… well, variable.

That’s where the ‘mut’ keyword comes into play and tells the compiler to allow us to change the value. However the upper code fails with said error nevertheless. The reason is that technically spoken, we have given the RIGHT to modify the variable to the variable “rs”. As a result, ‘s’ does not have this right anymore and therefore may not modify the content.

The reason for this initially cumbersome behavior is what the developers of Rust call the “Ownership Concept”.

 

The Ownership Concept

The roots of said constrictions can be summed up in three rules:

  1. Each value has a variable that’s called its “owner”
  2. There can only be one owner for the value at a time
  3. When the owner goes out of scope, the value will be dropped

The consequences are that no memory is accessible without “owning” it and there’s no memory around which doesn’t have an owner (except for inside ‘unsafe blocks’ about which we will talk  later).

This setup prevents several problems and has certain benefits:

  • Secure memory handling
    • Malicious code (SQL injection, buffer-overflow etc.) cannot access memory because it doesn’t own it
    • No invalid access attempts at runtime
  • Object-Lifetime determined at compile-time
  • Clear knowledge where data is manipulated in in complicated, hierarchical software
    • Less unintended behavior
    • No race conditions
    • Feels safe during programming

However those restrictions, if enforced strictly, also make some nowadays very normal and required things impossible:

  • No simple recursive algorithms on the same data-structure
    • The owner and scope system complicates things
  • Some design patterns not possible (Factory-Pattern for example)
  • No Multithreading
    • As an object only has one owner, multiple threads would not be able to work on the same data structure
  • No direct hardware access
    • Drivers naturally require direct memory allocation

With the standard API included in rust those things can be achieved through special data structures though. One of those is “CELL”.

let c1: &Cell = &Cell::new(0);
let c2: &Cell = c1;
c1.set(2);
println!("{:?}", c2.get());

The “Cell” encompasses a value (pretty much like a container in java) and provides functions to set and get the value. This works even with references of the cell! Unlike if it were a String for example as we saw earlier.

This data-structure provides certain skills for efficient algorithms (and that at zero cost after compilation). Yet it retains Rust’s memory safety with a series of constrictions enforced by the API:

  1. Data only accessible through set and get
  2. No deep-stacking (aka Cell in Cell)
  3. No passing to multiple threads

Now the big question: How is it even possible that something like Cell, a structure that breaks the fundamental rules of Rust, has been implemented in Rust?

The answer is: It is not possible. Something based on certain rules cannot break those rules.

That’s why Rust supports the so called “UNSAFE” keyword!

 

The “Unsafe” keyword

“Unsafe” can be used before a code-block and allows a group of very specific calls inside which are otherwise rejected by the compiles.
As it can be seen in the following example* one can more or less simply allocate memory inside such a block. However the compiler still ensures that nothing actually unsafe – like a raw memory-address (pointer) – from inside the block can be taken outside.

*Note that the code of this example has been simplified to be better readable!

let mut safe_box: Box;
unsafe {
   let size = 20;
   let unsafe_ptr = Heap::allocate(size);

   safe_box = Box::from_raw(unsafe_ptr, size);
}

If we had defined the “unsafe_ptr” directly outside the “UNSAFE” scope, the compiler would have thrown an error. But because we “BOXED” the memory into the special object “Box”, we have given that memory an owner and thus satisfy the ownership rules again.

Therefore, we are allowed to take this memory outside the unsafe block and the “Box” will also ensure that the memory is freed once the variable reaches the end of its scope.

Using constructs like this, Rust gains the full power of a system language it claims to be. At the same time its safety is based on the strict rule not to use the “unsafe” keyword directly unless absolutely necessary.

Still the programmer will use “unsafe code” indirectly. For example by using the “Cell” object shown earlier.

The second, big question now is, how can the programmer trust that the usage of such an object (through its API) will be safe, when its “insides” are officially not?

 

Mathematical proof that an unsafe code has a safe API

The computer scientists RALF JUNG, JACQUES-HENRI JOURDAN, ROBBERT KREBBERS, and DEREK DREYER have developed and performed a method for find a mathematical solution to this problem.
They described that in the paper: RustBelt: Securing the Foundations of the Rust Programming Language

The work consisted of four steps:

  1. Abstracting the Rust language to an intermediate language λRust (LambdaRust) which can form mathematical axioms and yet describe all of Rusts significant features. Especially its ownership concept.
    This set of formal, primitive constructs is close to Rust’s own “Mid-level Intermediate Representation” (MIR) which is used by the compiler.For the true computer scientists among the readers, here is the whole syntax system of λRust:The paper describes the way it works in more detail.
  2. Prove that the axioms are mathematically sound and can therefore form rightful terms.The semantic interpretation of a premise always has to match the semantic interpretation of the conclusion.This ensures that the model can be used to make a statement about the safeness of the code later on.
    The paper calls this “Fundamental theorem of logical relations”.-> The correctness of λRust has been proven
  3. It has to be proven that code that follows λRust will not result in any unsafe or undefined behavior. Partially, automated code-interpretation were used to fulfill this task successfully.For example the existence of data-races can be checked semantically and the result was that λRust does not allow those to occur.The rest has been verified manually. The paper calls this “Adequacy”.-> The safety of the language Rust has been proven.The exact procedure can be seen in the paper and the associated papers.
  4. Convert the API of the important core libraries that use the unsafe keyword internally, into the new language. If this is flawlessly possible and therefore the result does not exhibit any unsafe/undefined behavior, that means the real API does not either.-> The safety of the library has been proven.

Step 4. has been performed for several APIs, including the earlier mentioned “Cell”: Arc, Rc, Cell, RefCell, Mutex, mem::swap, thread::spawn, rayon::join and take_mut.

 

Conclusion

RustBelt proves that a significant part of Rust is truly safe now (after a few corrections had been implemented during the development of the paper).
This is the first time that has been achieved to a fully usable and preexisting language.

Furthermore this has shown a method to achieve true safety in a program without needing to bother the programmer with low-level, mathematical axioms directly.
-> The developer can trust the language.

In the end that makes Rust better usable for truly secure systems as long as only verified libraries are used and the unsafe keyword is avoided.

It remains to be seen whether more developers will adapt to rust now due to this mathematical proof and thus increase the chance that the language will be in use for more projects.

In my personal opinion it would make a lot of sense even in not extremely safety-critical programs like it’s used now, because it would generally raise the base of trust in general software and maybe also reduce the number of minor exploits significantly.

 

Research Questions

Those thoughts and my attempts to understand Rust as well as the paper has revealed a couple of questions that will require more research:

Could this verification method be applied to libraries in other languages too? For example to Java where an “unsafe” class exists as well?

Can a theoretical language with proved safety, be converted into a widespread usable language? (as if λRust were developed first)

How well could Rust replace C and C++ as system languages on a large scale?

Powerpoint: Rust