Get car location using Raspberry Pi and Google Cloud IoT Core

Project idea

Have you ever been in the situation, that you parked your car somewhere in the city and some hours later, you couldn’t remember where you parked it? You may wish to have an application on your smartphone, which is able to locate your car. From this consideration, the development of CarFuelLocate started.

Goal

Using CarFuelLocate you are not only able to track the current location of your car, you are also able to track the fuel level of your car. Moreover you’re also able to locate the cheapest and the closest gas station, according to the matching fuel type.

CarFuelLocate Web Application

Hardware

We equiped our car with a Raspberry Pi 3B to get the telemetry data of the car. We expanded the Raspberry Pi with a expension board, which made Bluetooth, UMTS and GPS connections through a serial interface possible. The retrieving of the vehicle data (e.g. fuel level) works through a OBD2 diagnostic device, which is connected to the Blutotooth interface of the Raspberry Pi extension board.

The processing of these data is made through a simple Python script, which is running on the Raspberry Pi, which sends the data to the Google Cloud Platfrom IoT Core, where it will be processed by a cloud function (we will look at this later).

Architecture and cloud components

CarFuelLocate consists of multiple parts. The Raspberry Pi, a simple Node.js backend , a simple Vue.js frontend and several cloud services and apis.

basic architecture of CarFuelLocate

Raspberry Pi

As already told, the Raspberry Pi collects the vehicle telemetry data and sends them to the Google Cloud IoT Core.

Cloud IoT Core and Cloud Functions

The Google Cloud IoT Core is a service that allows you to manage IoT devices and collect real time data of the devices. In our use case, it basically acts as an authentication service for the Raspberry Pi.

The Python script, running on the Raspberry Pi, sends it’s requests to the Google Cloud, where it is authenticated using the Cloud IoT Core. The request is served by a cloud function, which writes the data (e.g. the position of the car) directly into the Cloud Firestore database.

async function updateLocation(data,deviceId){
  const docRef = db.collection('location').doc(deviceId);
  await docRef.set({
    lat: data.lat,
    lon: data.lon,
  });
}

example code for a cloud function, saving the location to the database

When a user wants to connect his Raspberry Pi for the first time, he receives a token from the web application, which he has to enter into the Python application on the Raspberry Pi, which starts the registration.

During this process the Raspberry Pi generates a RSA keypair and appends his public key to the Cloud IoT platform, which assignes a individual device id, which is used to assign the device to a specific user.

Cloud Firestore

The Cloud Firestore stores all the user information, including the data sent by the Raspberry Pi (e.g. location) and the data, that is configurable by the user, through the frontend (e.g. fuel type).

The Cloud Firestore is accessed by the Cloud Functions and the Node.js backend.

Google Cloud Identity platform

We used the Google Cloud Identity platform as OAuth provider, so that users are able to login with their Google Account.

Node.js Express backend

The application consists of a simple Node.js Express backend, which handels the API requests by the frontend and handles the one-time Raspberry Pi user registration.

The backend is deployed in an App Engine, which is pretty easy becaue you basically just have to upload the code and the Google Cloud Platform does the rest. This gives us the possibility to focus on our code and the responsibility to operate the Node.js server rests in the hands of Google.

Vue.js frontend

A simple Vue.js frontend serves the car related data (e.g. location, fuel level, cheapest gas stations) with a responsive web application.

CarFuelLocate web application

The frontend is also deployed in an Google Cloud App Engine.

APIs

The application is able to show the cheapest and the nearest gas station. The backend is accessing the “Tankerkönig”-API to get the desired data about the gas stations. The API lists all gas stations in Germany, their position, and the price details.

The frontend uses the Google Maps API to display the car and the gas stations on a interactive map.

CI/CD

We used the HdM GitLab not only for version control, we used the CI/CD functionality to perform automated testing and an automated build and deployment process. We used a shared GitLab runner, which was offered by our university.

Our pipeline, that is triggered on each commit, consists of automatically launched tests and an automated deployment process, which deploys the App Engines to the Google Cloud Platfrom.

We developed on different branches. The development branches were deployed to different development App Engine instances. By merging the changes to the master branch, the changes were deployed to the production environment.

Challenges

During the development, we run into several challenges and learned a lot.

At the beginning we completly developed CarFuelLocate without the Cloud IoT Platfom. We defined a token on the Raspberry Pi which was than copied into the frontend of our web application. Doing this it was possible to send requests to the backend, by unauthorized clients. By brute forcing the token, an attacker would be able to manipulate the user data.

As we noticed our security problem, we switched to the Cloud IoT Platform. Now this kind of attack isn’t possible anymore. The individual device id is now assigned to a RSA keypair. If an attacker guesses the device id, he still isn’t able to manipulate user data, because he isn’t in possession of the RSA keypair.

Another attack that would be conceivable is that an attacker guesses the one-time token, that is shown the user, during the first-time device registration in the web application. So an attacker could connect his own client to the account of another persoen, during the period where the real user tries to connect his own Raspberry Pi. This could be solved by providing a confirmation query for the real user in the web application, asking whether he really wants to connect this Raspberry Pi to his account.

Google Cloud Platform

There is no specific point, why we used the Google Platform. Our decision is based on serveral points.

First of we learned a lot about the IBM Cloud and AWS in our Lecture Software Development for Cloud Computing and wanted to learn how other cloud providers work. Another point is, that we wanted to develop the whole project with one service provider and the Google Cloud Platfrom offers everything we wanted.

Alternatively we could have used Amazon Web Services and use their IoT platform, in combination with EC2 instances and a Amazon DocumentDB for our web application. Our project would also be able using the IoT platform of the IBM Cloud. Amazon and IBM doesn’t offer a map service (like Google Mpas), but we would still be able to use the Google Maps API in combination with the IBM Cloud or AWS, or we could have used OpenStreetMap.

There are many ways to develop a cloud based IoT application, our procedure is only one way.

Your first Web App in the cloud – AWS and Beanstalk

Hello fellow readers! 

In this blog you will learn how to set up a web-game with worldwide ranking in the cloud without having to deal with complicated deployment. That means for you: More time on your application. 

The app uses Node.js with Express and MongoDB as backend. The frontend is made out of plain html, css and javascript. For deployment we used the PAAS (platform as a service) from Amazon AWS, Beanstalk.

You can find the game on GitHub:
https://github.com/Pyrokahd/CircleClicker

App 

CircleClicker is a skill game with the goal to click on the appearing circles as fast as possible before they disappear again. 

We used javascript for the Frontend and Backend, in the form of Node.js. Javascript is well known and is used almost everywhere. This is also connected to a huge community and therefore already many answers to questions asked. 

Tip:
When deciding which programming language or tool you want to work with, always consider what the community is like and how many tutorials are available. 

Backend

Getting started

After installing node.js and express, we can use the “Express Application Generator” to generate a skeleton web server. We do this by navigating to the folder, in which we want the project to be and type the following command:

With –view we define which view engine to use, for example pug or ejs. Those are used to generate response pages using templates and variables.
However in this project we don’t use them except for the auto generated uses, to keep it simple.

To install additional modules type npm install ModuleName.

This will add the dependency to the package.json. Beanstalk will install all the modules listed in your package.json file automatically, so you don’t have to worry about uploading or installing them by yourself.

Now we add responses for the server to reply to client requests.

The first one sends the default page to the client, once it is connected. This will be our html site containing the game.

app.get('/', function(req, res, next) {
    res.sendFile(path.join(__dirname +           "/public/main.html")); 
});

Here we tell our server to respond to GET requests on the root server path “/”, which is the URL without anything behind it. The Response is our main.html located in the public folder behind the root. This means we have our main.html as the default web page.

The next GET response will respond to the url “/getScore” by sending the top 10 highscores to the client. 

app.get('/getScore', function(req, res, next) {

In there we make a query to our MongoDB and then construct a JSON string out of it to send it to the client. The query is shown later when we talk about MongoDB.

The last response answers a POST request from the client and is used to create new entries for the database, if the player sends a score.

app.post('/sendScore', function(req, res, next) {

In this function we receive a name and a score from the client in the form of a JSON object. Those variables are then used to create a database entry.

Frontend

The Frontend consists of the HTML page, the CSS stylesheet and one or more javascript files for the logic and functionality.

The HTML and javascript files are stored in their respective folder under the “public” folder in the server.

HTML

For the HTML document the important parts are a form to enter the archived score plus a name and buttons to start the game and send the score to the server. Other important parts, like the HTML canvas are generated once the game is started.

How exactly the html looks is not important. It just needs all the relevant elements with a respective ID, to be called upon in the javascript file, in which all the event handling and game logic happens.

The following should be avoided because it violates the content security policy (CSP):

  • inline css
  • inline javascript
  • inline eventhandler (i.e. onClick=”function” for a button)
  • importing scripts which are not hosted on your own server (i.e. download jquery and put it on your server)

It is also important to import the gameLogic script at the end of the html body. That way the html will be loaded first and we can access all elements in the javascript file without trouble.

Javascript

First we give the buttons on the site some functions to invoke, by adding click events to them. A HTML canvas is created inside an object, which holds the gamelogic, once the site has loaded. 

document.getElementById("startBtn").addEventListener("click", startGame);
document.getElementById("showPopUpBtn").addEventListener("click", showPopUp);
document.getElementById("submitBtn").addEventListener("click", sendScore);
document.getElementById("resetBtn").addEventListener("click", hidePopUp);

The first three of those buttons provide the necessary functionality to play the game and interact with the server. 

The first one “startBtn” does exactly what the name suggests. It starts the game by calling the startGame function. This sets some variables like lifes and points and then starts an intervalTimer to act as our Game Loop. After a certain time, which decreases the longer the game runs, a circle-object is spawned. This has the logic to increase points or decrease lifes and despawn after a while.
The canvas has an EventListener to check for clicks inside it. To check if a circle is clicked, the mouse-position and the circle-position are compared (the circle position needs to be increased by the offset of the canvas position and decreased by the scroll-y height).

The second button has the functionality to show the popup-form, which is used to enter a name and send it, together with the archived, score to the server. This function is also called automatically once you lose the game.

The submit Button calls the sendScore function, which uses jquery and ajax to make a POST request to the server and send the score and username as a JSON object named data. Here we also use the “/sendScore” url which we set for a post response, at the server.

const serverURL = window.location.origin;

$.ajax({
            type:"POST",
            async: true,
            url: serverURL+"/sendScore",
            data:data,
            success: function(dataResponse){
                console.log("success beim senden: ");
                console.log(dataResponse);
                setTimeout(requestLeaderboard,1000);
            },
            statusCode: {
                404: function(){
                    alert("not found");
                } //, 300:()=>{}, ...
            }
});

The last relevant function is to request the leaderboard. This is called upon loading the page and once we send our new score to the server, to update the leaderboard. This also uses jquery and ajax to make it easier to write the GET request.

$.ajax
({
    type:"GET",
    async: true,
    url: serverURL+"/getScore",
    success: function(dataResponse){
    highScores = JSON.parse(dataResponse);

    //Build html String from JSON response
    var leaderBoardString = "<h2>Leaderboard</h2>";
    var tableString = '<table id="table"><tr>
                       <th>Name</th>  <th>Score</th>';
    for (var i = 0; i < highScores.user.length;i++){
        tableString += "<tr><td>"+highScores.user[i].name
                    +"</td><td>"
                    +highScores.user[i].score
                    +"</td></tr>";
    }
    tableString += '</tr></table>';
    //Update leaderBoard with created table
    document.getElementById("leaderboardArea").innerHTML
                    = leaderBoardString+tableString;
     },
     statusCode: {
         404: function(){
             alert("not found");
         } //, 300:()=>{}, ...
    }
});

The “/getScore” url from the GET response of the server used to get the top 10 scores as a JSON object. This JSON object is then used to create a table in the form of a HTML string, which is set as the innerHTML of a div-box.

MongoDB

In our application we use MongoDB together with the mongoose module. The mongoose module simplifies certain MongoDB functions. 

Host

As the host for the database we used MongoDB Atlas, which lets you create a free MongoDB in the cloud of a provider of your choice. We chose AWS, since we also use their Beanstalk service. The free version of this database has some limits, but for a small project like this, it is more than enough.

After a database is created, you can create a user, assign rights and copy a connection-string from MongoDB Atlas. 

Connection

We are using this string and mongoose to connect to the DB in a few lines of code.

mongoose.connect(mongoDBConnectionString, { useNewUrlParser: true },);
//Get the default connection
var db = mongoose.connection;
//Bind connection to error event (to get notification of connection errors)
db.on('error', console.error.bind(console, 'MongoDB connection error:'));

After the connection is established we can create a mongoose model, which is like a blueprint for the entries in one table. This model is created from a mongoose schema in which we define the structure.

var UserModelSchema = new mongoose.Schema({
name: String,
score: Number
});
var UserModel = mongoose.model('UserModel', UserModelSchema );

Create Entry

Now we are ready to add new entries to the database. This is easily done by creating an instance of the mongoose model and filling in the appropriate fields. After calling the save function, a new entry is added.

var testInstance = new UserModel({name: _name, score: _score});
    testInstance.save(function (err, testInstance) {
        if (err) return console.error(err);
        console.log("new Entry saved");
    });

Query

To make a query is just as easy. First we define a query object and set the parameter according to our needs.

// find all athletes that play tennis
    var query = UserModel.find();
    // selecting the 'name' and 'score' fields but no _id field
    query.select('name score -_id'); 
    // sort by score
    query.sort({ score: -1 });
    //limit our results to 10 items
    query.limit(10);
    //to return JS objects not Mongoose Documents
    query.lean();

We can then execute that query and receive a result depending on the settings

query.exec(function (err, queryResult) {
      if (err) return handleError(err);

        //use queryResult (JS-Object)
        //to create JSON string

        //and send it to client
        res.send(resJSON); 
    });

Into the cloud!

But how do we get into the cloud? The first thing we do is decide on a cloud provider. We very quickly chose AWS because we were particularly impressed by the service they offered. It’s about AWS Beanstalk. Amazon promises to take over a lot of backend provisioning, like load-balancing,scaling, e2-instances and security groups.

Which means more time for actually programming. 

Does that work? To a large extent yes! 

How does it work?

Very simple. You create an Amazon AWS account, go into the development console and create a new AWS Beanstalk environment. Here you define the domain name and in which programming language your app is written. Then start the environment. Afterwards you upload the app which is put together to an archive and check on the domain if the app works. Updates are done the same way. Merge the new code into an archive, upload and select. 

You now have a website with your own game. Load Balancer is included, highscore is included and basic security settings are set. 

Check Logs on CloudWatch

In your Beanstalk environment configuration under monitoring, you can enable your logs to be streamed to CloudWatch. This comes with some extra cost, but allows you to have all your logs online and centralized in one location. 

CloudWatch also has many more functionalities like alarms when your application gets downscaled because there is less traffic, or monitoring of the performance of your ressources.

However we are interested in our logs, which you find under your protocol groups in CloudWatch. There is one log with the ending stdout.log (by default). This is where all of the console.log or console.error messages from our server are located.

Security Settings

If you want to make your application even more secure, you can put a little more time into it. We have decided to set up HTTPS and use the SCP protocol. 

HTTPS

To provide HTTPS, proceed as follows:

  • Create a certificate for domain names in AWS Certificate Manager (ACM)
  • LoadBalancer https Configuration
  • Add listener
  • Port 443
  • Protocol HTTPS
  • Upload Certificate

Helmet Security

Helmet is a library that checks some basic security issues regarding the header. One important aspect of this is the Content Security Policy (CSP) header. It forces you to write your html files in a secure manner to avoid cross site scripting attacks.

Problem: CSP-header enforces security policies

Despite not being completely CSP conform it worked on the local server but

to make it work in the cloud we had to follow the CSP guidelines. So we had to remove all inline Javascript and CSS to eradicate all errors regarding CSP.

Problem: HTTPS Forwarding

By activating Helmet security the resources of the website become unreachable. Through the AWS logs we found the following error message, but could not find out exactly where the problem was:

2020/09/05 17:53:16 [error] 19019#0: *7 connect() failed (111: Connection refused) while connecting to upstream, client: 172.31.37.17, server: , request: “GET / HTTP/1.1”, upstream: “http://127.0.0.1:8080/”, host: “172.31.41.163”

We have also looked at load balancers and security groups, but all the settings are correct here. In further research with the Developer Console in the browser, we found out that https requests are required. 

GET https://…elasticbeanstalk.com/javascripts/GameLogic.js net::ERR_CONNECTION_TIMED_OUT

Since we do not have the required permissions, we could not set up https. So we had to turn off the CSP for our game. The already implemented security features such as the removal of css inline tags remain. 

In previous tests, while disabling helmet we didn’t generate a new package-lock.json file, which holds a tree of generated packages. Because this was not updated we didn’t remove helmet completely and got wrong results for the test, which led us wrongly to believe helmet was not the problem.

Conclusion

Although this was not the initial goal, we learned a lot about networking during this project. Especially topics like setting up a server as well as the communication between database, client and server. Despite the roadblocks, we were very positively surprised by the AWS Cloud and Beanstalk. We can recommend Beanstalk if you want to get simple applications and websites in the cloud ASAP. As soon as we implemented security related code we had to make an additional effort, which went beyond beanstalk alone, to get it running.  

Getting Started with your first Software Project using DevOps

Getting Started with DevOps cover image

by Benedikt Gack (bg037), Janina Mattes (jm105), Yannick Möller (ym018)

cover image

This blog post is about getting started with your first large-scale software project using DevOps. We will discuss how to design your application, what DevOps is, why you need it, and a good approach for a collaborative workflow using git.

This post consists of two parts. Part one is this article which is all about finding out which topics you should consider thinking about when starting with DevOps for your own project as well as a basic introduction to these topics and recommendations for further reading. The second part is our “getting started” repository which contains a small microservice example project with additional readme-files in order to explain some of the topics above in practice and for you to try our workflow on your own.

This post is a summary of knowledge and experience we collected while accompanying and supporting a larger scale Online Platform project called “Schule 4.0” over the period of 6 months in DevOps related topics.

Link to our example repository: https://gitlab.com/curiosus42/cc-getting-started-example


What is Software Architecture?

The Software Development Life Cycle

Before we can talk about Software Architecture, we have to talk about the life cycle of a typical software project, because a successful software project involves a lot of planning, consists of many steps and architecture is just one of them. Furthermore, actual coding is a late step in the process and can be quite time consuming when not planned properly. Without further ado, let’s have a look at a typical Software Development Life Cycle:

Software Development phases.
Software Development phases. Source

The first two steps, “Ideation” and “Requirements”, basically mean thinking about your project from a business and user perspective. Technical details are not important and might even be inappropriate, as you should involve all members of your small team in the process and not all of them have the technical knowledge. Also, don’t forget to write everything down!

In the Ideation phase, after you brainstormed ideas, we recommend creating a Use Case Diagram which shows how the users can interact with your software. You can learn more about that here.

Furthermore, we also recommend creating a requirements catalog for the requirements phase. You can download our template with this link. When you’re done thinking about your project and want to start with the technical stuff, but can’t describe your platform requirements in detail, why don’t you go with a more agile project life cycle? Just make sure you thought about your goal and how to achieve it beforehand.

Agile Software Development model.
Agile Software Development model. Source

Planning done? Then let’s start with designing your application, especially Software Architecture.

Difference between design patterns and architectural patterns

If you are familiar with the term Design Patterns like the Singleton and Iterator pattern in software development, don’t confuse them with the term architectural patterns. While design patterns are dealing with how to build the important parts of your software, the defining components, architectural patterns are all about how these parts are organized and playing together.

“[software] Architecture is about the important stuff. Whatever that is.”

Ralph Johnson (Fowler, 2003)

So, what you’re going to think about in the design process heavily depends on what you define as important components and which architectural style you choose. Just remember that developing is mostly filling in the blanks and applying some design patterns for whatever you came up with in the design phase or, to put it short, implementing your software architecture.

Which architectural pattern to choose?

3 architectural styles – There is always a trade-off

There are many architectural patterns to begin with. In order to understand what these patterns are all about, here is an introduction to the three major architectural styles. These styles describe the overall idea behind different groups of patterns. Note that it is possible to mix these styles together for your own use case.

N-tier:

N-tier applications are typically divided in X different logical layers and N physical tiers. Each layer has a unique responsibility and can only communicate with its layer below, but not the other way around.

N-tier vs monolithic architecture

A typical architecture for web applications is the Three-tier-architecture which is described in detail below, but you could also go with a monolithic approach and separate your one tier application into different logical layers, like most game engines and the Windows NT platform architecture do (user and kernel mode).

This style is the developer friendliest and easy to understand if you feel comfortable with monolithic software development. The challenge is to end up with meaningful logical and physical layers and to deploy small changes or features while the platform is already running, which could lead to rebuilding and deploying a large part of the application. If the application is split up into multiple layers, deployment is made easier, but there is a high risk that some layers are unnecessary and are just passing requests to the next layer, decreasing overall performance and increasing complexity.

Service based

In this architectural style, the application is split into multiple services, which communicate through a network protocol over a network. Each service is a black box with a well-defined interface, serves only one purpose, most likely a business activity, and can be easily designed for its purpose, for example by using different technologies like programming languages. These services are working by either chaining – service 1 calls service 2 which calls service 3 – or by one service acting as a coordinator. A typical modern implementation is the Microservice architecture, which will be covered in detail later.

Key benefits are:

  • maintainability: changes will only affect one specific service, not the whole application
  • scalability: each service can be load balanced independently
  • reusability: services are available system wide and in “Schule 4.0”, we often ended up reusing an already existing service as a template

The service based style is by far the most difficult architectural style. A big challenge is specifying your services to keep them as decoupled as possible from others, which is strongly dependent on the design of the service APIs. In addition it is much more complicated dealing with not only one but many applications.

Event-Driven

This style consists of two decoupled participants, producer and consumer. The producer creates an event and puts it in a queue, typically on behalf of an incoming request from a client, and one of the available consumers consumes the event and processes it. Pure event driven architectures are mostly used for IoT scenarios, but there are variations for the web context like the Web-Queue-Worker pattern.

Their clear benefit is high and simple scalability and short response times. A challenge is how to deal with long running tasks which demand a response, and when to favor direct shortcuts instead of a queue approach.

Architectural patterns in detail

The Three Tier Architecture

This before mentioned architectural pattern is of the N-tier architecture style and consists of three tiers and three layers. This architecture is very common and straightforward when developing simple web applications. The example below is for a dynamic web application.

Example of a Three Tier architecture for the web applications
Example of a Three Tier architecture for the web applications

The topmost layer is the presentation layer, which contains the user interface displaying important information and communicates with the application layer. Whether or not you need a web application or just static web pages, the layer solely runs on the client tier (desktop pc or mobile device) or requires a web server to serve and render the static content for the client. The application tier contains your functional business logic, for example the REST API for receiving and adding user specific dynamic content like shopping cart items, and the last layer, the data tier, stores all data which needs to be persistent.

In the example graph above, the react web application is served by a small web server and runs on the client devices. Next, the NodeJS application exposes a REST API to the client and talks to the mongo document database. The node app and the database are typically deployed into different tiers, most commonly a virtual machine and a dedicated database server or service.

Microservices

A microservice architecture is by far the hardest choice. This style is most often used by massive tech enterprises with many developer teams in order to run their large-scale online platform and allowing for rapid innovation. There are many big platforms like Amazon.com, Netflix and eBay that evolved from a monolithic architecture to a microservice architecture for many good reasons, but most likely none of them really apply to your project for now. To be honest, none of the reasons applied to our project “Schule 4.0” either, which only consisted of five developers, but in the never-ending river of new technologies we were able to find technologies that made things much easier, convinced us to at least give it a try and made things turn out great in the end. Those technologies will be discussed later.

The core idea is to have multiple loosely coupled services which expose an API. The biggest difference to normal service oriented architectures is that in a microservice architecture, each microservice is responsible for its own data and there is nothing like a centralized data tier. That means that only the specific service has access to its data and other services cannot access it directly other than via the service API.

Key benefits:

  • Highly maintainable, rapid innovation and development
    • Developers can work independently on a service
    • Services can be deployed independently
    • Services are typical small applications and therefore easy to understand
  • Testability: small services are easy to test
  • Availability: Errors in one service won’t affect other services
  • Dynamic technology stack
    • You can pick whatever technology you want whenever you want

Drawbacks:

  • Services are products and not projects: Each developer is responsible for their code over the whole life cycle
  • Highly distributed system
  • Requires to deal with continuous integration and deployment (which will be covered below)
  • Nightmare without containerization or virtualization

Domain Driven Design

Many challenges can be avoided by carefully designing your microservice application. A helpful method for designing your application is the Domain Driven Design approach and its notion of bounded contexts. The idea is that the most important thing is your core business, what your application is about. Our goal is to model the core business as different contexts and designing the architecture after that. In order to do that, here are some tips:

After finishing with your requirements analysis you can start defining the domain of your application. Imagine your project as a company. What is your company about? What are its products? We will call that your domain and the products your contexts.

Let us continue with a simplified example of the “Schule 4.0” model. The platform domain is all about teachers sharing content with their students in the form of pins, boards and exercises. With this knowledge we can already distinguish three contexts:

Example of a Domain Model for Schule 4.0
Example of a Domain Model for “Schule 4.0”

These contexts consist of different objects which have relationships. For example, a Board can contain multiple Pins and a User can own multiple Boards. Every context can be implemented as its own microservice. The crucial point later is how the services are connected to each other. Remember, all the data is physically separated into different services but needs to be logically connected in order to query data. That is where the bounded contexts appear.

Note in the graph above, that the User object can be found in multiple contexts. The object itself is most likely modelled differently in each context, but the overall idea of a user stays the same. The different User objects are explicitly linked together over the UserId attribute. Because the concept of a User is shared across multiple Contexts, we can query all data related to the current user.

Communication in a Microservice architecture – GraphQL Federation

After we have modelled the domain of our application, everything is connected logically, but we still cannot request and change data, because we did not define the service APIs yet. We considered two options:

  1. Specifying a RESTful API for each service using HTTP
  2. Specifying a Graph API for each service using GraphQL
  3. Do a mixture of both

Note: We do not recommend doing a mixture of both. Historically speaking, there are many solutions migrating from REST to Graph, but starting from scratch there is no need for that.

Regardless of which API flavor you choose, the data is still spread over multiple services. For example, if you want to display the dashboard for the current user, pin and board data, exercises and user data need to be collected from different services. That means the code displaying the dashboard needs to interact with each service. It is not a good idea to leave that task to the client application, due to various performance and security reasons. In general, you should avoid making the internals of your backend publicly available.

That is why all microservice architectures are equipped with an API Gateway, which is the only entry point to your backend. Depending on the use case, it aggregates data from different services or routes the request to the appropriate service.

small example of a microservice architecture using http
Small example for the first option using http

The services expose simple http endpoints to the gateway for manipulating and retrieving data. The gateway itself only exposes an endpoint for retrieving the dashboard for the current user. The UserId will be included in the requests by the client. In order to create and to get specific boards, the gateway needs to be expanded with further endpoints, which might be routed directly to the underlying service.

In conclusion, there is nothing wrong with this approach, especially considering that there are great tools like Swagger which help building large REST APIs, but you still have to design, manage and implement the service APIs and the Gateway API separately.

The second option is different. The idea is to design a data graph which can be queried and manipulated with a language called GraphQL. Imagine it like SQL, but instead of talking to a database you send the statement to your backend.

exmaple with a representation of a data graph
Representation of a data graph

The following pseudo example query me (name) boards (id, title) returns the name of the current user and its boards as a JSON object.

With GraphQL, the client can precisely ask only for the data it needs, which can reduce network traffic a lot. The best part is that you already possess a data graph if you have created a domain model beforehand. You just have to merge the different contexts and attributes of the same objects to one coherent graph.

It gets even better: By using Apollo GraphQL Federation, most of the implementation for your Graph API is done automatically. For this to work, you only have to define the data graph for each service, which are just the contexts from your domain model, and setup the GraphQL API Gateway. The implementation is straightforward:

  1. Write down your service graph in the GraphQL Schema Definition Language (SDL)
  2. Implement the Resolvers, which are functions for populating a single attribute/field in your graph
    • Note: instead of requesting every field from the database separately, you can request the whole document and Apollo GraphQL generates the resolvers automatically
  3. Implement the Mutation Resolvers, which are functions for updating and creating data
  4. Tell the GraphQL API Gateway where to find its services

The Gateway then automatically merges the service schemas to one coherent schema and automatically collects the requested data from the implementing services. It is even possible to reference objects in other services and the Gateway will combine the data from different services to a single object.

GraphQL Federation – Limitation

As good as Apollo GraphQL Federation sounds on paper, it is not an all-round solution. In reality, you will always have to climb many obstacles no matter which decisions you make.

One technical limitation you might come across is when you try to delete a user. To do so, you have to decide which service defines and implements the deleteUser mutation. It is not possible (yet?) to define the same mutation in multiple services.

Visualization of the Apollo GraphQL Federation limitations

Because deleting a user also involves deleting its referenced pins and boards, the PinBoards service needs to be notified through an additional API, which is only accessible internally and not exposed to the client API.

Furthermore this additional API makes testing your service more complicated. Testing will be covered in detail in our example repository.

Architecture summary

Comparison table of different software architectures
Comparison table of different architectures

Git Workflow

In a successful project, the simple cooperation is one, if not the central point for success. What is more annoying than merge-conflicts or accidentally pushed changes that cause the program to crash… But we can make arrangements to not have problems like this during our project – let’s start:

Tools

First we go to our collaboration-platform, reachable under https://gitlab.mi.hdm-stuttgart.de/, or the official gitlab page and register a new project. Give it a name and register the members of the project.

For the local development you all need a tool to make your git commits, push them, pull the others and so on….

For Windows systems and commandline-lovers we recommend the git bash from https://gitforwindows.org/. If you better like to click and see what you do you can use https://www.sourcetreeapp.com or the implementations from e.g. IntelliJ or VSCode to see what you do and organize your collaboration via git.Sourcetree

Settings

In our git repository, we first go to the project settings and activate Merge Requests and Pipelines. Afterwards the navigation sidebar has two new entries.

The settings page

Workflow

The following step-by-step instruction will give you a good workflow to work together with the configured git-Repository.

The workflow in simple view
  1. Locally checkout the master (or Develop-branch) via git checkout master or in Sourcetree.
  2. Create a new branch and name it sensefully. A good way to hold your repository clean is to sort it in directories by naming it like your abbreviation and the feature or fitting bug-ticket if you, for example, use the ticket-system in gitlab.
  3. Let’s start coding, fixing bugs, developing new features, testing your code without crashing the master and only locally on your machine.
  4. If your local changes are working as wanted, commit your changes with a namingful commit-message. Beware of committing wrongly changed files. If you do not want to use Sourcetree you can do it with the command git commit -a -m “commit-message” .
  5. Then push your changes on your branch with Sourcetree or with git push origin <feature-branch> .
  6. Go to the web interface and create a new merge-request. You can add not only a commit-message but more text, screenshots and files to visualize your changes to your team members.
  7. Inform your team members so they can look at your changes, comment them, leave suggestions and finally approve it if all is fine.
  8. Merge your feature-branch to the master or develop-branch if other team members gave their approvals.
  9. If all tests are green and the build is ok, you can delete your extra branch to not leave so much data-waste.

To work successful with this model, think on some points:

  • branch new branches always from master
  • do not commit or push manually to the master
  • before a merge request merge the newest master to your feature-branch
  • not to much branches at the same time
  • small feature branches, no monsters…
  • name your branches senseful

CI & CD Pipeline

What is DevOps?

The term DevOps is derived from the idea of agile software development and aims to remove silos to encourage collaboration between development and operations. From this principle the term DevOps = development + operation is derived (Wikipedia, 2020).

To achieve more collaboration, DevOps promotes a mentality of shared responsibility between team members. Such a team shares responsibility for maintaining a system throughout its life cycle. At the same time, each developer takes responsibility for his own code, i.e. from the early development phase to deployment and maintenance. The overall goal is to shorten the time between the development of new code until it goes live. To achieve this goal, all steps that were previously performed manually, such as software tests, are now fully automated through the integration of a CI/CD pipeline.

The full automation allows a reduction of error-prone manual tasks like testing, configuration, and deployment. This brings certain advantages. On one hand, the SDLC (Software Development Life Cycle) is more efficient, and more automation frees team resources. On the other hand, automated scripts and tests serve as a useful, always up-to-date documentation of the system itself. This supports the idea of a pipeline as code (Fowler, 2017).

DevOps cycle
Figure 3 : The DevOps cycle.
Source: Akamai (2020).

The structure of the CI/CD pipeline is defined within a YML file in a project’s root and formulates so-called actions, action blocks, or jobs. Pipeline jobs are structured as a block of shell commands which allow, for example, to automatically download all necessary dependencies for a job and automatically execute scripts. A pipeline contains at least a build, test, and deployment job. All jobs are fully or partly automated (GitLab CI/CD pipelines, 2020). A partial automation of a job includes manual activation steps. The benefits of a pipeline integration are, among others, an accelerated code cycle time, reduced human error, and a fast and automated feedback loop to the developer himself/herself. In addition, costly problems when integrating new code into the current code base are reduced.

A pipeline is the high-level construct of continuous integration, delivery, and deployment. The jobs executed in a pipeline, from a code commit until its deployment in production, can be divided into three different phases, depending on the methodology to which they can be assigned. These are the three continuous methodologies:

  • Continuous Integration (CI)
  • Continuous Delivery (CD)
  • Continuous Deployment (CD)

While Continuous Integration clearly stands for itself, Continuous Delivery or Continuous Deployment are related terms, sometimes even used synonymously. However, there is a difference, which is also shown in Figure 4 and will be further explained in the following.

540Figure 4: The three continuous methodologies.
Source: RedHat (2020)

What is Continuous Integration (CI)?

The term Continuous Integration can be traced back to Kent Beck’s definition of the Extreme Programming (XP) process, which in turn is derived from the mindset of agile software development, which not only allows short cycle times but also a fast feedback loop (Fowler, 2017). Within such a software practice, each member of a team merges his/her work at least daily into the main branch of a source code repository. All integrations are automatically verified by an automated build which also includes testing and code quality checks, e.g. linting, to detect and fix integration errors early. This approach allows a team to develop software faster and reduces the time spent manually searching for and identifying errors.

Further, each team member bears the same responsibility for maintaining and fixing bugs in the software infrastructure and for maintaining additional tools such as the integrated CI/CD pipelines. Furthermore, a pipeline is not static but has to be monitored, maintained, and optimized iteratively with a growing codebase. To achieve high-quality software products, everyone must work well together in a team culture free of the fear of failure (Fowler, 2017).

Figure 5: Meme
Source: Memegenerator, Yoda (2020)

Some important CI Principles

Continuous Integration is accomplished by adhering to the following principles:

  • Regular integration to the mainline: New code needs to be integrated at least once per day.
  • Maintaining a single-source repository: Keep a stable, consistent base within the mainline.
  • Shared responsibility: Each team member bears the same responsibility to maintain the pipeline and project over its complete lifecycle.
  • Fix software in < 10min: Bugs have to be fixed as fast as possible by the responsible developer.
  • Automate the build: Test and validate each build.
  • Automate testing: Write automatically executable scripts and keep the testing pyramid in mind.
  • Build quickly: Keep the time to run a pipeline as minimal as possible.
  • Test in a clone: Always test in the intended environment to avoid false results.

What belongs in a Source Code Repository?

As the source code repository builds the base for the pipeline, it is important to keep it as complete as possible in order to be able to fulfill CI jobs. As such a source code repository for a CI/CD Pipeline should always include:

  • Source code
  • Test scripts
  • Property files
  • Database schemas
  • Install scripts
  • Third-party libraries

Why Containerize pipeline jobs?

Some of you may have come across this problem before – “Defect in production? Works on my machine”. Such costly issues with different environments on different machines (e.g. CI/CD  server and production server) can be prevented, by ensuring that builds and tests in the CI pipeline as well as the CD pipeline are always executed in a clone of the same environment. Hereby it is recommendable to use docker containers or virtualization. For virtualization, the use of a virtual machine can be enforced by running a Vagrant file that ensures the same VM setup throughout different machines. In short, automating containerized jobs standardizes the execution and ensures that no errors slip through due to different environments in which build, and tests are executed.

Further, by combining the CI/CD Pipeline with docker, an integration between CI and CD is enabled. Advantages are that the same database software, same versions, same version of operating system, all libraries necessary, same IP addresses, and ports, as well as the same hardware setting, is provided and enforced throughout build, test as well as deployment.

Figure 6: Meme 2
Source: Memegenerator, (2020)

How to identify CI Jobs?

To identify pipeline jobs, you should, together with the team, consider which manual steps are currently performed frequently and repetitively and are therefore good candidates for automation. Such repetitive tasks can be for example testing, building, and deployment, as well as the installation of shared dependencies, or even clean-up tasks to free memory space after a build was executed. The number of jobs can be expanded as desired but must remain self-contained. This would mean, for example, a unit test job contains only the necessary dependencies, shell commands, and unit test scripts that are needed to fulfill the unit test job.

Further, when defining the structure of a pipeline, it is important to not integrate any logic into the pipeline. This means that all logic must be outsourced into scripts which then can be executed automatically by the pipeline. Furthermore, pipelines need to be maintained and updated over the life cycle of a project to always keep it up-to-date and prevent errors. This means that pipelines also travel the whole software development life cycle (SDLC).

How to integrate Testing?

As already mentioned in the section above (see chapter 1 – Microservices) – When writing test scripts, it is important to look at the entire test pyramid, from automated unit tests to end-to-end tests. These test scripts then can be integrated into a CI pipeline’s test jobs.

Overall, testing is very important to avoid later costs due to time and cost-effective bugs and downtimes in production. When testing architectures, such as microservices, care must be taken to test the individual services not only independently of each other, but also in their entire composition. The challenge in this case is that there are also dependencies between the individual services.

Furthermore, an integration with a real database must also be tested. Since databases are not directly mapped into source code, it makes sense to formulate them in scripts against which tests can be executed. Overall it makes sense to not only mock a database but also test with real database integration.

Image 7: The three continuous methodologies.
Source: RedHat (2020)

What is Continuous Delivery (CD)?

The term Continuous Delivery is based on the methodology of Continuous Integration and describes the ability to deliver source code changes, such as new features, configuration changes, bug fixes, etc., in a continuous, secure, and fast manner to a production-like environment or staging repository. This fast delivery approach can only be achieved by ensuring that the mainline code is always kept ready for production and as such can be delivered to an end-customer at any time. From this point on an application can be quickly and easily deployed into production. With this methodology, traditional code integration, testing, and hardening phases can be eliminated (Fowler, 2019).

Figure 8: Meme.
Source: Memegenerator, Yoda (202
0)

The Five CD Principles

The five principles are also valid for the other methodologies.

  • Build quality 
  • Work in small batches
  • Computers perform repetitive tasks, people solve problems
  • Relentlessly pursue continuous improvement
  • Everyone is responsible

(Humble, 2017)

Figure 9: The three continuous methodologies.
Source: RedHat (2020)

What is Continuous Deployment (CD)?

The term Continuous Deployment implies a previous step of Continuous Delivery where production-ready builds are automatically handed over to a code repository. In conclusion, Continuous Deployment also builds on the methodology of Continuous Integration.

Continuous Deployment implies that each commit should be deployed and released to production immediately. To this purpose, the Continuous Deployment block in the pipeline includes the automation of code deployment via scripts to enable continuous and secure deployment. This follows the fail-fast pattern, which makes it possible to detect errors that have slipped into production through the automated test blocks which lead to an unhealthy server cluster. By implementing the fail-fast pattern, an error is easily correlated with the latest integration and can be quickly reversed. This pattern keeps the production environment downtimes to a minimum while code with new features goes live into production as quickly as possible. In practice, this means that successful code changes, from a commit to the branch to going live will only take a few minutes (Fitz, 2008).

GitLab workflow example
Figure 10: The GitLab CI/CD basic workflow.
Source: GitLab (2020)

Automated rollbacks and incremental deployments

Even if automated end-to-end tests against a build in a CI pipeline might have passed successful, unforeseen bugs might occur in production. In such case an important capability of pipelines is the possibility of automated rollbacks. In case of an unhealthy state of deployed code in production, a fast rollback allows to return to the last, healthy working state. This includes automatically reverting a build. Another option is incremental deployments, which are deploying new software to one node at a time, gradually replacing the application to gain more control and minimise risk (GitLab, 2020).

The relationship between Continuous Integration, Delivery, and Deployment

The following graph gives a quick overview of the workflow from the completion of a new feature in a working branch until its reintegration into the mainline. The following figure 11 gives a high-level overview of the workflow depicted in a pipeline.

What is CI/CD Pipeline?. Do you need your developing team to… | by Nanduri  Balajee | Medium
Figure 11: Overview of the CI/CD Pipeline workflow.
Source: SolidStudio (2020)

When a new code feature is committed to a remote feature branch, the CI will carry out an automated build. Within this build process, the source code of the feature branch will automatically be checked for code linting and compiled. The compilation is linked to an executable and all automated tests are run by the pipeline. If all builds and tests run without errors the overall build is successful.

Bugs or errors encountered throughout the CI pipeline will make the build fail and halt the entire pipeline. It is then the responsibility of the developer to fix all occurring bugs and repeat the process as fast as possible to be able to commit a new feature and merge it into the mainline to trigger the deployment process. Figure 12 shows such a CI/CD workflow between Continuous Integration, Delivery, and Deployment (Wilsenach, 2015).

Figure 12: The relationship between continuous integration, delivery and deployment
Source: Whaseem, M., Peng, L. and Shahin M. (August 2020)

Why did we pick the GitLab CI/CD Pipeline?

In total there are many products, from Jenkins, GitHub actions, DroneCi, CircleCi, to AWS Pipeline, etc. on the market. They all offer great services for DevOps integration. After research some free tools, we decided to stay with GitLab for the university project “Schule 4.0”, as Gitlab itself is a single DevOps tool which already covers all steps from project management, source code management to CI/CD, security, and application monitoring.

This choice was also made to minimize the tech stack. Since the university already offers a GitLab account on its own GitLab instance for source code management and issue tracking, we felt it made most sense to minimize the use of too many different tools. Therefore, sticking with one product to integrate a custom CI/CD Pipeline into our project meant having everything in one place, which also simplifies the complexity of the toolchain and furthermore allows us to speed up the cycle time. 

What is a GitLab Runner?

To be able to execute and run a CI/CD pipeline job, a runner needs to be assigned to a project. A GitLab runner can be either specific to a certain project (Specific Runner) or serve any project in the GitLab CI (Shared Runner). Shared runners are good to use if multiple jobs must be run with similar requirements. But in our case, as HdM has an own GitLab instance (at https://gitlab.mi.hdm-stuttgart.de/), a Specific Runner installed on our own server instance was the way to go.

The GitLab Runner itself is a Go binary that can run on FreeBSD, GNU/Linux, macOS, and Windows. Architectures such as x86, AMD64, ARM64, ARM, and s390x are supported. Further, to keep the build and test jobs in a simple and reproducible environment it is also recommendable to use a GitLab runner with a Docker executor to run jobs on your own images. It also comes with the benefit of being able to test commands on the shell.

Before installing a runner, it is important to keep in mind that runners are isolated machines and therefore shouldn’t be installed on the same server where GitLab is installed. Further, in case of the need to scale out horizontally, it can also make sense to split jobs and hand them over to multiple runners, installed on different server instances which then are able to execute jobs in parallel. If the jobs are relatively small, an installation on a Raspberry Pi is also a possible solution. This comes with the benefit of more control, higher flexibility, and very important – fewer costs.

Figure 13: Architecture for GitLab CI/CD in School 4.0
Source: Own graphic

For the project “Schule 4.0” we chose the architecture in figure 13 with two independent runners on two different machines. For this purpose, two runners were installed on an AWS EC2 Ubuntu 20.04 and in a VM on the HdM server instance to run the jobs for CI and CD separately.

Challenges and Limitations

The reason for the decision to split CI and CD jobs up was less due to horizontal scaling but the fact that the HdM server used for the deployment is located in a virtual, private network. Due to this isolation of the network, it is not possible to deploy directly from outside the network. Since there were no permissions on the HdM server for customizations, the resulting challenges could be circumvented with a second runner, installed in a VM on the target machine.

Further, it is important to isolate the runner, as when installed directly on a server, which also serves as a production environment, unwanted problems can occur e.g. due to ports that are already in use or duplicate docker image tag-names between the actual build jobs and already deployed builds which can cause failure and halt the pipeline. 

To reduce costs, which in our case amounted to 250 US$ (luckily virtual play money) on our educational AWS accounts, in only 2 months, the CI runner was also temporarily installed on a RaspberryPi 3 B+ with a 32 GB SSD card. Hereby it is important to state, that often slow home network speed and large jobs have an impact on RaspberryPi’s overall performance and can make build and testing jobs slow. This is okay for testing the pipeline but takes a too large amount of time when developing in a team. Therefore, to speed things up, the runner for the CI jobs was later again installed on a free tier AWS EC2 Ubuntu instance.

How to install a GitLab Runner?

To install a GitLab runner two main steps need to be followed: 

  1. Install a GitLab Runner 
  2. Register a GitLab Runner

Where to install a GitLab Runner?

A GitLab Runner can be installed on any suitable server instance e.g. on an IBM or Amazon EC2 Ubuntu instance by using the AWS Free Tier test account or an educational account. The following provides further information on how to getting started with AWS EC2.

Another possibility is to install a runner on your own RaspberryPi. The RaspberryPi 3B+ we used has a 32GB SSD card and uses the Raspbian Buster image which can be downloaded from the official distribution. Be aware that it might cause issues following the standard runner installation on GitLab as the RaspberryPi 3B+ uses a Linux ARMV7 architecture. A good tutorial to follow can be found on the blog of Sebastian Martens. After its installation, the Runner will work even after completely rebooting your Raspberry Pi.

  1. Runner intallations on an AWS EC2 instance

Advantages

  • Better network speed than in a private home network.
  • Faster build times.
  • Easy and quick setup of an instance.

Limitations 

  • High costs for services, even with a free tier selection.
  • Limitations in server configurations were due to free or educational accounts, e.g. memory < 10GB, limitations on server location that can lead to latency and time outs.
  • The server can be over configured and become unhealthy – all configurations are lost if no backup/clone of the instance has been made.
  • AWS offers quite complex and ambiguous documentation

In contrast to using a paid service, a more cost-effective solution may be to install the GitLab Runner inside a virtual machine on the HdM server, or on an own RaspberryPi or any other services.

2. Runner installation on a RaspberryPi

 Advantages

  • Full flexibility over available software and software versions
  • Costs are lower compared to leased servers including root access
  • Access data to the target infrastructure is available in the local network 

Limitations 

  • Takes longer to execute the pipeline 
  • Local home-network speed can slow job execution down due to images, dependencies, git repository, etc. that need to be downloaded
  • Not well suited for the execution of very large jobs
  • It is recommendable to take a RaspberryPi 3 or even newer

How does the GitLab Runner work?

To better understand how a GitLab Runner picks up jobs from the CI pipeline, assigns and returns the build and test results to a coordinator, the following sequence diagram in figure 13 will be explained in more detail.

Figure 13: Sequence diagram GitLab Runner interaction with GitLab CI Server.
Source: Evertse, Joost (August 2019, p. 56)

A Specific Runner, as used in our project, executes all jobs in the manner of a FIFO (First-in-first-out) queue. When the GitLab Runner starts, it tries to find the corresponding coordinator (the project’s GitLab server) by contacting the GitLab URL that was provided when the runner was registered. When the Runner registers with the registration token also provided at registration, it will receive a special token to connect to GitLab. After a restart, the GitLab Runner connects and waits for a request from the GitLab CI.

A Runner registered to a source code repository listens for change events in a particular branch, which causes the runner to fetch and pull the GitLab repository. The runner then fetches and executes the CI/CD jobs defined in the .gitlab-ci.yml file for the Gitlab server. The build and test results, as well as logging information, are returned to the GitLab server, which then displays those for monitoring purposes. If all jobs were executed successfully each job in the pipeline receives a green symbol and the push or merge onto a branch can be completed.

Figure 15: Pipeline jobs successfully executed
Source: Own GitLab project pipeline
CI/CD pipelines | GitLab
Figure 16: Pipeline jobs successfully executed.
Source: GitLab (2020)

How to build a GitLab CI/CD Pipeline?

A GitLab CI/CD Pipeline is configured by a YAML file which is named .gitlab-ci.yml and lies within each project’s root directory. The .gitlab-ci.yml file defines the structure and order of the pipeline jobs which are then executed sequentially or in parallel by the GitLab Runner.

Introduction to a Pipeline Structure

Each pipeline configuration begins with jobs that could be seen as a bundled block of command-line instructions. Pipelines contain jobs that determine what is needed to do and stages, which define when the jobs should be executed.

stages:  
# ------- CI ------  
- build  
- quality  

- test  
# ------- CD ------  
- staging  
- production

The stages use the stage-tags to define the order in which the individual pipeline blocks/jobs are executed. Blocks with the same stage-tag are executed in parallel. Usually there are at least the following stage-tags:

  • build – code is executed and built.
  • test – code testing, as well as quality checking by means of linting, etc.
  • deploy – deploy the code to production.

The Pipeline Architecture

Figure 20: Pipeline Architecture Example.
Source: GitLab Docs (2020)

The architecture shown in figure 20 is not very efficient, but easiest to maintain. As such it is shown, in combination with the following .gitlab-ci.yml file example, as a basic example, to understand the construction of a pipeline’s architecture. By defining the relationships between jobs a pipeline can be speeded up.

# default image
image: node:alpine

stages:
- build
- test
- deploy

# —– CI —— #
build_a:
stage: build
script: - echo "This job builds something."
….

build_b:
stage: build
script: - echo "This job builds something else."
….

test_a:
stage: test
script: - echo "This job tests something."
….

test_b:
stage: test
script: - echo "This job tests something else.
….

# —– CD —— #
deploy_a:
stage: deploy
script: - echo "This job deploys something."
….

deploy_b:
stage: deploy
script: - echo "This job deploys something else."
….

Structure of a job

The basic structure of the individual jobs/blocks within a pipeline includes the block name, and subordinate stage with stage tag, script tag with the executable scripts/commands. Each job must also contain a tag for the selected runner (here: gitlab-runner-shell). This structure can easily be extended.

job_name:  
stage:     
- stage-tag  
script:    
- echo "Do something important."    
tags:    
- gitlab-runner-shell

Jobs can be initiated in different ways. This depends on the tags which are assigned to the individual jobs. For example, it can make sense to not fully automate a job. In this case, a job e.g. deploy job, gets a tag for a manual job and thus forces the job to wait for manual approval and release by a developer.

job_name
...
when: manual    
only:      

- staging      
- merge_requests

How to optimize a Pipeline?

Efficiency and speed are very important when running the jobs through a CI/CD Pipeline. Therefore it is important to think not only about the architecture but also consider the following concepts to speed things up.

  • Host an own GitLab Runner: Often the bottleneck is not necessarily the hardware, but the network. Whilst GitLab’s Shared Runners are quick to use, the network of a private cloud server is faster.
  • Pre-install dependencies: Downloading all needed dependencies for each CI job is elaborative and takes a lot of time. It makes sense to pre-install all dependencies on an own docker image and push it to a container registry to fetch it from there when needed. Another possibility is to cache dependencies locally.
  • Use slim docker images: Rather use a tiny Linux distribution for images to execute a CI job than a fully blown up one with dependencies that you might not even need. In our project we therefore used an Alpine Linux distribution.  
  • Cache dynamic dependencies: If dependencies need to be dynamically installed during a job and thus can’t be pre-installed in an own docker image, it makes sense to cache those. Hereby, GitLab’s cache possibility allows to cache dependencies between job runs.
  • Only run a job if relevant changes happened: This is very useful, especially for a project that makes use of a microservice architecture. For example, if the front-end changed, the build and test jobs for all the others don’t need to be run as well. Such behavior can be achieved by using the only keyword in a job. The following gives a short example.

job_name:

only:    
changes: 
– apps/example1/**/*      
– shared-dependencies/**/*
… 

More detailed information can be found in the Example Repository’s README files.

Useful tools to be considered for building a GitLab CI/CD pipeline

Avoid syntax errors by the use of CI Linting. In order to avoid syntax errors and to get things right from the start, GitLab offers a web-based linting tool, which checks for invalid-syntax in the gitlab-ci.yml file. In order to use the web-based linting tool, simply add the extension -/ci/lint to the end of your project’s URL in GitLab.

As DevOps and CI/CD is a quite popular and complex topic we also want to introduce some further possibilities to optimize your DevOps, some are the following:

  • CI with Linting & Testing – Drone CI
  • Deployment with Jenkins
  • Linting with Sonarqube
  • Monitoring and Logging with GitLab CI/CD

Part 2 – “getting started” repository

When you’re done getting comfortable with the topics, it is time to see an example implementation of the theory above. Head over to our “getting started” repository containing a microservice application and further explanations.

Link to our example repository: https://gitlab.com/curiosus42/cc-getting-started-example

Conclusion

DevOps is a big buzzword for many complex tools and things that can make your project easier to work with. It´s a long way from a simple project idea until you have a working infrastructure, but it will help you to work easier and more efficient on your project in the future. For all the described tools and use-cases there are so many alternatives you can use.

On a good software-project all the sections above are important. How to design your software, how to work together, how to make the development-workflow more easy to use and bring your project from your local machine to a reachable server.

With this blogpost and the additional repository content you have an overview about the possibilities with a bunch of instructions on how you can start with DevOps in your project.


Further Reading…

Architecture

Getting started with your project
Software Development Life Cycle: A Guide to Phases and Models
https://ncube.com/blog/software-development-life-cycle-guide

Why software architecture matters
Is High Quality Software Worth the Cost? – Martin Fowler
https://martinfowler.com/articles/is-quality-worth-cost.html

Testing Strategies in a Microservice Architecture – Toby Clemson
https://martinfowler.com/articles/microservice-testing/

Best resource for microservice related topics with examples
Microservice patterns – Chris Richardson
https://microservices.io/patterns/The 12 factors that really matter when developing software
https://12factor.net/

Git Workflow

Git Workflow: https://www.atlassian.com/de/git/tutorials/comparing-workflows/gitflow-workflow

CI/CD Pipeline

Continuous Integration:
Definition of Continuous Integration – Martin Fowler
https://martinfowler.com/articles/continuousIntegration.html/

Continuous Delivery:
Continuous Delivery – Martin Fowler
https://www.martinfowler.com/bliki/ContinuousDelivery.html
Architecture – Jez Humble https://continuousdelivery.com/implementing/architecture/
Principles – Jez Humble
https://continuousdelivery.com/principles/

Continuous Deployment:
Continuous Deployment Blog – Timothy Fitz http://timothyfitz.com/2009/02/08/continuous-deployment/

GitLab CI/CD:
Rollbacks – GitLab Docs https://docs.gitlab.com/ee/ci/environments/#retrying-and-rolling-back
Incremental Rollouts – GitLab Docs https://docs.gitlab.com/ee/ci/environments/incremental_rollouts.html
CI Runner – GitLab Docs
https://docs.gitlab.com/ee/ci/runners/README.html
Pipeline Linting – GitLab Docs
https://docs.gitlab.com/ee/ci/lint.html
CI/CD Job Logging – GitLab Docs
https://docs.gitlab.com/ee/administration/job_logs.html

Containerization and virtualization:
Docker: https://www.docker.com/
Docker integration – GitLab Docs
https://docs.gitlab.com/ee/ci/docker/README.html
Vagrant virtualization – Vagrant
https://www.vagrantup.com/

Runner installation:
AWS EC2 Instance
https://aws.amazon.com/de/getting-started/
Runner installation on RaspberryPi – Sebastian Martens
https://blog.sebastian-martens.de/technology/install-gitlab-runner-on-raspberry-pi/
GitLab Runner
https://docs.gitlab.com/runner/


List of References

Software Architecture

CI/CD Pipeline

Images and Tables

Sicherheitssysteme in der Raumfahrt

https://p0.pikrepo.com/preview/53/442/falcon-heavy-demo-mission-launching-rocket.jpg

Disclaimer

Die heutige Raumfahrt oder auch allgemein Raumfahrt ist ein sehr komplexes und vielschichtiges Thema. Nicht grundlos werden umgangssprachlich schwierige Themen als “Raketenwissenschaften” bezeichnet. Dieser Artikel möchte nicht die Raumfahrt in ihrer Gänze beschreiben sondern nur einen sehr kleinen Teil im Bereich der Sicherheit beleuchten. Hierfür wurden sich vor allem auf den Start einer Rakete und die Landung mit einer Raumkapsel konzentriert. Natürlich gibt es noch sehr viel mehr Sicherheitskonzepte in der Raumfahrt, als in diesem Artikel beschrieben werden. Zusätzlich werden nicht alle, sondern nur ein paar der Sicherheitssysteme angeschaut, da auch hier dauerhaft geforscht wird und was heute als Stand der Technik gilt morgen schon veraltet sein kann. Deshalb sind die hier beschriebenen Systeme auch keine puren Prototypen sondern eher etablierte Systeme, welche bereits seit Jahren im Einsatz sind, mit eventuellen Ausblicken auf zukünftige oder kürzlich neu erprobten Abwandlungen und Ideen.

Dieser Artikel ist außerdem nicht als Anleitung oder detailgetreue, vollständige Erklärung der hier beschriebenen Systeme aufzufassen, sondern soll lediglich einen Überblick über die Systeme schaffen und eventuell Lust machen sich mehr mit den Systemen und der Raumfahrt im allgemeinen zu beschäftigen.

Warum eigentlich Raketen?

Wenn man sich eine Weile mit der Raumfahrt beschäftigt stößt man schnell auf Ideen und Konzepte für verschiedene Systeme in den Weltraum, beziehungsweise in den Orbit zu gelangen. Hierbei finden sich immer wieder Ideen wie beispielsweise spaceelevator (spacelift), skyhooks oder space launch cannons. Viele dieser Systeme bieten eine auf Dauer kostengünstige alternative Fracht in den Orbit zu transportieren. Wir setzen immer noch global und fast ohne ausnahmen auf Raketen. Dabei haben Raketen eigentlich sehr viele Nachteile.

Offensichtlicher weise sind Raketen extrem Laut und der Treibstoff der beim Start verbrannt wird ist unter umständen sehr umweltschädlich. Raketen eignen sich dementsprechend nicht dafür mitten in der Stadt abgeschossen zu werden sondern es müssen extra riesen große Gebiete nur für die Starts freigeräumt werden. Außerdem sind Raketen sehr teuer. Geschätzte Preise für den Start einer Ariane 5 liegen bei 150-200 Millionen US-Dollar. Für die kleinere kommerzielle Falcon 9 immerhin noch bei etwa 62 Millionen US-Dollar. Dazu kommt noch, dass nur ein Bruchteil der Masse effektive Nutzlast ist. So wiegt eine Falcon 9 etwa 550 Tonnen. Aber nur etwa 22,8 Tonnen davon kommen am Ende im niedrigen Erdorbit (low earth orbit, kurz LEO) an. Das entspricht gerade einmal 4% der Gesamtmasse. Als vergleich ist das etwa, wie wenn man mit einem Großen LKW auf der Straße 100 kg Fracht transportieren würde. Und zur Fracht würden auch Fahrer und Kabine zählen. Ergänzend kommt noch dazu, dass wir den LKW nachdem wir am Ziel angekommen sind einfach wegschmeißen würden, da Raketen zumindest bisher nicht wieder verwendbar waren. Und selbst bei der Falcon 9 von SpaceX, welche zu teilen wieder verwendbar ist, wird immer noch ein relativ großer Teil weggeschmissen und die Einsparungen in den Kosten belaufen sich auch nur auf einen recht geringen Anteil. Als wäre das noch nicht genug ist ein Flug mit einer Rakete auch noch super gefährlich. Im Prinzip sitzt man auf einer fortlaufenden Explosion und hunderten Tonnen Sprengstoff. Wenn hierbei ein Fehler auftritt ist die Rakete meistens nicht mehr zu retten. Warum also nutzen wir immer noch Raketen?
Die Antworten hierauf sind tatsächlich relativ einfach. Zuerst einmal sind Raketen einfach super schnell. Es werden gerade einmal so etwa 10 Minuten benötigt um in den Weltraum zu gelangen. Spricht man vom Weltraum ist sofort noch der Vorteil vorhanden, dass Raketen auch im Vakuum funktionieren. Sie bringen alles mit was sie zur Fortbewegung benötigen, im Gegensatz zu einem Flugzeug. Außerdem haben wir heute einfach ein sehr großes Wissen über Raketen und wie wir sie effektiv nutzen können. Sprich sie sind technologisch sehr gut machbar. Diese technologische Grundlage haben wir vor allem dem zweiten Weltkrieg und dem kalten Krieg zu verdanken, denn hier wurde im Spacerace natürlich auf Raketen gesetzt, da diese auch wunderbar als Waffe oder Langstreckenträger für Atombomben eingesetzt werden können. Kurz gesagt, dass wir heute Raketen verwenden ist vor allem Historisch so gewachsen.

Wie funktionieren Raktenstarts?

http://www.russianspaceweb.com/images/spacecraft/application/communications/ses/ses15/ascent_profile_1.jpg

Der Start einer Rakete lässt sich grundlegend in mehrere Stufen unterteilen. Beim erreichen jeder dieser Stufen wird meist ein weiterer Teil des launchvehicles abgetrennt, sodass Die Rakete insgesamt an Gewicht verliert und leichter wird. Für was sollte man beispielsweise einen leeren Treibstofftank noch weiter mit sich herumschleppen, wenn dieser nicht mehr benötigt wird. Die obere Grafik zeigt einen normalen Ablauf beim Start einer Soyuz Rakete. Es wird aufgelistet, wann welcher Teil der Rakete abgetrennt wird und ungefähr die Zeit wann dies geschieht. Der Aufbau einer Soyuz Rakete mit den einzelnen Teilen, den sogenannten Stages, ist im unteren Bild zu sehen.

http://www.russianspaceweb.com/images/rockets/r7/soyuz_fg/soyuz_fg_silo_1.jpg

Launch Escape System (LES)

Wie bereits bei den Nachteilen von Rakete erwähnt, sind Raketen meist nicht mehr zu retten, wenn beim Start irgendetwas schief gehen sollte. Und da dies meist in einer sehr großen Explosion endet bleibt von der Rakete auch von der Fracht nicht sehr viel übrig. Deshalb wird an der Rakete wenn man die Fracht sichern möchte meist ein sogenanntes launch escape system installiert. Dieses LES ist im Prinzip eine oder mehrere weitere kleine Raketen, welche die Fracht von der Trägerrakete entfernen und in Sicherheit bringen sollen. Hierfür wird die Kapsel oder Fracht von der Trägerrakete entkoppelt, wie es eigentlich erst im Orbit passieren sollte und die Raketen des LES gezündet. Sobald die Raketen des LES ausgebrannt sind wird es ebenfalls von der Kapsel abgetrennt und die Kapsel kann mithilfe der für die Landung vorgesehenen Fallschirme sanft zu Boden gleiten.

Bei der Umsetzung des LES wird inzwischen auf zwei verschiedene Methoden gesetzt. Zum einen die klassische Tower Methode, bei der oben auf der Kapsel ein Tower installiert wir der das LES und auch ein paar andere Systeme enthält. Zum anderen gibt es das inzwischen von ein paar Fahrzeugen eingesetzte Pusher System, bei dem das LES in die Kapsel integriert ist.

https://forum.nasaspaceflight.com/assets/6364.0/1857223.jpg

Das klassische Tower-Konzept hat die Vorteile, dass es während des Fluges abgeworfen werden kann, sobald es nicht mehr gebraucht wird und somit die Last der Rakete verringert wird. Außerdem benötigt es keinen extra Platz in der Kapsel und kann theoretisch auf jede mögliche Frachtkapsel oben drauf gesetzt werden, solange die Adapter stimmen. Ein Großer Nachteil des Towers ist allerdings, dass wenn man ihn abstößt ist er unwiederbringlich weg.

https://www.gannett-cdn.com/-mm-/2887f5d7d58a75a6437c45e987fd994202334aff/c=0-156-3000-1844/local/-/media/2015/07/01/Brevard/B9317936634Z.1_20150701164344_000_G90B83P9D.1-0.jpg

Der Vorteil des integrierten Pusher Systems, wenn man es nicht als LES verwendet ist, dass es auch als Bremsraketen bei der Landung oder als Steuerdüsen im Weltraum verwendet werden kann. Es bleibt außerdem den ganzen Flug über verfügbar, für den Fall dass doch irgendwann noch etwas schief geht.

Zusammenfassend kann man sagen, dass beide Systeme ihre Vor- und Nachteile haben. Was allerdings bei beiden System hinzu kommt, ist dass sie beide auf einer Technologie basieren, für welche sie selbst die Notlösung bei Fehlern bieten, nämlich den Raketen an sich. Ein LES basiert auf der Fortbewegung durch Rakete und hat ähnliche Anfälligkeiten und Nachteile. Tatsächlich gab es zum glück aber auch nur sehr wenige Fälle bei denen ein LES wirklich eingesetzt werden musste.

Atmospheric reentry

https://www.esa.int/var/esa/storage/images/esa_multimedia/images/2000/08/atmospheric_re-entry_demonstrator_-_artist_s_impression2/17866009-3-eng-GB/Atmospheric_Re-entry_Demonstrator_-_artist_s_impression.jpg

Während man sich beim Flug in den Orbit vorwiegend darüber Gedanken machen muss, wie man die benötigte Geschwindigkeit aufbauen kann um in einem Stabilen Orbit zu bleiben, muss man sich bei der Landung darum kümmern genau diese Geschwindigkeit sicher wieder abzubauen. Neben der aktiven Möglichkeit Schub entgegengesetzt der aktuellen Flugbahn zugeben und somit die Geschwindigkeit zu verringern gibt es noch die passive Möglichkeit lediglich durch Reibung die Geschwindigkeit zu verringern. Dies hat den Vorteil, dass kein zusätzlicher Treibstoff mehr gebraucht wird, welcher in erster Linie auch mit in den Orbit gebracht werden müsste. Der offensichtliche Nachteil ist, dass extreme Geschwindigkeiten nur durch Luftreibung abgebaut werden und somit extreme Hitze entsteht. Auch wenn diese Hitze dafür sorgt, dass kleine Meteoriten in der Atmosphäre verglühen und somit keinen Schaden auf der Oberfläche anrichten können, soll eben dieses Verglühen bei den Raumkapseln welche aus dem Orbit zurückkehren verhindert werden.

Damit die Kapsel nun die extreme Hitze aushalten kann muss sie auf die eine oder andere Weise widerstandsfähig gemacht werden. Dies kann entweder durch aktives, passives oder sogenanntes ablatives cooling oder aber durch heat absorption geschehen.

Unter aktiver und passiver Kühlung sind relative klassische Kühlungssysteme. Dabei wird die Hitze, welche von der Struktur aufgenommen wird wieder nach und nach an die Umgebung abgegeben. Dies kann entweder passiv passieren, wenn der Wärmeaustausch einfach durch die Struktur selbst stattfindet und nicht weiter eingegriffen wird. Oder aktiv, wenn zum Beispiel Kühlflüssigkeit durch die Struktur gepumpt wird und somit die Wärme schneller abgeleitet werden kann, damit sie sich nicht an einem Punkt aufstaut und dort die Struktur eventuell beschädigt. Und obwohl diese Kühlungsverfahren so gut wie überall problemlos funktionieren, sind sie für die Raumfahrt und vor allem für den Wiedereintritt in die Atmosphäre nicht effektiv genug. Die Hitze vor der Kapsel staut entwickelt sich so schnell und wird so heiß (bis zu 2000°C), dass selbst stahl sofort zu schmelzen beginnt, noch ehe ein passives oder aktives Kühlungsverfahren richtig greifen kann.

Um dies zu umgehen gibt es im Prinzip zwei Möglichkeiten. Erstens die Nutzung eines sehr viel hitzebeständigeren Materials, das die Temperaturen aushalten kann. Und zweitens zu verhindern, dass die extremheiße Luft die Kapsel überhaupt berührt und somit seine Hitzeenergie gar nicht übertragen kann.

Für die erste Möglichkeit, der sogenannten heat absorbtion, wurden vor allem für das Amerikanische Space Shuttle Programm spezielle Materialien entwickelt. Sie nehmen die Hitze beständig immer und immer weiter auf und können viel mehr Hitze aushalten, als herkömmliche Materialien. Das Problem mit diesem Materialien ist allerdings, dass sie meistens strukturell nicht so stabil wie etwa Stahl sind. Bei den hohen Geschwindigkeiten, mit denen die Kapsel oder das Space Shuttle unterwegs ist, können bereits Regentropfen ausreichen um das Hitzeschild zu beschädigen und somit nutzlos zu machen. Denn es langt, wenn nur an einer einzigen kleinen Stelle die Hitze das Schild durchdringt um die dahinterliegende Struktur zu zerstören.

Bei der zweite Möglichkeit, dem sogenannte ablative cooling, wird im gegensatz zu den anderen Techniken darauf gesetzt, dass das Hitzeschild im verlaufe der Landung zerstört wird. Ziel dieses verfahrens ist es, die Luft die sich vor der Kapsel erhitzt von der eigentlichen Struktur fern zu halten. Hierfür wird das Material des ablativen Hitzeschildes kontrolliert abgebrannt um eine Gasschicht zwischen der Heißen Luft und der Kapsel zu bilden. Diese Gasschicht, welche sich aus den verbrennungsgasen der Hitzeschildes zusammensetzt ist zwar auch sehr heiß, allerdings nicht so extrem Heiß, wie die Luft, weshalb viel besser mit dieser Hitze umgegangen werden kann. Die große Herausforderung des ablativen Hitzeschildes ist es natürlich, überall gleichmäßig ein entsprechenden Gaspuffer aufzubauen, denn sonst wird es wieder nutzlos.

https://spaceflight.nasa.gov/gallery/images/shuttle/sts-114/med/jsc2003e61578.jpg
https://i.gzn.jp/img/2020/03/24/picking-up-hot-space-shuttle-tiles/img-snap03405_m.jpg

Natürlich lassen sich alle oben genannten verfahren gemeinsam verwenden. Dies erhöht zwar die Sicherheit, weil mehrere Schichten die potentielle Hitze abfangen können, aber macht die Kapsel auch wieder schwerer, was mit deutlich höheren Kosten verbunden ist.

Kommt man am Ende heil durch die Atmosphäre, bleibt für das letzte Stückchen meist der altbewährte Fallschirm um sanft auf wieder auf der Erde anzukommen.

Quellen

Personenerkennung und Personenidentifizierung EDV-NR: 143307a zu Hören bei Prof. Walter Kriha

Rafael Janetzko, Matrikel-Nr.: 40190

Die Anwendungsgebiete der Personenerkennung und Personenidentifizierung wachsen und werden immer präziser, daher gilt die Personenerkennung und Personenidentifizierung im Bereich der Sicherheit zunehmend als essentiell notwendig und ist fast überall vertreten. Heutzutage entsperren die meisten Leute ihr Smartphone mit biometrischen Daten oder überwachen ihr Eigentum mittels Kamerasystemen, die immer ausgereifter und günstiger werden. Doch Systeme zur Personenerkennung und Personenidentifizierung werden auch im öffentlichen Bereich ausgebaut um beispielsweise die Sicherheit zu erhöhen oder für politische Zwecke. Menschen, die einem solchen System begegnen, machen sich oft keine Gedanken über die Datenverarbeitung, da dies mittlerweile zum Alltag gehört. Neben den Vorteilen der Technik bringt sie allerdings auch Nachteile mit sich.

1. Einleitung

Personenerkennung und Personenidentifizierung sollten unterschieden werden. Von Personenerkennung wird gesprochen, wenn anhand von Daten erkannt werden kann, dass sich eine Person in einem Bereich aufhält. Wenn die Daten jedoch einer Identität zugeordnet werden können, wird von Personenidentifizierung gesprochen. Personenidentifizierung ist somit deutlich präziser und nützlicher. Sie findet meist anhand von biometrischen Daten statt. Die gängigsten Techniken dazu sind die Gesichtserkennung mittels Kamerasystemen, einen Fingerabdruck-, Iris- oder Venenscan. Aber auch indirekte Daten, wie eine Adresse, ein QR-Code oder eine Internetverlauf, können zur Personenidentifizierung führen.[7]

2. Aufstieg der Technik

Marktführende Länder mit dem größten Einsatz der Personenidentifizierung sind China und die USA. So hat allein das Utah Data Center der USA eine geschätzte Speicherkapazität für Personenidentifizierung zwischen 1 Yottabyte und 5 Zettabyte. Auf die Weltbevölkerung zurückgerechnet entspräche dies einem Datenvolumen zwischen 1,4 Megabyte und 140 Gigabyte pro Person an Daten, welche gespeichert werden können.
Dass China diese Technik sehr intensiv nutzt ist deutlich, da sich im Jahr 2020 bereits 626 Millionen Überwachungskameras allein in China befinden. Das entspricht drei Einwohner pro Kamera. Die Stadt mit der meistgenutzten Personenidentifizierung ist Shenzhen. Hier befinden sich im Schnitt 60 Kameras auf 500 Meter Weg. Die Kameratechnik ist vor allem so beliebt, da kein aktiver Kontakt zum System notwendig ist. Im Gegensatz dazu benötigt ein Fingerabdrucksensor eine aktive Handlung. Der Durchbruch der Personenidentifizierung durch Kameratechnik fand mit der Künstlichen Intelligenz (KI) statt, als die KI AlphaGo der USA China im Nationalsport GO geschlagen hat. Aufgrund dieses Ereignisses hat Staatschef Xi Jinping KI zur Steuerstrategie auserkoren. Seither gibt es 1200 Startups allein in Peking, darunter auch SenseTime, das teuerste Startup der Welt mit einem heutigen Marktwert von rund drei Milliarden Euro. SenseTime entwickelt eine Gesichtserkennungssoftware, die zur Personenidentifizierung innerhalb der Bevölkerung dient, welche bereits so ausgereift ist, dass sie Personen besser erkennt als Menschen. KI kann bei der Analyse der Daten helfen, sodass selbst aus verpixelten Bildern das Original ausreichend gut wiederhergestellt werden kann.[1,7,9]

Bild 1: Rekonstruktion von Gesichtern einer verpixelten Aufnahme [1]

3. Akzeptanz und Alltag

Damit die Personenidentifizierung akzeptiert wird und an Vertrauen gewinnt, wird die Bequemlichkeit der Menschen, die Angst oder das Streben nach Sicherheit genutzt. Um den Einwohnern das System näher zu bringen gibt es in China den Haidan-Park in Peking, ein intelligenter Stadtpark, indem es unter anderem möglich ist mittels Gesichtserkennung in Einrichtungen und Verkehrsmitteln zu zahlen, Schließfächer zu bedienen und sich im Pavillon ein Lied zu wünschen. Hierfür muss lediglich eine Anwendung auf dem persönlichen Smartphone installiert und eingerichtet werden. Aber auch im Alltag Chinas ist das System mittlerweile angekommen. Zum Zugang zur Arbeit oder zur Identifizierung und zum Bezahlen auf der Straße wird kein Ausweis und keine Kreditkarte benötigt, ein Blick in die Kamera reicht. Aber auch in der Schule können Eltern und Lehrer prüfen, ob ihre Kinder im Unterricht aufpassen, indem das System Handlungen der Schüler erkennt, analysiert und rückmeldet.

Des Weiteren nutzt die Regierung das System zur Verfolgung von Kriminalität, der Kontrolle des Internets vor Missbrauch und unerwünschten Inhalten und zur Verkehrssteuerung- und sicherheit, indem Fehlverhalten direkt bestraft wird. Weitere Einsatzgebiete sind die gerechte Ressourcenverteilung und die Abschaffung der Armut durch Erziehung mittels eines gesellschaftlichen Punktesystems. [2,6,10]

Bild 2: Verkehrsüberwachung in China mittels automatisierter Personen und Objekterkennung[8]

4. Das Punktesystem

Das Punktesystem basiert auf folgendem Gedanken: Wer sich gut verhält, hat nichts zu befürchten und kann davon sogar profitieren, wer sich schlecht verhält wird bestraft. Jeder Bürger hat zu Beginn 1000 Punkte. Wenn dieser sich gut verhält und der Allgemeinheit hilft kann er sich Punkte dazu verdienen, wenn er sich schlecht verhält oder eine nicht akzeptierte Meinung gegenüber einer staatlichen Institution hat, verliert er Punkte. Jeder Bürger kann so maximal 1300 Punkte erreichen und minimal 600 Punkte besitzen. Diese Punkte können entscheidend sein zur Vergabe von Arbeits- und Studienplätzen, der Kredit- und Kontoverwaltung, dem Erlass von Kautionen, zur Reiseerlaubnis oder Zugang in öffentliche Einrichtungen. Wer einen schlechten Punktestand hat, hat es schwerer und dies kann dazu führen, dass er aus der Allgemeinheit ausgegrenzt wird. Außerdem hilft ein derartiges Punktesystem bei der Vermeidung von sozialen Unruhen, kann aber auch dazu führen, dass das System politisch missbraucht wird, indem bestimmte Gruppen und Minderheiten anhand einer Statistik verfolgt werden. Zudem gibt es Forschungen in den USA und China, welche behaupten potenziell kriminelles Verhalten anhand der Gesichtszüge erkennen zu können. [3,11,4,5,8.9,16]

China Sozial Credit System
Bild 3: Vereinfachte Darstellung des Punktesystems in China[16]

5. Fälschungen und Missbrauch

Heutzutage ist es kaum möglich Bildern einer Aufnahme zu trauen. Mittels Deep Fake ist es mittlerweile möglich Aufnahmen glaubhaft zu fälschen, sodass jemand in einem Video Dinge sagt, die er in Wirklichkeit nie gesagt hat oder Dinge macht, die er nie getan hat. Menschen kann so Verhalten unterstellen werden, welches nie stattfand um sie beispielsweise falsch zu verurteilen oder politisch zu beseitigen.
Ebenso kann es zu einem erhöhten psychologischen Leistungsdruck führen und den Menschen so gesundheitlich schaden. Zudem ist nicht bekannt welche Auswirkungen ein Hackerangriff auf ein solches System hat, da kaum Transparenz vorhanden ist. Es kann jedoch zum Datenklau, Datenfälschung oder Manipulation der Identitätszuweisung führen. Diese fehlende Transparenz bietet kaum Einblick in die automatisierte Datenverarbeitung, welche fehlerhaft sein kann.[14,15]

6. Ist ein solches System in der westlichen Welt möglich?

Dass ein solches System politisch in der westlichen Welt ebenso genutzt und gefördert wird, ist seit Veröffentlichung der NSA-Akten durch Edward Snowden und die Stasi-Unterlagen in Deutschland bekannt. Die Erhöhung der Sicherheit gilt dabei als Legitimierung für ein solches System. Zum Beispiel wird das Virus COVID-19 dazu ausgenutzt, die Personenerkennung und das Filmen von Menschen, mittels eines Social Distancing Detectors, zu legitimieren oder Menschen zum bargeldlosen Zahlen zu bewegen. Diese Systeme werden zudem immer komfortabler gemacht, damit sie von mehr Menschen genutzt werden. Des Weiteren wird versucht Barzahlungen bei hohen Beträgen zu verbieten und große Geldscheine, wie den 500€ Schein, abzuschaffen, um den Geldfluss besser identifizieren zu können. Zudem gibt es Systeme, wie die Black Box Schufa, welche anhand personenbezogener Daten eine undurchsichtige Risikobewertung erstellt. So können Menschen benachteiligt werden, auch wenn sie stets ihre Rechnungen bezahlt haben. Stattdessen können Alter, Geschlecht, sowie Wohnort entscheidend sein. Ist beispielsweise jemand männlich, jung und lebt in einer wohlhabenden Gegend, so hat er höhere Chancen auf einen Vertrag oder einen Bankkredit. [12,13,17,18,19]

Bild 4: Social Distancing Detector in einer Fußgängerzone[17]

7. Fazit.

Personenidentifizierung kann unter richtiger Verwendung positiv genutzt werden. Durch fehlende Transparenz der Datengrundlage eines solchen Systems gibt es keine Sicherheitsgarantie, da Ton und Bild glaubhaft verändert und missbraucht werden kann. Demnach sollte das Ergebnis stets hinterfragt werden und die Auswertung der Daten transparent erfolgen, denn Personen ohne Fachwissen erwarten gegenwärtig eine derartige Fälschung nicht. Daten über eine Person zu haben bedeutet auch Macht über diese Person zu haben. Das Leben der Menschen durch Datenmissbrauch zu ihrem Nachteil verändert werden. Googles KI-Forscher Ian Goodfellow sagte dazu: “Es ist ein historischer Glücksfall, dass sich die Menschheit in den letzten Jahrzehnten weitgehend auf Videos verlassen konnte, um sicherzustellen, dass etwas tatsächlich passiert ist.” [20]

Referenzen

[1]​https://www.watson.ch/digital/social%20media/564033268-hast-du-verpixelte-fotos-auf-einer-dating-seite-da nn-hast-du-ein-problem
[2]​https://www.blick.ch/news/ausland/neue-aera-totaler-ueberwachung-dank-gesichtserkennung-big-brother-wo hnt-in-china-id7593308.htm
[3]https://mixed.de/ki-ueberwachung-china-kontrolliert-ethnische-minderheit-per-gesichtserkennung/
[4]​https://arxiv.org/pdf/1611.04135v2.pdf
[5]https://mixed.de/kann-eine-kuenstliche-intelligenz-verbrecher-am-gesicht-erkennen/
[6]​https://www.derstandard.de/story/2000093116588/chinas-singender-pavillon-und-die-kuenstliche-intelligenz
[7]https://www.berliner-zeitung.de/zukunft-technologie/kuenstliche-intelligenz-totale-ueberwachung-ist-in-chin a-laengst-normalitaet-li.37733
[8]​https://www.technocracy.news/china-claims-its-social-credit-system-has-restored-morality/
[9]​https://www.welt.de/kultur/article191734655/Wie-China-mit-kuenstlicher-Intelligenz-zum-Ueberwachungsst aat-wird.html
[10]​https://www.golem.de/news/gesichtserkennung-schule-in-china-testet-system-zur-aufmerksamkeitserkennu ng-1805-134465.html
[11]​https://netzpolitik.org/2019/gesichtserkennung-automatisierter-rassismus-gegen-uigurische-minderheit-in-china/
[12]​https://netzpolitik.org/2019/usa-erneut-klage-gegen-massenueberwachung-durch-nsa-abgewiesen/
[13]​https://www.zeit.de/2019/39/edward-snowden-whistleblower-staatsfeind-cia
[14]​https://ars.electronica.art/center/de/obama-deep-fake/
[15]​https://futurezone.at/digital-life/deepfake-barack-obama-schimpft-in-video-ueber-donald-trump/400023301
[16]​https://www.faz.net/aktuell/wirtschaft/infografik-chinas-sozialkredit-system-15913709.html
[17]​https://mixed.de/social-distancing-abstandskontrolle-per-ki-tracking/
[18]​https://www.aachener-zeitung.de/ratgeber/geld/corona-und-die-bargeldabschaffung_aid-50403137
[19]​https://www.faz.net/aktuell/finanzen/bargeld-noch-eine-menge-500-euro-scheine-im-umlauf-16718681.html
[20]https://mixed.de/kuenstliche-intelligenz-ki-forscher-glaubt-an-massenhaft-fake-multimedia/

The development of the intranet into BeyondCorp

Aron Köcher, Miro Bilge

Only a few years earlier, the solution to exchange digital information like documents or pictures was to establish a physical connection between the participants. A usb stick was passed around the class to exchange music, you went to your friends house to print some urgent papers or a group of friends met to play games via LAN. With the increasing access to the Internet, new solutions have emerged.
While some users are satisfied with sending files by mail, for a company with different locations and a large number of digital data, a business solution is required.
With Virtual Private Network (VPN), a solution was created that allows one or more participants to become part of another network. The connected member has full access to devices, data and services as if he were physically present. Instead of a real connection, a tunnel is built over public networks, which is why it is called “virtual private”. The basic structure of a virtual private network is always the same.  A VPN connection consists of two participants, a client and a server, which establish a connection. Depending on the protocol, the connection can be encrypted and use different layers.

Initially, independent lines and connections based on the data link layer were used to connect individual locations. With the help of Frame Relay, a permanent, virtual link was established between the sites. This technology was replaced with the increasing change from layer 2 to ip-based network technology.  Compared to a dedicated line, the financial costs are much lower with the benefits of an ip-based link. Only a one-time configuration and Internet access, which is usually available anyway, is required.
The end user of VPNs usually does not come into contact with Layer-2 VPNs, because if a dedicated line is rented, it is set up in the background by the network operator and appears to the consumer as a physical connection. Therefore, due to current usage and the relevance for the readers of this blog, we will only discuss ip-based VPN in the following. 

Besides different protocols with different encryption methods, there are three types of VPN:

Client to Client VPN

Here a connection between two clients is established. This is used, for example, to control a computer using TeamViewer. and it is the only connection type where the complete message traffic is encrypted. Therefore it is limited to two devices.

Client to Site

With a Client to Site VPN, it is possible to connect a client to a remote network. This makes it possible for remote employees, such as during the COVID-19 crisis in the home office, to be a regular part of the company network. They have unlimited access to data and devices in the network. On the network side, this requires a VPN server to which the employee can connect after configuring a VPN client. The client software comes with the common operating systems and is also available as a mobile solution for smartphones. The message traffic is encrypted until it enters the network. This can also be used to protect privacy. Web pages that are accessed via a VPN only see the VPN server, but not the client. This allows the user to spoof his position and, unless logging is used on the server, he cannot be distinguished from the other users. Furthermore a Man-In-The-Middle attack in insecure networks is made more difficult. Attackers only see that an encrypted tunnel to a server has been established by the client. It is hard to draw conclusions about the called services, even if they are not encrypted. The different data protocols are repackaged in a VPN frame and are therefore not recognizable.

Site to Site

If companies grow beyond one location, the question arises how employees at both locations are given access to company data. 
For smaller locations, a client-to-site solution is sufficient to allow employees to access data from the main location. If the second location receives services, devices or data carriers that are to be accessed bidirectionally, both networks must be accessible via VPN. For this purpose, a VPN server is set up on the main network and a VPN concentrator on the second. With this site-to-site VPN, all internal network connections are exchanged between the VPN nodes and the two networks appear as one large network.


Limitations

By tunnelling all packets over the VPN interface, all network traffic and speed is directly dependent on this connection. With the failure of the VPN, internal processes are disrupted and, depending on use, the company’s business might come to a halt.
A further point is the security of a VPN. Externally, the network is protected against dangers from the Internet by the firewall. With the VPN a user becomes part of the network and has access to the devices and data contained in it. If an attacker overcomes the encryption of the VPN, the advantage of unrestricted access becomes its disadvantage. 
Apart from the security concept which ends at the VPN gateway and firewall, the VPN tunnel is also not untouchable. The number, size and VPN remote terminal offers the possibility of drawing conclusions about the transmitted data.
In addition to security, the provision of services and files and the extension of the network to include external services requires complex configuration. The internal network must be divided into further subnets with different access rights. This adjustment may be required for each customer and service. This demands active network management and can quickly become difficult to manage.

Network Access Control


There are various approaches to prevent the flexibility and options of a VPN, such as location and device independence, from becoming a vulnerability in the company network. 
The difficulty lies in the fact that by bridging the VPN, malicious software has access to the internal network. So the approach is to prevent malware from entering the network in the first place. In return, administrators can only grant access to known devices and restrict the installation of drivers and programs on these devices. These restrictions must always be re-evaluated in terms of productivity and must not restrict the user too much. With increasing rights, such as required port sharing, the control over each individual device becomes more difficult.  Moreover, the case of a contaminated device is not covered. This is why Cisco published the first approach to transferring network security away from the devices to the network as early as 2003. With Network Access Control (NAC), all devices that want access to the network are subjected to a security check. NAC thus forms a further layer between the VPN and the network, which handles access to services and resources.

System Overview Network Access Control [17:
https://blogs.getcertifiedgetahead.com/network-access-control/]

For the NAC system to grant access to a compliant device, the definition of compliant must first be defined in a policy rule. Depending on the NAC software and provider, the possibilities of the rules to be set vary. With current anti-virus signatures and installed security updates and patches, Cisco has created a basis for its approach. The network needs additional help to read out such information and check whether the connected device is a new one. The installation of an agent on the devices offers, in addition to access to this data, the possibility of automatically restoring a compliant state in case of non-compliance. 
If a device wants to connect to the network, the NAC Health Server notifies the agent to read off the necessary data and checks it against the rules for the respective user group. When the status of the device does not match the rule set, the device is quarantined and cannot access the network. 
The NAC server sends the deficiencies to the NAC agent on the unit, which then tries to resolve them.This includes everything from simply installing updates up to removing programs and software. If the compliance of the device can be restored, the NAC server allows access to the resources. The remediation process should be as self-sufficient as possible, but can quickly become quite complex depending on the deficiencies and user role.  When creating the rules, different scenarios must therefore be considered. For example, if a customer needs access to shared resources but has an outdated operating system, the agent cannot simply upgrade it, but in principle cannot deny access to these resources. The resource would have to be relocated to a sub-network, which still does not clarify the question of how this potentially risky resource is handled internally. Depending on the number of customers and resources, this process also becomes increasingly complex and difficult to maintain.

Software Defined Perimeter

Compared to Network Access Control, Software Defined Perimeter (SDP) does not establish the connection until the device and user have been authorized and authenticated. At the time of the authentication process, the location of the resource is unknown because it is not registered in the DNS. This is why SDP is also called a Black Cloud, which has many advantages over NAC. With the unique assignment of access rights and roles for each resource, a segmentation of the network is no longer necessary. Resources are only accessible for the respective user roles. This simplifies management by eliminating the need to create an additional subnet for each customer or service. The customer receives a user and is assigned to the resource.

System Overview Software Defined Perimeter [14: https://procureadvisor.com/the-definitive-guide-to-software-defined-perimeter/]

If the customer now wants to access the resource, he first contacts the SDP controller, which confirms his identity and integrity via the user management. Then the user is authorised and receives an authentication token. This token contains the resources that the user can access. If the user now accesses a resource, a VPN connection is established to the respective SDP gateway. This connection is established and terminated automatically via a client software. At the SDP gateway, the user is again identified via his token and then gains access to the one resource. 
By using SDP, Destributed Denial of Service, Man-in-the-Middle Attack and Code Injection attacks are prevented or made more difficult. In addition, in most cases the attacker does not receive access to the entire network if the attack is successful. 
The use of Software Defined Perimeter forms the basis of Zero Trust Network Access (ZTNA). No device, user or service inside or outside the network is trusted. Every single connection is encrypted and no resource can be reached without prior authentication. Viewing different connections as a separate environment with individual security requirements creates a minimal space of attack. ZTNA is transparent for the user, he only has to log in once via the client and can then access the resources.

To set it up, all users and resources must be assigned a user role and a predefined risk profile. This categorization of services, users and devices means a lot of effort for the company. Once completed, the system can easily be expanded to include roles and guidelines.

BeyondCorp

Before we take a closer look at the BeyondCorp Remote Access business model, the following chapter will first discuss Google’s initial idea. In 2011, Google started to develop its own intranet away from the VPN and towards Google BeyondCorp. 

Google’s idea was to get rid of the privileged network with single perimeter security and move to a more flexible solution similar to the Zero Trust Model. Important core components were to evaluate access depending on the respective device and the respective user. For example, a user can be authorized to access a resource from his company laptop, but if he wants to access the same resource via smartphone, this is not allowed. Furthermore, BeyondCorp is intended to provide unlimited network location and user experience. This means that it should make no difference to the employee whether he works from home, the company location or a public Internet café (depending on latency, of course). The same user experience also means that this can only be achieved if secure access is possible without VPN for the employees.

Google’s BeyondCorp was built on the basis of these core components. To ensure these key elements are in place, every request is fully authenticated, authorized and encrypted no matter where it is made from.

Architecture

To realize Google’s goals, the network architecture was redesigned. In the following the individual architecture components are described on the basis of the diagram:

1) Although it is a Google building, there is a privileged network, i.e. a network in which users are trusted and an unprivileged network. The latter is, similar to an external network, not trustworthy at first sight. Users who are in the Google building and on the network could just as well be sitting in a public Internet café from a security point of view. Therefore, access from the Google Building is equivalent to remote access. The difference is that it is possible to make requests from the unprivileged network via a private address space.
Consequently, requests to the Internet run from the unprivileged network. If a user wants to make an internal access to another part of the Google network, this is checked via an Access Control List (ACL). 

2) All user requests e.g. from the unprivileged network or enterprise applications from Google run through an Internet Access Proxy. This proxy forces an encrypted connection between the connection partners. The proxy can be specially configured for each application and offers various features such as global reachability, load balancing, access control checks, application health checks and DoS protection.

3) The basic prerequisite for granting access to the Access Proxy from the unprivileged network as well as from the public network is that the device has a so-called “Managed Device” status. This status means that the device is actively managed by the company and only these devices can access company applications via the Access Proxy. At the same time, Managed Device status implies that the company can track, monitor and analyze changes to the device. The goal of this is to be able to react dynamically to the security status of each device in order to allow or deny requests.
Technically, the Managed Device Status is realized by a certificate.  Each device that has the declared status is unique and can be recognized by the certificate. The certificate is renewed periodically and serves as a key to confirm that the device information is valid. In order to obtain a certificate, the respective device must be present in the Device Inventory Database (DID in short) and correctly stored. On the device, the certificate is then stored on a TPM (Trusted Platform Module) or a qualified certificate store, depending on the platform either on the hardware side or on the software side. 

4) The Access Proxy is fed by the Access Control Engine so that it can decide which requests from which user and which device it allows and which it does not allow. Based on the Access Control Engine, the Access Proxy can act as a dynamic access layer. In order to provide the Access Proxy with “advisory” support, the Access Control Engine itself has various sources of information at its disposal. Based on this data, both static rules and heuristics are deduced. In addition, machine learning is also used. Information that can be relevant for the Access Control Engine can be, for example, the operating system version number, the device class (cell phone model, tablet, …), access from a new location, user or user group, the device certificate, but also other information and analyses from the device inventory database.
For each request, the Access Control Engine then evaluates whether the required security level matches the security level established for the requested device based on the analyzed data. By determining the security level on the request side, it is also possible to separate parts of an application. For example: A user may be authorized to view an entry in a bug tracking software. But if he wants to update the status of the bug or edit the ticket, it is possible that this request will be blocked, because the trust to this user is not sufficient.

5) The Access Control Engine is in turn fed by a pipeline that extracts and aggregates the dynamic information.

6) BeyondCorp also uses Single Sign On for authentication, similar to the classic Zero Trust Model. The central user authentication portal is used to validate the primary access data. Furthermore, a two-factor authentication was added in the same step. After validation, short-lived tokens are generated, which then form part of the authorization process for specific resources. Depending on the trust level of the resource, the authentication measures can be more or less stringent. 
Since the administration of user groups and associated authorizations is relatively complex, for example if authorizations can change when a department changes, the user/group database is closely linked to the processes of HR (Human Resources). Consequently, if there is a new hire, new role/responsibility or someone leaves the company, these processes are recorded in HR. Every change in the HR processes also triggers an update in the database. This ensures that the employee data is always kept updated, while the effort required to keep the database up-to-date at the same time remains low.

7) Besides Single Sign On, Google uses RADIUS Server for network authentication. A user’s access via LAN or WLAN is transferred to the corresponding network via the RADIUS Server, so that an attacker cannot attack the entire network, but only a segment. In the case of Google, the RADIUS server assigns a managed device to the unprivileged network as soon as the device has authenticated itself using a certificate and 802.1x handshake. Another advantage besides security is that the network management is not done statically via fixed VLAN areas and switch/port configurations but can be dynamically referenced. Other devices without certificates for example are assigned to a guest network. In addition, in the case of an outdated device version the RADIUS server can also refer a potentially compromised device from the unprivileged network to a special quarantine network.

Architecture of BeyondCorp components. Own presentation according to [6: A New Approach to Enterprise Security (BeyondCorp)] 

As could be seen in the architecture, the Access Proxy plays a central role in the development of BeyondCorp, and since Google has tried to reuse as much existing technology as possible in the architecture, with the Access Proxy they have done the same. This was based on HTTP/HTTPS reverse proxies, so-called Google Front Ends (GFEs), which were already used in the front-end infrastructure and offered load balancing and TLS handshake “as a service”. These were subsequently extended to access proxies with several configurations like authentication and authorization policies. Since the Access Proxy is a central communication element, it supports OpenID Connect and OAuth as well as user-defined protocols that can be integrated. As a result, the user authenticates himself to the Access Proxy. If access is granted by the Access Control Engine, the request is forwarded to the backend service without any further credentials. There are several reasons for this. On the one hand, this increases security, since no credentials are intercepted on the backend side. Secondly, the Access Proxy is transparent for the backend. If the backend service supports its own authentication e.g. by credentials and/or cookies, confusion would occur if these credentials were also passed on to the backend service.
Nevertheless, the communication between Access Proxy and Backend Service must be secured. Therefore the internal communication takes place via HTTP with an encrypted channel. For this Google uses an internal authentication and encryption framework called LOAS (Low Overhead Authentication System), which enables the service to trust all receiving data. The framework works with mutual authentication which means that both entities in a communications link authenticate each other. This ensures that metadata is also not spoofable. An advantage of this is that new features can be added to the Access Proxy and different backend services can subscribe to the new features by parsing header fields.
Also the combination of Access Proxy with Access Control List through the Access Control Engine offers some advantages. For example, the central location of the components provides a uniform access point, which makes forensic analysis more effective, since logging is controlled centrally, so that an attack can be responded to not only one service, but directly for all backend services. Furthermore, enforcement policies can be managed centrally and defined consistently. Changes can thus be implemented more quickly. Another advantage is that backend developers do not have to worry about authorization. If the trust level of the service does not require any further authentication measures, the developer can rely on the fact that users are already homogeneously authenticated. If this is not sufficient, the rough approach can be refined by a fine-grained approach. For example, if a database application requires an additional authentication measure, this can be combined by the service itself integrating authentication. In this way, the system remains maximally flexible to the needs of the respective service. The service only has to initially configure the Access Proxy correctly to ensure that external communication between the service and the Access Proxy works.

After showing the architecture, the question arises how employees without VPN access can access the network from a client perspective? The answer from Google’s BeyondCorp provides a chrome extension. All access requirements, whether in the office or on the road, are handled through this access point. This is possible at Google, since the majority of all applications are accessible via the web according to the internal company guideline “online first” and the percentage of local applications is kept to a minimum.

The extension automatically manages a user’s Proxy Auto-Config (PAC) files and then routes the user through the Access Proxy to the appropriate destination. When a user connects to a network, the extension automatically downloads the latest PAC file and displays the “Good Connection” icon. Since all requests from the BeyondCorp extension are routed to the Access Proxy, it cannot communicate with devices that cannot reach the Access Proxy. For example, the local printer at the employee’s home. The status provides a solution here. When the employee enters the printer’s IP address in a new browser tab for configuration purposes, the request is sent to the Access Proxy along with all other private address space traffic. The routing request fails and the user receives a failure. Customized 502 error messages have been implemented to tell the employee that the extension must be switched to “Off:Direct”. Subsequently, the user can configure the printer and afterwards reconnect to the Access Proxy.

Infrastructure components

In the upper section we have often talked about different trust levels. In the following section, we will take a closer look at trust tiers and how BeyondCorp structures its infrastructure elements. 

Each resource is associated with a minimum trust tier that is necessary for access. The rule is that the lower the level, the more sensitive the information and thus the higher the necessary trust. If an employee now wants to access a resource, the first step is to check the trust level of the employee and his device. Subsequently, a trust level is assigned to the employee. Then it is checked whether the employee’s trust level is equal to or higher than the trust level of the requested resource. This results in the advantage that maintenance costs, e.g. costs for support and productivity, of highly secured devices are kept low and usability is improved at the same time.

On the architecture side, the Trust Inferer is responsible for the classification of the trust, which continuously analyzes the device status for the trust evaluation and thus determines the trust level. For this purpose, it uses the information of the Device Inventory Service, which in turn uses various information for aggregation (see figure below). If, for example, a laptop has not applied a security patch for the operating system, this may be less severe for a laptop with a lower trust level than for a laptop that was initially assigned a higher trust level. Conversely, this laptop with a high trust level could be temporarily downgraded due to the missing patch until the patch is applied. This way, employees are always encouraged to keep their software up-to-date. If the trust level has dropped to a minimum, consequences can also be drawn on the network side: A completely outdated laptop can thus be transferred to a quarantine network until the device is rehabilitated. This limits access to resources to a maximum and protects confidential information.

Architecture of the BeyondCorp Infrastructure Components [7:  Design to Deployment at Google (BeyondCorp)]

BeyondCorp Migration

As we have already seen in previous sections, it is not easy to convert the company’s own intranet in the same way as BeyondCorp, since a relatively large amount of restructuring is required, both on the network side and from the perspective of the overall architecture. The following section gives some advice on how to do this and how Google has implemented the restructuring.

First of all, it is important to realize that the intranet conversion is initially less technical effort, but rather more bureaucratic. One must be aware that the conversion affects the entire company including all employees and therefore the idea should be communicated early. The goal of this is to get maximum support at all management levels, which also means that everyone in management must have understood the benefits of a restructuring for the company. For example, reducing the risk of attacks could be an argument, while at the same time improving productivity. A risk table can be helpful for better understanding, as shown in the following example: 

Own figure

Once the management has understood that the changeover makes sense, supporting processes can be set up through early communication, which can be declared by change management. 

It is also important to be aware that the changeover is a lengthy process. It is only possible to renew incrementally, as many layers are affected, such as the network, security gateways, client platforms and backend services. Therefore, it makes sense to define migration teams in the different layers and to determine a leader who coordinates with the other leaders of all layers. 

Automatic transfer of employees to Managed Non-Privileged Network

The idea of Google was to keep the administrative effort to transfer employees from the privileged network to the unprivileged network as small as possible. For this purpose, a pipeline was developed that would automatically move the user away from the VPN to BeyondCorp with the Chrome Extension. The pipeline consists of three phases and starts with the logging mode: A traffic monitor was initially installed on each device. Each call from the privileged network is analyzed via an Access Control List (ACL) and classified whether the same call would have been accessible from the unprivileged network. This means that the same service would have been accessible via the Access Proxy. This was then logged and recorded. The content of the ACL was then stored centrally in a repository with the source IP address data to identify the user and the destination IP address to determine which service was not available. In the first phase it was possible to analyze relatively quickly which service was not yet connected to the Access Proxy, but at the same time had a high demand from the employees. As a result, a prioritization list could be created, which services should be attached in which order. The logging mode was executed until the following rule came into effect: If the employee could have accessed more than 99.9% of the content over 30 days via the unprivileged network, he will be put into enforcement mode after an e-mail notification with the employee’s consent. This differs from the logging mode in that requests that could not have been accessed from the unprivileged network are captured and dropped. If an employee has again been able to reach more than 99.99% of the requests via the unprivileged network over a period of 30 days, he will be transferred to the unprivileged network again after an e-mail notification. If less than 99.99% of the requests can be reached from the unprivileged network or the employee rejects the request, he is automatically downgraded back to logging mode. With this approach more than 50% of all employees could be automatically transferred to the unprivileged network.

The pipeline for moving Google computers to the Managed Non-Privileged (MNP) network [9: Maintaining Productivity While Improving Security (Migrating to BeyondCorp)]

BeyondCorp Remote Access

In early 2020, Google launched BeyondCorp Remote Access. This is a SaaS solution designed to support companies, especially during COVID-19, to be able to work securely from home without VPN access. The reason for the current launch, to create an alternative without VPN, is the bottleneck of VPN. Due to the sudden shift away from the office to the home office, many IT departments could not provide enough or a stable VPN network for all employees. Google has heard from many customers that this has made it impossible to access internal web applications such as customer service systems, software bug trackers and project management dashboards that would otherwise have been easily accessible from the company’s own network via web browser.

As a result, BeyondCorp Remote Access was released as a zero-trust solution based on its own BeyondCorp system. In addition to the aforementioned advantage of providing a fast solution without VPN, Google promises that customers, for example, can also easily access internal web applications. The proxy service also has enforcement policies that are checked depending on location, user and device. For example, Google provides the following example of an enforcement policy in its blog entry: “My HR managers who work from home with their own laptops can access our web-based document management system (and nothing else), but only if they use the latest version of the operating system and use phishing resistant authentication such as security keys.”. 

Another advantage of BeyondCorp Remote Access is its rapid deployment. With little local technology required and the ability to incrementally migrate individual applications, the Google service can be quickly integrated into the corporate structure, with Google advising that during a pandemic, key services should be connected first and then incrementally added to keep employee productivity high. This includes network-side architectural changes and security controls, as internal web applications can remain hosted in the same location. BeyondCorp Remote Access only takes care of the connection and linkage between application and employee. Finally, with the proxy service, a company can also avoid outsourcing time-consuming deployment, maintenance and infrastructure management tasks to the cloud and simplify licensing. This also promises easy scaling, low latencies and redundancy.

Overview BeyondCorp Remote Access Architecture [12: https://medium.com/andcloudio/remote-access-with-beyondcorp-f3bedd1432f2]

How does BeyondCorp Remote Access work?

If the user tries to call up a web application, the access first goes to the Cloud Identity Access Proxy (IAP). In addition to load balancing and encryption, the IAP also takes care of authentication and authorization. The service uses Google Accounts for this purpose. It is also possible to connect a local identity management system such as Active Directory. In this case, Google Cloud Directory Sync is used to synchronize user names with the cloud identity, while passwords are stored locally and instead SAML (Security Assertion Markup Language) as SSO is implemented to authenticate users with the existing local identity management system. The access between client and proxy is then analogous to Google’s BeyondCorp via a chrome extension. This extension collects and reports device information that is constantly synchronized with the Google Cloud and can be stored in a device inventory database in the Google Cloud. 

Subsequently, IAM (Identity- and Access Management) roles can be used during authorization to decide whether or not the user is granted access. Behind the firewall is then the IAP Connector, which is used to forward the data traffic secured by Cloud IAP to local applications. This is also supported by a DNS, which creates public domain names for the internal local apps and assigns the IAP Proxy IP address to them. This allows access to a locally hosted enterprise application. It is also possible to integrate Google Cloud Apps and applications from other clouds.

Connecting BeyondCorp Remote Access and local web application [12: https://medium.com/andcloudio/remote-access-with-beyondcorp-f3bedd1432f2]

To initially link the company’s internal network traffic with Google Cloud and remote access, there are three solutions that Google offers. Firstly, the company’s internal network is connected via Dedicated Interconnect (direct connection to Google). The traffic flows directly from network to network, not over the public Internet. The next variant is Partner Interconnect (more connection points through a supported service provider). Here, the traffic between the networks is routed via a service provider, but also not via the public Internet. The last variant is to use IPsec VPN, where the traffic is extended to the Google Cloud VPC (virtual private cloud) network, which enables private IP addresses as well.

Reservations about BeyondCorp Remote Access

While BeyondCorp Remote Access offers many advantages, it also provides some concerns that are discussed below:

First, BeyondCorp Remote Access is limited only to web and cloud based applications that can be linked. In the long term, Google plans to link local applications as well, but this is not yet possible. Another drawback is that each application must be individually integrated into the system. Therefore, in times of the pandemic, Google recommends prioritizing the applications according to their importance and to see which applications should be connected first. This must be done incrementally and there is no generic solution that can connect all applications at once with one configuration. Another point is the deep integration of the Google Cloud with the company network. This entails both technical and financial dependency. The former, because web applications moved to the cloud allow both Control Plane and Data Plane to operate via it. Furthermore, in the event of a technical problem, the administrators can do nothing to remedy it. You have to wait until Google gets the problem under control. In March 2019, for example, there was an operational disruption in the Google Cloud that would have made the company network unreachable from the outside. Financial dependency is also a point that should not be neglected. If the entire company architecture is linked to the Google Cloud over time, the company is also dependent on its pricing policy. If prices rise to this extent, moving to an alternative system will be very expensive and possibly not profitable. Finally, data protection is also an important issue. Depending on the explosive nature of the data, a company must consider whether it should be linked to the Google Cloud. All queries run via Google’s identity proxy, it is questionable whether every company wants to give Google such deep insights into the system. The same applies to user recognition. Even if you integrate an Active Directory system, user names are still synchronized via the Google Cloud. Finally, not all institutions are authorized to integrate BeyondCorp Remote Access. For example, the HdM would not have the necessary authority to connect students to the intranet via remote access because SSO must not synchronize from LDAP. 

Conclusion

In summary, a zero trust approach makes sense in any case. The Zero Trust solution greatly simplifies the security over VPN access and firewall as a single perimeter, and also the complicated administrative overhead of integrating mobile devices and cloud systems. Each access is evaluated not only on the basis of authorization, but also depending on the respective request. This allows a much more fine-grained determination of whether or not access is permitted in the context of time, place and device. BeyondCorp Remote Access is also very useful for small companies, especially in times of COVID-19, to allow easy and fast access to the home office without VPN. However, the dependency on Google is a risk that must be made aware of and evaluated in the company context. If necessary, it may be worthwhile in the medium term to fall back on BeyondCorp Remote Access during the pandemic, but in the long term it is worth planning a strategy to set up one’s own zero trust model.

Further Reading

https://gcppodcast.com/post/episode-221-beyondcorp-with-robert-sadowski/

https://cloud.google.com/solutions/beyondcorp-remote-access?hl=de

https://cloud.google.com/beyondcorp?hl=de

https://www.computerwoche.de/a/zero-trust-verstehen-und-umsetzen,3547307

Sources

[1]: Bhattarai, Saugat & Nepal, Sushil. (2016). VPN research (Term Paper). 10.13140/RG.2.1.4215.8160. (Accessed 31.9.2020)

[2]: Sridevi, Sridevi & D H, Manjaiah. (2012). Technical Overview of Virtual Private Networks(VPNs). International Journal of Scientific Research. 2. 93-96. 10.15373/22778179/JULY2013/32. 

[3]: Minnich, S. (2020, August 13). Heise Medien GmbH & Co. KG. Retrieved September 05, 2020, from
https://www.heise.de/download/specials/Anonym-surfen-mit-VPN-Die-besten-VPN-Anbieter-im-Vergleich-3798036

[4]: Helling, P. (n.d.). Was ist VPN? Retrieved September 05, 2020, from https://www.netzorange.de/it-ratgeber/vpn-bietet-sichere-verbindungen-auf-unsicheren-kanaelen/

[5]: Török, E. (2009, August 10). NAC-Grundlagen, Teil 1: Sicheres Netzwerk durch Network Access Control. Retrieved September 13, 2020, from https://www.tecchannel.de/a/sicheres-netzwerk-durch-network-access-control,2020365,3

[6]: Ward, R., & Beyer, B. (2014, December). A New Approach to Enterprise Security (BeyondCorp). 39(6)

[7]: Osborn, B. A., Mcwilliams, J., Beyer, B., & Saltonstall, M. X. (2016). Design to Deployment at Google (BeyondCorp). 41(1)

[8]: Luca Cittadini, Batz Spear, Betsy Beyer, & Max Saltonstall. (2016). The Access Proxy (BeyondCorp Part III). 41(4)

[9]: Peck, J., Beyer, B., Beske, C., & Saltonstall, M. X. (2017). Maintaining Productivity While Improving Security (Migrating to BeyondCorp). 42(2)

[10]: Victor Escobedo, Betry Beyer, Max Saltonstall, & Filip Żyźniewski. (2017). The User Experience (BeyondCorp 5). 42(3)

[11]: Hunter King, Michael Janosko, Betsy Beyer, & Max Saltonstall. (2018). Building a Healthy Fleet (BeyondCorp). 43(3)

[12]: Gunjetti, D. kumar. (2020, August 28). Remote Access to Corporate Apps with BeyondCorp. Medium. Retrieved 13. September 2020, from 
https://medium.com/andcloudio/remote-access-with-beyondcorp-f3bedd1432f2

[13]: Keep your teams working safely with BeyondCorp Remote Access. (o. J.). Google Cloud Blog. Retrieved 13. September 2020, from https://cloud.google.com/blog/products/identity-security/keep-your-teams-working-safely-with-beyondcorp-remote-access/

[14]: ProcureAdvisor, & *, N. (2019, February 14). The definitive guide to Software-defined perimeter. Retrieved September 13, 2020, from https://procureadvisor.com/the-definitive-guide-to-software-defined-perimeter/

[15]: „ZTNA”-Technologien: Was ist das, warum jetzt erwerben und wie wählt man die richtige Lösung? (n.d.). Retrieved September 13, 2020, from https://www.zscaler.de/blogs/corporate/ztna-technologies-what-they-are-why-now-and-how-choose

[16]: Problem mit Google Cloud: Massive Störung bei mehreren Google-Diensten. (o. J.). Retrieved September 13, 2020, from https://www.handelsblatt.com/technik/it-internet/problem-mit-google-cloud-massive-stoerung-bei-mehreren-google-diensten/24413414.html

[17]: Darril. (2015, March 19). Network Access Control. Retrieved September 14, 2020, from
https://blogs.getcertifiedgetahead.com/network-access-control/

Behind the scenes of modern operating systems — Security through isolation (Part 1)

In recent years, since the Internet has become available to almost anyone, application and runtime security is important more than ever. Be it an (unknown) application you download and run from the Internet or some server application you expose to the Internet, it’s almost certainly a bad idea to run apps without any security restrictions applied:
Continue reading