Cloud Gaming lässt sich mit Remote Desktops, Cloud Computing und Video on Demand Diensten vergleichen. Im Grunde beinhaltet Cloud Gaming das Streamen von Videospielen aus der Cloud zum Endkunden. Dabei erfasst und überträgt der Client seine Nutzereingaben (bspw. Maus, Tastatur, Controller) an den Server. Während dieser wiederum die Gesamtspielweltberechnung sowie Nutzereingaben-Auswertung bewältigt. Der Client stellt lediglich die Kapazität bereit, die Frames in der gewünschten Qualität gestreamt zu bekommen. Wesentliche Vorteile ergeben sich für leistungsschwache Geräte, während der offenkundige Nachteil in der Notwendigkeit einer gut ausgebauten Dateninfrastruktur liegt.
Neben Game Streaming gehört zum Cloud Gaming auch das Hosten von Serverinstanzen für Onlinespiele, sowie das Bereitstellen von Plattformdiensten (bspw. Bestenlisten, Chatsysteme, Authentifizierung etc.). Wesentlicher Bestandteil ist des Weiteren das Angebot von leistungsfähigen Downloadservern.
Plattformdienste und etablierte Nutzung der Cloud
Plattform- oder Onlinedienste bieten Schnittstellen, welche Spielemetadaten zur Verfügung stellen. Diese Dienste Umfassen in der Regel Funktionalitäten wie Bestenlisten, Chat- und Gruppensysteme sowie Onlinelobbys, aber auch Metadienste wie Authentifizierung, Analyse, Zuordnung von Spieleridentitäten und Matchmaking. Hierbei können Dienste sowohl öffentlich im Internet als auch intern für andere Dienste zur Verfügung stehen. Während diese Systeme zu Beginn ihres Aufkommens noch als einzelne Monolithen bereitgestellt wurden, bieten heutige Cloudbetreiber solche Dienste als Microservices an. Dies sorgt dafür, dass sich die hohe Skalierbarkeit der Cloud auf die Plattformdienste übertragen lässt. Diese Skalierbarkeit ist vor allem deshalb wichtig geworden, da der Spielemarkt in den letzten Jahren stark gewachsen ist und auch die Spiele selbst immer ressourcenintensiver geworden sind. Als gutes Beispiel kann man hierfür eine der größten Spieleplattformen hernehmen: Steam. Steam integriert viele der angebotenen Spiele in die eigenen Plattformdienste. Dies umfasst Beispielsweise eine Freundesliste, Chatsysteme, Matchmaking und Verbindungssysteme. Zusätzlich wird auch die dahinter liegende Infrastruktur für die Vermarktung der Spiele von Steam gestemmt. Dies umfasst einen Webshop und Downloadserver.
Der große Aufwand und die Nachfrage nach diesen Diensten zeigen sich anhand des weltweiten Datenverkehrs von Steam. So kommt zum Zeitpunkt dieses Blogeintrags allein der deutsche Datenverkehr auf über 35 Petabyte innerhalb der letzten 7 Tage. Und dies wiederum entspricht lediglich ca. 4,3% des weltweiten Datenvolumens.
Diese Zahlen steigen vor allem dann rasant an, wenn es zu speziellen Aktionen kommt, wie etwa dem Release eines neuen, stark erwarteten Spieles oder Sale-Aktionen vergleichbar mit einem Black-Friday für Games. Hierbei kommt es dazu, dass teilweise Millionen von Spielern gleichzeitig das neue Produkt erwerben und herunterladen wollen.
Diese starke Belastung spüren aber nicht nur Shop- und Downloadserver, sondern natürlich auch die klassischen dedizierten Spieleserver selbst. Gerade beim Release eines neuen Massive Multiplayer Onlinespiels (MMO) oder einer neuen großen Inhaltserweiterung versuchen sich gleichzeitig mehrere tausende Spieler auf den Spieleservern einzuloggen, während es zum Normalbetrieb meist nur ein Bruchteil der User ist. Auch hier hilft die hohe Skalierbarkeit der Cloud. Da solche Ereignisse normalerweise zu dem Betreiber bewussten und geplanten Zeiten auftreten können allerdings im Vorfeld schon Ressourcen reserviert und bereitgestellt werden.
Technische Herausforderungen des Game Streaming
Während Plattform- oder Onlinedienste auf die heute weit verbreiteten und gut etablierten Microservice Strukturen und Architektur zurückgreifen, eröffnen sich mit Games as a Service oder Game Streaming ganz neue Herausforderungen. Die Simulation des eigentlichen Spiels kann noch problemlos in einer emulierten Umgebung ablaufen und seine Inputs von außen beziehen, sowie das berechnete Resultat nach außen über einen Video stream abgeben. Das wahre Problem liegt allerdings in der Latenz. Bei Games handelt es sich im Gegensatz zu den meisten anderen Medien um ein interaktives Medium. Das heißt auf eine Aktion des Nutzers muss idealerweise eine unmittelbare Reaktion des Mediums erfolgen. Bei Game Streaming sind die Ansprüche daran besonders hoch, wenn es mit anderen interaktiven Medien wie einem Livestream mit Live Chat verglichen wird. Hier sind Verzögerungen von bis zu 1 Sekunde noch akzeptabel. Bei Spielen hingegen wird eine Latenz von wenigen Millisekunden erwartet. In dieser Zeit muss also die Eingabe beim Client registriert, an den Server geschickt, dort verarbeitet und ein neues Bild an den Client zurückgeschickt, decodiert und angezeigt werden.
Was kann Cloud Gaming?
Viele Vorteile aus Cloud Computing und Video on Demand Diensten können sich direkt auf Game Streaming übertragen lassen.
Für das Spielen von Games aus der Cloud wird keine teure, eigene Hardware benötigt. Der Streamingdienst Betreiber stellt die nötige Hardware zur Verfügung, um das gewünschte Spiel auf einer maximalen Qualitätsstufe darstellen zu können.
Um die Wartung, Instandhaltung und Modernisierung der Hardwaresysteme kümmert sich der Streaming Anbieter. Für den Endkunden fallen dadurch keine hohen Einzelkosten an.
Spiele stehen jederzeit und überall auf vielen verschiedenen Endgeräten zur Verfügung. Auch leistungsschwache Geräte wie Smartphones, Smart TVs oder einfache Laptops können somit zum Spielen Hardwareintensiver Titel verwendet werden
Das Manipulieren und Betrügen in Online- und Singleplayerspielen wird durch das Streaming erheblich erschwert. Dies resultiert direkt aus der Begrenzung der Interaktionspunkte mit dem Spiel. Lediglich die Bildausgabe und die Nutzereingabe finden auf dem lokalen Gerät statt. Jegliche weitere Informationsverarbeitung, wie beispielsweise die Position eines Mitspielers bleiben dem Nutzer verborgen.
Neben den Vorteilen gibt es jedoch auch einige Nachteile:
Für Gamestreaming ist zwingend eine durchgehende, leistungsstarke Internetverbindung von Nöten. Denn im Gegensatz beispielsweise zum Video on demand kann bei Game Streaming kein Buffering verwendet werden. Dies ist bei Spielen allerdings nicht möglich, da der weitere Verlauf des Spieles direkt von den Eingaben des Spielers abhängig ist. Ein Verbindungsverlust führt somit zwangsläufig zu einer abrupten Unterbrechung der Spielsession.
Schwankungen in der Bandbreite führen zu einer Drosselung der Bildqualität und mindern damit das Spielerlebnis.
Das Übertragen bereits in Besitz befindlicher Spiele werden von vielen Anbietern nicht unterstützt. Dies kann dazu führen, dass Spiele entweder nicht zur Verfügung stehen oder nochmals auf der Streaming Plattform gekauft werden müssen.
Das Modifizieren der eigenen Spieldaten wird unterbunden, da nicht lokal auf die Spielinhalte zugegriffen werden kann. Dies verhindert, dass Spieler Modifikationen erstellen und ihr Spielerlebnis bei Bedarf individuell anpassen können.
Der Anbieter entscheidet welche Publisher und Franchise in seinem Portfolie aufgenommen werden. Dies erschwert vor allem kleine Studios oder Indie Entwickler sich auf dem Markt zu etablieren.
Spiele stehen zusätzlich nur solange zur Verfügung so lange sie sich im Angebot des Streamingdienst befinden oder dieser die Lizenzen hierfür besitzt.
Cloud Gaming umfasst mehr als nur Game Streaming. Es ist bereits fester Bestandteil der heutigen Infrastruktur, da ein Großteil der Spiele auf Cloud-Dienste zurückgreift oder durch Plattformen wie bspw. Steam in diese integriert wird. Zwar steht Game Streaming an sich gerade erst in den Startlöchern, doch es würde mich nicht verwundern, wenn immer mehr Nutzer umsteigen oder zumindest teilweise auf dessen Vorteile zugreifen würden. Meiner Meinung nach wird es in absehbarer Zeit kein kompletter Ersatz für alle Spieler werden. Allerdings bin ich der Überzeugung, dass Game Streaming zum Netflix der Spieler wird, da es aufgrund der vorhandenen Technologien, Infrastruktur und Kunden über ein hohes Potential verfügt. Die Entwicklung der Streaming Ttechnologien steht allerdings erst an ihrem Anfang. Eine Weiterverfolgung wird sich in jedem Fall lohnen.
Microservices architectures seem to be the new trend in the approach to application development. However, one should always keep in mind that microservices architectures are always closely associated with a specific environment: Companies want to develop faster and faster, but resources are also becoming more limited, so they now want to use multiple languages and approaches. Therefore, it is difficult to evaluate a microservice architecture without reference to the environmental conditions. However, if you separate these external factors, you can still see that a microservice architecture offers much more isolation than other architectures – and can also handle security measures better than, for example, monolithic approaches due to today’s techniques and tools.
by Benedikt Gack (bg037), Janina Mattes (jm105), Yannick Möller (ym018)
This blog post is about getting started with your first large-scale software project using DevOps. We will discuss how to design your application, what DevOps is, why you need it, and a good approach for a collaborative workflow using git.
This post consists of two parts. Part one is this article which is all about finding out which topics you should consider thinking about when starting with DevOps for your own project as well as a basic introduction to these topics and recommendations for further reading. The second part is our “getting started” repository which contains a small microservice example project with additional readme-files in order to explain some of the topics above in practice and for you to try our workflow on your own.
This post is a summary of knowledge and experience we collected while accompanying and supporting a larger scale Online Platform project called “Schule 4.0” over the period of 6 months in DevOps related topics.
Before we can talk about Software Architecture, we have to talk about the life cycle of a typical software project, because a successful software project involves a lot of planning, consists of many steps and architecture is just one of them. Furthermore, actual coding is a late step in the process and can be quite time consuming when not planned properly. Without further ado, let’s have a look at a typical Software Development Life Cycle:
The first two steps, “Ideation” and “Requirements”, basically mean thinking about your project from a business and user perspective. Technical details are not important and might even be inappropriate, as you should involve all members of your small team in the process and not all of them have the technical knowledge. Also, don’t forget to write everything down!
In the Ideation phase, after you brainstormed ideas, we recommend creating a Use Case Diagram which shows how the users can interact with your software. You can learn more about that here.
Furthermore, we also recommend creating a requirements catalog for the requirements phase. You can download our template with this link. When you’re done thinking about your project and want to start with the technical stuff, but can’t describe your platform requirements in detail, why don’t you go with a more agile project life cycle? Just make sure you thought about your goal and how to achieve it beforehand.
Planning done? Then let’s start with designing your application, especially Software Architecture.
Difference between design patterns and architectural patterns
If you are familiar with the term Design Patterns like the Singleton and Iterator pattern in software development, don’t confuse them with the term architectural patterns. While design patterns are dealing with how to build the important parts of your software, the defining components, architectural patterns are all about how these parts are organized and playing together.
“[software] Architecture is about the important stuff. Whatever that is.”
So, what you’re going to think about in the design process heavily depends on what you define as important components and which architectural style you choose. Just remember that developing is mostly filling in the blanks and applying some design patterns for whatever you came up with in the design phase or, to put it short, implementing your software architecture.
Which architectural pattern to choose?
3 architectural styles – There is always a trade-off
There are many architectural patterns to begin with. In order to understand what these patterns are all about, here is an introduction to the three major architectural styles. These styles describe the overall idea behind different groups of patterns. Note that it is possible to mix these styles together for your own use case.
N-tier applications are typically divided in X different logical layers and N physical tiers. Each layer has a unique responsibility and can only communicate with its layer below, but not the other way around.
A typical architecture for web applications is the Three-tier-architecture which is described in detail below, but you could also go with a monolithic approach and separate your one tier application into different logical layers, like most game engines and the Windows NT platform architecture do (user and kernel mode).
This style is the developer friendliest and easy to understand if you feel comfortable with monolithic software development. The challenge is to end up with meaningful logical and physical layers and to deploy small changes or features while the platform is already running, which could lead to rebuilding and deploying a large part of the application. If the application is split up into multiple layers, deployment is made easier, but there is a high risk that some layers are unnecessary and are just passing requests to the next layer, decreasing overall performance and increasing complexity.
In thisarchitectural style, the application is split into multiple services, which communicate through a network protocol over a network. Each service is a black box with a well-defined interface, serves only one purpose, most likely a business activity, and can be easily designed for its purpose, for example by using different technologies like programming languages. These services are working by either chaining – service 1 calls service 2 which calls service 3 – or by one service acting as a coordinator. A typical modern implementation is the Microservice architecture, which will be covered in detail later.
Key benefits are:
maintainability: changes will only affect one specific service, not the whole application
scalability: each service can be load balanced independently
reusability: services are available system wide and in “Schule 4.0”, we often ended up reusing an already existing service as a template
The service based style is by far the most difficult architectural style. A big challenge is specifying your services to keep them as decoupled as possible from others, which is strongly dependent on the design of the service APIs. In addition it is much more complicated dealing with not only one but many applications.
This style consists of two decoupled participants, producer and consumer. The producer creates an event and puts it in a queue, typically on behalf of an incoming request from a client, and one of the available consumers consumes the event and processes it. Pure event driven architectures are mostly used for IoT scenarios, but there are variations for the web context like the Web-Queue-Worker pattern.
Their clear benefit is high and simple scalability and short response times. A challenge is how to deal with long running tasks which demand a response, and when to favor direct shortcuts instead of a queue approach.
Architectural patterns in detail
The Three Tier Architecture
This before mentioned architectural pattern is of the N-tier architecture style and consists of three tiers and three layers. This architecture is very common and straightforward when developing simple web applications. The example below is for a dynamic web application.
The topmost layer is the presentation layer, which contains the user interface displaying important information and communicates with the application layer. Whether or not you need a web application or just static web pages, the layer solely runs on the client tier (desktop pc or mobile device) or requires a web server to serve and render the static content for the client. The application tier contains your functional business logic, for example the REST API for receiving and adding user specific dynamic content like shopping cart items, and the last layer, the data tier, stores all data which needs to be persistent.
In the example graph above, the react web application is served by a small web server and runs on the client devices. Next, the NodeJS application exposes a REST API to the client and talks to the mongo document database. The node app and the database are typically deployed into different tiers, most commonly a virtual machine and a dedicated database server or service.
A microservice architecture is by far the hardest choice. This style is most often used by massive tech enterprises with many developer teams in order to run their large-scale online platform and allowing for rapid innovation. There are many big platforms like Amazon.com, Netflix and eBay that evolved from a monolithic architecture to a microservice architecture for many good reasons, but most likely none of them really apply to your project for now. To be honest, none of the reasons applied to our project “Schule 4.0” either, which only consisted of five developers, but in the never-ending river of new technologies we were able to find technologies that made things much easier, convinced us to at least give it a try and made things turn out great in the end. Those technologies will be discussed later.
The core idea is to have multiple loosely coupled services which expose an API. The biggest difference to normal service oriented architectures is that in a microservice architecture, each microservice is responsible for its own data and there is nothing like a centralized data tier. That means that only the specific service has access to its data and other services cannot access it directly other than via the service API.
Highly maintainable, rapid innovation and development
Developers can work independently on a service
Services can be deployed independently
Services are typical small applications and therefore easy to understand
Testability: small services are easy to test
Availability: Errors in one service won’t affect other services
Dynamic technology stack
You can pick whatever technology you want whenever you want
Services are products and not projects: Each developer is responsible for their code over the whole life cycle
Highly distributed system
Requires to deal with continuous integration and deployment (which will be covered below)
Nightmare without containerization or virtualization
Domain Driven Design
Many challenges can be avoided by carefully designing your microservice application. A helpful method for designing your application is the Domain Driven Design approach and its notion of bounded contexts. The idea is that the most important thing is your core business, what your application is about. Our goal is to model the core business as different contexts and designing the architecture after that. In order to do that, here are some tips:
After finishing with your requirements analysis you can start defining the domain of your application. Imagine your project as a company. What is your company about? What are its products? We will call that your domain and the products your contexts.
Let us continue with a simplified example of the “Schule 4.0” model. The platform domain is all about teachers sharing content with their students in the form of pins, boards and exercises. With this knowledge we can already distinguish three contexts:
These contexts consist of different objects which have relationships. For example, a Board can contain multiple Pins and a User can own multiple Boards. Every context can be implemented as its own microservice. The crucial point later is how the services are connected to each other. Remember, all the data is physically separated into different services but needs to be logically connected in order to query data. That is where the bounded contexts appear.
Note in the graph above, that the User object can be found in multiple contexts. The object itself is most likely modelled differently in each context, but the overall idea of a user stays the same. The different User objects are explicitly linked together over the UserId attribute. Because the concept of a User is shared across multiple Contexts, we can query all data related to the current user.
Communication in a Microservice architecture – GraphQL Federation
After we have modelled the domain of our application, everything is connected logically, but we still cannot request and change data, because we did not define the service APIs yet. We considered two options:
Specifying a RESTful API for each service using HTTP
Specifying a Graph API for each service using GraphQL
Do a mixture of both
Note: We do not recommend doing a mixture of both. Historically speaking, there are many solutions migrating from REST to Graph, but starting from scratch there is no need for that.
Regardless of which API flavor you choose, the data is still spread over multiple services. For example, if you want to display the dashboard for the current user, pin and board data, exercises and user data need to be collected from different services. That means the code displaying the dashboard needs to interact with each service. It is not a good idea to leave that task to the client application, due to various performance and security reasons. In general, you should avoid making the internals of your backend publicly available.
That is why all microservice architectures are equipped with an API Gateway, which is the only entry point to your backend. Depending on the use case, it aggregates data from different services or routes the request to the appropriate service.
The services expose simple http endpoints to the gateway for manipulating and retrieving data. The gateway itself only exposes an endpoint for retrieving the dashboard for the current user. The UserId will be included in the requests by the client. In order to create and to get specific boards, the gateway needs to be expanded with further endpoints, which might be routed directly to the underlying service.
In conclusion, there is nothing wrong with this approach, especially considering that there are great tools like Swagger which help building large REST APIs, but you still have to design, manage and implement the service APIs and the Gateway API separately.
The second option is different. The idea is to design a data graph which can be queried and manipulated with a language called GraphQL. Imagine it like SQL, but instead of talking to a database you send the statement to your backend.
The following pseudo example query me (name) boards (id, title) returns the name of the current user and its boards as a JSON object.
With GraphQL, the client can precisely ask only for the data it needs, which can reduce network traffic a lot. The best part is that you already possess a data graph if you have created a domain model beforehand. You just have to merge the different contexts and attributes of the same objects to one coherent graph.
It gets even better: By using Apollo GraphQL Federation, most of the implementation for your Graph API is done automatically. For this to work, you only have to define the data graph for each service, which are just the contexts from your domain model, and setup the GraphQL API Gateway. The implementation is straightforward:
Write down your service graph in the GraphQL Schema Definition Language (SDL)
Implement the Resolvers, which are functions for populating a single attribute/field in your graph
Note: instead of requesting every field from the database separately, you can request the whole document and Apollo GraphQL generates the resolvers automatically
Implement the Mutation Resolvers, which are functions for updating and creating data
Tell the GraphQL API Gateway where to find its services
The Gateway then automatically merges the service schemas to one coherent schema and automatically collects the requested data from the implementing services. It is even possible to reference objects in other services and the Gateway will combine the data from different services to a single object.
GraphQL Federation – Limitation
As good as Apollo GraphQL Federation sounds on paper, it is not an all-round solution. In reality, you will always have to climb many obstacles no matter which decisions you make.
One technical limitation you might come across is when you try to delete a user. To do so, you have to decide which service defines and implements the deleteUser mutation. It is not possible (yet?) to define the same mutation in multiple services.
Because deleting a user also involves deleting its referenced pins and boards, the PinBoards service needs to be notified through an additional API, which is only accessible internally and not exposed to the client API.
Furthermore this additional API makes testing your service more complicated. Testing will be covered in detail in our example repository.
In a successful project, the simple cooperation is one, if not the central point for success. What is more annoying than merge-conflicts or accidentally pushed changes that cause the program to crash… But we can make arrangements to not have problems like this during our project – let’s start:
For the local development you all need a tool to make your git commits, push them, pull the others and so on….
For Windows systems and commandline-lovers we recommend the git bash from https://gitforwindows.org/. If you better like to click and see what you do you can use https://www.sourcetreeapp.com or the implementations from e.g. IntelliJ or VSCode to see what you do and organize your collaboration via git.
In our git repository, we first go to the project settings and activate Merge Requests and Pipelines. Afterwards the navigation sidebar has two new entries.
The following step-by-step instruction will give you a good workflow to work together with the configured git-Repository.
Locally checkout the master (or Develop-branch) via git checkout master or in Sourcetree.
Create a new branch and name it sensefully. A good way to hold your repository clean is to sort it in directories by naming it like your abbreviation and the feature or fitting bug-ticket if you, for example, use the ticket-system in gitlab.
Let’s start coding, fixing bugs, developing new features, testing your code without crashing the master and only locally on your machine.
If your local changes are working as wanted, commit your changes with a namingful commit-message. Beware of committing wrongly changed files. If you do not want to use Sourcetree you can do it with the command git commit -a -m “commit-message” .
Then push your changes on your branch with Sourcetree or with git push origin <feature-branch> .
Go to the web interface and create a new merge-request. You can add not only a commit-message but more text, screenshots and files to visualize your changes to your team members.
Inform your team members so they can look at your changes, comment them, leave suggestions and finally approve it if all is fine.
Merge your feature-branch to the master or develop-branch if other team members gave their approvals.
If all tests are green and the build is ok, you can delete your extra branch to not leave so much data-waste.
To work successful with this model, think on some points:
branch new branches always from master
do not commit or push manually to the master
before a merge request merge the newest master to your feature-branch
not to much branches at the same time
small feature branches, no monsters…
name your branches senseful
CI & CD Pipeline
What is DevOps?
The term DevOps is derived from the idea of agile software development and aims to remove silos to encourage collaboration between development and operations. From this principle the term DevOps = development + operation is derived (Wikipedia, 2020).
To achieve more collaboration, DevOps promotes a mentality of shared responsibility between team members. Such a team shares responsibility for maintaining a system throughout its life cycle. At the same time, each developer takes responsibility for his own code, i.e. from the early development phase to deployment and maintenance. The overall goal is to shorten the time between the development of new code until it goes live. To achieve this goal, all steps that were previously performed manually, such as software tests, are now fully automated through the integration of a CI/CD pipeline.
The full automation allows a reduction of error-prone manual tasks like testing, configuration, and deployment. This brings certain advantages. On one hand, the SDLC (Software Development Life Cycle) is more efficient, and more automation frees team resources. On the other hand, automated scripts and tests serve as a useful, always up-to-date documentation of the system itself. This supports the idea of a pipeline as code (Fowler, 2017).
The structure of the CI/CD pipeline is defined within a YML file in a project’s root and formulates so-called actions, action blocks, or jobs. Pipeline jobs are structured as a block of shell commands which allow, for example, to automatically download all necessary dependencies for a job and automatically execute scripts. A pipeline contains at least a build, test, and deployment job. All jobs are fully or partly automated (GitLab CI/CD pipelines, 2020). A partial automation of a job includes manual activation steps. The benefits of a pipeline integration are, among others, an accelerated code cycle time, reduced human error, and a fast and automated feedback loop to the developer himself/herself. In addition, costly problems when integrating new code into the current code base are reduced.
A pipeline is the high-level construct of continuous integration, delivery, and deployment. The jobs executed in a pipeline, from a code commit until its deployment in production, can be divided into three different phases, depending on the methodology to which they can be assigned. These are the three continuous methodologies:
Continuous Integration (CI)
Continuous Delivery (CD)
Continuous Deployment (CD)
While Continuous Integration clearly stands for itself, Continuous Delivery or Continuous Deployment are related terms, sometimes even used synonymously. However, there is a difference, which is also shown in Figure 4 and will be further explained in the following.
What is Continuous Integration (CI)?
The term Continuous Integration can be traced back to Kent Beck’s definition of the Extreme Programming (XP) process, which in turn is derived from the mindset of agile software development, which not only allows short cycle times but also a fast feedback loop (Fowler, 2017). Within such a software practice, each member of a team merges his/her work at least daily into the main branch of a source code repository. All integrations are automatically verified by an automated build which also includes testing and code quality checks, e.g. linting, to detect and fix integration errors early. This approach allows a team to develop software faster and reduces the time spent manually searching for and identifying errors.
Further, each team member bears the same responsibility for maintaining and fixing bugs in the software infrastructure and for maintaining additional tools such as the integrated CI/CD pipelines. Furthermore, a pipeline is not static but has to be monitored, maintained, and optimized iteratively with a growing codebase. To achieve high-quality software products, everyone must work well together in a team culture free of the fear of failure (Fowler, 2017).
Some important CI Principles
Continuous Integration is accomplished by adhering to the following principles:
Regular integration to the mainline: New code needs to be integrated at least once per day.
Maintaining a single-source repository: Keep a stable, consistent base within the mainline.
Shared responsibility: Each team member bears the same responsibility to maintain the pipeline and project over its complete lifecycle.
Fix software in < 10min: Bugs have to be fixed as fast as possible by the responsible developer.
Automate the build: Test and validate each build.
Automate testing: Write automatically executable scripts and keep the testing pyramid in mind.
Build quickly: Keep the time to run a pipeline as minimal as possible.
Test in a clone: Always test in the intended environment to avoid false results.
What belongs in a Source Code Repository?
As the source code repository builds the base for the pipeline, it is important to keep it as complete as possible in order to be able to fulfill CI jobs. As such a source code repository for a CI/CD Pipeline should always include:
Why Containerize pipeline jobs?
Some of you may have come across this problem before – “Defect in production? Works on my machine”. Such costly issues with different environments on different machines (e.g. CI/CD server and production server) can be prevented, by ensuring that builds and tests in the CI pipeline as well as the CD pipeline are always executed in a clone of the same environment. Hereby it is recommendable to use docker containers or virtualization. For virtualization, the use of a virtual machine can be enforced by running a Vagrant file that ensures the same VM setup throughout different machines. In short, automating containerized jobs standardizes the execution and ensures that no errors slip through due to different environments in which build, and tests are executed.
Further, by combining the CI/CD Pipeline with docker, an integration between CI and CD is enabled. Advantages are that the same database software, same versions, same version of operating system, all libraries necessary, same IP addresses, and ports, as well as the same hardware setting, is provided and enforced throughout build, test as well as deployment.
How to identify CI Jobs?
To identify pipeline jobs, you should, together with the team, consider which manual steps are currently performed frequently and repetitively and are therefore good candidates for automation. Such repetitive tasks can be for example testing, building, and deployment, as well as the installation of shared dependencies, or even clean-up tasks to free memory space after a build was executed. The number of jobs can be expanded as desired but must remain self-contained. This would mean, for example, a unit test job contains only the necessary dependencies, shell commands, and unit test scripts that are needed to fulfill the unit test job.
Further, when defining the structure of a pipeline, it is important to not integrate any logicinto the pipeline. This means that all logic must be outsourced into scripts which then can be executed automatically by the pipeline. Furthermore, pipelines need to be maintained and updated over the life cycle of a project to always keep it up-to-date and prevent errors. This means that pipelines also travel the whole software development life cycle (SDLC).
How to integrate Testing?
As already mentioned in the section above (see chapter 1 – Microservices) – When writing test scripts, it is important to look at the entire test pyramid, from automated unit tests to end-to-end tests. These test scripts then can be integrated into a CI pipeline’s test jobs.
Overall, testing is very important to avoid later costs due to time and cost-effective bugs and downtimes in production. When testing architectures, such as microservices, care must be taken to test the individual services not only independently of each other, but also in their entire composition. The challenge in this case is that there are also dependencies between the individual services.
Furthermore, an integration with a real database must also be tested. Since databases are not directly mapped into source code, it makes sense to formulate them in scripts against which tests can be executed. Overall it makes sense to not only mock a database but also test with real database integration.
What is Continuous Delivery (CD)?
The term Continuous Delivery is based on the methodology of Continuous Integration and describes the ability to deliver source code changes, such as new features, configuration changes, bug fixes, etc., in a continuous, secure, and fast manner to a production-like environment or staging repository. This fast delivery approach can only be achieved by ensuring that the mainline code is always kept ready for production and as such can be delivered to an end-customer at any time. From this point on an application can be quickly and easily deployed into production. With this methodology, traditional code integration, testing, and hardening phases can be eliminated (Fowler, 2019).
The Five CD Principles
The five principles are also valid for the other methodologies.
Work in small batches
Computers perform repetitive tasks, people solve problems
Relentlessly pursue continuous improvement
Everyone is responsible
What is Continuous Deployment (CD)?
The term Continuous Deployment implies a previous step of Continuous Delivery where production-ready builds are automatically handed over to a code repository. In conclusion, Continuous Deployment also builds on the methodology of Continuous Integration.
Continuous Deployment implies that each commit should be deployed and released to production immediately. To this purpose, the Continuous Deployment block in the pipeline includes the automation of code deployment via scripts to enable continuous and secure deployment. This follows the fail-fast pattern, which makes it possible to detect errors that have slipped into production through the automated test blocks which lead to an unhealthy server cluster. By implementing the fail-fast pattern, an error is easily correlated with the latest integration and can be quickly reversed. This pattern keeps the production environment downtimes to a minimum while code with new features goes live into production as quickly as possible. In practice, this means that successful code changes, from a commit to the branch to going live will only take a few minutes (Fitz, 2008).
Automated rollbacks and incremental deployments
Even if automated end-to-end tests against a build in a CI pipeline might have passed successful, unforeseen bugs might occur in production. In such case an important capability of pipelines is the possibility of automated rollbacks. In case of an unhealthy state of deployed code in production, a fast rollback allows to return to the last, healthy working state. This includes automatically reverting a build. Another option is incremental deployments, which are deploying new software to one node at a time, gradually replacing the application to gain more control and minimise risk (GitLab, 2020).
The relationship between Continuous Integration, Delivery, and Deployment
The following graph gives a quick overview of the workflow from the completion of a new feature in a working branch until its reintegration into the mainline. The following figure 11 gives a high-level overview of the workflow depicted in a pipeline.
When a new code feature is committed to a remote feature branch, the CI will carry out an automated build. Within this build process, the source code of the feature branch will automatically be checked for code linting and compiled. The compilation is linked to an executable and all automated tests are run by the pipeline. If all builds and tests run without errors the overall build is successful.
Bugs or errors encountered throughout the CI pipeline will make the build fail and halt the entire pipeline. It is then the responsibility of the developer to fix all occurring bugs and repeat the process as fast as possible to be able to commit a new feature and merge it into the mainline to trigger the deployment process. Figure 12 shows such a CI/CD workflow between Continuous Integration, Delivery, and Deployment (Wilsenach, 2015).
Why did we pick the GitLab CI/CD Pipeline?
In total there are many products, from Jenkins, GitHub actions, DroneCi, CircleCi, to AWS Pipeline, etc. on the market. They all offer great services for DevOps integration. After research some free tools, we decided to stay with GitLab for the university project “Schule 4.0”, as Gitlab itself is a single DevOps tool which already covers all steps from project management, source code management to CI/CD, security, and application monitoring.
This choice was also made to minimize the tech stack. Since the university already offers a GitLab account on its own GitLab instance for source code management and issue tracking, we felt it made most sense to minimize the use of too many different tools. Therefore, sticking with one product to integrate a custom CI/CD Pipeline into our project meant having everything in one place, which also simplifies the complexity of the toolchain and furthermore allows us to speed up the cycle time.
What is a GitLab Runner?
To be able to execute and run a CI/CD pipeline job, a runner needs to be assigned to a project. A GitLab runner can be either specific to a certain project (Specific Runner) or serve any project in the GitLab CI (Shared Runner). Shared runners are good to use if multiple jobs must be run with similar requirements. But in our case, as HdM has an own GitLab instance (at https://gitlab.mi.hdm-stuttgart.de/), a Specific Runner installed on our own server instance was the way to go.
The GitLab Runner itself is a Go binary that can run on FreeBSD, GNU/Linux, macOS, and Windows. Architectures such as x86, AMD64, ARM64, ARM, and s390x are supported. Further, to keep the build and test jobs in a simple and reproducible environment it is also recommendable to use a GitLab runner with a Docker executor to run jobs on your own images. It also comes with the benefit of being able to test commands on the shell.
Before installing a runner, it is important to keep in mind that runners are isolated machines and therefore shouldn’t be installed on the same server where GitLab is installed. Further, in case of the need to scale out horizontally, it can also make sense to split jobs and hand them over to multiple runners, installed on different server instances which then are able to execute jobs in parallel. If the jobs are relatively small, an installation on a Raspberry Pi is also a possible solution. This comes with the benefit of more control, higher flexibility, and very important – fewer costs.
For the project “Schule 4.0” we chose the architecture in figure 13 with two independent runners on two different machines. For this purpose, two runners were installed on an AWS EC2 Ubuntu 20.04 and in a VM on the HdM server instance to run the jobs for CI and CD separately.
Challenges and Limitations
The reason for the decision to split CI and CD jobs up was less due to horizontal scaling but the fact that the HdM server used for the deployment is located in a virtual, private network. Due to this isolation of the network, it is not possible to deploy directly from outside the network. Since there were no permissions on the HdM server for customizations, the resulting challenges could be circumvented with a second runner, installed in a VM on the target machine.
Further, it is important to isolate the runner, as when installed directly on a server, which also serves as a production environment, unwanted problems can occur e.g. due to ports that are already in use or duplicate docker image tag-names between the actual build jobs and already deployed builds which can cause failure and halt the pipeline.
To reduce costs, which in our case amounted to 250 US$ (luckily virtual play money) on our educational AWS accounts, in only 2 months, the CI runner was also temporarily installed on a RaspberryPi 3 B+ with a 32 GB SSD card. Hereby it is important to state, that often slow home network speed and large jobs have an impact on RaspberryPi’s overall performance and can make build and testing jobs slow. This is okay for testing the pipeline but takes a too large amount of time when developing in a team. Therefore, to speed things up, the runner for the CI jobs was later again installed on a free tier AWS EC2 Ubuntu instance.
How to install a GitLab Runner?
To install a GitLab runner two main steps need to be followed:
A GitLab Runner can be installed on any suitable server instance e.g. on an IBM or Amazon EC2 Ubuntu instance by using the AWS Free Tier test account or an educational account. The following provides further information on how to getting started with AWS EC2.
Another possibility is to install a runner on your own RaspberryPi. The RaspberryPi 3B+ we used has a 32GB SSD card and uses the Raspbian Buster image which can be downloaded from the official distribution. Be aware that it might cause issues following the standard runner installation on GitLab as the RaspberryPi 3B+ uses a Linux ARMV7 architecture. A good tutorial to follow can be found on the blog of Sebastian Martens. After its installation, the Runner will work even after completely rebooting your Raspberry Pi.
Runner intallations on an AWS EC2 instance
Better network speed than in a private home network.
Faster build times.
Easy and quick setup of an instance.
High costs for services, even with a free tier selection.
Limitations in server configurations were due to free or educational accounts, e.g. memory < 10GB, limitations on server location that can lead to latency and time outs.
The server can be over configured and become unhealthy – all configurations are lost if no backup/clone of the instance has been made.
AWS offers quite complex and ambiguous documentation
In contrast to using a paid service, a more cost-effective solution may be to install the GitLab Runner inside a virtual machine on the HdM server, or on an own RaspberryPi or any other services.
2. Runner installation on a RaspberryPi
Full flexibility over available software and software versions
Costs are lower compared to leased servers including root access
Access data to the target infrastructure is available in the local network
Takes longer to execute the pipeline
Local home-network speed can slow job execution down due to images, dependencies, git repository, etc. that need to be downloaded
Not well suited for the execution of very large jobs
It is recommendable to take a RaspberryPi 3 or even newer
How does the GitLab Runner work?
To better understand how a GitLab Runner picks up jobs from the CI pipeline, assigns and returns the build and test results to a coordinator, the following sequence diagram in figure 13 will be explained in more detail.
A Specific Runner, as used in our project, executes all jobs in the manner of a FIFO (First-in-first-out) queue. When the GitLab Runner starts, it tries to find the corresponding coordinator (the project’s GitLab server) by contacting the GitLab URL that was provided when the runner was registered. When the Runner registers with the registration token also provided at registration, it will receive a special token to connect to GitLab. After a restart, the GitLab Runner connects and waits for a request from the GitLab CI.
A Runner registered to a source code repository listens for change events in a particular branch, which causes the runner to fetch and pull the GitLab repository. The runner then fetches and executes the CI/CD jobs defined in the .gitlab-ci.yml file for the Gitlab server. The build and test results, as well as logging information, are returned to the GitLab server, which then displays those for monitoring purposes. If all jobs were executed successfully each job in the pipeline receives a green symbol and the push or merge onto a branch can be completed.
How to build a GitLab CI/CD Pipeline?
A GitLab CI/CD Pipeline is configured by a YAML file which is named .gitlab-ci.yml and lies within each project’s root directory. The .gitlab-ci.yml file defines the structure and order of the pipeline jobs which are then executed sequentially or in parallel by the GitLab Runner.
Introduction to a Pipeline Structure
Each pipeline configuration begins with jobs that could be seen as a bundled block of command-line instructions. Pipelines contain jobs that determine what is needed to do and stages, which define when the jobs should be executed.
stages: # ------- CI ------ - build - quality - test # ------- CD ------ - staging - production
The stages use the stage-tags to define the order in which the individual pipeline blocks/jobs are executed. Blocks with the same stage-tag are executed in parallel. Usually there are at least the following stage-tags:
build – code is executed and built.
test – code testing, as well as quality checking by means of linting, etc.
deploy – deploy the code to production.
The Pipeline Architecture
The architecture shown in figure 20 is not very efficient, but easiest to maintain. As such it is shown, in combination with the following .gitlab-ci.yml file example, as a basic example, to understand the construction of a pipeline’s architecture. By defining the relationships between jobs a pipeline can be speeded up.
The basic structure of the individual jobs/blocks within a pipeline includes the block name, and subordinate stage with stage tag, script tag with the executable scripts/commands. Each job must also contain a tag for the selected runner (here: gitlab-runner-shell). This structure can easily be extended.
Jobs can be initiated in different ways. This depends on the tags which are assigned to the individual jobs. For example, it can make sense to not fully automate a job. In this case, a job e.g. deploy job, gets a tag for a manual joband thus forces the job to wait for manual approval and release by a developer.
Efficiency and speed are very important when running the jobs through a CI/CD Pipeline. Therefore it is important to think not only about the architecture but also consider the following concepts to speed things up.
Host an own GitLab Runner: Often the bottleneck is not necessarily the hardware, but the network. Whilst GitLab’s Shared Runners are quick to use, the network of a private cloud server is faster.
Pre-install dependencies: Downloading all needed dependencies for each CI job is elaborative and takes a lot of time. It makes sense to pre-install all dependencies on an own docker image and push it to a container registry to fetch it from there when needed. Another possibility is to cache dependencies locally.
Use slim docker images: Rather use a tiny Linux distribution for images to execute a CI job than a fully blown up one with dependencies that you might not even need. In our project we therefore used an Alpine Linux distribution.
Cache dynamic dependencies: If dependencies need to be dynamically installed during a job and thus can’t be pre-installed in an own docker image, it makes sense to cache those. Hereby, GitLab’s cache possibility allows to cache dependencies between job runs.
Only run a job if relevant changes happened: This is very useful, especially for a project that makes use of a microservice architecture. For example, if the front-end changed, the build and test jobs for all the others don’t need to be run as well. Such behavior can be achieved by using the only keyword in a job. The following gives a short example.
More detailed information can be found in the Example Repository’s README files.
Useful tools to be considered for building a GitLab CI/CD pipeline
Avoid syntax errors by the use of CI Linting. In order to avoid syntax errors and to get things right from the start, GitLab offers a web-based linting tool, which checks for invalid-syntax in the gitlab-ci.yml file. In order to use the web-based linting tool, simply add the extension -/ci/lint to the end of your project’s URL in GitLab.
As DevOps and CI/CD is a quite popular and complex topic we also want to introduce some further possibilities to optimize your DevOps, some are the following:
CI with Linting & Testing – Drone CI
Deployment with Jenkins
Linting with Sonarqube
Monitoring and Logging with GitLab CI/CD
Part 2 – “getting started” repository
When you’re done getting comfortable with the topics, it is time to see an example implementation of the theory above. Head over to our “getting started” repository containing a microservice application and further explanations.
DevOps is a big buzzword for many complex tools and things that can make your project easier to work with. It´s a long way from a simple project idea until you have a working infrastructure, but it will help you to work easier and more efficient on your project in the future. For all the described tools and use-cases there are so many alternatives you can use.
On a good software-project all the sections above are important. How to design your software, how to work together, how to make the development-workflow more easy to use and bring your project from your local machine to a reachable server.
With this blogpost and the additional repository content you have an overview about the possibilities with a bunch of instructions on how you can start with DevOps in your project.
This is part two of our series on how we designed and implemented a scalable, highly-available and fault-tolerant microservice-based Image Editor. This part depicts how we went from a basic Docker Compose setup to running our application on our own »bare-metal« Kubernetes cluster.
This is part one of our series on how we designed and implemented a scalable, highly-available and fault-tolerant microservice-based Image Editor. The series covers the various design choices we made and the difficulties we faced during design and development of our web application. It shows how we set up the scaling infrastructure with Kubernetes and what we learned about designing a distributed system and developing a production-grade Kubernetes cluster running on multiple nodes.
The last two years in software development and operations have been characterized by the emerging idea of “observability”. The need for a novel concept guiding the efforts to control our systems arose from the accelerating paradigm changes driven by the need to scale and cloud native technologies. In contrast, the monitoring landscape stagnated and failed to meet the new challenges our massively more complex applications pose. Therefore, observability evolved as a mission-critical property of modern systems and still attracts much attention. The numerous debates differentiated monitoring from observability and covered its technical and cultural impact on building and operating systems. At the beginning of 2019, the community reached consensus on the characteristic of observability and elaborated its core principles. Consequently, new tools and SaaS applications appeared marking the beginning of its commercialization. This post identifies the forces driving the evolution of observability, points out trends we presently perceive and tries to predict future developments.
In our fourth part of Microservices – Legolizing Software Development we will focus on our Continuous Integration environment and how we made the the three major parts – Jenkins, Docker and Git – work seamlessly together.
Today we want to give you a better understanding of the security part of our application. Therefore, we will talk about topics like security certificates and enable you to gain a deeper insight into our auth service.
The microservice structure can generate a heavy communication between many services. Worst case scenario is a long tail of dependencies, resulting in a high latency of the response for the initial request. This can get even worse, e.g. if the services were running on different servers placed in various data centers. Even if some requests can run parallel, the response time for the initial requested service will take at least the answering time of the tail it depends on.