Ynstagram – Cloud Computing mit AWS & Serverless

Im Rahmen der Vorlesung “Software Development for Cloud Computing” haben wir uns hinsichtlich des dortigen Semesterprojektes zum Ziel gesetzt einen einfachen Instagram Klon zu entwerfen um uns die Grundkenntnisse des Cloud Computings anzueignen.

Grundkonzeption / Ziele des Projektes

Da wir bereits einige Erfahrung mit React aufgrund anderer studentischer Projekte sammeln konnten, wollten wir unseren Fokus weniger auf die Funktionalität und das Frontend der Applikation richten und ein größeres Augenmerk auf die Cloud spezifischen Funktionen und Vorgehensweisen legen. 

Konkret planten wir die Umsetzung eines Instagram Klons mit den grundlegenden Funktionalitäten:

  • Bilder bzw. Beiträge hochladen
  • Titel & Beschreibung von Beiträgen anlegen
  • Liken von Beiträgen 
  • Kommentieren von Beiträgen
  • Accountmanagement

Entwurfsentscheidung

Frontend

Aufgrund von bereits existierenden Vorkenntnissen und guter Erfahrungen entschieden wir uns für die Umsetzung des Frontends mit Hilfe des React Frameworks. Mit der Gestaltung als Web App ergibt sich zudem der Vorteil, dass “Ynstagram” Plattform übergreifend erreichbar ist.

 

Backend – von Firebase zu AWS

Zunächst starteten wir unser Projekt mit Firebase umzusetzen. Zum einen verlor dies jedoch seinen Reiz hinsichtlich des Lerneffekts, da wir parallel unser Softwareprojekt mit Firebase verwirklichten. Gleichzeitig wurde uns durch die Einblicke in den Vorlesungen ein Bewusstsein für den Funktionsumfang von AWS geschaffen.

Interessant war für uns hierbei beispielsweise die wesentlich umfangreicheren Einsatzmöglichkeiten von Lambda Funktionen. Während diese in Firebase nur durch Einträge / Trigger geschehen kann, konnten wir hier auf API Aufrufe zurückgreifen. Auch die umsetzbaren Funktionalitäten gestalteten sich als wesentlich umfangreicher. So bot sich uns unter anderem die Möglichkeit Bilder beim Upload automatisiert zu skalieren und perspektivisch ließe sich auch recht einfach eine Analyse von Inhalten mit Hilfe von Künstlicher Intelligenz umsetzen. Während man bei Firebase in all dem schnell an gewisse Grenzen kommt, gibt es bei AWS einen viel breiteren Horizont an Möglichkeiten. 

Dennoch gestaltete sich dieser Umstieg keineswegs als einfach, denn Firebase bietet eine wesentlich bessere Übersichtlichkeit und Dokumentation.

Serverless

Da uns die AWS Web Oberfläche und die Online Erstellung von Lambda Funktionen keineswegs ansprachen suchten wir nach Lösungen um alle Konfigurationen wenn möglich auch auf Github hinterlegt zu haben und im Code Editor anlegen zu können. 

Dabei sind wir letztlich auf Serverless gestoßen. Hier werden alle Buckets, Tabellen und API Aufrufe über ein serverless.yaml File verwaltet. So lassen sich neue Elemente viel übersichtlicher / schneller anlegen und Konfigurationen können einfach von bereits erstellten Elementen übernommen werden. 

Postman

Um einen Überblick über die erstellten API Routen zu behalten und um diese einfach testen zu können, haben wir uns für Postman entschieden. Über ein in Github geteiltes File können so alle am Projekt beteiligten die aktuellen API Routen sehen und neue Aufrufe anlegen.

Umsetzung / Architektur

AWS Services

DynamoDB

Da wir bereits in Firebase auf die dortige NoSQL Datenbank “Firestore Database” zurückgegriffen hatten, entschieden wir uns hier für eine Beibehaltung dieser Datenbankstruktur. Der Vorteil liegt dabei gegenüber SQL Datenbanken in einfacheren Abfragen bedingt durch eine flachere Datenstruktur. 

Wir verwenden DynamoDB Tabellen um die zu den Bildern gehörenden Informationen wie z.B. Titel, Beschreibung, Autor etc. zu speichern. Die Verknüpfung der Bilder mit den Datensätzen in den Tabellen erfolgt dabei durch eine einzigartige ID.

Es gibt dabei 2 Tabellen, eine in der zunächst die Eingaben gespeichert werden, und eine weitere in die verarbeitete Datensätze übertragen werden. 

Beide Tabellen sind dabei strukturell gleich aufgebaut. Zentral sind hier eine eindeutige ID, Datum der Erstellung, Account Name des Erstellers, sowie Beschreibung und Titel des Beitrags. Kommentare und Likes werden über Arrays verwaltet. 

S3

In S3 Buckets sind die zu den Beiträgen hinterlegten Bilder gespeichert. Dabei gibt es einen Bucket mit den Originaldateien, sowie einen mit verringerter Auflösung. Der Name der Bilder entspricht dabei stets der den Beiträgen zugehörigen einzigartigen ID.

Cognito

Mit AWS Cognito konnten wir in wenigen Schritten unser Account Management einrichten. Cognito bietet dabei die Unterstützung aktueller Identitäts und Zugriffsmanagement Standards wie zB. Oauth 2.0 und SAML 2.0 und bietet gleichzeitig auch die Möglichkeit Multifaktor Authentifizierung zu implementieren. 

Amplify

Wir nutzen AWS Amplify um das Frontend unserer Applikation zu hosten und um hierfür eine CI/CD Pipeline mit Development und Master Umgebung zu realisieren. Eine genauere Erklärung hierzu findet sich im Abschnitt “CI/CD Pipeline”.

Lambda 

API Gateway

Ein großteil unsere Lambda Funktionen wird über API Aufrufe genutzt. Eine Übersicht dieser haben wir wie eingangs erwähnt in einem Postman File hinterlegt.

API-Routen

POST /image-upload

Hochladen eines Bildes in den S3 Bucket. Wird sowohl mit Bilddaten als auch mit dazugehörigen Informationen wie Beschreibung und Titel aufgerufen (JSON-Format). 

POST /image-info

Erstellt einen Eintrag in die DynamoDB Tabelle mit sämtlichen Informationen zu einem Beitrag, wird im Body als JSON übermittelt. 

POST /create-file

Erstellt eine Datei im S3 Bucket. Der Dateiname entspricht dem URL Parameter.

GET /get-all-images

Gibt alle als “valid” markierten Beiträge als Array im JSON-Format zurück.

GET /get-file

Gibt eine Datei anhand des Dateinamens zurück. 

GET /image-info

Gibt die Informationen zu einem einzelnen Beitrag im JSON-Format zurück.

PUT /update-image-info

Genutzt um Kommentare zu Beiträgen hinzuzufügen. Updated Einträge in der DynamoDB Tabelle.

PUT /update-likes

Verwendet um neue Likes hinzuzufügen.

DynamoDB / S3 Trigger

Neben direkten API Aufrufen verwenden wir auch Trigger auf DynamoDB Tabellen, sowie S3 Buckets. 

Exemplarischer Ablauf eines Image Uploads

Dies lässt sich am Besten am Ablauf einer Beitragserstellung darstellen.

Per Post-Request wird zunächst die Lambda Funktion “imageUpload” aufgerufen, welche das Bild in dem S3 Bucket hinterlegt. Dann wird über einen Trigger automatisch die Lambda Funktion “imageResize” aufgerufen, welche die Bilder auf eine Auflösung von 400 x 400 Pixeln skaliert. Diese Bilder werden dann im Bucket für skalierte Bilder gespeichert. So können die Bilder im Feed gerade bei Mobilen Geräten schneller geladen werden.

Parallel dazu wird in der DynamoDB Tabelle ein Eintrag angelegt. Auch hier wird ein Trigger aufgerufen der seinerseits die Funktion “changeText” aufruft. Diese ersetzt in Anlehnung an den Namen “Ynstagram” alle “i” in Beschreibung und Titel durch “y”. Hierbei handelt es sich lediglich um eine Spielerei die aus unserem Interesse entstand verschiedenste Trigger und Einsatzmöglichkeiten von Lambda Funktionen auszuprobieren.

CI/CD Pipeline

Interessant war es für uns zudem erstmals wirklich Erfahrung mit einer CI/CD Pipeline zu sammeln. Wir planten dabei die strikte Unterteilung in eine Entwicklungsumgebung und einer dieser prinzipiell gleichen finalen Umgebung. So dass der aktuelle Stand schon unter realistischen Bedingungen getestet werden kann bevor er letztlich veröffentlicht wird.

Diese CI/CD Pipeline haben wir mit AWS Amplify und Github Actions umgesetzt. Dabei wird zunächst stets auf einen Development Branch gepusht, welcher dann automatisch auf eine Entwicklungsumgebung auf Amplify hochgeladen wird. So können zunächst alle Tests durchgeführt werden, bevor dann mit einem Pull request die Änderung auf den Master Branch übertragen werden. Wenn dies geschehen ist, werden diese ebenfalls automatisch in die Produktionsumgebung übernommen bzw. deployed.

Hier wird neben den durch Github Actions durchgeführten Tests auch überprüft ob die Web Anwendung auch auf verschiedenen Geräten richtig skaliert und damit überprüft ob die UI für den Nutzer funktionsfähig angezeigt wird. Der aktuelle Stand wird dabei selbstverständlich nur übernommen, wenn alle Tests erfolgreich abgeschlossen werden.

Serverless

Um der Unübersichtlichkeit der AWS Weboberflächen aus dem Weg zu gehen und um Elemente leichter und reproduzierbar, über Git verwaltet anlegen zu können haben wir uns für Serverless entschieden. Hier werden alle AWS Komponenten in einem “serverless.yaml” File angelegt. 

Variablen

Es gibt dabei zum Beispiel auch die Möglichkeit unkompliziert Environment Variablen anzulegen:

Welche wir wiederum über eigene custom Variablen, welche an verschiedenen Stellen genutzt werden definiert haben:

Dies bringt den Vorteil mit sich, dass Namen flexibel verändert und direkt überall übernommen werden, sprich sowohl in AWS als auch im Code über die Environment Variablen.

Functions

Gleichermaßen einfach lassen sich auch die Lambda Functions anlegen. Diese werden jeweils über einen “handler” referenziert und werden dann durch ein “event” aufgerufen, was entweder API Aufrufe oder eben bspw. DynamoDB / S3 Trigger sein können.

Resources

Auch alle Buckets und Tabellen sind im .yaml File definiert. So können insbesondere neue Elemente sehr einfach angelegt werden, da man direkt auf zuvor definierte Konfigurationen zurückgreifen kann.

Testing

Beim Testen haben wir uns vor allem auf die API Aufrufe und die grundsätzlichen Funktionen fokussiert. Grundsätzlich werden unsere Tests über die im Abschnitt “CI/CD Pipeline” dargestellte Pipeline mit Github Actions ausgeführt. Diese sind auch Bestandteil des Amplify Deployment Prozesses. Zusätzlich dazu haben wir CircleCi implementiert um die Serverless Komponenten automatisch zu deployen. Für das Testing nutzen wir allgemein ein lokales Mock-Up unserer DynamoDB da wir hier schnell auf das Problem gestoßen sind, dass unsere freien AWS Kontigente aufgebraucht waren.

Ausblick / Fazit

Die größten Schwachstellen des Projektes liegen aktuell in nicht abgesicherten API Aufrufen, diese ließen sich über die Verwendung von API Keys schützen. Dabei sollten perspektivisch auch die Zugriffe auf DynamoDB sowie S3 über IAM Role verwaltet werden. 

Für die Accountverwaltung wäre es sinnvoll die Multifaktor Authentifizierung einzurichten. Der Funktionsumfang ließe sich selbstverständlich noch deutlich ausbauen, wobei besonders die Nutzung von KI Komponenten für uns interessant wäre.  

Insgesamt konnten wir, ausgehend von keinerlei Grundkenntnissen im Bereich Cloud Computing, uns mit Hilfe der Umsetzung des Projektes im Rahmen der Vorlesung einen Überblick und ein Grundverständnis für die Welt des Cloud Computings erarbeiten, welche eine solide Basis bieten um zukünftig die vorliegenden Ansätze noch wesentlich weiter zu vertiefen.

Deploying Random Chat Application on AWS EC2 with Kubernetes

1. Introduction

For the examination of the lecture “Software Development for Cloud Computing”, I want to build a simple Random Chat Application. The idea of this application is based on the famous chat application called Omegle. Omegle is where people can meet random people in the world and can have a one-on-one chat. With Omegle people can have a conversation with not only normal chat but also a video chat. Not like Omegle, my application has only a normal texting function.

2. Technologies for the development of application

a. Frontend

React

For Frontend Development there are a great number of open-source libraries. React is recently one of the most popular and widely used libraries. There are many reasons for a developer to choose and use React. It is one of the most popular front-end technologies in the market. Compared to other libraries out there React seems to be easier to learn. As it doesn’t take much time to learn this technology, the developers can rapidly practice and build their own very first project. React helps increase productivity by using reusable components and development tools. There are many development tools available for React that speech up the project. The most important reason is that it has very strong community support. There are thousands of free React tutorial videos and blog posts on the internet which is very helpful for the developer. Therefore, I decided to learn this library during previous semesters. This project gives me a chance to have real practice.

b. Backend

Node.js

Node.js has become one of the most popular JavaScript tools. Node.js is a JavaScript runtime environment, which allows companies to improve their efficiency of the web development process. The frontend and Backend teams can now work more easily together. Since Node.js is written in JavaScript and bases on Google V8 Engine, everything is done very quickly. Node.js can create an Event Loop, which can cover all asynchronous input-output operations.  And the best part is that it can increase the speed of any other framework as well. By allowing developers to write JavaScript code for both the Frontend and the Backend, Node.js makes it easy to send data between the server and the client, which makes it easier to synchronize data immediately.

Socket.io

When I decided to develop a chat application, I already thought about Socket.io. I had a chance to know this JavaScript library during the course Web Development 2. To build a real-time application we should use Socket.io. Socket.io will help parties in different locations connect with each other, transmitting data instantly through an intermediary server. Socket.io can be used in many applications such as chat, online games, updating the results of an ongoing match, … It is used a lot by the developer community, because of its speed and convenience. Socket.io provides us with many methods as well as outstanding features such as security, auto-connect, disconnection detection, multiplexing, room creation, …

3. Application explanation

a. Client

As I mentioned, for the client-side I use React. Therefore, I have a chance to know the concept of a Single Page Application, whose content is loaded only once and updated dynamically. For the interaction with the page or with subsequent pages, we don’t need another server, which means that the page is not reloaded. To apply this concept of the web application, React offer a packet names “react-router-dom”. My application is very simple, so it only has two paths to be loaded. The root path is where the user inputs his name, and it will load the Join component. The other path is the Chat component, in which the user sends messages after getting a room.

Socket.io library for the client is imported because it is not provided by JavaScript. This will expose the ‘io’ namespace.

Endpoint URL will be given for ‘io’ to connect with the socket.io server.

Now users can send and receive messages from the server after the room is created.

b. Server

To setup server, some packets need to be imported:

The server is set up and listen on port 5000

The server can also save users temporally, so it will know which user’s name already existed. And then it can remove users after they terminate their chat. To execute all those actions, I write a users.js file, which will have some functions such as, addUser, removeUser, getUser, …

I want to create a chat application where users don’t have to have an already known friend and a Chat room will be automatically created for them. With this application, they can meet a new friend and the server will get them a chat room.

I created a variable queue, which is an array. It will save a user, who has not had any partner yet, temporarily. Every user, who already inputted their names, will be connected to the socket. Socket knows that he wants to join a room. In the callback function of the socket, the name and socket ID of the user will be saved by function ‘addUser’, which is in users.js. Then socket will check if any other user is waiting for a partner in a queue. If someone is waiting for a partner, he will be popped from the queue. His socket and partner’s socket will be connected. And their room ID will be a combination of the 2 socket IDs. If no one is waiting for a room, the current socket is pushed to a queue and wait for another user to join.

c. Problem

CORS: 

It is an abbreviation for Cross-Origin Request Sharing, which means that all data should come from the same resource. They use it as a security measure because JavaScript can load content from other servers without the knowledge of the user. This problem can be solved when both websites are aware of the data exchange, then the process will be allowed.

I installed the CORS package on my server. Origin will configure the Access-Control-Allow-Origin CORS header. Now client and server can communicate without error.

4. Testing

Testing is very important during the development of the application. Testing helps developers to discover existing errors/bugs before releasing the application. Therefore, the quality of the application would be enhanced. I decided to test only the server-side because it is more complicated than the client-side. Two tests are being created.

  • A single user testing: tests if he can connect to a server and receive a welcome message from the server.
  • Two users testing: tests if a room can be created when there are two users and both of them can receive the same welcome message from the same room.

5. Deployment

a. Docker Swarm and Kubernetes

When deploying this project, I create a container for each of the client and server sites. Docker is the most popular solution for the container platform. I want to learn to write a DockerFile and a docker-compose to create a container.

For the cloud development environment, I choose Amazon Web Service. It is currently one of the most comprehensive platforms for cloud computing services. I use an EC2 virtual server to make my project online. I would like to work with Kubernetes to manage the containers. I choose EKS, which is a service from Amazon web service.

If you work with a lot of containers, you have to be able to manage them efficiently. An orchestration tool enables exactly that. With the orchestration tool, you can integrate containers that you created with Docker. Then you use orchestration to manage, scale, and move the containers.

Although Kubernetes and Docker can work well together, there is competition when it comes to Docker Swarm. I have considered some features of Docker Swarm and Kubernetes.

  • Scaling: The load on our application is too high, Kubernetes can add more nodes to our cluster. Of course, we have to configure Kubernetes correctly so that it can create a new virtual machine. Then a node is added to the cluster.
  • Installation: with Docker Swarm, it is easy to create a new node, then integrate it with Swarm. On the other hand, to configure Kubernetes you have to determine the size of the node, how many master nodes, and worker nodes.
  • Load balancing: Docker swarm offers application auto load balancing. However, Kubernetes gives the flexibility to configure load balancing manually.
  • Storage volume participation: since the docker swarm manages Docker containers, containers find it easy to share data. Not just data, as well as other things. Kubernetes puts the container in a pod so the container cannot simply communicate with another. You need other components from Kubernetes, e.g., Service to create the connection.
  • Monitoring: While Swarm requires additional resources for monitoring and keeping a log, these tasks are already provided for in Kubernetes.

b. Amazon EKS and Kops

When deploying Kubernetes on AWS, you can configure and manage the deployment yourself for full flexibility and control. There are a few options for self-management: Amazon Elastic Kubernetes Service and Kops.

EKS is a managed service offered by AWS. EKS uses automatically provided instances and offers a managed control plane for deployment.

Kops is an open-source tool that can be used to automate the deployment and management of clusters on AWS. It is officially supported by AWS.

c. Docker File

To work with Kubernetes, I need to create all necessary containers. Containers are created by writing docker files. These docker files contain all information about the container, e.g.: name of the image, directory store our application, port, … Docker will follow this information, then step by step, create containers. Besides, I use Docker Compose to start the process of creating containers.

d. Kubernetes architecture on AWS cloud

I choose Kubernetes, Amazon EC2, EKS, ECR for the deployment of my project. What is showed below is the architecture of Kubernetes on AWS cloud.

Source: https://blogs.tensult.com/2019/08/14/guide-to-setup-kubernetes-in-aws-eks-using-terraform-and-deploy-sample-applications/

Kubernetes server is a control panel. It creates a cubic cluster. In the cluster, there are master nodes that create and manage worker nodes. When you call deployment commands, the Kubernetes server sends messages to EKS, then EKS sends the tasks to the worker nodes.

Worker node contains some pods, in which the docker container will be run. I choose controller Deployment to keep these pods running and observe them. For the worker node, I create a pod for the client, 2 pods for the server, and a pod for Redis. The load balancer can be used to communicate with the application from outside. 

I decided to have 2 pods Server because I want to scale my application. In case when more people try to connect to my application, the request will be handled faster when we have 2 pods instead of 1 pod. The picture below shows a horizontal scaling, which means that it has more copies of the application and these copies can work with each other at the same time.

For example: for the pod client I write a client-deployment.yaml file

  • A deployment named client is created, indicated by the .metadata.name field.
  • The .spec.selector field defines how the deployment finds the pods to be managed.
  • The deployment creates one replica pod, indicated by the .spec.replicas field.
  • the .template.spec field indicates that the pod is running a container. The container is created by docker image, which has been saved in ECR (Elastic Container Registry)
  • The container is created using the .spec.template.spec.containers.name field which is called client.

To enable network access to the set of pods I have to create a Service, which is written in client-service.yaml.

This specification creates a new service object called “client” that targets TCP port 3000 on each pod, which is labeled as app = random-chat.

For pods Server and Redis, I also create Deployment and Service for each.

e. Problem

Service of pod Server:

Pods can usually send requests with each other by using a normal type of Service, which means, in my case, that the pod Client can send a request to the pod Server without having an attribute ‘type’ in server-service.yaml. The Endpoint of the Client will be ‘server:5000’, which is the combination of the name of the service and the targetPort. But after many attempts, it still does not work. So, I decided to make the Service of pod Server as type Load Balancer, which is shown in the picture above. Now the Endpoint of the client will be the address of this Load Balancer.

6. Conclusion

During the course ‘Software Development for Cloud Computing’ and this project, I have a chance to know the concept of Docker containers and how to manage them with Kubernetes. I gain not only theoretical knowledge but also practical experience by developing and deploying the application. Moreover, working with cloud computing is new and interesting for me. Cloud computing is nowadays applied in the development of applications a lot. What I applied in my project is just a small part of cloud computing and I want to learn more about it in the future.

“Studidash” | A serverless web application

by Oliver Klein (ok061), Daniel Koch (dk119), Luis Bühler (lb159), Micha Huhn (mh334)

Abstract

You are probably familiar with the HdM SB-Funktionen. After nearly four semesters we were tired of the boring design and decided to give it a more modern look with a bit more functionality then it currently has. So we created “Studidash” in the course “Software Development for Cloud Computing”. “Studidash” shows your grades and automatically calculates the sum of your ECTS and also an average of your grades. 

Since this is a project for SD4CC it runs as a serverless web application at Amazon Web Services, or AWS for short. Our tech stack for this project consists of Angular, Python, Terraform and some AWS Services like Lambda or S3.

While developing this Web-App we encountered some difficulties but also learned a lot of stuff and we hope that this blog post can give you a quick overview of what we did, what we learned, what problems we had and how we solved them so you have it easier for your next project.

What did we do? 

As mentioned in the abstract, we developed a serverless Web-App called “Studidash” because of said boring design of the SB-Funktionen. First of all, we decided that we wanted to learn a new tech stack and came to the conclusion that Angular as our frontend would be the most modern frontend framework. For our backend we decided to use Python since it’s lightweight and easy to learn. From another course we learned about Terraform so this was something we were already somewhat familiar with and decided to use it for our deployment to AWS. We also used AWS to host the Web-App since we got access to AWS Student Accounts.

After we settled for a project and our tech stack we had to think about a way to make it “cloud native” and started to research some information and came across serverless. We dug a bit deeper and found some useful information. So we came to realize that serverless might be the way to go. Serverless means that our (or maybe your application) isn’t running completely on a “on-prem”-server but is running in the cloud instead. That means the application itself isn’t coupled to the server. Servers are still there but you don’t have to think about the administrative stuff around that. This is all going to be handled by your cloud service provider. The serverless approach brings scalability, high availability and efficient resource usage and management with it. As mentioned, you can focus more on the development itself rather than thinking about servers. A connection to a CI/CD pipeline makes it easy and fast to release a new version of your application. But serverless also has its downsides. The functions have to be as small as possible to only fit one purpose and some Web-Apps can have higher latency due to a cold start (When a function isn’t used for quite some time it gets destroyed and needs to be instantiated again, which takes time). You are also going to have a bad time debugging your application since it isn’t as easy as you might be used to. In the end we went with a static frontend in a S3-Bucket, a backend running as AWS Lambda Functions and AWS API Gateway to connect them. 

Architecture

Our architecture is fully hosted on AWS and our code repositories are hosted on the HdM GitLab server. The clients can access our frontend via their favourite web browser. Our frontend application is hosted in an AWS S3-Bucket. The good thing here is that we don’t have to manage or deploy any web server by ourselves. This reduces the management overhead and in the end the costs. After the frontend is served to the client, the user can input their user credentials to access their grades from the third party service (HdM SB-Funktionen). A HTTP-Request will then be sent to a Lambda Function with an API-Gateway to receive the request. This Lambda Function contains a Python script which will parse the user credentials provided in the received HTTP-Request and use them to make a login at the SB-Funktionen platform and scrape the necessary grades and lecture data from the user. This scraped data will then be preprocessed and returned as a JSON-Object to the frontend.

From the developer side we used Git/GitLab for the version control of our code. In GitLab we created a CI/CD pipeline to build the frontend, the Python grade scraper and a Terraform image to deploy all our neccessary AWS resources. Thanks to the CI/CD pipeline the developer can just push the newest code base to the repository and it will be deployed automatically to AWS.

Architecture overview

Frontend

For our frontend we decided to build an Angular single page application. We made this decision because it’s an up-to-date framework to build fast and easy web applications.

When the user loads the website the header only displays a login component for the HdM SB-Funktionen credentials. This component triggers a POST request to the Lambda Function containing the login data. The Lambda Function then responds and returns several grade objects to the frontend which are identically defined in front- and backend. The grade object exactly maps the table structure of the HdM page. The response then triggers the rendering of the table and you will receive a login message. Also there is an error handling if the login failed. The table can be sorted according to the different values, the grade average and ECTS are calculated and displayed in the header of the page.

Screenshot of our frontend after successful login

Backend

Our backend consists of a Python script which is hosted in a Lambda Function with an API-Gateway to receive HTTP-Requests. The frontend sends a HTTP-Request with the user credentials in the request body to the API-Gateway. The request is then forwarded to the Lambda Function which then injects the HTTP-Request into our Python grade scraper script. The Python script then parses the request body and performs a login at the SB-Funktionen website of the HdM where all the student grades and lectures are stored.

Backend workflow

In our code example the event variable is the received HTTP-Request from the frontend. The received request body is a string, so the content of the body has to be parsed to JSON again. When there is no login data provided, the script will send a HTTP-Response with the status code 401 and a corresponding message.

In the next step our script scrapes all the data we need and parses them into a JSON format which our frontend can handle easily. This JSON data is then sent as response to the Lambda Function which will forward this response to the API-Gateway. The API-Gateway then also forwards this response back to our frontend application where the received data will be processed and displayed.

Code snippet – try-except

We also had to keep some other things in mind. For example what should happen when our backend throws an exception or the third-party-service isn’t available? In our backend we created an error handler which takes a HTTP-Status Code and an error message as parameter, converts the data in the right format for our frontend and then sends the response.

Code snippet – error handling

Our main lambda_handler function is then divided into different parts. Each part is surrounded by a try-except clause to catch exceptions. For example if the third party service is down or if there were no credentials provided by the frontend. This makes our backend more reliable and also gives the user enough feedback to know what’s going on. Since we use an external service we need to think of a solution for the case when the third party service is down, for example for maintenance reasons. A possible solution to this would be to implement a caching mechanism which we don’t provide in the current state.

CI/CD

To make our application as cloud native as possible we implemented a CI/CD pipeline in our project. This pipeline builds our Web-App as well as our Lambda Functions, tests our Python script and deploys them to AWS. For that we are using different stages (build, test, deploy) in our .gitlab-ci.yml file. The build_webapp stage first pulls a Node-image and runs a few lines of script to install all dependencies and then builds the Angular based frontend. While this part is running, a second instance is pulling an Alpine image and is also running a few lines of script to package our Lambda Function(s) into a ZIP file.

After that, the test stage is invoked to test the application before deployment. This is a crucial part in the pipeline since it can reveal mistakes that we made during development before going “live” with the application. When the tests succeed, the next stage is invoked.

In our case, we made the deployment stage manually since we didn’t want to push every small change to AWS and also the Student Accounts had some time limits that would forbid us doing that anyway. But what happens in the deploy stage is fairly simple. Like in the stages before we are pulling an image for Terraform to run the usual Terraform commands like init, validate, plan and apply. This initializes Terraform, validates our main.tf in the root of the repository, creates a plan for creating the different resources in this main.tf and finally applies it. 

But what exactly is in this main.tf file? This file contains every resource we need in AWS and creates it. First of all, we declared variables for our different buckets, one for the Lambda Function and one on which the Angular app is going to be hosted at. After that, we created the S3-Bucket for the Lambda Function and uploaded the ZIP file with the function to the bucket. From there, it gets deployed to AWS Lambda. We also needed to create a role and policy to give the bucket the correct access rights to execute their task properly. After that, the S3-Bucket for the Angular app is created and the needed files are uploaded. This bucket hosts the frontend as a static website which we also configured in our main.tf to do that.

.gitlab-ci.yml file for our pipeline (1/2)
.gitlab-ci.yml file for our pipeline (2/2)

Testing

Testing is one of the most important things when implementing a CI/CD pipeline and with automated deployment. When you don’t implement tests you don’t really know if your application works before deployment and after the deployment, it is too late. So implementing a stage for testing in our project was the way to go. For our Python backend we wrote some basic Unit-Tests to test functionality and also added a test stage for the backend to our CI/CD pipeline.

We also managed to write an End-To-End-Test for our frontend which checks if the Error-Snackbar is shown when the user puts in wrong credentials. The harder part in this scenario was to get it running in the CI/CD pipeline, which we unfortunatly didn’t manage to do.

What problems did we have and how did we solve them?

One of the biggest problems we encountered was due to the fact that we only had access to an AWS Student Account. It ensured that we only had restricted access to AWS. For example we needed to create different kinds of roles to deploy our Lambda Function with the correct set of rights to be executed. Due to the restrictions we were not allowed to give the roles the needed permissions which caused our CI/CD pipeline to fail and our project didn’t get fully deployed. This could only be solved by getting a “real” AWS Account which gives you all the permissions you would need.

Another problem we faced was CORS (Cross-Origin Resource Sharing). In the first steps of our development we always got a CORS-Error when our frontend was requesting the grades and lecture data from our backend service. The reason for that was because in our Python backend script we just sent back the JSON-Object containing all the data but without any HTTP-Headers to our frontend. The frontend then failed to receive the response because the URL of the API-Gateway was different from the URL that our frontend had. To fix this problem we had to set the Access-Control-Allow-Origin HTTP-Header in the response from our backend. 

Code snippet – http-headers (CORS)

After that, the request worked and our frontend could receive the scraped data.

Another problem we had was to integrate our End-to-End-Test in our CI/CD-pipeline, which we unfortunately didn’t manage to fix in time. It would’ve required us to have a runner that has a browser available but we weren’t able to set that up. We managed to implement an E2E-Test which is running locally without any problems. So at least we have a bit of code quality assurance here. Having to run the tests manually isn’t what you want to do for a fully automated cloud native approach.

Conclusion

It was quite a long way from where we started, but in the end we managed to get our Web-App running on AWS as we liked. We made it a bit difficult in the beginning because we said we wanted to learn some new technologies like Python and Angular, so we first had to learn those. But we also had to learn about serverless-architecture. It is also something to look forward to working with in the future.

At the presentations we found out about AWS Amplify, which is basically a tool by AWS to get serverless Web-Apps running as fast as possible without the need of S3-Buckets. It showed us that there isn’t really the “one and only” way to get something running in the cloud. There are many possible solutions. 

In our opinion we learned a lot about AWS, serverless-architecture and cloud in general. But also about developing an application where you don’t have to think about renting and maintaining a server. Maybe we can continue with this project in the near future and give the HdM SB-Funktionen a new look 🙂

Application Updater mit Addon-Verwaltung

von Mario Beck (mb343) und Felix Ruh (fr067)

Einleitung

Unser Ziel war es, einen Programm Updater für Entwickler zu erstellen, den diese einfach in ihre CI/CD-Pipeline integrieren können. Für die Umsetzung haben wir die IBM Cloud und eine Serverless Architektur verwendet, um eine unbegrenzte Skalierbarkeit zu erreichen. Zu den verwendeten Serverless Services zählen die Cloud Functions, DB2 und ein Object Storage.

Das Projekt besteht aus einem Uploader, mit dem der Entwickler sein Programm in den Object Storage hochladen kann. Und einem Downloader für den Benutzer, mit dem automatisch die aktuelle Version heruntergeladen wird.

Verwendung aus der Entwicklersicht:

  • Programm wird registriert und man bekommt die dazugehörigen API-Keys
  • Erstellen der Config für den Downloader
  • Mit dem Uploader kann das Programm hochgeladen werden, dies kann einfach in eine CI/CD Pipeline eingebunden werden

Verwendung aus der Benutzersicht:

  • Herunterladen des Downloaders und der Config
  • Starten des Downloaders
  • Vor Programmstart wird nach neuen Updates gesucht und diese falls vorhanden heruntergeladen
  • Nach dem Update wird das eigentliche Programm gestartet
Continue reading

Cloud basierter Password Manager

von Benjamin Schweizer (bs103) und Max Eichinger (me110)

Abstract

Können Passwort Manager Anbieter meine Passwörter lesen? Wir wollten auf Nummer sichergehen und haben unseren Eigenen entwickelt. Dieser Artikel zeigt auf welche Schritte wir hierfür unternehmen mussten.
Dabei haben wir unser Frontend mittels Flutter und unser Backend in AWS umgesetzt. Außerdem gehen wir auf IaC mittels Terraform ein. Am Ende teilen wir unsere Probleme bei der Umsetzung sowie Erweiterungsmöglichkeiten, welche in Zukunft umgesetzt werden könnten.

Mobile App

Für die Umsetzung des Frontends haben wir uns für das Flutter Framework entschieden. Durch Flutter konnten wir eine gemeinsame Codebasis für alle unsere Zielplattformen (iOS, Android) verwenden. Daher konnten neue Features schnell implementiert und Änderungen konnten sofort auf unterschiedlichen Geräten getestet werden. Außerdem wurde durch einen Integrationstest sichergestellt, dass die App auf beiden Plattformen fehlerfrei läuft.


Die App bietet eine einfache Benutzeroberfläche, in welcher folgende Aktionen möglich sind:

  • Login und Registrierung eines Nutzers
  • Hinzufügen und Löschen eines Passworts
  • Ändern von Passwörtern

Diese Funktionen sind in folgenden Benutzeroberflächen abgebildet

Die Oberfläche ist auf Android und iOS identisch.

Architektur

Grundlegend ist unsere Architektur in Frontend und Backend aufgeteilt. Dabei wird, wie bereits erwähnt das Frontend mit Flutter umgesetzt und das Backend über die AWS-Cloud realisiert.
Alle Anfragen an das Backend werden mit HTTP-Requests an das API-Gateway-Service gesendet.
Die Anfragen der Nutzer müssen einen validen JWT (JSON Web Token) enthalten.
Diesen Session Token bekommt der Nutzer bei erfolgreichem Log-in über den Cognito-Service.
Durch den Token kann in den Lambda Funktionen sichergestellt werden, das Nutzer nur Daten ändern können, für die Sie eine Berechtigung haben.
Die Lambda Funktionen evaluieren die Anfragen und Ändern die Passwortdaten im DynamoDB-Service.
Alle Passwortdaten, welche an das Backend übertragen werden, sind bereits lokal vom Client verschlüsselt worden, um Missbrauch zu verhindern.


AWS Services

Da wir dieses Semester bereits in einem anderen Studienfach Erfahrung mit AWS in Verbindung mit IoT sammeln konnten, war AWS unsere erste Wahl. Außerdem waren wir uns am Anfang des Projektes sicher, das AWS alle unsere Anforderungen erfüllt.

DynamoDB
AWS DynamoDB ist ein vollständig verwalteter NoSQL-Datenbankservice. Dort werden alle Passwortdaten gespeichert.
Um schnelle Query Anfragen zu ermöglichen, ist unsere Tabelle in folgende Felder aufgeteilt:

  • Partitionsschlüssel: User_Id (Eine eindeutige ID für jeden Nutzer)
  • Sortierschlüssel: PasswordName
  • Passwort (in verschlüsselter Form)
  • Beschreibung

API-Gateway

Um HTTP-Anfragen an unsere Lambda Funktionen weiterzuleiten, nutzen wir den API-Gateway-Service. Dieser validiert Nutzeranfragen auf erforderliche Parameter und lehnt bei fehlenden Daten die Anfrage ab. Außerdem wird hier der Token mithilfe von Cognito geprüft.
Wir nutzen 3 verschiedene HTTP-Methoden, um unsere Client-Anfragen zu bearbeiten. Zudem sind alle Methoden über einen API-Key gesichert.

  • Die DELETE Methode löscht ein Password von einem Nutzer.
  • Die GET Methode gibt alle Passwörter eines Nutzers zurück.
  • Die PUT Methode erstellt oder überschreibt ein Passwort.

Lambda
Innerhalb jeder unserer Lambda Funktionen haben wir Zugriff auf den Cognito Authorizer. Dieser bietet uns die Möglichkeit, direkt auf die Daten (User_Id, Username etc.) des Nutzers zuzugreifen, welcher den Request geschickt hat.

PUT Methode:

In der PUT Methode werden neue Passwörter gespeichert oder schon vorhandene überschrieben.

Durch das in API-Gateway angelegte JSON-Schema können wir prüfen, ob der HTTP-Body die erforderlichen Parameter enthält.
Trotzdem kann es sein, dass vom Nutzer ein leerer String (“”) geschickt wird.
Dies ist der erste Validierungsschritt, welcher in der Lambda Funktion ausgeführt wird.


Mithilfe der User_Id und der Passwortdaten können wir nun einen neuen Eintrag in DynamoDB erstellen.
Falls der Passwortname bereits in der Datenbank steht, wird das Passwort überschrieben.


GET Methode:

Die GET Methode liefert alle Passwörter eines Nutzers zurück.

Durch die User_Id können wir an DynamoDB eine Query schicken, welche uns alle Daten von einem einzigen Nutzer zurückliefert.
Dadurch wird sichergestellt, dass ein Nutzer nur auf seine eigenen Passwortdaten zugriff hat.

DELETE Methode:

Die DETELE Methode löscht ein Passwort eines Nutzers.

Auch für diese Methode existiert in API-Gateway ein JSON-Schema, welches den Body validiert.

Mithilfe des Authorizers und des Passwortnamens können wir nun für diesen Nutzer das Passwort löschen. Dafür müssen wir den Partitionsschlüssel (User_Id) und den Sortierschlüssel (PasswordName) angeben und diese Anfrage an DynamoDB senden.


Verschlüsselung

Um sicherzustellen, dass alle Passwörter sicher sind und nur von dem Nutzer gelesen werden können, welcher die Passwörter auch angelegt hat, nutzen wir lokale Verschlüsselung.
Damit der Nutzer die App auf mehreren Geräten gleichzeitig verwenden kann, muss es gemeinsamen Schlüssel geben.

Dieser Schlüssel basiert bei uns auf dem Login-Passwort. Dieser Schlüssel wird bei erfolgreichem Einloggen auf dem Gerät in einem sicheren Bereich gespeichert.

Bei iOS wird der “Keychain Service” und bei Android das “Android keystore system” genutzt. Das Login-Passwort wird zuvor zu einem Hash konvertiert.

Beim Ver- und Entschlüsseln wird ein AES Algorithmus verwendet. Die verschlüsselten Bytes müssen danach nur noch zu einem String encodiert werden, damit man das Passwort leichter zu einem JSON serialisieren kann.


Testing in Flutter

Um alle Funktionen in Flutter leicht testen zu können, haben wir uns für einen Integrationstest entschieden. Damit können wir herausfinden, ob alle wichtigen Funktionen richtig funktionieren. Unsere Hauptfunktionen, welche wir getestet haben sind:

  • Login eines Nutzers
  • Hinzufügen und Löschen eines Passworts
  • Ändern von Passwörtern
  • Ver- und Entschlüsselung
Die Login Prozedur prüft ob die App die richtigen Daten anzeigt, sobald der Nutzer eingeloggt wurde.


Terraform

Wir haben in unserem Projekt Terraform eingesetzt. In unser Ziel war es hierbei, das jeder einfach dieses Projekt bei sich zu Hause nachbilden kann. Dadurch war es aber auch möglich, schnell den AWS Account oder die Region zu ändern und sämtliche Infrastruktur Änderungen zu versionieren.

Main und Cognito

Grundsätzlich haben wir unseren Terraform Code in mehrere Files aufgeteilt. Dabei werden in der main.tf vor allem die Standard Terraform Parameter gesetzt und zusätzlich wird hier noch der Cognito Service aufgesetzt.

Variablen

Im variable.tf File werden für Terraform die AWS AccountID und die Region festgelegt. Diese Daten werden von Terraform benötigt, um die richtigen Account und Region für die Cloud Infrastruktur zu finden.

DynamoDB

Dienst und die Tabellen dem dynamodb.tf File werden, wie zu erwarten, alle DynamoDB Einstellungen übernommen. Hier wird also der AWS Dienst und die Tabellen Felder angelegt.

API Gateway

Im api.tf wird zunächst der normale AWS API Gateway Dienst eingestellt. Danach werden die Methoden der API definiert. Für jede Methode muss zudem angegeben werden welche Lambda Funktion ausgeführt werden soll, zusätzlich kann hier ein Template für die Request hinzugefügt werden. Dadurch wird unter anderem auch der Lambda Code vereinfacht. Jetzt wo in den Methoden alles Wichtige eingestellt ist, wird im tf File die Response erstellt.
Als Nächstes wird in Terraform definiert, wie die API bereitgestellt wird. Dadurch spart der Entwickler die Zeit später in AWS API Gateway den Deploy Butten von Hand zu drücken. Am Ende vom api.tf File wird noch für jede API Methode ein Model / Schema für die JSON Daten angeben. Das hat den Vorteil das Request an die API, ohne diese benötigten Daten, automatisch abgelehnt werden. Und zum Schluss wird noch der Authorizer von Cognito für die API richtig eingerichtet.


Lambda

Alle Lambda Funktionen müssen lokal gespeichert sein. Im lambda.tf File wird dann der Pfad zur Funktion definiert, beim Ausführen von Terraform wird dann ein zip File von jeder Funktion erstellt und in AWS Lambda hochgeladen.
Danach wird werden noch die Policies für das Logging mit Cloudwatch und die Access Rechte für DynamoDB geben.
Am Ende werden noch Permissions für API Gateway und Cognito an die richtigen Lambda Funktionen verteilt.

Probleme

Jedes Projekt hat seine Probleme, so auch unseres. Direkt am Anfang hatten wir viel mit AWS zu kämpfen, den wenn man für die Cloud entwickelt ist Debugging nicht mehr so einfach wie in einer IDE. Man kann dadurch kaum mehr nachvollziehen, was das AWS Setup oder der Lambda Code macht. Am Ende konnten wir das Problem über den Service Cloudwatch lösen. Hier werden alle Aktion geloggt, dadurch können Fehler schnell gefunden und gelöst werden.

Weitere Schwierigkeiten sind beim Arbeiten mit Terraform aufgetaucht. So sollte Terraform eigentlich von alleine alle Dienste in der richtigen Reihenfolge anlegen. Leider geht dies oft schief, und wir mussten im Terraform File von Hand mittels depens_on, die Reihenfolge einstellen. Ein weiteres Problem war, dass AWS beim Erstellen von Services viele Optionen im Hintergrund von alleine einstellt. Später mit Terraform raus zu finden, welche Einstellungen man jetzt genau braucht, war nur über das ausführliche Lesen der Dokumentation möglich.

Beim Arbeiten mit den AWS Dienst Cognito hatten wir Schwierigkeiten mit dem Bestätigen eines Nutzers. Denn Cognito legt zwar alleine alle Nutzer an, aber die Bestätigung des Accounts muss man von Hand gemacht werden. Jedoch konnten wir dieses Problem schnell über eine eigene Lambda Funktion lösen, welche automatisch jeden Nutzer verifiziert.

Erweiterungen

Das Projekt wurde von Anfang an mit dem Zeitlimit eines Semesters geplant. Jedoch könnte man mit weiterem zeitlichen Investment noch viele wichtige Features umsetzen:

  • So funktioniert Flutter auch im Web, leider haben wir innerhalb des Codes viele Abhängigkeiten zu iOS / Android. Jedoch wäre der Aufwand, die App auch als Web App laufen zu lassen, überschaubar.
  • Momentan sind die Passwörter in der Detailansicht noch sichtbar, in Zukunft sollte man diese verbergen und ein Kopierbutton hinzufügen. So könnte man die Passwörter noch besser schützen.
  • Zudem könnte man Power-Usern das Leben vereinfachen, in dem man eine Passwortsuche einbauen würde, so könnten Nutzer aus einem Berg von Passwörtern schnell das Richtige finden.
  • Ein weiter Feature wäre das Ändern des Masterpassworts, hier wäre aber der Aufwand sehr groß. Da jedes Passwort des Nutzers in der Datenbank mittels des Masterpassworts verschlüsselt ist, müsste jedes Passwort neu verschlüsselt werden.

Lessons Learned

Wie im Projekt bereits zu erkennen ist, haben wir viel Erfahrung mit den AWS Diensten: IAM, Cognito, Lambda, API Gateway, Cloudwatch und DynamoDB, sammeln können. Dank Terraform ist uns auch das Konzept der IaC (Infrastructure as Code) jetzt viel greifbarer geworden, zusätzlich hat uns Terraform fast schon gezwungen, von jedem benutztem AWS Service, die wichtigste Einstellung zu kennen. Außerdem konnten wir erste Kenntnisse in der App Entwicklung sammeln. Zuletzt haben wir unser Wissen im Bereich Testing um das Thema Integrations Tests erweitert.

How do you get a web application into the cloud?

by Dominik Ratzel (dr079) and Alischa Fritzsche (af094)

For the lecture “Software Development for Cloud Computing”, we set ourselves the goal of exploring new things and gaining experience. We focused on one topic: “How do you get a web application into the cloud?”. In doing so, we took a closer look at Continuous Integration / Continuous Delivery, Infrastructure as a Code, and Secure Sockets Layer. In the following, we would like to share our experiences.

Overview of the content of this blog post

  • Comparison GitLab and GitHub 
    • CI/CD in GitLab 
      • Problem: Where are the CI/CD settings in the HdM Gitlab? 
      • Problem: Solve Docker in Docker by creating a runner 
    • CI/CD in GitHub 
  • Set up SSL for the web application 
    • Problem: A lot of manual effort 
      • Watchtower 
      • Terraform 
  • Testing 
    • Create a test environment 
    • Automated Selenium frontend testing in GitHub 
  • Docker Compose 
  • Problem: How to build amd64 images locally with an arm64 processor? 

Continuous Integration / Continuous Delivery

At the very beginning, we asked ourselves which platform was best suited for our approach. We limited ourselves to the best-known platforms so that the comparison would not be too complex: GitHub and GitLab.
Another point we wanted to try was setting up a runner. For this purpose, we set up a simple pipeline in both GitLab and GitHub to update Docker images on Docker Hub.

GitLab vs. GitHub

GitHub is considered the original cloud-based Git platform. The platform focuses primarily on the community. Comparatively, it is also the largest (as of January 2020: 40 million users). GitLab is the self-hosted open-source alternative to GitHub. During our research, we noticed the following differences concerning our project.

GitLab GitHub 
Free private and public repositories ✓ ✓ (since Jan. 2019)
Enterprise versions ✓ ✓ 
Self-hosted version ✓ ○ (only with paid Enterprise plan) 
CI/CD with shared or personal runners ✓ ○ (with third-party apps) 
Wiki ✓ ✓ 
Preview code changes ✓ ✓ 

Especially the point that it is only possible in GitLab to use self-hosted runners for the CI/CD pipeline caught our attention. From our point of view, this is a plus for GitLab in terms of data protection. The fact that GitLab can be self-hosted is an advantage but not necessary for our project. Nevertheless, it is worth mentioning, which is why we have included the point in our list. In all other aspects, GitLab and GitHub are very similar.

CI/CD – GitLab vs. GitHub

GitHub: provides the user so-called GitHub Actions. This way, the user does not have to set up, configure or host his runner.
+ very easy to use
+ free of charge
– Critical from a data protection perspective, as the code is executed/read “somewhere”

GitLab: To use the CI/CD, a custom runner must be configured, hosted, and integrated into the code repository.
+ code stays on own runner (e.g., passwords and source code are safe)
+ the runner can be configured according to one’s wishes
– complex to set up and configure
– Runner could cost money depending on the platform (e.g., AWS)

Additional information: HdM offers students so-called shared runners. However, Docker-in-Docker is not possible with these runners for security reasons. In the following, we will explain how we configured our GitLab runners to allow Docker-in-Docker. Another insight was that the Docker_Host variable must not be specified in the pipeline, otherwise the Docker socket will not be found, and the pipeline will fail.

CI/CD in GitLab 

Where are the CI/CD settings in the HdM GitLab?

We are probably not the first to notice that the CI/CD is missing in the MI GitLab navigation. The “advanced features” have been disabled to avoid “overwhelming” students. However, they can be easily activated via the GitLab settings (Settings > General > Visibility, project features, permissions) (https://docs.gitlab.com/ee/ci/enable_or_disable_ci.html). 

Write the .gitlab-ci.yml file

The next step is to write an individual .gitlab-ci.yml file (https://docs.gitlab.com/ee/ci/quick_start/index.html).  
The script builds a Docker container and pushes it to Docker Hub. The DOCKER_USERNAME and DOCKER_PASSWORD are stored as Variables in GitLab (Settings > CI/CD > Variables). 
Tip: If you want to keep the images private but do not want to pay for the second private repository on Docker Hub (5$/month), you can create a private repo and push the images separated by tag (in our case, “frontend” and “backend”). 

stages:
  - docker

build-push-image:
  stage: docker
  image: docker:stable
  tags:
    - gitlab-runner
  cache: {}
  services:
    - docker:18.09-dind
  variables:
    DOCKER_DRIVER: overlay2
    DOCKER_TLS_CERTDIR: ""
    # This variable DOCKER_HOST should never be set, because otherwise the default address of the Docker host will be
    # overwritten and the runner will not be able to access the socket and the pipeline will fail!
    # DOCKER_HOST: tcp://localhost:2375/
  before_script: # Install docker-compose
    - apk add --update --no-cache curl py-pip docker-compose
  script:
    - echo $DOCKER_PASSWORD | docker login --username $DOCKER_USERNAME --password-stdin
    - docker-compose build
    - docker-compose push
  only:
    - master

Configuring Gitlab

Next, we asked ourselves how we could restrict merges into the master. The goal was only to allow a branch to be added to the master if the pipeline was successful. This setting can be found in Settings > General > Merge requests > Merge checks the item “Pipelines must succeed”.

Setting up and configuring GitLab Runner

For this, we have written a runnerSetup.sh.

#!/bin/bash

# Download the binary for your system
sudo curl -L --output /usr/local/bin/gitlab-runner https://gitlab-runner-downloads.s3.amazonaws.com/latest/binaries/gitlab-runner-linux-amd64

# Give it permission be executed
sudo chmod +x /usr/local/bin/gitlab-runner

# Create a GitLab CI user
sudo useradd --comment 'GitLab Runner' --create-home gitlab-runner --shell /bin/bash

# Install and run as service
sudo gitlab-runner install --user=gitlab-runner --working-directory=/home/gitlab-runner
sudo gitlab-runner start
sudo gitlab-runner status

# Command to register the runner
sudo gitlab-runner register --non-interactive --url https://gitlab.mi.hdm-stuttgart.de/ \
 --registration-token asdfX6fZFdaPL5Ckna4qad3ojr --tag-list gitlab-runner --description gitlab-runner \
 --executor docker --docker-image docker:stable \
 --docker-volumes /var/run/docker.sock:/var/run/docker.sock \
 --docker-privileged

# Install Docker and give the GitLab runner permissions so that it can access the Docker socket.
echo "Installing Docker"
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce

sudo usermod -aG docker gitlab-runner

# Restart Docker and GitLab Runner Service
sudo systemctl restart gitlab-runner
sudo systemctl restart docker.service

During the step “# Command to register the runner” we fixed the problem we had with the HdM runners. “–docker-volumes/var/run/docker.sock:/var/run/docker.sock” gives the runner access to the Docker socket. “–docker-privileged” allows the runner to access all devices on the host and processes outside the container (be careful).

CI/CD in GitHub

This is done by adding the following code in the GitHub repository in the self-created .github/workflows/ci.yml file.
Like the previous .gitlab-ci.yml file, the script creates a Docker container and pushes it to Docker Hub. The DOCKER_USERNAME and DOCKER_PASSWORD are stored in the Action Secrets of GitHub (Settings > Actions).

name: Build and Push to Docker.io

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2
    - name: Login to Docker.io
      run: docker login -u ${{ secrets.DOCKER_USERNAME }} -p ${{ secrets.DOCKER_PASSWORD }} 
 
    - name: Build Docker-Compose
      run: docker-compose build
 
    - name: Deploy Container to Docker.io
      run: docker-compose push

SSL

SSL is used to encrypt the data exchange between the web browser and web server. It thus protects against access by third parties. To set up SSL, it is necessary to have an SSL certificate.

Configuration

We decided to use “all-inkl.com” due to an existing subscription.
In the KAS admin center (after setting up the domains and subdomains), new DNS records can be created and edited (Domain > DNS Settings > Actions (Edit)). Here, a new Type-A record can be created that points to the IP address of the AWS reverse proxy. The email (which needed for verification) can easily be created in email > email Inbox.

Server configuration 

We used the free CA Let’s Encrypt (https://letsencrypt.org/) for the creation and renewal of the SSL certificates. For the configuration, we used the following images: jwilder/nginx-proxy as Nginx Proxy and jrcs/letsencrypt-nginx-proxy-companion as Nginx Proxy Companion (it creates the certificates and mounts them via the volumes into the Nginx Proxy so that it can use them).  
In the docker-compose.yml, the environment variables can now be added for the service “frontend”. 

  frontend:
    image: dr079/webshop:frontend
    build:
      context: ./frontend
      dockerfile: Dockerfile
    restart: always
    environment:
      API_HOST: backend
      API_PORT: 8080
      # Subdomain
      LETSENCRYPT_HOST: webshop.designmyhouse.de
      # Email for domain verification
      LETSENCRYPT_EMAIL: admin@designmyhouse.de
      # For the Nginx proxy
      VIRTUAL_HOST: webshop.designmyhouse.de
      # The Port on which the frontend responds. Tells the Nginx proxy who to send the requests to.
      VIRTUAL_PORT: 80
# Not needed when deploying with reverse proxy
#    ports:
#      - "80:80"

After that, we created the docker-compose-cert.yml file, which starts the Nginx Proxy and the Nginx Proxy Companion.

version: "3.3"
services:
  nginxproxy:
    image: jwilder/nginx-proxy
    restart: always
    volumes:
      - ./nginx/data/certs:/etc/nginx/certs
      - ./nginx/conf:/etc/nginx/conf.d
      - ./nginx/dhparam:/etc/nginx/dhparam
      - ./nginx/data/vhosts:/etc/nginx/vhost.d
      - ./nginx/data/html:/usr/share/nginx/html
      - /var/run/docker.sock:/tmp/docker.sock
    ports:
      - 80:80
      - 443:443
    labels:
      - "com.github.jrcs.letsencrypt_nginx_proxy_companion.nginx_proxy"

  nginxproxy_comp:
    image: jrcs/letsencrypt-nginx-proxy-companion
    restart: always
    depends_on:
      - nginxproxy
    volumes:
      - ./nginx/data/certs:/etc/nginx/certs:rw
      - ./nginx/conf:/etc/nginx/conf.d
      - ./nginx/dhparam:/etc/nginx/dhparam
      - ./nginx/data/vhosts:/etc/nginx/vhost.d
      - ./nginx/data/html:/usr/share/nginx/html
      - /var/run/docker.sock:/var/run/docker.sock:ro

AWS EC2 instance

In AWS, an EC2 instance (consisting of an Ubuntu server and a security group) can now be created and started with the settings Verify and Launch. 

The IP address of the created instance can now be entered as a Type-A entry under “all-inkl.com”.

Install Docker on Ubuntu

It is now possible to connect to the EC2 instance and run the following commands to make the project accessible through the domain/subdomain. (Note: It may take a few hours for the DNS server to apply the settings. Solution: Use the Tor browser)

# Add GPG key of Docker repository from APT sources.
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

# Update Ubuntu package database
sudo apt-get update

# Install Docker
sudo apt-get install -y docker-ce

# Install Docker Compose
sudo curl -L https://github.com/docker/compose/releases/download/1.18.0/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose

# Give Docker Compose the execute permission
sudo chmod +x /usr/local/bin/docker-compose

# Log in as root user
sudo -s

# Clone project
git clone https://github.com/user_name/project_name.git

# Build and launch project
docker-compose -f ./project_name/docker-compose-cert.yml up --build -d

# Pull images from DockerHub
sudo docker-compose -f ./cloud-webshop/docker-compose.yml pull

sudo docker-compose -f ./cloud-webshop/docker-compose.yml up -d

Watchtower

With Watchtower, updates to the Docker registry can be automatically detected and downloaded. The container will then be rebooted with the new image. Watchtower accesses the Docker repo via REPO_USER & REPO_PASS and checks in the set time interval (— interval 30) if the Docker images have changed and updates them on the fly.
This requires adding the following code to the docker-compose.yml (replace REPO_USER and REPO_PASS with Docker.io Access Token credentials (Settings > Security)).

  watchtower:
    image: v2tec/watchtower
    environment:
      REPO_USER: REPO_USER
      REPO_PASS: REPO_PASS
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    command: --interval 30

Terraform

The preceding steps involve a considerable manual effort. However, it is possible to automate this, e.g., with Terraform. To achieve this, the following files must be written.

main.tf

resource "aws_instance" "test" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = var.ec2_instance_type

  tags = {
    Name = var.ec2_tags
  }

  user_data = file("docker/install.sh")
//  user_data = file("docker/setupRunner.sh")
  key_name = aws_key_pair.generated_key.key_name
  security_groups = [
    aws_security_group.allow_http.name,
    aws_security_group.allow_https.name,
    aws_security_group.allow_ssh.name]
}

output "instance_ips" {
  value = aws_instance.test.*.public_ip
}

providers.tf

provider "aws" {
  access_key = var.aws-access-key
  secret_key = var.aws-secret-key
  region = var.aws-region
}

security_groups.tf

resource "aws_security_group" "allow_http" {
  name = "allow_http"
  description = "Allow http inbound traffic"
  vpc_id = aws_default_vpc.default.id

  ingress {
    from_port = 80
    to_port = 80
    protocol = "tcp"
    cidr_blocks = [
      "0.0.0.0/0"
    ]
  }

  egress {
    from_port = 0
    to_port = 0
    protocol = "-1"
    cidr_blocks = [
      "0.0.0.0/0"]
  }
}

resource "aws_security_group" "allow_https" {
  name = "allow_https"
  description = "Allow https inbound traffic"
  vpc_id = aws_default_vpc.default.id

  ingress {
    from_port = 443
    to_port = 443
    protocol = "tcp"
    cidr_blocks = [
      "0.0.0.0/0"
    ]
  }

  egress {
    from_port = 0
    to_port = 0
    protocol = "-1"
    cidr_blocks = [
      "0.0.0.0/0"]
  }
}

resource "aws_security_group" "allow_ssh" {
  name = "allow_ssh"
  description = "Allow ssh inbound traffic"
  vpc_id = aws_default_vpc.default.id

  ingress {
    from_port = 22
    to_port = 22
    protocol = "tcp"
    # To keep this example simple, we allow incoming SSH requests from any IP. In real-world usage, you should only
    # allow SSH requests from trusted servers, such as a bastion host or VPN server.
    cidr_blocks = [
      "0.0.0.0/0"
    ]
  }
}

variables.tf

variable "ec2_instance_type" {
  default = "t2.micro"
}

variable "ec2_tags" {
  default = "Webshop"
//  default = "Gitlab-Runner"
}

variable "ec2_count" {
  default = "1"
}


data "aws_ami" "ubuntu" {
  most_recent = true
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }
  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
  owners = ["099720109477"] # Canonical
}

ssh_key.tf

variable "key_name" {
  default = "Webshop"
}

resource "tls_private_key" "example" {
  algorithm = "RSA"
  rsa_bits = 4096
}

resource "aws_key_pair" "generated_key" {
  key_name = var.key_name
  public_key = tls_private_key.example.public_key_openssh
}

resource "aws_default_vpc" "default" {
  tags = {
    Name = "Default VPC"
  }
}

variable_secrets.tf

variable "aws-access-key" {
  type = string
  default = "aws-access-key"
}

variable "aws-secret-key" {
  type = string
  default = "aws-secret-key" 
}

variable "aws-region" {
  type = string
  default = "eu-central-1"
}

install.sh for EC2 setup

To do this, we created a ./docker/install.sh file with the following content.


#!/bin/bash

# Install wget to update IP at all-inkl.com
echo "Setup all-inkl.com"
sudo apt-get install wget

# Save public IP to variable
ip="$(dig +short myip.opendns.com @resolver1.opendns.com)"

# Add all-inkl.com variables
kas_login="username"
kas_auth_data="pw"
kas_action="update_dns_settings"
sub_domain="sub"
record_id="id"

sudo sleep 10s

# Update all-inkl.com dns-settings with current IP and account data
sudo wget --no-check-certificate --quiet \
  --method POST \
  --timeout=0 \
  --header '' \
    'https://kasapi.kasserver.com/dokumentation/formular.php?kas_login='"${kas_login}"'&kas_auth_type=plain&kas_auth_data='"${kas_auth_data}"'&kas_action='"${kas_action}"'&var1=record_name&wert1='"${sub_domain}"'&var2=record_type&wert2=A&var3=record_data&wert3='"${ip}"'&var4=record_id&wert4='"${record_id}"'&anz_var=4'


echo "Installing Docker"
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce

echo "Installing Docker-Compose"
sudo curl -L https://github.com/docker/compose/releases/download/1.18.0/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# Follow guide to create personal access token https://docs.github.com/en/github/authenticating-to-github/keeping-your-account-and-data-secure/creating-a-personal-access-token
sudo git clone https://username:token@github.com/ratzel921/cloud-webshop.git
sudo docker login -u username -p token
sudo docker-compose -f ./cloud-webshop/docker-compose-cert.yml up --build -d
sudo docker-compose -f ./cloud-webshop/docker-compose.yml pull
sudo docker-compose -f ./cloud-webshop/docker-compose.yml up

Next, run the following commands. This will automatically create an EC2 instance (runs the application), a Security_Group (for connections to the EC2 instance via HTTPS, HTTP, and SSH), an SSH_KEY (allows to access the EC2 instance via SSH). In the end, the IP address of the EC2 instance is displayed in the console. This will automatically be entered into all-inkl.com or manually add it.

# Get terraform provider with init and use apply to start the terraform script.
terraform init
terraform apply --auto-approve

# (Optional) Delete EC2 instances
terraform destroy --auto-approve

Testing

Creating a Testing Environment

Using Terraform and an EC2 instance, it is also possible to create a testing environment. We used the GitHub pipeline for this.

Backend/Dockerfile

# Build stage
FROM maven:3.6.3-jdk-8-slim AS build
COPY src /home/app/src
COPY pom.xml /home/app
RUN mvn -f /home/app/pom.xml clean test
RUN mvn -f /home/app/pom.xml clean package

# Package stage
FROM openjdk:8-jre-slim
COPY --from=build /home/app/target/*.jar /usr/local/backend.jar
COPY --from=build /home/app/target/lib/*.jar /usr/local/lib/
EXPOSE 8080
ENTRYPOINT ["java","-jar","/usr/local/backend.jar"]

frontend/nginx/nginx.conf

server {
  listen 80;
  server_name www.${VIRTUAL_HOST} ${VIRTUAL_HOST};

    location / {
        root   /usr/share/nginx/html;
        index  index.html index.htm;
        try_files $uri $uri/ /index.html;
        proxy_cookie_path / "/; SameSite=lax; HTTPOnly; Secure";
    }

    location /api {
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_set_header X-NginX-Proxy true;

        proxy_pass_header Set-Cookie;

        proxy_cookie_domain www.${VIRTUAL_HOST} ${VIRTUAL_HOST};
        #rewrite ^/api/?(.*) /$1 break;
        proxy_pass http://${API_HOST}:${API_PORT};
        proxy_redirect off;
    }

   error_page   500 502 503 504  /50x.html;

   location = /50x.html {
        root   /usr/share/nginx/html;
    }
}

frontend/Dockerfile

# Build stage
# Use node:alpine to build static files
FROM node:15.14-alpine as build-stage

# Create app directory
WORKDIR /usr/src/app

# Install other dependencies via apk
RUN apk update && apk add python g++ make && rm -rf /var/cache/apk/*

# Install app dependencies
# A wildcard is used to ensure both package.json AND package-lock.json are copied
# where available (npm@5+)
COPY package*.json ./

RUN npm install

# Bundle app source
COPY . .

# Build static files
RUN npm run test
RUN npm run build


# Package stage
# Use nginx alpine for minimal image size
FROM nginx:stable-alpine as production-stage

# Copy static files from build-side to build-server
COPY --from=build-stage /usr/src/app/dist /usr/share/nginx/html

RUN rm /etc/nginx/conf.d/default.conf
COPY nginx/nginx.conf /etc/nginx/templates/

# EXPOSE 80
CMD ["/bin/sh" , "-c" , "envsubst '${API_HOST} ${API_PORT} ${VIRTUAL_HOST}' < /etc/nginx/templates/nginx.conf > /etc/nginx/conf.d/nginx.conf && exec nginx -g 'daemon off;'"]

Modifying the docker-compose.yml

To do this, we created a copy of docker-compose.yml (docker-compose-testStage.yml). We changed the images and the LETSENCRYPT_HOST & VIRTUAL_HOST for the “backend” and “frontend” service in this file.

Modifying the Terraform files

In the testStage.sh, we changed the record_id and “docker-compose -f ./cloud-webshop/docker-compose.yml pull & sudo docker-compose -f ./cloud-webshop/docker-compose.yml up -d” to “sudo docker-compose -f ./cloud-webshop/docker-compose-testStage.yml pull sudo docker-compose -f ./cloud-webshop/docker-compose-testStage.yml up -d
In the main.tf, “user_date = file(“docker/test_Stage.sh”)” is set.
After that, the EC2 instance, the security group, and SSH can be started as usual using Terraform.

Automated Selenium frontend testing with GitHub 

To do this, create the .github/workflows/selenium.yml file with the following content.
The script is executed on every push to the repository. It installs all necessary packages, creates a screenshot folder, and runs the pre-programmed Selenium tests located in the frontend folder.
After a push or manual execution, the test results with the artifacts (screenshots) are located on the Actions tab.

name: selenium tests
on: push
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Build the stack
        run: docker-compose up -d
      - name: npm install
        run: cd frontend && npm install
      - name: install jest
        run: cd frontend && npm install jest
      - name: install selenium-webdriver
        run: cd frontend && npm install selenium-webdriver
      - name: run tests
        run: mkdir -p /tmp/screenshots/ && cd frontend && npm test
      - name: Archive screenshots
        uses: actions/upload-artifact@v2
        with:
          name: selenium-screenshots
          path: /tmp/screenshots/
      - name: Shutdown
        run: docker-compose down

Note that Chromedriver must be run headless, as GitHub cannot run a browser on a screen.

var driver = await new Builder()
        .forBrowser('chrome')
        .setChromeOptions(new chrome.Options().headless())
        .build();

Infrastructure as a Code

Cloud computing is the on-demand provision of IT resources (e.g., servers, storage, databases) via the Internet. Cloud computing resources can be scaled up or down depending on business requirements. You only pay for the IT resources you use. 
On July 27, 2021, Gartner published the latest “Magic Quadrant” for Cloud Infrastructure and Platform Services. Like last year, Amazon Web Service is the top performer in the Magic Quadrant. Followed by Microsoft and Google. (
https://www.gartner.com/doc/reprints?id=1-271OE4VR&ct=210802&st=sb). Since we were interested in trying Docker Compose, we decided to use AWS for deployment.

Deployment on Amazon ECS with Docker Compose 

Since early 2020, AWS and Docker have started working on an open Docker Compose specification, which will make it possible to use the Docker Compose format to deploy containers on Amazon ECS and AWS Fargate. In July 2020, the first beta version for Docker Desktop was released; the first stable version has been available since September 15, 2020.

Customize docker-compose.yml

The AWS ECS CLI supports Compose versions 1, 2, and 3. By default, it looks for docker-compose.yml in the current directory. Optionally, you can specify a different filename or path to a Compose file with the –file option. The Amazon ECS CLI only supports a few parameters, so correcting the yml may be necessary (https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cmd-ecs-cli-compose-parameters.html).

# (Optional) Create a new Docker context to point the Docker CLI to the correct endpoint. For this step you need the AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY.
docker context create ecs myecscontext

# (Optional) Use context
docker context use myecscontext

# Deploy application to AWS
docker compose up 

# Here you can see which containers were started as well as the URLs
docker compose ps

# (Optional) Shut down container. (Don't forget to change the context back to default).
docker compose down

# Convert Docker Compose file to CloudFormation to track which resources are created or updated
docker compose convert

BuildX

Building images for other processors

For example, if you have an M1 with an arm64 processor, a locally created image would not be accepted by AWS (error message “EssentialContainerExited: Essential container in task exited”). The reason is that ECS instances only support amd64 images.

Since Docker version >= 19.03, Docker offers buildX. The plugin is officially no longer considered experimental as of August 5, 2020. With the buildX functionality, it is relatively easy to create Docker images that work on multiple CPU architectures.

# (optional) Create a new Builder instance
docker buildx create --name mybuilder

# (optional) Use created builder
docker buildx use mybuilder    

# Show all available builder instances (here you can also see which CPU architectures are supported by the builder)
docker buildx ls

# Build and push image for example for amd64, arm64 and arm/v7
docker buildx build --platform linux/amd64,linux/arm64,linux/arm/v7 --tag username/repository_name:tag_name --push .

# Delete images
docker buildx prune --all

How to Scale Jitsi Meet

Person sitting on their bed, having a video conference on their laptop

In today’s world, video conferencing is getting more and more important – be it for learning, business events or social interaction in general. Most people use one of the big players like Zoom or Microsoft Teams, which both have their share of privacy issues. However, there is an alternative approach: self-hosting open-source software like Jitsi Meet. In this article, we are going to explore the different scaling options for deploying anything from a single Jitsi server to a sharded Kubernetes cluster.

Continue reading

Peer2Peer Multiplayer Real-time Strategy Game “Admiral: WW2”

Admiral: WW2

1 Intro

Gaming is fun. Strategy games are fun. Multiplayer is fun. That’s the idea behind this project.

In the past I developed some games with the Unity engine – mainly 2D strategy games – and so I thought it is now time for an awesome 3D multiplayer game; or more like a prototype.

The focus of this blog post is not on the game development though, but rather on the multiplayer part with the help of Cloud Computing.

However where to start? There are many ways one can implement a multiplayer game. I chose a quite simple, yet most of the time very effective and cheap approach: Peer-to-Peer (P2P).

But first, let us dive in the gameplay of Admiral: WW2 (working title).

2 Game Demo

2.1 Gameplay

Admiral: WW2 is basically like the classic board game “Battleships”. You’ve got a fleet and the enemy player has got a fleet. Destroy the enemy’s fleet before your own fleet is sunk. The big difference is that Admiral: WW2 is a real-time strategy game. So the gameplay is more like a real-life simulation where you as the admiral can command your ships via direct orders:

  • Set speed of a ship (stop, slow ahead, full ahead, …)
  • Set course of a ship
  • Set the target of the ship (select a ship in the enemy fleet)

Currently there is only one ship class (the German cruiser Admiral Hipper), so the tactical options are limited. Other classes like battleships, destroyers or even aircraft carriers would greatly improve replayability; on the other hand they would need many other game mechanics to be implemented first.

Ships have multiple damage zones:

  • Hull (decreases the ship’s hitpoints or triggers a water ingress [water level of the ship increases and reduces the hitpoints based on the amount of water in the hull])
  • Turrets (disables the gun turrets)
  • Rudder (rudder cannot change direction anymore)
  • Engine/Propeller (ship cannot accelerate anymore)

If a ship loses all hitpoints the ship will sink and is not controllable.

2.2 The Lobby Menu

Before entering the gameplay action the player needs to connect to another player to play against. This is done via the lobby menu.

Here is the place where games are hosted and all available matches are listed.

On the right hand side is the host panel. To create a game the host must enter a unique name and a port. If the IP & Port combination of the host already exists, hosting is blocked.

After entering valid infos the public IP of the host is obtained via an external service (e.g. icanhazip.com). Then the match is registered on a server and the host waits for incoming connections from other players.

On the left hand side there is the join panel. The player must enter a port before viewing the match list. After clicking “Join”, a Peer-to-Peer connection to the host is established. Currently the game only supports two players, so after both peers (host and player) are connected the game will launch.

More on the connection process later.

3 Multiplayer Communication with Peer2Peer

3.1 Peer-to-Peer

P2P allows a direct connection between the peers with UDP packets – in this case the game host and player.

So in between no dedicated server handling all the game traffic data is needed, thus reducing hosting costs immensely.

Because most peers are behind a NAT and therefore connection requests between peers are blocked, one can make use of the NAT-Traversal method Hole-Punching.

3.1.1 P2P Connection with Hole-Punching

Given peer A and peer B. A direct connection between A and B is possible if:

  • A knows the public IP of B
  • A knows the UDP port B will open
  • B knows the public IP of A
  • B knows the UDP port A will open
  • A and B initiate the connection simultaneously

This works without port-forwarding, because each peer keeps the port open as if they would contact a simple web server and wait for the response.

To exchange the public IPs and ports of each peer a Rendezvous-Server behind no NAT is required.

3.1.2 Rendezvous-Server

The Rendezvous-Server needs to be hosted in the public web, so behind no NAT. Both peers now can send simple web requests as if the users would browse the internet.

If peer A tells the server he wants to host a game, the server saves the public IP and port of A.

If B now decides to join A’s game the server informs B of the IP and port of A.

A is informed of B’s public IP and port as well.

After this process A and B can now hole-punch through their NATs and establish a P2P connection to each other.

A Rendezvous-Server can be very cheap, because the workload is quite small.

But there are some cases where Hole-Punching does not succeed (“…we find that about 82% of the NATs tested support hole punching for UDP…”, https://bford.info/pub/net/p2pnat/).

In those cases a Relay-Server is needed.

3.1.3 Relay-Server

The Relay-Server is only used as a backup in case P2P fails. It has to be hosted in the public internet, so behind no NAT.

Its only task is the transfer of all game data from one origin peer to all other peers. So the game data just takes a little detour to the Relay-Server before continuing it’s usual way to the peers.

This comes at a price though. Since all of the game traffic is now travelling through this server the workload can be quite tough depending on the amount of information the game needs to exchange. Naturally the ping or RTT (Round Trip Time: the time it takes to send a packet from peer to peer) of a packet is increased resulting in lags. And finally multiple Relay-Servers would be required in each region (Europe, America, Asia, …). Otherwise players far away from the Relay-Server suffer heavy lags. All of these lead to high demands on the server hardware. To be clear: a proper Relay-Server architecture can be expensive in time and money.

Because of that in this project I ignored the worst-case and focused on the default Peer-to-Peer mechanism.

3.1.4 Peer2Peer Conclusion

The big advantage of this method: it’s mainly serverless, so the operation costs of the multiplayer is very low. Because of that, P2P is a very viable multiplayer solution for small projects and indie games. The only thing that is needed is a cheap Rendezvous-Server (of course only if no Relay-Server is used). P2P also does not require to port-forward, which can be a difficult and/or time consuming task depending on the player’s knowledge.

But there are disadvantages:

  • A home network bandwidth may not be enough to host larger games with much traffic; a server hosted at a server farm has much more bandwidth
  • The game stops if a P2P host leaves the game
  • No server authority
    • every player has a slightly different game state that needs to be synchronized often; a dedicated server has only one state and distributes it to the players; players only send inputs to the server
    • anti-cheat has to be performed by every peer and not just the server
    • random is handled better if only the server generates random values, otherwise seeds have to be used
    • game states may need to be interpolated between peers, which is not the case if only the server owns the game state

A dedicated server would solve these disadvantages but in return the hardware requirements are much higher making this approach more expensive. Also multiple servers would be needed in all regions of the world to reduce ping/RTT.

3.2 Game Connection Process

After starting the game the player sees the multiplayer games lobby. As described previously the player can host or join a game from the list.

3.2.1 Hosting a game

The host needs to input a unique game name and the port he will open for the connection. When the host button is clicked the following procedure is triggered:

  1. Obtain public IP address
    • Originally this should be handled by the Rendezvous-Server, because it is hosted behind no NAT and can see the public IP of requests, but limitations of the chosen hosting service prevented this approach (more on that later)
    • Instead I used a web request to free services like icanhazip.com or bot.whatismyipaddress.com as a second backup in case the first service is down; these websites respond with a plain text containing the ipv6 or ipv4 of client/request
  2. The Rendezvous-Server is notified of the new multiplayer game entry and saves the game with public IP and port, both sent to the server by the host
    • Host sends GET-Request to the server (web server) containing all the information needed /registermpgame?name=GameOne&hostIP=1.1.1.1&hostPort=4141
    • On success the game is registered and a token is returned to the host; the token is needed for further actions affecting the created multiplayer game
  3. The host now waits for incoming connections from other players/peers
    • The host sends another GET-Request to the Rendezvous-Server /listenforjoin?token=XYZ123
      • This is a long-polling request (websocket alternative): the connection is held open by the server until a player joined the multiplayer game
      • If that is the case the GET-Request is resolved with the public IP and port of the joined player, so that hole-punching is possible
      • If no player joins until the timeout is reached (I’ve set the timeout to 15 seconds), the request is resolved with http status code 204 No content and no body
      • In that case the GET-Request has to be sent again and again until a player joins
  4. On player join both peers init a connection and punch through NAT
  5. If successful the game starts
  6. (Otherwise a Relay-Server is needed; explained previously)
  7. The host closes the game with another GET /startorremovempgame?token=XYZ123

3.2.2 Joining a game

The player first needs to input a valid port. After that he is presented with a list of multiplayer games by retrieving the information from the Rendezvous-Server with a GET-Request to the endpoint /mpgameslist. This returns a JSON list with game data objects containing the following infos:

  • name: multiplayer game name
  • hostIP: public IP of the host
  • hostPort: port the host will open for the connection

If the player clicks “Join” on a specific game list item the following process handles the connection with the host:

  1. Obtain public IP address
    • Originally this should be handled by the Rendezvous-Server, because it is hosted behind no NAT and can see the public IP of requests, but limitations of the chosen hosting service prevented this approach (more on that later)
    • Instead I used a web request to free services like icanhazip.com or bot.whatismyipaddress.com as a second backup in case the first service is down; these websites respond with a plain text containing the ipv6 or ipv4 of the client/request
  2. Inform the Rendezvous-Server of the join
    • Send a GET-Request with all the information needed /joinmpgame?name=GameOne&ownIP=2.2.2.2&hostPort=2222
    • Now the host is informed by the server if the host was listening
    • The server resolves the request with the public IP and port of the host
    • Now the player and the host try to establish a P2P connection with hole-punching
    • If successful the game starts
    • (Otherwise a Relay-Server is needed; explained previously)

3.3 Game Synchronization

Real-time synchronization of game states is a big challenge. Unlike turn-based games the game does not wait until all infos are received from the other players. The game always goes on with a desirably minimal amount of lag.

Of course the whole game state could be serialized and sent to all players, but this would have to happen very frequently and the package size would be very large. Thus resulting in very high bandwidth demand.

Another approach is to only send user inputs/orders, which yields far less network traffic. I used this lightweight idea, so when the player issues an order the order is immediately transmitted to the other player. There the order is executed as well.

The following game events are synchronized:

  • GameStart: After the game scene is loaded the game is paused and the peer sends this message to the other player periodically until he receives the same message from the other peer; then the game is started
  • RandomSeed: Per game a “random seed master” (the host) periodically generates a random seed and distributes that seed to the other player; this seed is then used for all random calculations
  • All 3 ship orders:
    • ShipCourse
    • ShipSpeed
    • ShipTarget
  • GameSync: All of the previous messages still led to diverging game states, so a complete game serialization and synchronization is scheduled to happen every 30 seconds
    • Projectile positions, rotations, velocities are synched
    • The whole ship state is synched
    • Both game states (the received one and the own one) are interpolated, because I don’t use an authoritative server model and so both game states are “valid”

The following game events should have a positive impact on game sync, but are not implemented yet:

  • ProjectileFire: Syncs projectiles being fired
  • Waves: Because the waves have a small impact on the position where projectiles are fired and hit the ship the waves should be in-sync as well

3.3.1 IDs

In game development you mostly work with references. So for example a ship has a reference to another ship as the firing target. In code this has the benefit of easy access to the target ship’s properties, fields and methods.

The problem is with networking these references do not work. Every machine has different references although it may represent the same ship. So if we want to transfer the order “Ship1 course 180” we cannot use the local reference value to Ship1.

Ship1 needs an unique ID that is exactly the same on all machines. Now we can send “ShipWithID1234 course 180” and every machine knows which ship to address.

In code this is a bit more tedious, because the received ID has to be resolved to the appropriate ship reference.

The most difficult part is finding unique IDs for all gameobjects.

Ships can obtain an ID easily on game start by the host. Projectiles are a bit more tricky, because they are spawned later on. I solved this by counting the shots fired by a gun turret and combining the gun turret’s ID with the shot number to generate a guaranteed unique ID, provided the gun turret ID is unique. Gun turret IDs are combined as well: Ship ID + gun turret location (sternA, sternB, bowA, bowB, …).

Of course with an authoritative server this gets easier as only the server generates IDs and distributes them to all clients.

3.3.2 Lockstep

Additionally there is an interesting and promising approach to discretize the continuous game time called Lockstep. It is used in prominent real-time strategy games like Age of Empires (https://www.gamasutra.com/view/feature/131503/1500_archers_on_a_288_network_.php). The basic idea is to split up the time in small time chunks, for example 200ms intervals. In this time frame every player can do exactly one action that gets transferred to all the other players. Of course this action can also be “no action”. The action is then executed in the next interval almost simultaneously for all players. This way the real-time game is transformed into a turn-based game. It is important to adjust the interval based on the connection speeds between the players, so that no player lags behind. For the players the small order input delay is usually unnoticed, if the interval is small enough.

An important requirement is that the game is deterministic and orders issued by players have the same outcome on all machines. Sure there are ways to handle random game actions, but because AdmiralWW:2 uses random for many important calculations and my development time frame was limited I unfortunately did not implement this technique.

4 Rendezvous-Server Hosting

There are almost unlimited hosting options on the internet. Usually the selection shrinks after a specific programming language is picked. But because I used NodeJS with Typescript, which transpiles the code to default Javascript, there were still plenty of hosting options. If I decided to write the server in C# and therefore run a .NET Core application like the game is written with (Unity uses C# or some exotic programmers use Javascript) many hosting providers drop out.

4.1 Alternatives

Of course there is the option of renting an own dedicated server: very expensive for a simple Rendezvous-Server and maintenance heavy, but powerful and flexible (.NET ok).

There’s the option of a managed server: little maintenance but very, very expensive.

We have VPS (Virtual Private Servers): dedicated servers that are used by many customers and the hardware is distributed among them, cheaper.

Then there are the big players like AWS, Google Cloud Platform, IBM Cloud and Microsoft Azure: they can get very expensive, but in return they offer vast opportunities and flexibility; it is easy to scale and monitor your whole infrastructure and a load-balancer can increase availability and efficiency of your server(s); on the other hand the learning-curve is steeper and setting up a project needs more time.

4.2 Heroku

Heroku is a cloud based Platform-as-a-service (PaaS) offering hosting of many common programming languages like Javascript/NodeJS (which I used), Python and Ruby. It does not offer as many possibilities as AWS and co, but it is way simpler to learn and set up.

Also it does have a completely free plan, which grants over 500 hours uptime per month. This is not enough to run the whole month with 30 * 24 = 720 hours, but the application sleeps after 1 hour with no actions and automatically wakes up again if needed. This is perfectly fine for a Rendezvous-Server, because it is not used all the time. The wake up time is not that bad as well (around 4-8 seconds).

Of course Heroku offers scaling so that the performance is massively increased and the app will never sleep, but this comes with a price tag.

In a paid plan Heroku also has a solid monitoring page with events, up- and downtimes, traffic and so on.

Server logs are easily accessible as well.

For setup you just need to create a “Procfile” in your project folder that defines what to execute after the build is completed: web: npm run start will run the npm script called start as a web service. The application is then publicly reachable on your-app-name.herokuapp.com. The NodeJS web server can then listen on the port that is provided by Heroku in the environment variable process.env.PORT.

Deployment is automated: just push to your github master branch (or the branch you specified in Heroku); after that a github webhook triggers the build of your app in Heroku.

But during development I discovered a big disadvantage: Heroku does not support ipv6.

This is a problem, because I wanted to use the Rendezvous-Server as a STUN-Server as well, which can determine and save the public IPs of client requests. But if a client like me only has Dual-Stack lite (unique ipv6 but the ipv4 address is shared among multiple customers) Peer2Peer is not possible with the shared ipv4.

As a workaround the clients obtain their public ipv4 or ipv6 via GET-Request from icanhazip.com or as a backup from bot.whatismyipaddress.com. These websites return a plain text body containing the public IP. After that the peers send their public IP to the Rendezvous-Server as explained previously.

5 Architecture Overview

Typescript usually is a very good choice for larger projects, simply because of the type-safety and development-time error checking. This guarantees no more searching for bugs like typos as it is the case in plain Javascript.

To realize the web server I used the very popular ExpressJS, which does not need any introduction and should be well-known by this time.

6 Conclusion

Real-time multiplayer games are tricky. The game states quickly diverge and much effort has to be done to counteract this. Game time differences and lag drastically compound this. But methods such as Lockstep can help to synchronize the time across multiple players.

While developing, try to keep the game as deterministic as possible, so that player actions yield the same result on every machine. Random is usually problematic, but can be handled via a dedicated game server or seeds.

Peer-to-Peer is a simple and great solution for smaller indie multiplayer games, but comes with some disadvantages. For larger projects dedicated/authoritative servers are favourable.

Heroku offers a fast and simple setup for hosting cloud applications and the free plan is great for smaller projects. If demand increases scaling is no problem and the deployment is automated. But be aware of the missing ipv6 support of Heroku.

All in all: Gaming is fun. Strategy games are fun. Multiplayer is fun – for the player and an exciting challenge for developers.

Finding the right strategy for your way to the cloud

– Abstract –

This article is about finding the right strategy if you want to empower your business with cloud computing. It’s written for those of you who are still new to this topic and want to get a first sight through the dense cloud mystery. After reading you should have gained an orientation on how to start your first cloud project and a basic knowledge of the most important buzzwords of the industry.

“The essence of strategy is choosing what not to do”
– Prof. Dr. Michael E. Porter – Harvard Business School
Cofounder of the strategic management


It’s stormy outside

It’s been over a decade now that Amazon was stirring up the market with their web services and cloud computing is now more mainstream than ever. What for many people started with file-sharing- and backup-services like Dropbox or company internal NAS-systems, quickly transformed into a trending global topic. Why should you always upgrade local hardware and storage to stay competitive if you can have the same or even greater performance with shared resources, reduced costs and accessibility from everywhere?

Some understood that earlier than others and can already harvest the fruits from their work – and some, hopefully not you – should seriously think about protecting their house because the storm is not over yet! With the introduction of cloud gaming services like GeForce Now and the start of projects from the major players in the market, it starts to be clear that cloud computing is not just a hype, quite the reverse, anything seems to be possible and any upcoming business transformations should consider the opportunities of cloud computing for their needs. And who knows, maybe one day you can even sell your computing power as a part of the worldwide cloud like it’s been introduced by crypto-currency miners or solar tech companies?

Faster, Greater, Higher! The sky’s the limit!

So as enough time has passed that the first CTOs and CIOs can look back to their early experiences with moving to cloud services, we can wrap up some key facts if you’re still looking for a few good arguments which may convince you and your team for moving to the cloud.

Scalability with minor effort

  • Overwhelming traffic is not a show stopper anymore
  • Auto scaling infrastructure is auto money for your pocket!
  • Reduce the multi region infrastructure challenge

Security

  • Cloud storage takes less risks than lost devices!
  • Stop to worry about underlying infrastructure security & maintenance
  • The disaster can come! Well proven Disaster Recovery Systems of great
    cloud providers are probably better than yours!

Speed & Performance

  • Increase your server-side performance on-demand as you go
  • Handle request peaks without costly overprovisioning your hardware

Reduced Costs = Greater Profit

  • Pay-as-you-go is better than buy-and-deprecate!
  • Financial advantages through transformation from CAPEX to OPEX
  • Makes your financial model a lot easier to plan and calculate
  • High chance to reduce your Time to Market 
  • Emerging markets can perhaps make you even fly higher

Agility & User Experience

  • Stop your team from leaving their key competences
  • Increase your team performance with the latest collaboration tools
  • Satisfy your team by giving them a state-of-the-art feeling
  • Give more freedom for physical appearance

Ecological Footprint

  • Less eWaste for deprecated hardware
  • Savings on power supply and emissions

And if you’re doing it right, you earn a lot of FLEXIBILITY which is nowadays more important than ever! Our technological environment changes so fast that you may be outdated before you finish your project – ok, probably not that fast – but you should be prepared! If your cloud provider can’t handle your SLA or gets outdated by himself, you may move on to a new provider but be aware: cloud providers want to keep you as a customer, for sure! How that may end up badly for you we’ll discuss in the next section.

Pitfalls you better leave for others

So if you’re convinced now by the advantages of the cloud, we reveal you some pain points where others have been stumbling, so keep them in mind and don’t make these mistakes again!

Readiness of Organization
One of the greatest mistakes you can do while dreaming of your cloud project is to underestimate the know-how that is necessary to do the job right. Many teams stumbled over the fact that their ideas and plans were great but they forgot about the people who finally have to do the job. Your change management team also has to be aware that going to the cloud can also have a negative impact on other teams, so other teams in your company may start fighting against that change. Imagine for example your customers have now direct access to your order- and payment system, will your sales team like that? Your change can also impact your hierarchical structure and long cherished employees lose their major value for the team. So it’s probably likely that not everybody in your organization will like your project, be aware of that!

Readiness of Application
It doesn’t matter if you’re developing a new or transforming an existing application, you have to keep the drawbacks in mind that can come along with cloud services. Too many technical leaders made the mistake of thinking a simple lift-and-shift strategy will work out and they had to learn by hard that some refactoring is always necessary. This can start by underestimating the threshold of your cloud provider, continuing with disregarding upcoming latency that destroys your user experience or dependency libraries that are restricted to be used in the cloud (rare, but reported). So you better make a new concept first and think from the very beginning which components can be reused in which way. Think about your build steps, your deployment, your test environment, your architecture, current infrastructure, previous concurrency issues, clusters you want to use – convince yourself first and not later!

Data Privacy & Compliance
Especially with the release of the new data privacy regulations from 2018, many companies got messed up and reported big question marks on how to continue with their services. So you can be lucky to start your cloud project after this disrupting wave but you shouldn’t think that the current data privacy agreements will last forever, this was rather just the beginning of regulations that were missed since the beginning of our digital lifestyle. Data privacy for your customers and data protection imposed by your company compliance will harden your way to the cloud – that’s for sure!

Remember your Business Idea!
It doesn’t matter if you start a new project or lift some existing functionality to the cloud, if it creates advantages, any motivated employee will create further and further ideas how to improve and scale the impact of your cloud movement. So if you impress your environment, you can likely be sure that it doesn’t take a long time until the first ideas from other teams or team members land on your desk! So keep your primary business goals in mind and finish them first before you start working on other issues. If you’re a technical leader or you’re the one who pushes the cloud movement forward, take the advice to reflect yourself frequently and check if the tasks that you’ve created also have a value to your business. Too many become starry-eyed for cloud transformations and lift them to a holy-grail, which it isn’t for sure! So think twice if the current steps are really necessary and profitable for you and your team!

Vendor Lock-In
It’s too sad to be true but in reality it’s a fact. We have two strongly diverging mindsets in our industry. There are the ones who believe everything should be free and open sourced and the others try to catch you in their cage and won’t ever let you go. So you got to be aware of the fact that it’s common under cloud providers that they will try to trap you with special offers, extra support, personal training sessions and individual implementation opportunities just to keep you as a customer as long as possible. So if you start to implement services based on proprietary interfaces it will be harder for you to relocate your codebase one day. Or if you build up a special knowledge customized to their services, you may wake up one day and realize that this knowledge is now worthless. So in other words, if their ship goes down, you’ll go down too! So be aware to avoid a vendor lock-in as much as you can!

Cloud Provider Evaluation
Especially those who are new to this topic may stumble across a thing called SLA (Service Level Agreement) which defines the contract you sign with your cloud provider. If you don’t know exactly what it contains and where the borders of your cooperation are, you may run into trouble sooner or later. So it’s a good advice to get somebody for your team who was able to collect some experience in the cloud environment before and protects you from making the same mistakes that others did. It’s not about missing some contract details, it’s more about finding the right values and parameters that fit to your application and finally to your business case. As long as your project works fine you’ll probably never get in touch with your SLA that much, but once you got your customers or your boss in your neck, if there is an unexpected downtime, drastically rising costs due to a miss-configuration or a security lack, or even a disaster with a never ending recovery time, you will understand why knowing your SLA details is important. Take it as a good advice, especially at the beginning of your cooperation, that measurement and control is better than relying on trust. So the job is to monitor and measure as much as you can until you get a good feeling of what’s going on and how much responsibility you can take for it. Expect the unexpected!

Your guide through the jungle

So as far as you are familiar with the greatest advantages and biggest show stoppers now, we can start to create a cloud strategy that fits to your needs. Get your weapons locked and loaded and follow me through the jungle!

What do you want to create?

Remember your business goals and remember how others have lost the trail and you don’t want that the same happens to you, right? So define your project and especially your project scope precisely. What do you want to create? What is what you want to offer to your customer? Who is your customer and what do they really need? Think about your use-cases and match them accordingly. To make it a little bit more confusing, maybe you’ve heard about XaaS or *aaS which is one of the most blown up cloud buzzwords ever! What had once started with Software as a Service (SaaS), which basically means software delivered by cloud services, later transformed into a term that is almost used for any kind of service. Firewall as a Service, Payment as a Service, Analytics as a Service and even non- technological ones. But the idea behind the services-oriented model is great! So if you want to market your later product right, think in terms of services! Remember how you changed your business model from CAPEX to OPEX and your future customers want to have that too! They don’t want to buy a product, they have a goal and they are looking for the right service to reach their target as fast and as best as they can. And that’s a fact for any kind of customer. So you are serving them your best to let them reach their best!
They other way around you have to think of the same terms for your goals. But this is a matter of your skills, compliance and amount of responsibility you want to share. Netflix for example thought they could fly high and even higher, so they built up their own infrastructure and data centers until they had to realize that they can never get this job done any better than Amazon does. From that point on they focus on what they really want to do: delivering awesome series! So think about the services you want to deliver, what kind of services would be a help for you, short term and long term! Imagine your project growth stages from time to time, where is your starting point and where do you finally want to get and find a match with a cloud provider that has all the abilities you need for your scale.

If you’re willing to create services that are mainly for corporate users, think of a private cloud with outsourced hosting facilities which is in fact technically nothing else than a restricted public cloud for your private usage. If your compliance or your business case doesn’t force you to keep your data in-house, guard it with a virtual private network access (VPN) and you are ready to go. If you once need to open some services for public usage you can still transform it to a hybrid cloud by removing the according restrictions or routing it through a public cloud.

Checklist

To not leave you unarmed, we’ve created a checklist that you may consider at the start of your cloud project. Critics are always welcome, if you’ve noticed some missing points or made different experiences throughout the years, let us know and write it in the comments down below!

  1. What’s my business goal? Which requirements does it take?
  2. How can I accelerate my time to market?
    Is the market ready for my concept?
  3. How ready is my application or existing architecture?
  4. Is there a need for refactoring? Does it also consider potential cloud
  5. drawbacks like latency, downtime, noisy neighbours, migration, licenses and so on?
  6. How ready is my organization and team? Who can do the job?
  7. Will it change my business model? Does it disrupt my team? 
  8. Which restrictions do exist? Compliance? Security?
    Data privacy policies?
  9. Where do I need help in terms of services? Long term? Short term?
  10. Where do I want to grow? How much responsibility can I share?
  11. Does the SLA match my requirements? How about Disaster Recovery?
  12. Do the project requirements match my budget? Can it be done at the given time?

And as a final word: Don’t over engineer and don’t over manage it all! Keep it simple and straightforward and learn from your mistakes. Never lay back, be proactive, expect the unexpected and create your plan b – than you’re prepared for whatever it takes 😀

References

Author: Cloud Technology Partners, a Hewlett Packard Enterprise Company
Publication Date: July 21, 2016
Title: Five things every CEO should know before going to the cloud
Retrieved August 08, 2020 from:
https://www.cloudtp.com/doppler/5-things-every-ceo-know-going-cloud/

Author: Jeremy Cook, Cloud Academy
Publication Date: September 2019, 2019
Title: Cloud Migration Risks & Benefits
Retrieved August 09, 2020 from:
https://cloudacademy.com/blog/cloud-migration-benefits-risks/

Author: Sharad Acharya, Ace Cloud Hosting
Publication Date: April 05, 2019
Title: Things to look out while choosing a cloud service provider
Retrieved August 09, 2020 from:
https://www.acecloudhosting.com/blog/choosing-cloud-service-provider/

Author: Eze Castle Integration
Publication Date: January 21, 2020
Title: 6 Common cloud mistakes and how to avoid them
Retrieved August 09, 2020 from:
https://www.eci.com/blog/15706-6-common-cloud-mistakes-and-how-to-avoid-them.html

Author: Stefan Luber, Cloud-Computing Insider
Publication Date: August 16, 2017
Title: Was ist eine private Cloud?
Retrieved August 12, 2020 from:
https://www.cloudcomputing-insider.de/was-ist-eine-private-cloud-a-631415/

Cloudbased Image Transformation

Introduction

As part of the lecture „Software Development for Cloud Computing“, we had to come up with an idea for a cloud related project we’d like to work on. I had just heard about Artistic Style Transfer using Deep Neural Networks in our „Artificial Intelligence“ lecture, which inspired me to choose image transformation as my project. However, having no idea about the cloud environment at that time, I didn’t know where to start and what is possible. A few lectures in I had heard about Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Function as a Service (FaaS). Out of those three I liked the idea of FaaS the most. Simply upload your code and it works. Hence, I went with Cloud Functions in IBMs Cloud Environment. Before I present my project I’d like to explain what Cloud Functions are and how they work.

What are Cloud Functions?

Choose one of the supported programming languages. Write your code. Upload it. And it works. Serverless computing. That’s the theory behind Cloud Functions. You don’t need to bother with Infrastructure. You don’t need to bother with Load Balancers. You don’t need to bother with Kubernetes. And you definitely do not have to wake up at 3 am and race to work because your servers are on fire. All you do is write the code. Your Cloud Provider manages the rest. Cloud provider of my choice was IBM.

Why IBM Cloud Functions?

Unlike Google and Amazon, IBM offers FREE student accounts. No need to deposit any kind of payment option upon creation of your free student account either. Since I have no experience using any cloud environment, I didn’t want to risk accidentally accumulating a big bill. Our instructor was also very familiar with the IBM Cloud, in case I needed support I could have always asked him as well.

What do IBM Cloud Functions offer?

IBM offers a Command Line Interface (CLI), a nice User Interface on their cloud website, accessible using the web browser of your choice and very detailed Documentation. You can check, and if you feel like it, write or edit your code using the UI as well. The only requirement for your function is: It has to take a json object as input and it has to return a json as well. You can directly test the Function inside the UI as well. Simply change the Input, declare an example json object you want to run it with, then invoke your function. Whether the call failed or succeeded, the activation ID, the response time, results, and logs, if enabled, are then displayed directly. You can add default input Parameters or change your functions memory limit, as well as the timeout on the fly as well. Each instance of your function will then use the updated values.

Another nice feature of IBM Cloud Functions are Triggers. You can connect your function with different services and, once they trigger your function, it will be executed. Whether someone pushed new code to your GitHub repository or someone updated your Cloudant Database, IBMs database service. Once invoked by this trigger, your function executes.

You can also create a chain of Cloud Functions. The output of function 1 will then be the input of function 2.

IBM Cloud Function use the Apache OpenWhisk service, which packs your code into a Docker Container in order to run it. However, if you have more than one source file, or dependencies you need, you can pack it in a docker image or, in some cases, like Python or Ruby, you can also zip them. In order to do that in Python, you need a virtual environment using virtualenv, then zip the virtualenv folder together with your python files. The resulting zip files and Docker images can only be uploaded using the CLI.

You can also enable your function as Web Action, which allows it to handle HTTP Events. Since the link automatically provided by enabling a function as web action ends in .json, you might want to create an API Definition. This can be done with just a few clicks. You can even import an OpenAPI Definition in yaml or json format. Binding an API to a function is as simple as defining a base path for your API, giving it a name and creating an operation. For example: API name: Test, Base path for API: /hello and for the operation we define the path /world select our action and set response content type to application/json. Now, whenever we call <domain>/hello/world, we call our Cloud Function using our REST-API. Using the built-in API-Explorer we can test it directly. If someone volunteers to test the API for us, we can also share the API Portal Link with them. Adding a custom domain is also easily done, by dropping the domain name, the certificate manager service and then Certificate in the custom domain settings.

Finally, my Project

Architecture of the Image Transformation Service

The idea was:

A user interacts with my GitHub Page, selects a filter, adds an Image, tunes some parameters, then clicks confirm. The result: They receive the transformed image.

The GitHub Page has been written with HTML, CSS and JavaScript. It sends a POST request to the API I defined, which is bound to my Cloud Function, written in Python. It receives information about the chosen filter, the set parameters and a link to the image (for the moment, only jpeg and png are allowed). It then processes the image and returns the created png byte64 encoded. The byte64 encoded data will then be embedded in the html site and the user can then save the image.

The function currently has three options:

You can transform an image into a greyscale representation.

Left: Original Image by Johannes Plenio, Right: black and white version

You can upscale an image by a factor of two, three or four

Left: Original Image by Johannes Plenio, Right: Upscaled by a factor of 2

and you can transform an image into a Cartoon representation.

Left: Original Image by Helena Lopes, Right: Cartoon version

Cartoon images are characterized by clear edges and homogenous colors The Cartoon Filter first creates a grayscale image and median blurs it, then detects the edges using adaptive Threshold, which currently still has a predefined window size and threshold. It then median filters the colored image and does a bitwise and operation between every RGBA color channel of our median filtered color image and the found edges.

Dis-/ Advantage using (IBM) Cloud Functions

Serverless Infrastructure was fun to work with. No need to manually set up a server, secure it, etc. Everything is done for you, all you need is your code, which scales over 10.000+ parallel instances without issues. Function calls themselves don’t cost that much either. IBMs base rate is currently $0,000017 per second of execution, per GB of memory allocated. 10.000.000 Executions per month with 512MB action memory and average execution time of 1.000ms only cost $78,20 per month, including the 400,000 GB-s free tier. Another good feature was being able to upload zip packages and docker images.

Although those could only be uploaded using the CLI. As a Windows user it’s a bit of a hassle. But one day I’ll finally set up the 2nd boot image on my desktop pc. One day. Afterwards, no need for my VM anymore.

The current code size limit for IBM Cloud Functions is 48 MB. While this seems plenty, any modules you used to write your code, not included by default in IBMs runtime, needs to be packed with your source code. OpenCV was the module I used before switching over to Pillow and numpy, since OpenCV offers a bilateral filter, which would have been a better option than a median filter on the color image creation of the Cartoon filter. Sadly it is 125 MB large. Still 45 MB packed. Which was, according to the real limit of 36 MB after factoring in the base64 encoding of the binary files, sadly still too much. Neither would the 550 MB VGG16 model I initially wanted to use for an artistic style transfer neural network as possible filter option. I didn’t like the in- and output being limited to jsons either. Initially, before using the GitHub Page, the idea was to have a second Cloud Function return the website. This was sadly not possible. There being only a limited selection of predefined runtimes and modules are also more of a negative point. One could always pack their code with modules in a docker imag/zip, but being able to just upload a requirements.txt and the cloud automatically downloading those modules as an option would have been way more convenient. My current solution returns a base64 encoded image. Currently, if someone tries to upscale a large image and the result exceeds 5 MB, it returns an error, saying „The action produced a response that exceeded the allowed length: –size in bytes– > 5242880 bytes.“

What’s the Issue?

Currently, due to Github Pages not setting Cross Origin Resource Sharing (CORS) Headers, this does not work currently. CORS is a mechanism that allows web applications to request resources from a different origin than its own. A workaround my instructor suggested was creating a simple node.js server, which adds the missing CORS Headers. This resulted in just GET requests being logged in the Cloud API summary, which it responded to with a Code 500 Internal Server Error. After reading up on it, finding out it needs to be set by the server, trying to troubleshoot this for… what felt like ages, adding headers to the ajax jquery call, enabling cross origin on it, trying to workaround by setting the dataType as jsonp. Even uploading Cloud Function and API again. Creating a test function, binding it to the API (Which worked by the way. Both as POST and GET. No CORS errors whatsoever… till I replaced the code). I’m still pretty happy it works with this little workaround now, thank you again for the suggestion!

Other than that, I spent more time than I’m willing to admit trying to find out why I couldn’t upload my previous OpenCV code solution. Rewriting my function as a result was also a rather interesting experience.

Future Improvements?

I could give the user more options for the Cartoon Filter. the adaptive Threshold has a threshold limit, this one could easily be managed by the user. An option to change the window size could also be added, maybe in steps?

I could always add new filters as well. I like the resulting image of edge detection using a Sobel operator. I thought about adding one of those.

Finding a way to host a website/find a provider that adds CORS Header, allowing interested people to try a live-demo and play around with it, would be an option as well.

What i’d really like to see would be the artistic style transfer uploaded. I might be able to create it using IBM Watson, then add it as sequence to my service. I dropped this idea previously because i had no time left to spare trying to get it to work.

Another option would be allowing users to upload files, instead of just providing links. Similar to this, I can also include a storage bucket, linked to my function in which the transformed image is saved. It then returns the link. This would solve the max 5 MB response size issue as well.

Conclusion

Cloud Functions are really versatile, there’s a lot one can do with them. I enjoyed working with them and will definitely make use of them in future projects. The difference in execution time between my CPU and the CPUs in the Cloud Environment was already noticeable for the little code I had. Also being able to just call the function from wherever is pretty neat. I could create a cross-platform application, which saves, deletes and accesses data in an IBM Cloudant database using Cloud Functions.

Having no idea about Cloud Environments in general a semester ago, I can say I learned a lot and it definitely opened an interesting, yet very complex world I would like to learn more about in the future.

And at last, all Code used is provided in my GitHub repository. If you are interested, feel free to drop by and check it out. Instructions on how to set everything up are included.