Microservices – any good?

As software solutions continue to evolve and grow in size and complexity, the effort required to manage, maintain and update them increases. To address this issue, a modular and manageable approach to software development is required. Microservices architecture provides a solution by breaking down applications into smaller, independent services that can be managed and deployed individually.

Commonly used in distributed and large-scale systems, this architectural pattern is favored for its scalability, flexibility and suitability for systems that require rapid change and innovation. Continuous delivery, high scalability, agility and modularity are all shiny buzzwords associated with microservices, but they don’t tell the whole story. While microservices offer a number of benefits, it is important to remember that there are also challenges to this approach.

What are microservices, anyway?

The term “microservice” was introduced in 2005 by Peter Rogers, founder of Resource Oriented Computing. He used “micro-web-services” to describe more flexible and more service-oriented software architecture.

The microservices architecture is an approach to the development of software as a series of small services that can be deployed independently of each other. The basic principle of microservices, the division of software components into modular units, is nothing new, but rather based on the principle of Service Oriented Architecture (SOA) which came into use in the late 1990s. The microservice architecture is commonly considered an evolution of SOA because its services are more differentiated and run independently.

In a monolithic architecture, everything is implemented as a single, tightly coupled unit, with all components in a single code base. In contrast, a microservices architecture decomposes the application into an unlimited number of small, loosely linked services. Each of these services is responsible for one specific aspect of business.

Comparison of monolithic system architecture and Microservice architecture from [4].

Microservices are not just a technical approach. They are also an organizational approach. Conway’s law states that “Organizations which design systems […] are constrained to produce designs which are copies of the communication structures of these organizations.” In terms of that, it makes sense that there is also a need for a change in the organizational structure when implementing microservices. 

Microservices are therefore, as already mentioned, a strong modularization concept. Microservices can communicate with each other via an application programming interface (API) that supports loose coupling. Traditional monolithic structures suffer from a tight coupling between its components, introducing high dependencies between modules. Each of those separate Microservices can be deployed and tested independently. As they communicate using the same protocols it doesn’t matter which technology they use in their implementation. The individual microservices can, for example, be programmed in different languages.

Why do we want a microservices architecture?

In an ideal world microservices help you…

…scale.

Unlike vertical scaling, also known as scaling up, where more resources are added to a single node in the system, there are no limits (from a hardware perspective) to horizontal scaling. Horizontal scaling, also known as scaling out, involves adding more nodes to the system, such as adding more servers to a cluster. An important advantage of horizontal scalability is the ability to increase capacity during operation.

…modularize.

The strong modularization makes the software easily accessible. A microservice is used for a single task and is designed to perform that task in the most effective way possible. A single service is easier to maintain and can be easily replaced. The modularization logic also makes it easier to build in redundancy, services can be duplicated very effortlessly. In addition, the individual components can be easily reused and developed further.

… create loose coupling.

Since the services communicate via an API, they are ideally only loosely coupled. Loosely coupled in this context refers to a system in which the individual microservices are designed to operate independently and do not have a tight dependency on each other. Separating the application into individual services prevents undesired dependencies.

…deploy independently.

An independent deployment allows frequent releases while the rest of the application remains available. This means that they can be modified, tested and put into production independently of each other. The individual microservices can be developed and maintained independently by business-oriented, cross-functional teams. Ideally, the teams should manage their products throughout their entire lifecycle. Following Amazon’s guiding principle „You build it, you run it”.

…be technology independent.

As mentioned above, microservices can be implemented in a technology independent way. Thus, they can be built in a way that suits their task best. Development teams in varying expert areas can use the language that suits their needs (e.g.: AI related parts of the application are implemented in python, C++ is used for critical real-time services).

…decentralize.

Ideally, each microservice has its own database, decentralizing responsibility and allowing updates to be made on an individual basis. In addition, distributing the services to independent databases avoids the problem of a Single Point of Failure (SPoF).

Are the benefits of microservices architecture overstated?

Microservices can help you scale and increase the availability of your system, but if you can’t effectively manage and coordinate the communication between services, it can lead to increased complexity. One of the main challenges is effectively managing and coordinating the communication between microservices, as it can lead to an increase in complexity.

The availability of the whole system decreases with the creation of more microservices . If we assume a 99% availability for a monolith, the availability of a system of microservices is reduced with each additional component that also has a 99% availability; to determine the availability of the whole system, the availability of the individual components are multiplied.

It is easier to debug and test a single microservice compared to a monolith because they are smaller and more manageable. However, debugging multiple microservices in a system can be challenging because it can be difficult to understand which microservice is performing a particular task. In contrast, observing the behavior of the system as a whole is relatively straightforward with a monolithic architecture. Debugging microservices can be a complex and time-consuming process because it requires a more nuanced understanding of how each component interacts. 

So what is the best way to test such a complex system? Netflix, for example, implements chaos testing involving planned failures of its own services to test its systems’ ability to handle unexpected and faulty conditions. Another more conventional method would be integration testing which involves testing the interactions between microservices by creating test scenarios that simulate real-world interactions between them. The disadvantage of this method, however, is the lack of knowledge about what happens when one or more services fail. Depending on the specific requirements and characteristics of the microservices and the system as a whole, it may be helpful to combine several testing approaches.

The communication between different microservices should be decreased to a minimum thus only if they need functionalities of other microservices. If the communication between microservices frequently becomes a hindrance, it may indicate underlying architectural issues. A common problem with tightly coupled microservices is that changes in one microservice can have a domino effect on other microservices, leading to unexpected behavior and failures. Another issue is the over-reliance on synchronous communication between microservices, which can lead to deadlocks and slowdowns.

Managing the entire system can be complex, especially if the organization lacks technical expertise. In such cases, utilizing cloud providers like AWS or Azure can be a viable solution, though it may result in increased cost. Additionally, the implementation of a fail-safe API is crucial, but can be a complex task.

Another challenge is the independent deployment of microservices, which increases the operational overhead, testing challenges, and the need for specialized technical expertise. This can result in a higher level of complexity in the overall system. The decentralization of services can also increase the attack surface, making it more difficult to secure the system.

In the ideal microservices world each microservices has its own database. In reality, it is difficult to keep the data of the services separate with data that is needed by multiple microservices.  This contradicts the approach of splitting the data into separate databases. A compromise needs to be found between how to split the data into separate databases and how to maintain data consistency.

For the use of microservices additional skill is needed such as knowledge about Kubernetes, Container, Logging or CI/CD Pipelines compared to monolithic applications. For smaller applications, a monolithic approach is more advantageous due to lower overhead in setting up and maintaining the system, as well as simpler and easier testing and deployment processes.

Main learnings

  • Be clear about why you want to do microservices. Is it because everyone else is doing it or because you need it? A microservice should not be the goal in itself, it can be more of a way to get to your goal.
  • Consider whether your application is too small. Microservices only make sense when your application reaches a certain size. Below that, the overhead of microservices is far too big.
  • If it is not possible to divide the project into small parts without creating a large number of dependencies, then you should leave it.
  • See if your organization has the ability  to break down the structures to make microservices work and has the capacity to maintain the infrastructure needed for microservices.
  • Think about testing the whole system – We already know from monolithic applications that testing is crucial. However, it is equally important that the interactions between microservices can be tested effectively. Automated testing provides you assurance in the reliability and functionality of your system.
  • Consider whether there is a need for scaling to that extent. For a website with constant traffic or no spikes, it is possible to work well with monolithic systems as there is no need to scale resources quickly “on the fly.”

Conclusion

The question of when to prefer microservices over a monolithic system is a complex one that requires an understanding of the drawbacks and benefits of both approaches. There are certain guiding rules or criteria that can help determine when it makes sense to adopt microservices, such as the size and complexity of the system, the need for increased scalability and resilience, and the skills and resources available to manage and maintain the architecture. Understanding these factors can help organizations make informed decisions about whether to adopt microservices and how to implement them effectively.

What does the future hold for microservices?

Monitoring the health and performance of microservices can be a complex task and is likely to be a central area of interest in the future. The serverless computing approach is also expected to gain traction in the microservices space, as organizations do not have to worry about the underlying infrastructure. Finally, I would like to mention the ways in which artificial intelligence could improve microservices in the future. It is conceivable that AI algorithms could be used to improve the resilience of microservices through AI monitoring and management. Alternatively, the use of AI could be to improve communication between individual microservices. As these technologies continue to develop, it is likely that more and more new applications will emerge.

Main Sources

[1] Wolff, E. (2019). Microservices – A Practical Guide. CreateSpace Independent Publishing Platform. ISBN: 978-1-71707-590-1

[2] D. Shadija, M. Rezai and R. Hill, “Towards an understanding of microservices,” 2017 23rd International Conference on Automation and Computing (ICAC), Huddersfield, UK, 2017

[3] Disasters I’ve seen in a microservices world, www.world.hey.com/joaoqalves/disasters-i-ve-seen-in-a-microservices-world-a9137a51 (Last access: 09.02.2023)

[4] Monolithic architecture vs microservices, www.divante.com/blog/monolithic-architecture-vs-microservices  (Last access: 09.02.2023)

[5] Microservices – Not A Free Lunch!, www.highscalability.com/blog/2014/4/8/microservices-not-a-free-lunch.html (Last access: 07.02.2023)

[6] Why you should use a microservice architecture, www.infoworld.com/article/3637016/why-you-should-use-a-microservice-architecture.html (Last access: 08.02.2023)

[7] Microservices-Architekturen, www.leanix.net/de/wiki/vsm/microservices-architecture (Last access: 07.02.2023)

[8] Microservices, www.martinfowler.com/articles/microservices.html  (Last access: 09.02.2023)

[9] 8 Microservices Trends to Watch in 2022, https://scoutapm.com/blog/microservices-trends (Last access: 09.02.2023)

[10] Microservices vs. SOA: Wo liegt der Unterschied?, www.talend.com/de/resources/microservices-vs-soa/ (Last access: 09.02.2023)

Migration einer REST API in die Cloud

Artikel von Cedric Gottschalk und Raphael Kienhöfer

Im Rahmen der Endabgabe der Vorlesung “Software Development für Cloud Computing” haben wir uns zum Ziel gesetzt, eine bereits bestehende REST API eines vorherigen Projektes in die Cloud zu migrieren. Dabei haben wir uns dafür entschieden, die Google Cloud zu verwenden. Im Zuge dieses Projektes haben wir uns auch mit Infrastructure as Code mittels Terraform beschäftigt.

Architektur

Vor dem Umzug in die Cloud lebte die API als Container auf einem einzelnen Server, der mittels Docker Compose verwaltet wurde. Hier wurde nginx als Reverse-Proxy eingesetzt, um die Übertragung mittels TLS zu sichern. MariaDB wurde als SQL-Datenbank eingesetzt. Die Verknüpfung der einzelnen Dienste gestaltete sich hier durch den gemeinsamen technischen Unterbau (Docker) sehr simpel.

Continue reading

“Himbeer Tarte und harte Fakten”: Im Interview mit Ansible, k3s, Infrastructure as Code und Raspberry Pi

Why so serious? – Ein Artikel von Sarah Schwab und Aliena Leonhard im Rahmen der Vorlesung Systems Engineering and Management.

Die Idee, ein fiktives Interview zu erstellen, entstammt daraus komplexe Sachverhalte unterhaltsam und verständlich zu machen.

Raspberry Fruit

Wir sind heute zu Gast in der Tech-Sendung “Himbeer Tarte und harte Fakten”.  Heute geht es unter Anderem um die Themen “Raspberry Pi”, “Infrastructure as Code” und “Ansible”. Vier Experten haben wir zu einer Diskussionsrunde eingeladen. Herzlich Willkommen, Frau Ann Sibel, Herr Kah Dreis, Herr Archie Tex-Ture und Frau Infra Struc-Ture. 

Continue reading

Applikationsinfrastruktur einer modernen Web-Anwendung

ein Artikel von Nicolas Wyderka, Niklas Schildhauer, Lucas Crämer und Jannik Smidt

Projektbeschreibung

In diesem Blogeintrag wird die Entwicklung der Applikation- und Infrastruktur des Studienprojekts sharetopia beschrieben. Als Teil der Vorlesung System Engineering and Management wurde besonders darauf geachtet, die Anwendung nach heutigen Best Practices zu entwickeln und dabei kosteneffizient zu agieren

Continue reading

Deploying Random Chat Application on AWS EC2 with Kubernetes

1. Introduction

For the examination of the lecture “Software Development for Cloud Computing”, I want to build a simple Random Chat Application. The idea of this application is based on the famous chat application called Omegle. Omegle is where people can meet random people in the world and can have a one-on-one chat. With Omegle people can have a conversation with not only normal chat but also a video chat. Not like Omegle, my application has only a normal texting function.

2. Technologies for the development of application

a. Frontend

React

For Frontend Development there are a great number of open-source libraries. React is recently one of the most popular and widely used libraries. There are many reasons for a developer to choose and use React. It is one of the most popular front-end technologies in the market. Compared to other libraries out there React seems to be easier to learn. As it doesn’t take much time to learn this technology, the developers can rapidly practice and build their own very first project. React helps increase productivity by using reusable components and development tools. There are many development tools available for React that speech up the project. The most important reason is that it has very strong community support. There are thousands of free React tutorial videos and blog posts on the internet which is very helpful for the developer. Therefore, I decided to learn this library during previous semesters. This project gives me a chance to have real practice.

b. Backend

Node.js

Node.js has become one of the most popular JavaScript tools. Node.js is a JavaScript runtime environment, which allows companies to improve their efficiency of the web development process. The frontend and Backend teams can now work more easily together. Since Node.js is written in JavaScript and bases on Google V8 Engine, everything is done very quickly. Node.js can create an Event Loop, which can cover all asynchronous input-output operations.  And the best part is that it can increase the speed of any other framework as well. By allowing developers to write JavaScript code for both the Frontend and the Backend, Node.js makes it easy to send data between the server and the client, which makes it easier to synchronize data immediately.

Socket.io

When I decided to develop a chat application, I already thought about Socket.io. I had a chance to know this JavaScript library during the course Web Development 2. To build a real-time application we should use Socket.io. Socket.io will help parties in different locations connect with each other, transmitting data instantly through an intermediary server. Socket.io can be used in many applications such as chat, online games, updating the results of an ongoing match, … It is used a lot by the developer community, because of its speed and convenience. Socket.io provides us with many methods as well as outstanding features such as security, auto-connect, disconnection detection, multiplexing, room creation, …

3. Application explanation

a. Client

As I mentioned, for the client-side I use React. Therefore, I have a chance to know the concept of a Single Page Application, whose content is loaded only once and updated dynamically. For the interaction with the page or with subsequent pages, we don’t need another server, which means that the page is not reloaded. To apply this concept of the web application, React offer a packet names “react-router-dom”. My application is very simple, so it only has two paths to be loaded. The root path is where the user inputs his name, and it will load the Join component. The other path is the Chat component, in which the user sends messages after getting a room.

Socket.io library for the client is imported because it is not provided by JavaScript. This will expose the ‘io’ namespace.

Endpoint URL will be given for ‘io’ to connect with the socket.io server.

Now users can send and receive messages from the server after the room is created.

b. Server

To setup server, some packets need to be imported:

The server is set up and listen on port 5000

The server can also save users temporally, so it will know which user’s name already existed. And then it can remove users after they terminate their chat. To execute all those actions, I write a users.js file, which will have some functions such as, addUser, removeUser, getUser, …

I want to create a chat application where users don’t have to have an already known friend and a Chat room will be automatically created for them. With this application, they can meet a new friend and the server will get them a chat room.

I created a variable queue, which is an array. It will save a user, who has not had any partner yet, temporarily. Every user, who already inputted their names, will be connected to the socket. Socket knows that he wants to join a room. In the callback function of the socket, the name and socket ID of the user will be saved by function ‘addUser’, which is in users.js. Then socket will check if any other user is waiting for a partner in a queue. If someone is waiting for a partner, he will be popped from the queue. His socket and partner’s socket will be connected. And their room ID will be a combination of the 2 socket IDs. If no one is waiting for a room, the current socket is pushed to a queue and wait for another user to join.

c. Problem

CORS: 

It is an abbreviation for Cross-Origin Request Sharing, which means that all data should come from the same resource. They use it as a security measure because JavaScript can load content from other servers without the knowledge of the user. This problem can be solved when both websites are aware of the data exchange, then the process will be allowed.

I installed the CORS package on my server. Origin will configure the Access-Control-Allow-Origin CORS header. Now client and server can communicate without error.

4. Testing

Testing is very important during the development of the application. Testing helps developers to discover existing errors/bugs before releasing the application. Therefore, the quality of the application would be enhanced. I decided to test only the server-side because it is more complicated than the client-side. Two tests are being created.

  • A single user testing: tests if he can connect to a server and receive a welcome message from the server.
  • Two users testing: tests if a room can be created when there are two users and both of them can receive the same welcome message from the same room.

5. Deployment

a. Docker Swarm and Kubernetes

When deploying this project, I create a container for each of the client and server sites. Docker is the most popular solution for the container platform. I want to learn to write a DockerFile and a docker-compose to create a container.

For the cloud development environment, I choose Amazon Web Service. It is currently one of the most comprehensive platforms for cloud computing services. I use an EC2 virtual server to make my project online. I would like to work with Kubernetes to manage the containers. I choose EKS, which is a service from Amazon web service.

If you work with a lot of containers, you have to be able to manage them efficiently. An orchestration tool enables exactly that. With the orchestration tool, you can integrate containers that you created with Docker. Then you use orchestration to manage, scale, and move the containers.

Although Kubernetes and Docker can work well together, there is competition when it comes to Docker Swarm. I have considered some features of Docker Swarm and Kubernetes.

  • Scaling: The load on our application is too high, Kubernetes can add more nodes to our cluster. Of course, we have to configure Kubernetes correctly so that it can create a new virtual machine. Then a node is added to the cluster.
  • Installation: with Docker Swarm, it is easy to create a new node, then integrate it with Swarm. On the other hand, to configure Kubernetes you have to determine the size of the node, how many master nodes, and worker nodes.
  • Load balancing: Docker swarm offers application auto load balancing. However, Kubernetes gives the flexibility to configure load balancing manually.
  • Storage volume participation: since the docker swarm manages Docker containers, containers find it easy to share data. Not just data, as well as other things. Kubernetes puts the container in a pod so the container cannot simply communicate with another. You need other components from Kubernetes, e.g., Service to create the connection.
  • Monitoring: While Swarm requires additional resources for monitoring and keeping a log, these tasks are already provided for in Kubernetes.

b. Amazon EKS and Kops

When deploying Kubernetes on AWS, you can configure and manage the deployment yourself for full flexibility and control. There are a few options for self-management: Amazon Elastic Kubernetes Service and Kops.

EKS is a managed service offered by AWS. EKS uses automatically provided instances and offers a managed control plane for deployment.

Kops is an open-source tool that can be used to automate the deployment and management of clusters on AWS. It is officially supported by AWS.

c. Docker File

To work with Kubernetes, I need to create all necessary containers. Containers are created by writing docker files. These docker files contain all information about the container, e.g.: name of the image, directory store our application, port, … Docker will follow this information, then step by step, create containers. Besides, I use Docker Compose to start the process of creating containers.

d. Kubernetes architecture on AWS cloud

I choose Kubernetes, Amazon EC2, EKS, ECR for the deployment of my project. What is showed below is the architecture of Kubernetes on AWS cloud.

Source: https://blogs.tensult.com/2019/08/14/guide-to-setup-kubernetes-in-aws-eks-using-terraform-and-deploy-sample-applications/

Kubernetes server is a control panel. It creates a cubic cluster. In the cluster, there are master nodes that create and manage worker nodes. When you call deployment commands, the Kubernetes server sends messages to EKS, then EKS sends the tasks to the worker nodes.

Worker node contains some pods, in which the docker container will be run. I choose controller Deployment to keep these pods running and observe them. For the worker node, I create a pod for the client, 2 pods for the server, and a pod for Redis. The load balancer can be used to communicate with the application from outside. 

I decided to have 2 pods Server because I want to scale my application. In case when more people try to connect to my application, the request will be handled faster when we have 2 pods instead of 1 pod. The picture below shows a horizontal scaling, which means that it has more copies of the application and these copies can work with each other at the same time.

For example: for the pod client I write a client-deployment.yaml file

  • A deployment named client is created, indicated by the .metadata.name field.
  • The .spec.selector field defines how the deployment finds the pods to be managed.
  • The deployment creates one replica pod, indicated by the .spec.replicas field.
  • the .template.spec field indicates that the pod is running a container. The container is created by docker image, which has been saved in ECR (Elastic Container Registry)
  • The container is created using the .spec.template.spec.containers.name field which is called client.

To enable network access to the set of pods I have to create a Service, which is written in client-service.yaml.

This specification creates a new service object called “client” that targets TCP port 3000 on each pod, which is labeled as app = random-chat.

For pods Server and Redis, I also create Deployment and Service for each.

e. Problem

Service of pod Server:

Pods can usually send requests with each other by using a normal type of Service, which means, in my case, that the pod Client can send a request to the pod Server without having an attribute ‘type’ in server-service.yaml. The Endpoint of the Client will be ‘server:5000’, which is the combination of the name of the service and the targetPort. But after many attempts, it still does not work. So, I decided to make the Service of pod Server as type Load Balancer, which is shown in the picture above. Now the Endpoint of the client will be the address of this Load Balancer.

6. Conclusion

During the course ‘Software Development for Cloud Computing’ and this project, I have a chance to know the concept of Docker containers and how to manage them with Kubernetes. I gain not only theoretical knowledge but also practical experience by developing and deploying the application. Moreover, working with cloud computing is new and interesting for me. Cloud computing is nowadays applied in the development of applications a lot. What I applied in my project is just a small part of cloud computing and I want to learn more about it in the future.

KISS, DRY ‘n SOLID — Yet another Kubernetes System built with Ansible and observed with Metrics Server on arm64

This blog post shows how a plain Kubernetes cluster is automatically created and configured on three arm64 devices using an orchestration tool called Ansible. The main focus relies on Ansible; other components that set up and configure the cluster are Docker, Kubernetes, Helm, NGINX, Metrics Server and Kubernetes Dashboard. Individual steps are covered more or less; the whole procedure follows three principles:

Continue reading

How to Scale Jitsi Meet

Person sitting on their bed, having a video conference on their laptop

In today’s world, video conferencing is getting more and more important – be it for learning, business events or social interaction in general. Most people use one of the big players like Zoom or Microsoft Teams, which both have their share of privacy issues. However, there is an alternative approach: self-hosting open-source software like Jitsi Meet. In this article, we are going to explore the different scaling options for deploying anything from a single Jitsi server to a sharded Kubernetes cluster.

Continue reading

Migrating from Heroku to Hetzner: Achieving Scalability with Docker, Kubernetes and Rancher

Written by Eva Ngo, Niklas Brocker, Benedikt Reuter and Mario Koch.

In the System Engineering and Management lecture, we had the opportunity to apply presented topics like distributed systems, CI/CD or load testing to a real project or with the help of a real application. In the following article we will share our learnings and experiences around the implementation and usage of Docker, Kubernetes, Rancher, CI/CD, monitoring and load testing.

Continue reading

GeoDarts

by Jannik Igney [ji016] & Timothy Geiger [tg079]

1. Introduction

For our course “Software development for cloud computing” we developed a little multiplayer browser game named “GeoDarts”. Goal of the game: Guess where cities are located in Germany on a map and be closer than your opponents. Goal of our project: We wanted to learn what it takes to bring an application like that to the cloud and test different solutions for this task. Also we wanted to get some first hands-on experience with different technologies including socket.io, Vue.js and Mapbox. There were many highs and lows and many lessons-learned in our process. In the following we will describe some of the problems we encountered in different areas, along with the solutions we found for them – and the new problems arising from these solutions…

To get a rough impression of what our game looks like:

The blue markers are the ones set by the players, the red one is the solution. The player that is the furthest away gets one point, the next one two points and so on. If two players have the same number of points in the end, their total distance will decide.

2. Technologies

For the application itself we decided to work with Node.js, Socket.io and Vue.js.

2.1 Why NodeJS

We decided to use NodeJS for our backend because we already worked with it in the lecture Web Development 2. Therefore, we were able to start working on the game right away without having to deal with a new programming language and framework. Popular alternatives would have been Django or Flask. However, both frameworks require advanced knowledge in python that neither of us has. Another alternative that is becoming more and more famous is Deno. Deno was programmed by the same developer who also developed NodeJS. He wanted to improve some things in Deno that he didn’t like in NodeJS. You can write code for Deno with Javascript or Typescript. Even though we have no experience in Typescript, we have heard almost only good things about the programming language. Unfortunately, it has two major downsides. It has a relatively small community and you cannot use npm. Whereas NodeJS has a huge community. You will hardly find any question about NodeJS that has not been asked and answered before. This saves you a lot of time and headaches when debugging.

2.2 Why socket.io

In the beginning, we didn’t know how to implement a real-time game. After some research we came across a technology called websockets. Websockets allow communication in both directions. That means not only can the client send data to the server, but the server is able to send data to the client as well. Websockets work like this: you open a TCP connection and leave it open until you don’t need it anymore. In NodeJS you can implement websockets in different ways. We chose one of the most prominent solutions: Socket.IO. It is very easy to start with. With only a few lines of code, you already have a functional application running. It is also widely used and you can find many learning materials. And these were not even all features. Unfortunately. By the time we were at a very advanced stage of development, we had learnt about features that we didn’t know before. However, these features were not very advantageous for us. We will see later why.

2.3 Why Vue.js

Similar to Node.js we had some first basic experience with Vue.js but never did a major project with it. That’s why we wanted to use this opportunity to deepen our knowledge of Vue.js. Many concepts of Vue.js are quicker to learn than with competitors like React. Also there’s great support from a large community – perfect conditions for some experimenting.

3. Architecture and program flow

In the beginning our application just consisted of a single Node.js express server, communicating with clients via socket.io. At some point we decided that we would like to be able to horizontally scale the app and have multiple instances available. The reason for that decision was of course not that we feared our server might collapse under the traffic generated by hundreds of thousands of excited users of our game, we simply wanted to gain some first experience in the topic of scaling and find out what solutions there are and what problems they come with.
So the below graphic shows the architecture we ended up with. Redis and Node.js are deployed in seperated Kubernetes Services. In order to make use of multiple Node.js instances we first had to overcome quite a few obstacles and still did not achieve a perfect solution, as described further below. However, we learned a lot in this fight and in the end that was our most important goal.

Once the client has loaded the page all further communication with the server happens via the websocket. The following chart illustrates the flow of events from creating a new game to transmitting the final results:

4. How to get a map?

One of the core requirements for a geography game like ours, is of course embedding an interactive map. Since most developers will probably come across the big topic of maps sooner or later, so we were glad to use this as an opportunity to take a first look into some of the related technologies. First thing we learned was that maps that are integrated in websites are usually not one big piece of data but a raster of single tiles, put together like pieces of a puzzle. That way only the required areas of a map need to be loaded, e.g. when zooming in. Traditionally these tiles were served as images (“raster tiles”), but considering that you need new tiles for every zoom level and that every image is a huge bunch of data you might end up with a very slow map that is not much fun to use. Also with raster tiles, though you can add custom features to the map later, you can not style single layers of the map, because there is just that one layer. That’s why most providers for map services support another technique by now: vector tiles. With this approach, the tiles are not images made of pixels, but vector data, an exact geometric description of every element in the map which can then be rendered in the browser. The main advantages are shorter loading time (smaller data size), smoother zooming (no need for multiple tilesets for different zooming levels) and easy customization of the specific layers. Though this technique involves higher requirements on the client’s browser and hardware it is considered the superior approach for many use cases nowadays. That’s why we decided to go with vector tiles, though probably if we had to start over, we would try to do it differently.

When it comes to the question of what technologies to use, there are many alternatives for both client side libraries (e.g. leaflet, mapbox gl) and map tile providers (mapbox, Google, OpenStreetMaps). In our case it was important to be able to style the map, so that we could remove all labels and features except country borders and give single countries their own colour. Mapbox is a big platform for all kinds of map services including an extensive map style editor called Mapbox studio. So we decided to try this, since Mapbox also offers a large community and good tutorials. Mapbox hosts vector tiles for free as long as you stay under 50.000 requests per month. No need to worry here.

We were happy with Mapbox since it is fairly easy to use and fulfilled all our demands. However, with the knowledge we have now about how displaying maps works, we would probably try to solve the problem completely differently: The most important advantage of vector tiles is good performance even with heavy zooming and jumping around on a multilayer map. However, the map that we needed for our game neither has many layers nor is there any zooming or changing position. That’s why we probably could have rendered the map from a simple geoJSON file, e.g. with the library d3.js. A geoJSON describes all features of a map in JSON format and can be used as a starting point for vector tiles. Due to our low requirements on the map, we probably wouldn’t have needed vector tiles and could have done without the dependency to mapbox. 

5. Frontend development – SPA = Single Page Application or Socket Problems Ahead

Writing the frontend of our application with Vue.js was a new experience for us in different respects. Even though it is really easy to create a first outline of a page with Vue.js, it still takes quite some time to get behind concepts like the lifecycle hooks and communication between components, but also usage of the development server or UI Libraries. 

All these things go beyond the scope of this post. A more general aspect of frontend frameworks like Vue.js, is that what you get is an SPA, a Single Page Application, which opens up new possibilities and is better in terms of performance, but it also changes some very basic assumptions that you might be used to if you come from developing old school static web pages like us. The most important one is probably that your components are being reused on the client side without reloading them from the server again. That sounds like no big deal but at many points it can cause some quite confusing errors if you forget about it. The fact that your entire frontend is now stateful of course has some big advantages like being able to store data and objects and keeping open a permanent socket connection, you just shouldn’t forget that unlike in a static frontend, jumping between different URIs won’t reset that state, because the page is not reloaded from the server. For instance if you use a timer in your component and you don’t kill it when leaving the component it might not start anew but just keep running when revisiting that page. At one point it took us a long time to figure out why certain actions happened twice. Every chat message we sent, appeared twice in the chat. Eventually we found out that our listener functions of the websocket were mounted again every time a component was reloaded because we didn’t destroy them. We handled this by writing a wrapper function for socket.addListener(), which first removes all listeners for that specified event and then adds the new one. Another approach would be to use the vue-router’s beforeRouteLeave hook to remove listeners when leaving the current route.

Another point where we needed to adapt our stateful application to the stateless platform web is the game flow. We needed a purely frontend mechanism that makes sure that a user can only access those subpages that represent the phase of the game that he is currently in. For example when the results are displayed we don’t want players to be able to navigate back into the game view. We don’t want them to see the game view of a game they have not joined. For that purpose we created a room token that is stored in a vuex store when the player successfully joins a game and deleted when the game is over. By this token the components can verify that a user is actually allowed to enter the component he navigated to.

6. Backend Introduction

This chapter is about roughly explaining how our backend works. We will take a closer look at the backend later when it comes to scaling because many decisions for certain technologies and techniques only came up through scaling. Therefore, it would not make much sense if we were to look at it right now.

Before we started the development, we thought about how to make it possible for players to play the game at the same time in different rooms. With techniques that we knew up to that point, we couldn’t find a solution. So we did some research. After a while we came up with Socket.IO. Besides being able to develop real time applications, socket.io also offers the possibility to create such rooms that we need for our game. So how does it work? If a player wants to create a new game, we just create a new room in socket.io with a unique Id in order to distinguish the rooms. Now a second player wants to join the game. To join the room from before, all he needs is the unique id. That’s it. Now we can create rooms and join them. But how can we address only the players in a specific room? With socket.io this is done via events. We can emit events from the server side to the client and the other way around. An example: Player ‚A‘ wants to send a message to all other players in the same room. So he emits an event called ‚sendMsg’ with the message as parameter. The server receives this event and tries to determine the room the socket is currently in. Afterwards, the server itself emits an event named ‚receiveMsg‘ with the same message as before as a parameter with the determined room as destination. Every client connected to the server listens to this event. But since the event is only sent to the sockets in the same room, only these sockets receive the event. The received message can then be displayed in the chat. Once you understand the logic behind it, it is actually quite simple. Our entire client-server communication then works with these events. Socket.io also saves us some work. It automatically deletes the rooms that are no longer needed. This is the case when no player is left in a room.

However, one problem remains. How do we store data? For example, what about the points you win during the game? If we were to store the scores on the client side, you could manipulate your score. What about the cities that are randomly selected by the game at the beginning and then queried during the course of the game. If we would save the cities on the client side, one could simply read the coordinates. That’s not what we want. So this is not a solution that comes into question for our game. This means we have to store the data on the server side. But how? In the end we decided to use Redis. This was not always the case. But more details about this will come later when we talk about scaling.

What is redis? Redis is an in-memory data structure store. We can easily store data structures like strings, lists and sets. It stores data with an associated key. So if we want to access a data set, we simply do this through the matching key. However, there is a problem with Redis. Unlike socket.io, the data here gets not deleted when all sockets in a room are disconnected. So if we did nothing, our database would be filled with unnecessary data. When we first noticed the error, there were about 1,000 entries in the database without a game running. The number of entries in Redis is easy to determine. You go to the Redis console and enter ‘keys *’. This will give you all keys stored in Redis. But how did we solve the problem? Our solution is to make the keys dependent on the room or the player. Each socket in socket.io has a unique socket id and each room has a unique room id as mentioned before. We use these IDs to store the data. We simply rename the keys. A key that stores the players name of a player with the socketId=12345 would then be: ‘12345:playername’. Now you just have to get all keys starting with ‘12345’ at the disconnect event and delete them. In Javascript it looks like this:

This way we delete all player data, so no unnecessary data remains in Redis. Furthermore, we check in the disconnect event whether the player was the last player in the room. If so, we also delete all room data with the same method. But now we use the room id instead of the socket id. This way we can avoid memory leaks.

What had also cost us a lot of time was generally programming with Redis. The code became very complex very quickly. You had to read the code again and again to understand what individual parts of the code did. Of course that’s something nobody wants to have. The fact that the readability of the code worsened very quickly was because Redis works with callbacks. If you want to get one entry, it looks like this:

This does not look very complicated yet. But if you have a lot of queries that depend on each other, as we do, the whole thing looks a lot more complicated:

In order to understand what’s going on here, you really have to pay attention. After a while we found out that such a thing is called ‘Callback Hell’. It is caused by coding with complex nested callbacks. Each Callback takes the previous result as an argument. Fortunately, you can easily solve this in Javascript with promises respectively with async and await. To have a little less work, we used the npm package “async-redis”. You just have to be careful that you can only use await in async functions. Now we can easily rewrite the above example to make the code more readable:

7. Infrastructure vs. Platform as a service

When our application had reached a certain level, we started thinking about ways to deploy it to the cloud. Using AWS EC2 Instances seemed to be a quite straightforward approach to us since we could operate on a common Linux VMs and wouldn’t have to give too much control to some black box. So we created a Linux Ubuntu instance, added some rules to its security group (AWS implementation of a virtual firewall) and installed the software we needed, like our Node.js runtime. By that time we already knew that we would need a separate redis server that allows us to have multiple instances of the app available. Therefore we created another EC2 instance running redis and connected our app instance with it via the internal IP address. This worked fine but we saw some problems. First of all, hardcoding the IP of our redis server for the connection did not seem like a great idea in respect to flexibility and exchangeability. Also we realized that having to connect to our instances via ssh in order to monitor and operate the application is quite annoying. We came to the conclusion that maybe we should try some solutions that are more in the field of orchestration and platform as a service. That’s why we took a look into Cloud Foundry and kubernetes in the ibm cloud environment. 

Cloud Foundry seemed to be a good choice at first, since deploying our app itself was done very quickly after choosing our version of Node.js as a runtime and then pushing our source code to the platform with the help of a simple manifest.yml file. However we couldn’t figure out how to set up a redis-server as a separate service which our app could connect to. The longer we tried to make any sense of the CF documentation the more confusing and frustrating it became, so eventually we decided to focus on kubernetes, which was a good decision. Though Docker and Kubernetes turned out to be very complex for beginners as well and they required a lot of research, we found the documentation and tutorials to be really helpful and got a clearer understanding of how things actually work over time. So our summary for kubernetes is “powerful and complex but well explained and consistent in itself” 

Apart from the different technical purposes of AWS and kubernetes and apart from all their pros and cons, we noticed huge differences in the way things are explained and presented to first-time users. With AWS we constantly felt that it’s mostly about advertising and selling a product. Kubernetes on the other hand as an open source platform really seems to want its users to understand the underlying technical concepts. As developers we liked that spirit way better than the commercial one and that’s also why we chose not to work with Amazon’s PaaS solution Elastic Beanstalk for a start.

8. Scaling with Kubernetes

So what is this chapter about? As already mentioned in the chapter ‘Backend Introduction’, many decisions we made in the backend are based on problems we encountered during scaling. These difficulties occurred due to the fact that we weren’t sure at the beginning whether we wanted to scale the game. We then decided to scale the game after all. But the backend version at that time was not designed for scaling. That’s how all these problems came up. For example Redis. In the beginning we had a completely different way of storing data. We only decided to use redis later when we experienced difficulties with the old method. And this is exactly what this chapter is about. We will discuss the problems that arose while scaling a version that was not designed for scaling. So let’s start with the Kubernetes deployment, because without a deployment there are no bugs to talk about.

8.1 Docker

In order to deploy our game in Kubernetes, we first have to create an image of our game. This is done via Docker. In order to write our own dockerfile, we first had to take a closer look on how docker works. Each time we build our image, docker steps through the instructions in the dockerfile and executes them in the specified order. Every instruction in that file then creates a new image layer. This mechanism allows image layers to be cached. Therefore when Docker steps through the instructions one after the other, it checks if a layer has changed. If nothing has changed, docker uses the cached image layer. Otherwise the instruction gets executed and all subsequent layers are not cashed anymore, because something could have changed. Best practice would be to order your image layers from the less frequently changed to the more frequently changed.

That’s why our first instruction is the node image. It is rarely changed and rather big. We use the alpine image from Node because it is faster, smaller and more secure compared to the other versions. Furthermore, we don’t need the advantages of the other images, such as ‘apt’. Here is a little size comparison from the latest version (14.11):

stretch  345,2 MB
buster 322,65 MB
stretch-slim 57,8 MB  
buster-slim 51,54 MB
slim 47,33 MB  
alpine 28,33 MB

However the image is only available locally on our computer. So we still have to publish it somehow. This can be done with the help of a registry. There are many different registries. One of the best known is the one from docker itself, which we then decided to use: Dockerhub.  Alternatively, cloud providers like AWS and IBM also offer such registries.

8.2 Kubernetes

Since we now have an image of our game, we can now deploy our application into Kubernetes. Kubernetes is a popular container orchestrator. It was first released by Google, but is now part of an open source community.

In the beginning it was very difficult to get into Kubernetes, because there were many new concepts. But once we understood the basics, we were able to apply what we had learned very quickly.

A Pod in Kubernetes is the smallest unit of deployment. It can run one or more containers. Technically, a Pod can be deployed directly into kubernetes. However, we mostly use controllers to deploy a Pod. There are different types of controllers: Deployments, ReplicaSet, StatefulSet, … In our case we chose Deployment because it can manage several identical pods. Here we can specify how many instances we want to have and which image we want to use.

Next we need a Service. A service is an endpoint to a set of pods. It is a persistent endpoint in the cluster to connect to the Pods.

Finally, Ingress. Ingress manages external access to the services in a cluster. Here you can enter additional routes that lead to other backends. But in our case there is only one backend.

In summary, we now have the following structure:

https://kubernetes.io/docs/concepts/services-networking/

8.3 Problem Solving

Now comes the really interesting part, which helped us the most in understanding scaling. Determining why our app is not doing what we wanted it to do. In this blog post we want to focus on 3 problems. The first problem is about data storage and the other two are about websockets and socket.io.

This brings us to the first problem: data storage. We have already mentioned Redis as our final solution. But how have we stored the data before that? In the simplest way possible. We stored our data as JSON objects in an array. Here is a simplified version of a room JSON object:

As you can see, we store the RoomID. Before a player can join a room via Socket.IO, we first step through the array in which all rooms are stored. Only if the room with the specified RoomID exists, the socket is allowed to join the Socket.IO room. However, when we scaled the game, we noticed some strange logs. Apparently, a room couldn’t be found. As a result the socket wasn’t able to join a game. Yet we were sure that the room must exist. Therefore, we did a little research. After some time, we realized the mistake: Let us assume we have 2 instances. Alice visits our website. She gets forwarded to instance 1. Now Alice creates a new game, which results in a room object being stored on instance 1. Afterwards Bob wants to join the game. He gets forwarded to instance 2. The server tries to find the room Alice has just created. Since the room was saved on instance 1, the server can’t find it. As a result Bob can’t join the game. 

As you can see, we have made the mistake to store our data per instance. Such an application is called a stateful application. Every instance has a different state. One possible solution would have been to edit our Ingress Controller. In the Ingress Controller you can add paths that point to different backends. In our example from above, the paths would have been ‘/instance1’ and ‘/instance2’, each pointing to a different backend. For example a game is created on instance 1 with the roomID=12345. Now the invitation link would look like this: “/instance1/invitation/12345”. Every player who now joins room 12345 gets automatically forwarded to instance 1. However, this solution has a few drawbacks. Firstly, instance 1 must know its name: ‘instance1’. This has proved to be quite tricky. With a little more time, however, we would have figured that out. Yet there is still another problem. Let us suppose we add 5 additional instances. But how does the ingress Controller know? Somehow the new paths have to be added. One would have to add them manually. We could have possibly replaced the ingress controller with Traefik. Traefik, from our understanding, has the feature to automatically detect such new paths. But we didn’t bother with that any longer, because Traefik became very quickly very complex. Finally there is also a third drawback to the solution. We don’t really want the players to know which instance they are on. We just want it to be some kind of a black box for the players. Therefore, the solution was out of question.

Our solution to the problem was actually quite simple. We have turned our stateful game into a stateless game. We simply connected a database to ensure that every GeoDarts instance has access to the same data. So it does not matter which instance you are forwarded to. As mentioned above, we chose Redis. MongoDB would have been an alternative. We wouldn’t have had to reprogram as much, because we already saved the data as JSON objects. We just had to change the “location” where we saved the data. However, deleting the data after a disconnect with MongoDB would have been much harder. In addition, Redis also has a significant advantage when it comes to websockets, which we will address in a moment. But first we have to create a Redis Service in Kubernetes. We do this through the following YML file:

Now to the second big problem, websockets. Again let’s assume we have 2 instances and 2 players: Alice and Bob. Alice gets forwarded to instance 1 and bob to instance 2. Both players join the same socket.io room with the same roomID. Everything seems fine. At least on the first sight. Although we have not received any error messages, it’s still not working properly. For example, if you tried to start the game, the game only started for some players. And if any of the remaining players then clicks on „start game”, the game will be started for all other players where the game has not yet started. 

After several frustrating debugging sessions, we finally figured it out. Websocket connections are stateful. This makes them not so easy to scale. But lets first look at what went wrong. As Alice wanted to join the room on instance 1, the join event was only emitted on instance 1. The same applies to Bob. When he wanted to join the room, his join event was also only emitted on instance 2. So Alice didn’t receive the event because she is on instance 1.

This problem can be addressed by using an Adapter. This Socket.io technique allows us to pass messages between processes and to broadcast events to all clients. We use the socket.io-redis adapter, which takes advantage of the pub/sub function of Redis. When Bob  now tries to join a game, the other instances now also get informed about this event with the help of Redis. Ergo the event gets emitted on all instances.

Now to the last problem. From time to time we got this error message on the client side:

Error during WebSocket handshake: Unexpected response code: 400

To understand why this error occurs, we need to look at how socket.io establishes a connection. Since socket.io 1.x.x the fallback algorithm has changed to an upgrade approach. By default, a long-polling connection is established first, then upgraded to “better” transports like websockets. Long polling almost works everywhere. That’s why the connection gets established this way.

Though this feature can be quite useful, in our case it was the root of our problem. As the socket.io documentation says: 

“If you plan to distribute the load of connections among different processes or machines, you have to make sure that requests associated with a particular session id connect to the process that originated them.”

However, since we use a load balancer, this is not always the case. A brief reminder: With websocket connections, players remain on the prozess/instance they were first redirected to. With Long Polling, though, it’s different. Every time you make a new request, you will be randomly redirected. The player must be lucky that the long polling request gets always forwarded to the same process until Long Polling gets rejected and the websocket connection is used.

There are two solutions. The first solution would be to use Sticky Connections/Sessions. Sticky Sessions is a feature that allows a loadbalancer to route requests to the same process they were first routed to. Though this solution is proposed by socket.io, we decided not to use it. We have tried to stick to the 12 factor app during the development and it does not allow sticky connections. In the end, we decided to use the method socket.io doesn’t recommend: disabling Long Polling. Socket.io does not suggest this, because long polling is one of the biggest advantages compared to other web sock implementations. In that case, the socket.io documentation proposes to maybe consider using raw Websockets. And we have to agree with it. In retrospect, it would have made more sense for us not to use socket.io.

8.4 Target achieved?

Now that we have successfully scaled our game, we have realized something. The load doesn’t get distributed as much as we hoped. Back when we only had one instance, everything was forwarded to this one instance. It was responsible for storing data, emitting events and calculating game stuff. Now, after scaling, this hasn’t changed. There is still only one process responsible for everything. Now it just doesn’t happen on the GeoDarts instances anymore, but rather on the Redis Service. Now Redis stores data and emits events. Only the calculations remain within the geodarts instances. The problem has simply moved to the back. This is somewhat sobering.To solve the problem we could have used Redis replication / Redis cluster. It allows replica instances to be exact copies of a master instance. However, we stopped here. We have noticed that when we’ve just solved one problem, the next one follows. The difficult thing is always to say when you’re done, because there are always things you can improve. But we felt that our project had come to a point where we were able to tell so.

Autoscaling of Docker Containers in Google Kubernetes Engine

The name Kubernetes comes originally from the Greek word for helmsman. It is the person who steers a ship or boat. Representing a steering wheel, the Kubernetes logo was most likely inspired by it. [1

The choice of the name can also be interpreted to mean that Kubernetes (the helmsman) steers a ship that contains several containers (e.g. docker containers). It is therefore responsible for bringing these containers safely to their destination (to ensure that the journey goes smoothly) and for orchestrating them.

Apparently Kubernetes was called Seven of Nine within Google. The Star Trek fans under us should be familiar with this reference. Having 7 spikes, there might be a connection between the logo and this name. [2]

This blog post was created during the master lecture System Engineering and Management. In this lecture we deal with topics that are of interest to us and with which we would like to conduct experiments. We have already worked with docker containers very often and appreciate the advantages. Of course we have also worked with several containers within a closed system orchestrated by Docker-Compose configurations. Nevertheless, especially in connection with scaling and the big companies like Netflix or Amazon, you hear the buzzword Kubernetes and quickly find out that a distribution of a system to several nodes requires a platform such as Kubernetes.

Continue reading