Distributed stream processing frameworks – what they are and how they perform

This blog aims to provide an overview about the topic of stream processing and its capabilites in a large scale environment. The post starts with an introduction to stream processing. After that, it explains how stream processing works and shows different areas of application as well as some common stream processing frameworks. Finally, this article will provide a performance comparison of several common frameworks based on benchmarking data.

Continue reading

Isolation and Consistency in Databases

by Samuel Hack and Sebastian Wachter.

Most people assume that the data coming from a database is correct. For most applications this is true, but when the databases are used in systems where the database is at its limit, this is no longer always the case. What happens if during a query of a value exactly this value is changed by another user? Which entry was first when two users create different values at the same time. Especially in systems where one database server is not enough it is very important to choose the right rules for Isolation and Consistency, if this is not the case it can lead to data loss or in worst case to incorrect values in the application. 

Most database systems promise virtually nothing by default. Therefore it is important to learn what isolation and consistency level the database promises and then adjust it to fit the application.

In this paper we will give an overview of what isolation and consistency levels are, what different levels are available in databases, and what errors can occur in databases. Afterwards we will give a short introduction to the CAP theorem and discuss the problems that it includes for distributed systems.

Continue reading

Autoscaling of Docker Containers in Google Kubernetes Engine

The name Kubernetes comes originally from the Greek word for helmsman. It is the person who steers a ship or boat. Representing a steering wheel, the Kubernetes logo was most likely inspired by it. [1

The choice of the name can also be interpreted to mean that Kubernetes (the helmsman) steers a ship that contains several containers (e.g. docker containers). It is therefore responsible for bringing these containers safely to their destination (to ensure that the journey goes smoothly) and for orchestrating them.

Apparently Kubernetes was called Seven of Nine within Google. The Star Trek fans under us should be familiar with this reference. Having 7 spikes, there might be a connection between the logo and this name. [2]

This blog post was created during the master lecture System Engineering and Management. In this lecture we deal with topics that are of interest to us and with which we would like to conduct experiments. We have already worked with docker containers very often and appreciate the advantages. Of course we have also worked with several containers within a closed system orchestrated by Docker-Compose configurations. Nevertheless, especially in connection with scaling and the big companies like Netflix or Amazon, you hear the buzzword Kubernetes and quickly find out that a distribution of a system to several nodes requires a platform such as Kubernetes.

Continue reading