{"id":22123,"date":"2022-02-21T11:57:21","date_gmt":"2022-02-21T10:57:21","guid":{"rendered":"https:\/\/blog.mi.hdm-stuttgart.de\/?p=22123"},"modified":"2023-06-18T17:48:49","modified_gmt":"2023-06-18T15:48:49","slug":"scaling-a-basic-chat","status":"publish","type":"post","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/","title":{"rendered":"Scaling a Basic Chat"},"content":{"rendered":"\n<p><strong>Authors: <\/strong><br>Max Merz \u2014 <a href=\"http:\/\/merzmax.de\" title=\"merzmax.de\">merzmax.de<\/a>, <a href=\"https:\/\/twitter.com\/MrMaxMerz\" target=\"_blank\" rel=\"noreferrer noopener\" title=\"https:\/\/twitter.com\/MrMaxMerz\">@MrMaxMerz<\/a><br>Martin Bock \u2014 <a href=\"https:\/\/martin-bock.com\">martin-bock.com<\/a>, <a href=\"https:\/\/twitter.com\/martbock\" target=\"_blank\" rel=\"noreferrer noopener\">@martbock<\/a><\/p>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>The idea of this project was to create a simple chat application that would grow\nover time. As a result, there would be more and more clients that want to chat\nwith each other, what might lead to problems in the server that have to be\nfixed. Which exact problems will occur, we were going to see along the project.<\/p>\n<p>In the center is a simple chat server that broadcasts incoming messages to all\nclients. In order to notify the clients about new messages, the connection\nshould be static and bidirectional. Therefore, we based the communication on the\nWebSocket protocol.<\/p>\n<p>Furthermore, wanted to see how the server behaves with the rising load.\nTherefore, we had the plan of performing several load tests to display the\nweak points and improvements, as the system enhances.<\/p>\n<\/div>\n\n\n\n<!--more-->\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h2>Creating a Concept for (Monitoring) A Load Test<\/h2>\n<p>We started this project with creating an overview over all the things we will\nneed. First of all we needed a client that can be used to chat and a server that\nis capable of broadcasting messages to all connected clients.<\/p>\n<h3>Load Test Client<\/h3>\n<p>In order to perform load tests, we need a lot of clients that can send a lot of\nmessages to the server. Since we don&#8217;t have a bunch of people that are willing\nto chat, we have to automate the process of creating and operating the chat\nclients. Since we wanted to use WebSockets and no public load testing clients\nwere available, we had to implement a load test client by ourselves.<\/p>\n<p>First we implemented a basic chat client that was just there for connecting to\nthe server and for sending and receiving messages. The first version was just\na basic command line client. The tool asks you for a username and afterwards\nyou can chat with other clients.\nIn order to automate the dispatching of the messages, we added a <code class=\"\" data-line=\"\">loadTest<\/code> flag.\nAdditionally, you can configure the message frequency and size.<\/p>\n<p>To start a bunch of clients at once, we created the load-test-client. Here you\ncan see a screenshot of the flags that can be used to configure the load test\nclient.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/ltc_help.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"400\" data-attachment-id=\"22487\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/ltc_help\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/ltc_help.png\" data-orig-size=\"2616,1022\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"ltc_help\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/ltc_help-1024x400.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/ltc_help-1024x400.png\" alt=\"\" class=\"wp-image-22487\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/ltc_help-1024x400.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/ltc_help-300x117.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/ltc_help-768x300.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/ltc_help-1536x600.png 1536w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/ltc_help-2048x800.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>Furthermore, we wanted to collect metrics from the clients that could provide\ninformation on the performance of the system in general. The collected data will\nbe written in a CSV file that we can use after each load test to analyze the\ntest by calculating latencies and the ratio of sent vs. received messages, and\nfinally render graphs of the calculation results.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/client-monitoring.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"339\" data-attachment-id=\"22488\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/client-monitoring\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/client-monitoring.png\" data-orig-size=\"1594,528\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"client-monitoring\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/client-monitoring-1024x339.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/client-monitoring-1024x339.png\" alt=\"\" class=\"wp-image-22488\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/client-monitoring-1024x339.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/client-monitoring-300x99.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/client-monitoring-768x254.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/client-monitoring-1536x509.png 1536w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/client-monitoring.png 1594w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h3>Server Monitoring<\/h3>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Server_Monitoring.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"337\" data-attachment-id=\"22489\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/server_monitoring\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Server_Monitoring.png\" data-orig-size=\"1710,562\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Server_Monitoring\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Server_Monitoring-1024x337.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Server_Monitoring-1024x337.png\" alt=\"\" class=\"wp-image-22489\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Server_Monitoring-1024x337.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Server_Monitoring-300x99.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Server_Monitoring-768x252.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Server_Monitoring-1536x505.png 1536w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Server_Monitoring.png 1710w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>We wanted to monitor the server using the popular Prometheus\/Grafana stack.\nFirst of all, we used cAdvisor to access the hardware metrics of the docker\ncontainer that runs the server application. Additionally, we used the Golang\nclient for Prometheus to provide custom metrics. The metrics we wanted to\ncollect were the number of incoming and outgoing messages as well as the\nprocessing time (time from receiving a message at the server to sending it to\nthe last client) for each message.<\/p>\n<p>Finally, we configured two Grafana dashboards to display all the collected\nmetrics.<\/p>\n<h3>Restricting the server<\/h3>\n<p>With the intention of running into scaling issues very quickly and only having a\nlimited server capacity available at the university, we wanted to restrict\navailable CPU time and memory. After some research, we found out that there are\ntwo possibilities to restrict the resources of Docker containers:<\/p>\n<ul>\n<li>Docker Swarm<\/li>\n<li>Kubernetes<\/li>\n<\/ul>\n<p>Unfortunately, it is not possible to restrict resources using Docker-Compose,\nwhich we used up until now. Since we already configured the whole server and the\nadditional monitoring containers in a docker-compose.yml file, we decided to try\nthe Docker Swarm approach because Swarm can also be configured using\ndocker-compose files. This means that we did not have to create a new\nconfiguration. Additionally, deploying the server and the monitoring components\ninto a Kubernetes cluster would have resulted in even more work and would\nhave increased the incomplexity.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_limit_compute.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"959\" data-attachment-id=\"22490\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/swarm_limit_compute\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_limit_compute.png\" data-orig-size=\"1888,1768\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"swarm_limit_compute\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_limit_compute-1024x959.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_limit_compute-1024x959.png\" alt=\"\" class=\"wp-image-22490\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_limit_compute-1024x959.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_limit_compute-300x281.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_limit_compute-768x719.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_limit_compute-1536x1438.png 1536w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_limit_compute.png 1888w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h2>Learning a New Programming Language<\/h2>\n<p>One of the project goals we had was to learn the programming language Go. As it\nis the case with every new language you learn, there are certain points where\nyou run into problems. This included getting to know some new concepts that we\nfind quite interesting. Of course, we do not want to withhold these.<\/p>\n<h3>Goroutines<\/h3>\n<p>Within the implementation of the client, we realized that we have to parallelize\nsending and receiving messages. In other languages you would start\nthem in two threads &#8212; but Go has something else: Goroutines.<\/p>\n<blockquote>\n<p>A goroutine is a lightweight thread in GoLang. It can continue its work\nalongside the main goroutine and thus creating concurrent execution.<\/p>\n<p>&#8212; https:\/\/golangdocs.com\/goroutines-in-golang<\/p>\n<\/blockquote>\n<p>In addition to sending and receiving messages, we used goroutines to\nparallelize the clients that will be started by the load test client. Running the\ndifferent operations as goroutines was not that complicated but when stopping\nthe different applications, we stumbled accross several problems.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/kill_ltc_csv_error.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"445\" data-attachment-id=\"22491\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/kill_ltc_csv_error\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/kill_ltc_csv_error.png\" data-orig-size=\"3640,1582\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"kill_ltc_csv_error\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/kill_ltc_csv_error-1024x445.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/kill_ltc_csv_error-1024x445.png\" alt=\"\" class=\"wp-image-22491\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/kill_ltc_csv_error-1024x445.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/kill_ltc_csv_error-300x130.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/kill_ltc_csv_error-768x334.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/kill_ltc_csv_error-1536x668.png 1536w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/kill_ltc_csv_error-2048x890.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>One of the errors appeared when we stopped the load test client. Most of the\ntimes we stopped it, the CSV file would contain incomplete lines. This was when\nwe realized that the goroutines don&#8217;t shut down gracefully. They will be stopped\nregardless of the fact that they still perform some task. Additionally, the server\nshowed a lot of error logs since the web socket connections of the clients\nweren&#8217;t closed gracefully.<\/p>\n<h3>Introducing WaitGroups<\/h3>\n<p>While investigating the errors, we realized that we didn&#8217;t stop the goroutines\ngracefully. We found several blog entries that used so-called\n&quot;WaitGroups&quot;. A WaitGroup is basically a module that can be used to block until\ndifferent tasks finish. Here is an example that shows how we used the WaitGroups\nto shut down the clients in the load test client in a controlled manner.<\/p>\n<p>First of all you have to instantiate a WaitGroup. In the next step, you have to\nadd the number of tasks that you will wait for. In our case, we want to create\nall clients for the chat. Therefore, the number of tasks we have to wait for\nequals the number of clients.<\/p>\n<pre><code class=\"language-go\" data-line=\"\">waitGroup := &amp;sync.WaitGroup{}\nwaitGroup.Add(numOfClients)\n\nvar cancelFuncs []*context.CancelFunc\n<\/code><\/pre>\n<p>The cancel functions will be used to notify the clients that they have to shut\ndown. For each client we create, a new cancel function will be added to the\nlist. In order to notify the main Goroutine that a client task is completed, the WaitGroup instance has\nto be handed over as well \u2013 so the side routine can call &quot;Done()&quot; on the WaitGroup instance.<\/p>\n<pre><code class=\"language-go\" data-line=\"\">for i := 0; i &lt; *numOfClents; i++ {\n  ctx, cancelFunc := context.WithCancel(context.Background())\n  cancelFuncs = append(cancelFuncs, &amp;cancelFunc)\n\n  go func() {\n    chatClient := client.Client{\n      Context:          ctx,\n      WaitGroup:        waitGroup,\n      ServerUrl:        *serverUrl,\n      CloseConnection:  closeConnection,\n      IsLoadTestClient: *loadTest,\n      MsgFrequency:     *msgFrequency,\n      MsgSize:          *msgSize,\n      MsgEvents:        msgEvents,\n      Room:             room,\n    }\n\n    err := chatClient.Start()\n    if err != nil {\n      log.Fatalf(&quot;%v&quot;, err)\n    }\n  }()\n}\n<\/code><\/pre>\n<p>Now, the client has a cancel function and a WaitGroup, but how does this help\nus when shutting down the application? In order to understand this, let&#8217;s assume that the load test\nclient application was cancelled by pressing &quot;CTRL+C&quot;.<\/p>\n<pre><code class=\"language-go\" data-line=\"\">\/\/ Listen to system interrupts -&gt; program will be stopped\nsysInterrupt := make(chan os.Signal, 1)\nsignal.Notify(sysInterrupt, os.Interrupt)\n\n&lt;-sysInterrupt\nlog.Println(&quot;Shutting down clients...&quot;)\n\nfor _, cancelFunc := range cancelFuncs {\n  (*cancelFunc)()\n}\nwaitGroup.Wait()\n<\/code><\/pre>\n<p>As shown in the implementation above, the first step is to notify the clients\nthat they have to shut down. Therefore, the cancel functions that were stored in\nthe list will be executed. Afterwards, the <code class=\"\" data-line=\"\">waitGroup.Wait()<\/code> blocks until all\nclients completed their work. On the client side, the main Goroutine blocks until the cancel function is\nexecuted:<\/p>\n<pre><code class=\"language-go\" data-line=\"\">\/\/ Waiting for shutdown...\n&lt;-client.Context.Done()\n<\/code><\/pre>\n<p>When this happens, the client can disconnect gracefully and shutdown the\nGoroutines it started. When everything is done, the client executes\n<code class=\"\" data-line=\"\">waitGroup.Done()<\/code>, signaling the task is completed. Afterwards, the Goroutine\nwill terminate.<\/p>\n<p>After all clients were shut down, the <code class=\"\" data-line=\"\">waitGroup.Wait()<\/code> will complete. In our\ncase, we still had to shut down the CSV write, which we implemented in the same\nway. First, the writer will be informed that it should terminate by executing its\ncancel function. When the whole data is written to the file, the writer will\nexecute the <code class=\"\" data-line=\"\">waitGroup.Done()<\/code> method. The load test client blocks until this\ntimepoint and then terminates completely.<\/p>\n<p>This way, we could on the one hand terminate the websocket connection gracefully\nso that the error log of the server is not flooded. On the other hand we could\nmake sure that all incoming messages were logged and written to the CSV file.<\/p>\n<h2>Plotting Some Huge CSV Files<\/h2>\n<p>To analyze our first load test, we wanted to plot the results recorded\nby our load test client. In the first iteration, we took a graph rendering\nlibrary for Go and parsed our huge CSV files that had a filesize upwards of\n500MB per file. Since the library was designed to render graphs that are\nresponsive and presented on the web, it was nearly impossible to get a browser\nto render the graphs we wanted with the amount of data points we had.<\/p>\n<p>Since we knew that the landscape of plotting and data analysis tools in Python\nis enormously large, we started to parse our CSV files and render the resulting\nplots in Python using Pandas and Matplotlib. The rendering turned out much\neasier, especially because Matplotlib can render static images that can be\nembedded everywhere, including this blog and our presentation slides.<\/p>\n<p>However, processing the CSV files to calculate the latency of every single\nmessage and aggregating this data took a long time. A really, really long time.\nIn fact, it took so long to process the CSV files from a single load test that\nwe re-implemented the CSV parsing, calculating the latencies, counting the\nmessages and aggregating the results in Go, let it all run and the results were\n<strong>still<\/strong> available before Python was finished to parse a bunch of CSV files.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-logs.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"883\" data-attachment-id=\"22492\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/go-vs-python-logs\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-logs.png\" data-orig-size=\"1066,919\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"go-vs-python-logs\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-logs-1024x883.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-logs-1024x883.png\" alt=\"\" class=\"wp-image-22492\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-logs-1024x883.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-logs-300x259.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-logs-768x662.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-logs.png 1066w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>Our Go CSV processor massively made use of parallelism with Goroutines \u2013 which\nactually caused the fans in my notebook with an M1 Pro SoC to audibly spin for\nthe first time since I bought it.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-activity.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"718\" data-attachment-id=\"22493\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/go-vs-python-activity\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-activity.png\" data-orig-size=\"1072,752\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"go-vs-python-activity\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-activity-1024x718.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-activity-1024x718.png\" alt=\"\" class=\"wp-image-22493\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-activity-1024x718.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-activity-300x210.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-activity-768x539.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/go-vs-python-activity.png 1072w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>Our final rendering pipeline constisted of the load test client written in Go\nthat would record a procotol of every load test in a large CSV file. The CSV\nprocessor, also written in Go, would then parse those large CSV files, calculate\nlatencies and count messages. It would then aggregate the results and store it\nin a smaller CSV file. This file would then be parsed by a Python script to\nrender the graphs you see in this blog post using Matplotlib.<\/p>\n<h2>Performing the First Load Test<\/h2>\n<p>For our first actual load test, we wanted to get a feeling for the magnitude we\nwere playing with. We started very small \u2013 with 10 clients that would send and\nreceive messages simultaneously. After that worked fine, we tried 20 clients.\nThen 30. 40. We went up like this until we reached the CPU limit we artificially\nrestricted the container.<\/p>\n<p>Starting at 90 clients, the message processing time jumps way up to about 10\nseconds in the 99th percentile. The 95th and 90th percentile jitter around at\nabout 3 and 2 seconds. With 100 clients, the 95th and 90th percentile go up to 9\nand 7 seconds. Even the 10th percentile is now at over 1 second. The next two\nimages show the message processing time and the CPU load for our first load\ntest. You can see for yourself how the increasing amount of clients slowly melds\naway our server&#8217;s ability to handle every message in an acceptable timeframe.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-cpu.png\"><img loading=\"lazy\" decoding=\"async\" width=\"824\" height=\"303\" data-attachment-id=\"22494\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-1-cpu\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-cpu.png\" data-orig-size=\"824,303\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-1-cpu\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-cpu.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-cpu.png\" alt=\"\" class=\"wp-image-22494\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-cpu.png 824w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-cpu-300x110.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-cpu-768x282.png 768w\" sizes=\"auto, (max-width: 824px) 100vw, 824px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-mpt.png\"><img loading=\"lazy\" decoding=\"async\" width=\"825\" height=\"304\" data-attachment-id=\"22495\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-1-mpt\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-mpt.png\" data-orig-size=\"825,304\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-1-mpt\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-mpt.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-mpt.png\" alt=\"\" class=\"wp-image-22495\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-mpt.png 825w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-mpt-300x111.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-1-mpt-768x283.png 768w\" sizes=\"auto, (max-width: 825px) 100vw, 825px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>The next images show the evaluated results measured by the load test client. The\nfirst image shows the latency of a message. To measure the latency, the load\ntest client records when it has sent a message and also when it receives it\nback. The latency is plotted by percentiles per load test configuration. Since\neverything above 2 seconds would be unusable in reality, we decided to truncate\nthe graph at this limit. As you can see, the latency starts to noticably\nincrease with 80 clients where the 90th percentile is over a second. With 90,\n100 and 110 clients, the latencies in mostly all percentiles go up beyond the 2\nseconds limit.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"864\" height=\"504\" data-attachment-id=\"22496\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/percentiles-lim-2000_test-1\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-1.png\" data-orig-size=\"864,504\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"percentiles-lim-2000_test-1\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-1.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-1.png\" alt=\"\" class=\"wp-image-22496\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-1.png 864w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-1-300x175.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-1-768x448.png 768w\" sizes=\"auto, (max-width: 864px) 100vw, 864px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>The next graph shows the mean latency of every configuration with the standard\ndeviation. For 100 clients, the average latency is over 40 seconds!<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" data-attachment-id=\"22497\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/latency_test-1\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-1.png\" data-orig-size=\"1152,648\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"latency_test-1\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-1-1024x576.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-1-1024x576.png\" alt=\"\" class=\"wp-image-22497\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-1-1024x576.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-1-300x169.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-1-768x432.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-1.png 1152w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>In the last graph, we can compare the number of sent messages to the number of\nreceived ones. This is useful to determine if the server delivers every message\nor if it skips some. As you can see, the number of received messages gets lower\nthan the number of sent messages starting with 90 clients. With 100 clients, we\nonly receive about half of the messages back within the load test. Note that the\nserver would potentially deliver these &quot;lost&quot; messages after the load vanishes\nand it can catch up. However, for our test setup these late messages are not\nrelevant.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"864\" height=\"504\" data-attachment-id=\"22498\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/sent-vs-received_test-1\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-1.png\" data-orig-size=\"864,504\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"sent-vs-received_test-1\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-1.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-1.png\" alt=\"\" class=\"wp-image-22498\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-1.png 864w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-1-300x175.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-1-768x448.png 768w\" sizes=\"auto, (max-width: 864px) 100vw, 864px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h3>Learnings From the First Load Test<\/h3>\n<p>The first load test gave us a baseline to work with \u2013 we found our first\nbottleneck. We realized that the jitter in the live monitoring makes it quite\ndifficult to recognize new plateaus. We also saw that the message processing rate\nincreases sharply when a bottleneck is in reach. From the last graph above, we saw\nthat the server does not deliver every message back to the client under heavy load\n\u2013 at least while still under load. One other thing that we took away was that the\npreparations for an actual load test took <em>way longer<\/em> than we had anticipated.<\/p>\n<h2>Improving the Server<\/h2>\n<p>The next step was to implement the changes we decided on to improve the server.\nSince we decided to change the business concept of one global chat to several\nchat rooms, we first had to create a concept. There were basically two main\nideas:<\/p>\n<ul>\n<li>Implement a handshake between the server and the client that includes adding\nthe client to a specific chat room.<\/li>\n<li>Including the chat ID in the URL, insprired by Jitsi.<\/li>\n<\/ul>\n<p>The first one would result in a huge refactoring and additional implementation\nefforts. Furthermore, we might be able to balance the load later on \u2013 based on\nthe room id that would be included in the URL in with the second approach. Due\nto these reasons, we decided to use the URL approach.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/chat_rooms.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"889\" data-attachment-id=\"22499\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/chat_rooms\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/chat_rooms.png\" data-orig-size=\"1416,1230\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"chat_rooms\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/chat_rooms-1024x889.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/chat_rooms-1024x889.png\" alt=\"\" class=\"wp-image-22499\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/chat_rooms-1024x889.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/chat_rooms-300x261.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/chat_rooms-768x667.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/chat_rooms.png 1416w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h2>(Docker) Networking at Its Finest<\/h2>\n<h3>VPN<\/h3>\n<p>After implementing the necessary changes, we deployed the updated server and\ntried to run our second load test. The first test with two and four rooms with\n25 clients each worked totally fine. When we tried to run more than 150 clients,\nwe ran into a problem, though&#8230; Each time we started the 151th client, the load\ntest client application crashed. WHAT THE HACK?!<\/p>\n<p>In order to access and manage the server running at the university, we had to be\nlogged into a VPN. The SSH port of our VM is not accessible from the outside\nworld. While investigating the error, we also tested starting the clients on a\nlocal machine that was not connected to the VPN. The clients started flawlessly\nand more than 150 connections were no problem. Since we used port 8080 on the\nserver, traffic needed to go through the VPN to reach it, though.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/150_clients_VPN.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"575\" data-attachment-id=\"22500\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/150_clients_vpn\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/150_clients_VPN.png\" data-orig-size=\"3360,1886\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"150_clients_VPN\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/150_clients_VPN-1024x575.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/150_clients_VPN-1024x575.png\" alt=\"\" class=\"wp-image-22500\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/150_clients_VPN-1024x575.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/150_clients_VPN-300x168.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/150_clients_VPN-768x431.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/150_clients_VPN-1536x862.png 1536w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/150_clients_VPN-2048x1150.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>The thing is that we needed the VPN to access the Grafana dashboard. The\nfirewall didn&#8217;t allow outside traffic to other ports than 80 and 443. Thus, we\nhad to change the server configuration to run on port 80 so that a load test\nclient would be able to connect to it without traffic running through the VPN.\nSince we also needed to keep an eye on our Grafana dashboard, we forwarded port\n3000 through an SSH connection.<\/p>\n<pre><code class=\"language-bash\" data-line=\"\">ssh -L 3000:localhost:3000 root@scale-chat.mi.hdm-stuttgart.de\n<\/code><\/pre>\n<h4>Docker DNS<\/h4>\n<p>Being fully motivated after finding the issue, we wanted to finally run the\nsecond load test. Since we weren&#8217;t sure if the server&#8217;s container was the latest\nversion, we started a rebuild of the container. This was the moment we smashed\nto the ground the second time that day.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/docker_dns.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"573\" data-attachment-id=\"22501\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/docker_dns\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/docker_dns.png\" data-orig-size=\"3292,1842\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"docker_dns\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/docker_dns-1024x573.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/docker_dns-1024x573.png\" alt=\"\" class=\"wp-image-22501\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/docker_dns-1024x573.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/docker_dns-300x168.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/docker_dns-768x430.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/docker_dns-1536x859.png 1536w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/docker_dns-2048x1146.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>For some reason the dependencies could&#8217;t be downloaded anymore&#8230; Again: WHAT\nTHE HACK?!<\/p>\n<p>We tried everything that we could think of to solve the problem. We changed the\nhost server&#8217;s DNS resolver, modified the Docker host configuration, playing\naround with firewall rules. Of course, we rebooted the machine a few times as\nwell \u2013 but nothing would help. After investing the problem and researching for\nquite some time, we found a hint that this error might occur due to a hung up\n<code class=\"\" data-line=\"\">docker0<\/code> network bridge. After resetting all Docker networking on the machine,\nthe problem was indeed gone and we could finally start to run our second load\ntest.<\/p>\n<h2>Performing the Second Load Test<\/h2>\n<p>For the second load test, we wanted to try a few different combinations of\nclients per room and room count. We began with 25 rooms and 2 clients per room.\nWe then increased the number of clients per room until we hit our CPU limit\nagain. When we hit the limit, we had 16 clients in 25 rooms \u2013 making for 400\ntotal clients served.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mpt.png\"><img loading=\"lazy\" decoding=\"async\" width=\"815\" height=\"344\" data-attachment-id=\"22502\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-2-1-mpt\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mpt.png\" data-orig-size=\"815,344\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-2-1-mpt\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mpt.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mpt.png\" alt=\"\" class=\"wp-image-22502\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mpt.png 815w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mpt-300x127.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mpt-768x324.png 768w\" sizes=\"auto, (max-width: 815px) 100vw, 815px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mr.png\"><img loading=\"lazy\" decoding=\"async\" width=\"814\" height=\"341\" data-attachment-id=\"22503\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-2-1-mr\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mr.png\" data-orig-size=\"814,341\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-2-1-mr\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mr.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mr.png\" alt=\"\" class=\"wp-image-22503\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mr.png 814w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mr-300x126.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-mr-768x322.png 768w\" sizes=\"auto, (max-width: 814px) 100vw, 814px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-cpu.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1015\" height=\"208\" data-attachment-id=\"22504\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-2-1-cpu\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-cpu.png\" data-orig-size=\"1015,208\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-2-1-cpu\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-cpu.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-cpu.png\" alt=\"\" class=\"wp-image-22504\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-cpu.png 1015w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-cpu-300x61.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-1-cpu-768x157.png 768w\" sizes=\"auto, (max-width: 1015px) 100vw, 1015px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>Since we hit the limit with 16 clients per room, next we lowered the room size\ndown to 8 clients per room and tried 50 rooms. Then, 12 clients per room with 50\nrooms. We went back down to 8 clients per room and instead increased the amount\nof rooms to 75. Now, we had hit a limit again.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-mpt.png\"><img loading=\"lazy\" decoding=\"async\" width=\"799\" height=\"332\" data-attachment-id=\"22505\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-2-2-mpt\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-mpt.png\" data-orig-size=\"799,332\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-2-2-mpt\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-mpt.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-mpt.png\" alt=\"\" class=\"wp-image-22505\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-mpt.png 799w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-mpt-300x125.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-mpt-768x319.png 768w\" sizes=\"auto, (max-width: 799px) 100vw, 799px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-cpu.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1015\" height=\"213\" data-attachment-id=\"22506\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-2-2-cpu\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-cpu.png\" data-orig-size=\"1015,213\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-2-2-cpu\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-cpu.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-cpu.png\" alt=\"\" class=\"wp-image-22506\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-cpu.png 1015w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-cpu-300x63.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-2-cpu-768x161.png 768w\" sizes=\"auto, (max-width: 1015px) 100vw, 1015px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>Finally, we wanted to see how far we can scale the room count and went back down\nto 4 clients per room. We began with 100 rooms. Then 150, where we saw a massive\nincrease in the message processing time in the 99th percentile (up to 10 seconds\nagain). When looking at the CPU usage though, we did not hit the limit yet.\nHowever, we instead now hit a memory limit because our chat server keeps a chat\nhistory that filled up the available memory. We continued to test with 175 and\n200 rooms, until we hit the CPU limit as well and decided that it was enough.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-mpt.png\"><img loading=\"lazy\" decoding=\"async\" width=\"801\" height=\"333\" data-attachment-id=\"22507\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-2-3-mpt\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-mpt.png\" data-orig-size=\"801,333\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-2-3-mpt\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-mpt.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-mpt.png\" alt=\"\" class=\"wp-image-22507\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-mpt.png 801w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-mpt-300x125.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-mpt-768x319.png 768w\" sizes=\"auto, (max-width: 801px) 100vw, 801px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-cpu.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1016\" height=\"211\" data-attachment-id=\"22508\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-2-3-cpu\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-cpu.png\" data-orig-size=\"1016,211\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-2-3-cpu\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-cpu.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-cpu.png\" alt=\"\" class=\"wp-image-22508\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-cpu.png 1016w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-cpu-300x62.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-cpu-768x159.png 768w\" sizes=\"auto, (max-width: 1016px) 100vw, 1016px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-memory.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1013\" height=\"426\" data-attachment-id=\"22509\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-2-3-memory\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-memory.png\" data-orig-size=\"1013,426\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-2-3-memory\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-memory.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-memory.png\" alt=\"\" class=\"wp-image-22509\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-memory.png 1013w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-memory-300x126.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-2-3-memory-768x323.png 768w\" sizes=\"auto, (max-width: 1013px) 100vw, 1013px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>Now, let&#8217;s have a look at the visualized load test results. The latency\npercentile graph confirms what we saw earlier in the Grafana dashboard. The\nconfigurations where we hit the bottlenecks show latencies go up over 2 seconds\nfor everything upwards of the 60th percentile.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"864\" height=\"504\" data-attachment-id=\"22510\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/percentiles-lim-2000_test-2\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-2.png\" data-orig-size=\"864,504\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"percentiles-lim-2000_test-2\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-2.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-2.png\" alt=\"\" class=\"wp-image-22510\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-2.png 864w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-2-300x175.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-2-768x448.png 768w\" sizes=\"auto, (max-width: 864px) 100vw, 864px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>The average latency graph looks very strange for this test. The mean latency never\ngoes way beyond 4 seconds, but the standard deviation is extremely huge (partly \u00b1 26 seconds).<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" data-attachment-id=\"22511\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/latency_test-2\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-2.png\" data-orig-size=\"1152,648\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"latency_test-2\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-2-1024x576.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-2-1024x576.png\" alt=\"\" class=\"wp-image-22511\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-2-1024x576.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-2-300x169.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-2-768x432.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-2.png 1152w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>For the second load test, we see that no messages are getting lost when\ncomparing the counts of sent vs. received messages \u2013 except for the\nconfiguration with 75 rooms with 8 clients each.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"864\" height=\"504\" data-attachment-id=\"22512\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/sent-vs-received_test-2\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-2.png\" data-orig-size=\"864,504\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"sent-vs-received_test-2\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-2.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-2.png\" alt=\"\" class=\"wp-image-22512\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-2.png 864w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-2-300x175.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-2-768x448.png 768w\" sizes=\"auto, (max-width: 864px) 100vw, 864px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h3>Learnings From the Second Load Test<\/h3>\n<p>In the second load test, we saw that splitting clients into multiple chat rooms\nreduces server load dramatically \u2013 without even scaling the server itself. With\nmor smaller chat rooms, more total clients can be served. We also saw that the\nmessage processing time in the 90th, 95th and 99th percentile was &gt;2 seconds\nlong before the CPU limit was reached. We identified memory as another\nbottleneck with 150 rooms and 4 clients per room. Of course, we provoked the\nserver to run in a memory bottleneck by storing the message history in an array.\nThe idea here was to see when this implementation error becomes noticeable.\nLater in the analysis, we noticed that our Prometheus data had a gap in the\nmessage rate for 25 rooms and 16 clients per room. At the time of the gap, the\nCPU usage of the server container reaches the CPU limit. We think the data gap\nmight be caused by the huge load which prevents the server from responding to\nPrometheus&#8217; scrape requests on its interal monitoring port.<\/p>\n<h2>Scaling the Server<\/h2>\n<p>The next task was to scale the server horizontally. Scaling the actual container\nto two replicas was the easiest task, since Docker Swarm can create replicas by\nits own. The next thing was to split the load between the two server instances.\nIn addition to creating the replicas, Docker Swarm deploys a load balancer that\nloadbalances all requests to the containers using round robin.<\/p>\n<p>Since we had the idea to loadbalance the clients based on their chat room that\nis included in the URL, we thought about changing the loadbalancer. One of the\nproblems we had was that a lot of proxies did&#8217;t support the possibility of\nsticking a specific URI to one server instance. Therefore, we couldn&#8217;t find\nexamples of previous proxy configurations that helped us. The only thing was the\nHAProxy configuration of a horizontally-scaled Jitsi deployment \u2013 but when we\ntried to adapt their configuration to our setup, we failed as well.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_scale.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"668\" data-attachment-id=\"22513\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/swarm_scale\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_scale.png\" data-orig-size=\"2418,1578\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"swarm_scale\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_scale-1024x668.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_scale-1024x668.png\" alt=\"\" class=\"wp-image-22513\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_scale-1024x668.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_scale-300x196.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_scale-768x501.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_scale-1536x1002.png 1536w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/swarm_scale-2048x1337.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>In the end, we decided to use the load balancer that is included in Docker Swarm,\neven if it would loadbalance the traffic round robin. Since the requests now\nwill be loadbalanced round robin, clients in the same chat room might connect to\ndifferent servers. With the current implementation, this would result in two\nclients that would&#8217;t receive messages from each other. In order to make this\npossible, the two server instances had to exchange all messages they receive with\neach other.<\/p>\n<p>To solve the problem, we added a component to the server that we named\n<code class=\"\" data-line=\"\">Distributor<\/code>. Its task is to send all messages received from clients to the\nother server instances. Additionally, the <code class=\"\" data-line=\"\">Distributor<\/code> is capable of receiving\nmessages that were sent by the other server&#8217;s <code class=\"\" data-line=\"\">Distributors<\/code>. To exchange\nmessages between the server instances, we use the <em>publish &amp; subscribe<\/em> functionality\nprovided by Redis. Both of us have been working with Redis as a key-value store\nbefore. Therefore, we wanted to try out this additional feature.<\/p>\n<h2>The Moment We Broke Up With Docker Swarm<\/h2>\n<p>At this stage, we thought that we solved all problems and that we are now capable\nof scaling the server for additional load. Therefore, we started another\nload test. First, everything went fine \u2013 but after some minutes we realized that\nthe values we saw in the monitoring dashboard did&#8217;t make sense. The number of incoming and\noutgoing messages didn&#8217;t meet our expectations. The values matched the\nexpectations for one server instance but not two&#8230;<\/p>\n<p>After taking a closer look at how Prometheus scrapes the servers and with an eye\non the Docker Swarm deployment, we realized that Prometheus ist just requesting\none server. More specific, it requested the metrics alternately from the two\nserver instances \u2013 because the requests were load balanced round robin through\nthe Docker Swarm load balancer as well&#8230;<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/prom_round_robin.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"706\" data-attachment-id=\"22514\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/prom_round_robin\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/prom_round_robin.png\" data-orig-size=\"2288,1578\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"prom_round_robin\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/prom_round_robin-1024x706.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/prom_round_robin-1024x706.png\" alt=\"\" class=\"wp-image-22514\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/prom_round_robin-1024x706.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/prom_round_robin-300x207.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/prom_round_robin-768x530.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/prom_round_robin-1536x1059.png 1536w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/prom_round_robin-2048x1412.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>While researching for a solution, we found out that Prometheus is capable of\ndiscovering targets in a Docker Swarm. This would have the advantage that we can\nscale the server and the monitoring would adapt automatically without the need\nfor further configuration. The only problem was that we didn&#8217;t manage to get the\nauto discovery to work&#8230;<\/p>\n<p>What now? After some discussions and the fact that the project had to come to an\nend, we decided to go the dirty path. This included setting up a second server\ninstace manually and adding the second instance&#8217;s DNS name to the Prometheus\nscraping list. Since we couldn&#8217;t use the Docker Swarm load balancer anymore, we\nhad to deploy one by ourselves. Here we decided to use Traefik since we have\nmade some experience with it in the past \u2013 and we liked the possibility to\nconfigure it with Docker labels.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/lb_traefik.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"706\" data-attachment-id=\"22515\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/lb_traefik\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/lb_traefik.png\" data-orig-size=\"2288,1578\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"lb_traefik\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/lb_traefik-1024x706.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/lb_traefik-1024x706.png\" alt=\"\" class=\"wp-image-22515\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/lb_traefik-1024x706.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/lb_traefik-300x207.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/lb_traefik-768x530.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/lb_traefik-1536x1059.png 1536w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/lb_traefik-2048x1412.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>We know that this solution is not the nicest, but since we really wanted to run\na load test with both server instances, this was the fastest option for us.<\/p>\n<h2>The Third Load Test<\/h2>\n<p>Now that we finally had a working deployment with two server instances, we were\nexcited to run the third load test. Like in the last round of the last test, we\nused 4 clients per room and started with 150 rooms. We scaled the number of\nrooms up to 175 without a problem. As you can see in the Grafana graph below,\nthe message processing time increased and got more jitter with 200 rooms. With\n250 rooms, it increases in the 10th and 50th percentile, but stays on the same\nlevel for the 99th percentile. What&#8217;s weird is that the graph gets extremely\nflat \u2013 the jitter is completely gone.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mpt.png\"><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"353\" data-attachment-id=\"22518\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-3-mpt\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mpt.png\" data-orig-size=\"936,353\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-3-mpt\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mpt.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mpt.png\" alt=\"\" class=\"wp-image-22518\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mpt.png 936w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mpt-300x113.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mpt-768x290.png 768w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>When we had a look at the message rate, we saw a similar effect. The rates for\n200 rooms bounce around so that it almost looks like a chalk painting done by a\nsmall child. With 250 rooms, the jitter is almost gone.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr.png\"><img loading=\"lazy\" decoding=\"async\" width=\"932\" height=\"352\" data-attachment-id=\"22519\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-3-mr\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr.png\" data-orig-size=\"932,352\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-3-mr\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr.png\" alt=\"\" class=\"wp-image-22519\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr.png 932w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr-300x113.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr-768x290.png 768w\" sizes=\"auto, (max-width: 932px) 100vw, 932px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>When we filtered for messages that were coming from our Redis message\nsubscription, we saw almost nothing. The incoming message rate from Redis was at\nabout 1.4 messages per 10 seconds. After we both were quite suprised by this\nbehavior, we suspected that our Redis container might be working at full capacity.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr-distributor.png\"><img loading=\"lazy\" decoding=\"async\" width=\"933\" height=\"353\" data-attachment-id=\"22520\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-3-mr-distributor\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr-distributor.png\" data-orig-size=\"933,353\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-3-mr-distributor\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr-distributor.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr-distributor.png\" alt=\"\" class=\"wp-image-22520\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr-distributor.png 933w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr-distributor-300x114.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-mr-distributor-768x291.png 768w\" sizes=\"auto, (max-width: 933px) 100vw, 933px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>But when looking at the CPU usage of the redis container and both server containers, everything was fine. They all were below 15%.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-cpu.png\"><img loading=\"lazy\" decoding=\"async\" width=\"935\" height=\"353\" data-attachment-id=\"22521\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-3-cpu\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-cpu.png\" data-orig-size=\"935,353\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-3-cpu\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-cpu.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-cpu.png\" alt=\"\" class=\"wp-image-22521\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-cpu.png 935w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-cpu-300x113.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-cpu-768x290.png 768w\" sizes=\"auto, (max-width: 935px) 100vw, 935px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>After some digging around we finally opened <code class=\"\" data-line=\"\">htop<\/code> on the host server to see how our\nhost machine was doing. The CPU usage was at 100%, the load average at 7.75. Dammit!<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-htop.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"771\" data-attachment-id=\"22522\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-3-htop\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-htop.png\" data-orig-size=\"1720,1295\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-3-htop\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-htop-1024x771.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-htop-1024x771.png\" alt=\"\" class=\"wp-image-22522\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-htop-1024x771.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-htop-300x226.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-htop-768x578.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-htop-1536x1156.png 1536w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-htop.png 1720w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>We realized that we hadn&#8217;t really looked at the hardware specs of the machine\nthat we loaned from the university for these tests. One glance at the CPU info\nwas enough for us to plunge into disbelief. What we had here in front of us was\na virtual machine with only one CPU core with 2.4 GHz.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-sysinfo.png\"><img loading=\"lazy\" decoding=\"async\" width=\"993\" height=\"1024\" data-attachment-id=\"22523\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/load-test-3-sysinfo\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-sysinfo.png\" data-orig-size=\"1154,1190\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"load-test-3-sysinfo\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-sysinfo-993x1024.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-sysinfo-993x1024.png\" alt=\"\" class=\"wp-image-22523\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-sysinfo-993x1024.png 993w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-sysinfo-291x300.png 291w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-sysinfo-768x792.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/load-test-3-sysinfo.png 1154w\" sizes=\"auto, (max-width: 993px) 100vw, 993px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>We simply did not think far enough. We had not one looked at the hardware specs\nof our testing machine when planning a load test like this. Ouch!\nThat won&#8217;t ever happen to us again.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/facepalm.gif\"><img loading=\"lazy\" decoding=\"async\" width=\"480\" height=\"266\" data-attachment-id=\"22524\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/facepalm\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/facepalm.gif\" data-orig-size=\"480,266\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"facepalm\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/facepalm.gif\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/facepalm.gif\" alt=\"\" class=\"wp-image-22524\"\/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>For the sake of completeness, let&#8217;s have a quick look at the load test client&#8217;s\nvisualized results. As expected, we can see that the latency exceeds our 2\nsecond treshhold in the 80th percentile for 200 rooms with 4 clients per room.\nFor 250 rooms, even the 10th percentile exceeds the 2 second mark.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"864\" height=\"504\" data-attachment-id=\"22525\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/percentiles-lim-2000_test-3\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-3.png\" data-orig-size=\"864,504\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"percentiles-lim-2000_test-3\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-3.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-3.png\" alt=\"\" class=\"wp-image-22525\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-3.png 864w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-3-300x175.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/percentiles-lim-2000_test-3-768x448.png 768w\" sizes=\"auto, (max-width: 864px) 100vw, 864px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>As you can see in the mean latency graph, the average latency for 250 rooms was\nat about 80 seconds with a standard deviation of over 40 seconds.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" data-attachment-id=\"22526\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/latency_test-3\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-3.png\" data-orig-size=\"1152,648\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"latency_test-3\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-3-1024x576.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-3-1024x576.png\" alt=\"\" class=\"wp-image-22526\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-3-1024x576.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-3-300x169.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-3-768x432.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/latency_test-3.png 1152w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>Finally, it won&#8217;t come to a surprise that some messages got lost when comparing\nmessage counts for 200 and 250 rooms.<\/p>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"864\" height=\"504\" data-attachment-id=\"22527\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/21\/scaling-a-basic-chat\/sent-vs-received_test-3\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-3.png\" data-orig-size=\"864,504\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"sent-vs-received_test-3\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-3.png\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-3.png\" alt=\"\" class=\"wp-image-22527\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-3.png 864w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-3-300x175.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/sent-vs-received_test-3-768x448.png 768w\" sizes=\"auto, (max-width: 864px) 100vw, 864px\" \/><\/a><\/figure>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h2>Conclusion<\/h2>\n<p>This project really took an unexpected turn. In the beginning, we wanted to see\nhow stateful WebSocket connections can be scaled and thought, we would set up a\nsimple load test. And while we&#8217;re at it, we could do it in Go and learn a new\nprogramming language along the way.\nWe now know that load testing software is a lot more work than we originally\nanticipated. And when you&#8217;re simultaniously learning a new programming language\nwith new concepts and paradigms, it can sometimes be a little overwhelming.<\/p>\n<p>To compensate for our newfound knowledge gaps, we shifted the focus of our\nproject on monitoring and load testing in a more general view. We learned <em>a\nlot<\/em> from this project \u2013 just not what we first thought we would when we set out\nto do this. Scaling issues with WebSocket connections, which were our initial\nimpulse for the project, never became a problem since we never got to this\nbottleneck. However, we encountered lots of other challenges during the project.\nOften times, problems will arise that are outside of your comfort zone when\ndoing something like this. But we overcame every single one. And we took away a\nlot of knowledge along the way.<\/p>\n<p>What would have been interesting is to execute load tests with changes in the message size and frequency. Or test with different loads for each client, creating &quot;hot keys&quot;.\nUnfortunately, we didn&#8217;t have enough time to try all of that.<\/p>\n<p>If there is one thing to remember from this blog post though: <strong>always<\/strong> inspect\nyour hardware specs thoroughly to see what your machines are capable of\n<strong>before<\/strong> planning a load test of any kind.<\/p>\n<p>At the very end, a quick word on that new language we learned: Go is an awesome\nlanguage and we like it very much. Although you have to give up some convenience\nyou might be accustomed to. And you have to get used to things like instant\nerror handling everywhere. Looking at you, <code class=\"\" data-line=\"\">if err != nil<\/code>. But other then that,\nGo is great and we encourage everyone to try it.<\/p>\n<\/div>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Authors: Max Merz \u2014 merzmax.de, @MrMaxMerzMartin Bock \u2014 martin-bock.com, @martbock<\/p>\n","protected":false},"author":922,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1,22,651,2],"tags":[574,3,342,15,573,575],"ppma_author":[836],"class_list":["post-22123","post","type-post","status-publish","format-standard","hentry","category-allgemein","category-student-projects","category-system-designs","category-system-engineering","tag-chat","tag-docker","tag-docker-compose","tag-docker-swarm","tag-scale","tag-websocket"],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":282,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2016\/03\/10\/docker-running-on-a-raspberry-pi-hypriot\/","url_meta":{"origin":22123,"position":0},"title":"Docker on a Raspberry Pi: Hypriot","author":"Jonathan Peter","date":"10. March 2016","format":false,"excerpt":"Raspberry Pis are small, cheap\u00a0and easy to come by. But what if you want to use Docker on them? Our goal was to run Docker on several Raspberry Pis and combine them to a cluster with Docker Swarm. To achieve this, we first\u00a0needed to get Docker running on the Pi.\u2026","rel":"","context":"In &quot;System Designs&quot;","block_context":{"text":"System Designs","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/system-designs\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1924,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/02\/28\/microservices-legolizing-software-development-4\/","url_meta":{"origin":22123,"position":1},"title":"Microservices \u2013 Legolizing Software Development IV","author":"Calieston Varatharajah, Christof Kost, Korbinian Kuhn, Marc Schelling, Steffen Mauser","date":"28. February 2017","format":false,"excerpt":"An automated development environment will save you. We explain how we set up Jenkins, Docker and Git to work seamlessly together.","rel":"","context":"In &quot;System Designs&quot;","block_context":{"text":"System Designs","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/system-designs\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/02\/draw_io_docker_small-1024x439.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/02\/draw_io_docker_small-1024x439.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/02\/draw_io_docker_small-1024x439.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":170,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2016\/01\/07\/more-docker-more-power-part-2-setting-up-nginx-and-docker\/","url_meta":{"origin":22123,"position":2},"title":"More docker = more power? \u2013 Part 2: Setting up Nginx and Docker","author":"Moritz Lottermann","date":"7. January 2016","format":false,"excerpt":"This is Part 2 of a series of posts. You can find Part 1 here:\u00a0https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2016\/01\/03\/more-docker-more-power-part-1-setting-up-virtualbox\/ In the first part of this series we have set up two VirtualBox machines. One functions as the load balancer and the other will house our services. As the next step we want to install\u2026","rel":"","context":"In &quot;System Designs&quot;","block_context":{"text":"System Designs","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/system-designs\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2016\/01\/1429543497dockerimg.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2016\/01\/1429543497dockerimg.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2016\/01\/1429543497dockerimg.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2016\/01\/1429543497dockerimg.png?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":27,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2015\/12\/17\/27\/","url_meta":{"origin":22123,"position":3},"title":"Docker- dive into its foundations","author":"Benjamin Binder","date":"17. December 2015","format":false,"excerpt":"Docker has gained a lot of attention over the past several years.\u00a0But not only because of its cool logo or it being\u00a0the top buzzword of managers, but also because of its useful features.\u00a0We talked about Docker quite a bit without really\u00a0understanding why it's so\u00a0great to use. So we decided to\u2026","rel":"","context":"In &quot;Databases&quot;","block_context":{"text":"Databases","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/scalable-systems\/databases\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1299,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2016\/08\/16\/exploring-docker-security-part-2-container-flaws\/","url_meta":{"origin":22123,"position":4},"title":"Exploring Docker Security &#8211; Part 2: Container flaws","author":"Patrick Kleindienst","date":"16. August 2016","format":false,"excerpt":"Now that we've understood the basics, this\u00a0second part will\u00a0cover the most relevant container threats, their possible impact as well as\u00a0existent countermeasures. Beyond that, a short overview\u00a0of the most important sources for container threats will be provided. I'm pretty sure you're not counting on most\u00a0of them. Want to know more? Container\u2026","rel":"","context":"In &quot;Secure Systems&quot;","block_context":{"text":"Secure Systems","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/system-designs\/secure-systems\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/article-1301858-0ABD7881000005DC-365_964x543.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/article-1301858-0ABD7881000005DC-365_964x543.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/article-1301858-0ABD7881000005DC-365_964x543.jpg?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/article-1301858-0ABD7881000005DC-365_964x543.jpg?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":1373,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2016\/09\/13\/exploring-docker-security-part-3-docker-content-trust\/","url_meta":{"origin":22123,"position":5},"title":"Exploring Docker Security &#8211; Part 3: Docker Content Trust","author":"Patrick Kleindienst","date":"13. September 2016","format":false,"excerpt":"This third and last part of this series intends to give an overview of Docker Content Trust, which in fact combines different frameworks and tools, namely Notary and Docker Registry v2, into a rich and powerful feature set making Docker images more secure.","rel":"","context":"In &quot;Secure Systems&quot;","block_context":{"text":"Secure Systems","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/system-designs\/secure-systems\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/Notary.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/Notary.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/Notary.jpg?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/Notary.jpg?resize=700%2C400&ssl=1 2x"},"classes":[]}],"jetpack_sharing_enabled":true,"authors":[{"term_id":836,"user_id":922,"is_guest":0,"slug":"mm312","display_name":"Max Merz","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/77136ad5eeba989d7f4d31557c36083fb3469fa5eaeb53871273c67e8f6f0b1f?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/22123","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/users\/922"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/comments?post=22123"}],"version-history":[{"count":7,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/22123\/revisions"}],"predecessor-version":[{"id":22532,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/22123\/revisions\/22532"}],"wp:attachment":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/media?parent=22123"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/categories?post=22123"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/tags?post=22123"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/ppma_author?post=22123"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}