How to scale an IoT-Platform

Written by Marvin Blessing, Michael Partes, Frederik Omlor, Nikolai Thees, Jan Tille, Simon Janik, Daniel Heinemann – for System Engineering And Management


The aim of the project was to develop a system with which the power consumption of any socket in a household is determined and stored in real time and can be read out by users with the corresponding access data. The user of the system should be able to read out the data via a web application to which he can log in with an account. This should also ensure mobile access to the data. Furthermore, the measured data should be permanently stored in a database, so that users also have the option of viewing measured values over a longer period of time.

The measurement of power consumption should be made possible with a hardware element that is attached to the socket and writes the measured values to the database described.

Fig. 1: Basic system structure

The finished system includes a hardware element and forwards the measured current to a database via a message broker. In addition, the system includes a front-end (web page) through which a user can then query this data via an individual user account.

A more detailed description of the architecture can be found in the following chapter “Architecture”.

According to there were about 3.6 million homes in Germany that used energy management applications in 2020. For the year 2025 there is an estimate of about 15.3 million homes using energy management systems. Therefore, taking competition and market growth into account, it’s not unlikely that a system like the one we created could reach 2 million homes in the near future.


For the architecture of the project, we have to take a look at the system requirements which we defined based on the use case:

  • Read power data
  • Send power data from user to our system
  • Store power data
  • Store user data
  • Read power data based on the user
  • Visualize data for users
  • Monitoring and logging
  • (Authentification)

To implement these requirements, it is crucial to understand the needs of each requirement and figure out an efficient way to connect all the necessary components.

In the first step, we made a list of the technologies that we needed to implement those components:

  • Read power data: To measure the power data we need a Raspberry Pi that has access to the sockets of the user. On the Raspberry Pi is a Python script running that collects the power data.
  • Sending data from Raspberry Pi: To send the power data from the Raspberry Pi to our main system, we decided to use a message broker, since we have to be able to send a lot of data from many different users to our system at the same time.
  • Store power data: To store the power data we use a simple timeseries Database because all data points have a timestamp.
  • Store user data: For the user authentication we use a separate Database to store the user data.
  • Read power data based on the user: To access the stored data we decided to use multiple small services which are used to perform tasks on the databases. This includes an user authentication service which checks the correctness of the provided user data, as well as a read service and a write service to access the stored power data and write new power data to the timeseries database.
  • Visualize data for users: To visualize the data for the user we use a simple web app that uses the different services to access and present the power data to the user.
  • Monitoring and logging: To check the status of our system we use some self defined metrics that we can observe through dashboards.
  • (Authentification: We want to authenticate the user securely and therefore have planned an authentication service with Keycloak. However, this is not implemented at the current time.)

The connection of these components is shown in the following architecture diagram.

Fig. 2: The general architecture of the Energymonitor application

This diagram shows that each reading or writing access only occurs through a service defined by us. There is no direct access to a database, so that we have full control over the access options that we provide.
It also allows us to use those services as microservices. So each service is independent from the other ones and can be developed, tested and scaled on its own. The last aspect is especially important for the performance of our system. It gives us the option to only really scale up as needed. This means that if many users just want to read their data, only the read service needs to be scaled up, whereas the write service is not affected and we do not waste unnecessary resources on it.

This approach follows the CQRS (Command and Query Responsibility Segregation) pattern, which separates reading and writing data from a database. Interestingly, we didn’t know about this pattern when we implemented it, and through our own deliberations we realized the benefits of separating the services and only later discovered that this approach was already known as a pattern.

Scaling & Load Test

Estimations and predicted bottlenecks

To predict bottlenecks early and to estimate the overall requirements we created a scenario to check our resources and decisions against.

First of, the resources of our virtual machine are the following:

Fig. 3: Overview of the systems resources

We also created an overview of the data that would be sent throughout our system.

This helped us evaluate which components to choose and how to test our system in order to find bottlenecks.

Estimate users (households)2.000.000
Power values per user~ 30 values (constantly changing)
Data per user~ 300 bytes / min
Max. messages per minute1 / user / min = 2.000.000 / min 
Max. Throughput @ full utilization~ 600 MB / min

The systems user would generate the following data:

UserValue per time increment
1 user~ 157,68 MB / year
2 Mio User~ 315,36 TB / year
2 Mio User~ 86,4 Mrd messages / month

With these values in mind we could reevaluate our systems components and make an informed prediction about possible bottlenecks.

These are shown in the diagram below.

Fig. 4: Predicted bottlenecks in the systems components

To tackle the bottlenecks we listed some possible solutions for scaling and reducing the overproportional utilization of single components. Some of them included specific load balancing instances, database sharding and spreading load over time.


Before we conducted a load test we made sure that we were able to see the utilization of the critical components in our system. We therefore implemented the ‘TICK’ stack.

Telegraf → Data collecting agent

InfluxDB → Time series DB

Chronograf → Data visualization

Kapacitor → (Data processing) & alerting

Fig. 5: Monitoring inside Influx DB’s ‘Telegraf’

Fig. 6: Monitoring inside Influx DB’s ‘Telegraf’

Fig. 7: Monitoring inside Influx DB’s ‘Telegraf’

We created different dashboards to monitor the current utilization and throughput, as seen above. To connect our own services we implemented logging while using a shared data format which can be easily read by Telegraf. This allowed us to add most of our components to the monitoring system.

Load Test

Initially, we were using JSON as our data format. However, due to the relatively large size of JSON messages, we decided to switch to Protocol Buffers (protobuf) instead. This was due to the promising smaller transmitting size of protobuf messages.

In our protobuf message, we utilized a container object which contained multiple measurements. Each measurement contained the measurement name, device ID, energy measurement, and timestamp. This allowed us to effectively and efficiently transmit multiple measurements in a single message while keeping the overall size of the message to a minimum. By utilizing protobuf and optimizing our message format, we were able to improve the efficiency and effectiveness of our data transmission process.

Fig. 8 General data format used for the application

In order to determine the potential traffic requirements of our system, we conducted measurements to assess the size of our messages. We found that a container object containing eleven measurements was approximately 470 bytes, while a single measurement was around 37 bytes. We also measured the improvement in size comparing the container object in JSON and in protobuf. For JSON we measured that a container had the size of around 1075 bytes and thus our improvement in size was about 50%, which in network throughput can go a long way.

Using this information, we calculated the amount of traffic that would be required if we were to scale up to 2 million messages per minute, each containing a container object. Breaking it down to seconds would mean we had to scale up to 33 thousand messages per second. With this calculation, we were able to gain insight into the potential traffic demands that our system might face at scale. The expected amount of traffic required for our scale was expected to be around 14.95 MB/s, in regards to network traffic this would be around 119.6 Mbps. Knowing the network requirements and the shortcomings of slow home networks upload performance in Germany we were required to conduct load tests either with combined forces of several home networks or to test on the virtual machine itself or at least from within the same local network. We then tried the same machine approach.

Our general approach to load testing the system involved deploying it on our virtual machine and conducting performance tests. Initially, we tested the system as a whole but found that its performance was not meeting our expectations. To identify the root cause, we checked our monitoring and validated the performance of RabbitMQ both as a standalone component and as a part of our write service.

Fig. 9: Load testing the whole system and load testing the message broker

To conduct our load tests we used the RabbitMQ Performance Test Tool (perftest). Simulating the traffic from several Raspberry Pi instances from our system architecture.

RabbitMQ Standalone
Data470 bytesOur dataOur data
Send rate Messages/s~51.000~46.000~21.000
Receive rateMessages/s~50.000~45.000~20.000

The default PerfTest result for the RabbitMQ Standalone is higher than with our dataformat. This is probably due to the internal workings of the performance tool. As for the difference with RabbitMQ Standalone versus Standalone with manual ack there’s a large difference in over 50% performance loss. Going for throughput would mean that we would have to drop our data consistency.

Full Application (RabbitMQ, WriteService & InfluxDB)
DataOur dataOur dataOur dataOur data
Send rate Messages/s~10.000~20.000~13.000~20.000
Receive rateMessages/s~5.000~19.000~6.000~20.000

For our application we see an even further decrease in messages per second. Using auto ack and four receiving threads we only achieved half the performance of the receiver from the testing tool. Also the amount of threads used for our Go application did not seem to make any difference for the throughput. For scaling we thus should rather deal with missing data and interpolation techniques. 

Achieving only half the performance kept us wondering, which is why we took a look on our monitoring dashboards. At first glance either the WriteService or InfluxDB somehow became our bottleneck.

Fig. 10: InfluxDB writes during load test

The InfluxDB dashboard clearly shows a write peak of ~61.000 writes per seconds around the performance test.

Looking at the dashboard for our RabbitMQ we can see the performance around 20k messages per second as stated before. A slight difference is visible between ack and delivery rate of around 400 messages per second. Also the amount of acked messages did not match the messages sent through RabbitMQ. The amount of acked messages is equivalent to 11 * 20.000 ⇒ 220.000 rows per second, which our InfluxDB clearly didn’t reach.

Fig. 11: Queue deliver and ack rate during load test.

We could see that we also validated the performance of our write service and InfluxDB, which led us to determine that the write service and InfluxDB were slowing RabbitMQ down. We then conducted additional tests to validate the performance of the write service in isolation.

Fig. 12: Load testing the Timeseries DB component

Using a customized Go client we conducted a write test outside our application. Writing 2 million data points first single threaded and then multithreaded to check if any performance gain is reached. We reached similar speeds leading us to the conclusion that our Go client might not be the issue. InfluxDB now being our main suspect of performance loss. Due to the custom testing tool we also discovered that using the convenience of the Point data object from the InfluxDB Go client library actually cost us time. Internally of the Go library the Point data object was just converted to the InfluxDB line protocol format. Removing the transformation from protobuf to point but instead writing line protocol ourselves we increased the performance of our writes.

Despite our efforts even when matching the throughput of InfluxDB to our RabbitMQ we could not have accommodated our use case of 2 Million users with 30 devices per minute. Achieving 46.000 messages per second for 11 metrics we would expect around 15.000 for 30 metrics per second. Scaling this to a per minute metric would mean that we can support up to 900.000 households which is roughly the half of our goal. So either way more improvements would be required for RabbitMQ and InfluxDB to reach our target.


Our focus during system optimization was primarily on RabbitMQ, which resulted in neglecting the optimization of our writes to InfluxDB. As we were more accustomed to relational databases, we were unaware of the performance relevant configurations of InfluxDB. Our load tests revealed that the main issue was with our writes to the database, but we cannot be certain if we have optimized our single InfluxDB instance to its full potential. 

Given our VM constraints, it would have been better to place InfluxDB on a separate VM with an SSD and more RAM for better write performance. Additionally, we should have revisited our write design of the Go client, which we then could have verified with InfluxDBs own performance test tool called Inch. Moreover, we could have considered using multiple InfluxDB instances/sharding to further increase our write performance.

For RabbitMQ we would have required scaling as well, maxing our throughput for a single instance an improvement only could have been made with more instances and load balancing.

Unfortunately, most of these options were out of scope for our student project, but they can be explored with more time and computational resources.

Lessons Learned

Right from the beginning we were aware that running our observability solution on the same system we want to monitor is a bad idea. If there were a complete system outage there would be no way for us to see what happened as the monitoring would crash along with the system. 

We did it anyway because we were limited to our single VM on which we needed to deploy our entire application. We expected that we were not going to be able to monitor our system during network outages, deployments, etc. However, what we hadn’t fully considered is that this would also cause problems for us when trying to evaluate the performance of our services during the stress tests.

While performing a stress test, our Chronograf dashboards would first become slow and unresponsive because the system was under so much load. What we were able to see though was that the RabbitMQ and our write service would slowly eat up the entire system RAM, at which point the monitoring would only report back errors and Chronograf became completely unreachable. We did not anticipate that the other components would start stealing resources from our observability stack. Then again, part of that very stack, our InfluxDB, was being stress tested, which explains why the monitoring became slow to update. 

All this supports our plan to run the monitoring on and as a separate system in a market-ready version of the application.

Monitoring was also somewhat handy in finding bugs in our application: At one point our write service reported CPU usage of over 180 %. This was a clear indicator that there was a bug in our implementation. Thankfully though, it was just putting high demand on the CPU at that moment and hadn’t crashed our system yet. We rolled the service back to an older version where the bug hadn’t been observed, which gave us the time to find and fix the bug and quickly deploy the updated version without causing substantial disruption to the application. 

This, we think, is an excellent example of how monitoring paired with alerting can prevent outages or at least minimizes downtime. 

Development Process

Despite reaching a solution that works, there are quite a few things where we struggled along the way. Some of the lessons almost feel like they’re part of a book entitled “Wouldn’t it be cool… and other bad design approaches”. Making these mistakes can be embarrassing to admit, but struggles always bring an opportunity to learn with them. 

In retrospect, we lost ourselves in creating too much unnecessary functionality too early in the project. An example would be attempting to create a sharding resolver that lookups user tokens to distribute writes across different influx instances and buckets. Deep down we knew that it would probably be out of scope but we still fell victim to our curiosity. The proper way would have been to implement a minimal working prototype first, conduct a performance analysis, identify the bottleneck, and only then spend resources to resolve it.

While we recognize that anticipation of future problem sources is a valuable trait in developing software, not being able to resist these impulses massively harms overall productivity. In our case this meant we were constantly switching between problems, creating more and more zones of construction in the process. This made it harder to orient in the code and difficult to decide what to work on next.

What is so frustrating is that we are aware of this tendency and know the solution is to practice an iterative and incremental approach to development, yet we tend to repeat these rookie mistakes. Unfortunately, we realized too late that hexagonal architecture has a natural workflow to it that helps with the problem of getting lost. First, you start by getting a clear understanding of the problem domain. We recommend using to sketch out the components of the application and how they interact. Next, you define the application’s ports. Ports should be defined upfront, as they represent the key abstractions of the application and ensure that the application is designed to be technology-agnostic. The definition of port interfaces often happens in unison with the definition of domain objects, because domain objects inform the design of the interface and are often used as input and output parameters. Once ports are defined, develop the core business logic of the application. After that, it is a good practice to test the core business logic to validate it meets the functional requirements of the application. Testing the business logic before implementing the adapters also helps to ensure that any issues or bugs in the core functionality of the application are detected and resolved early in the development process.

Once the business logic is implemented and tested, proceed to implement the adapters that connect the application to the external environment. Test the adapters to ensure they conform to the ports and correctly transform data between the external environment and the business logic.

After adapters are implemented and tested, you can test the system as a whole, to ensure that all components work together as expected. This testing should include functional and integration testing. Last, refactor and continue to improve the design. In comparison, our unstructured and unfocused approach to implementation cost us valuable time.

Without following the workflow described above, you tend to push testing towards the end of the project. The combination of having acquired debt (need to re-factor code to make it easier to test) and a self-inflicted lack of time meant, we were not able to take advantage of hexagonal architecture offers in the domain of testing.

func WriteEvent(msg interface{}, target interface{}) error {
    // let the type checks begin …

func WriteWattage(m *domain.WattageMeasurement, token string) error {
    // no such problems

Fig. 13: Different approaches with very generic interfaces (top) and specific types (bottom)

Another problem we encountered was finding the right degree of abstraction. A generic solution is typically designed to be flexible and adaptable so that it can be used in a wide range of use cases without requiring significant modification. These adjectives sound highly desirable to the ears of a developer and, as a result, we wanted to build our code in a way that would allow completely changing out the message format, inject adapters via configuration instead of touching the source code, etc. It’s fair to say that my initial implementation was way too generic. More and more it felt like we were not building a solution to our specific use case, but almost a framework that could be used for similar problems. Eventually, we realized that generic doesn’t mean good. Adding lots of adaptability naturally adds more abstraction and indirection to the code. The most glaring indicators that something was going wrong were that our interfaces were inexpressive/vague, exhibited weak type safety, and forced us to write unnecessary code complexity to deal with that. When you build a highly adaptable system that can adapt to a whole bunch of cases that never happen, then all of that just ends up being a big waste of time. In hindsight, it is difficult to analyze what caused this obsession with developing something generic. Once we re-wrote everything to be as specific to our problem domain as possible, the code became expressive and readable, and the need for using empty interfaces as function parameters vanished. 

Identifying and improving bottlenecks

As software developers, we’ve all experienced the frustration of trying to identify and resolve bottlenecks in a system. We know that while it’s easy to detect the general location of a bottleneck, it can be hard to determine the actual issue. It’s also challenging to predict how software will perform with different hardware specifications or how all the components will interact with each other on a virtual machine. As we had no knowledge of the actual performance on our virtual machine we started to put everything on the machine and first tested the system as a whole. While the approach can work it is harder to determine the actual issue when a bottleneck seems to be prevalent. Also, using the numbers provided by the software vendors for the software solutions can help to get a rough estimate of the performance; it’s still different with each use case and current set up and can’t replace making your own measurements when going for performance.

One valuable lesson we’ve learned is to take a more methodical approach to identifying bottlenecks. Checking first if there is a load testing tool to produce load for a single component and then verifying the results with a custom implementation. For our approach as soon as we noticed a bottleneck we started to check each component individually on the target machine before adding additional components. This helped us isolate the source of the issue. Without our monitoring in place we would never have detected the bottleneck and we’ve thus realized that monitoring is crucial in detecting performance issues. By tracking performance metrics, we could identify areas of concern and take action to optimize processes such as unnecessary convenience data transformations and, depending on the use case, unrequired data consistency, which can significantly impact system performance.

In summary, while the process of identifying and resolving bottlenecks can be challenging, we’ve learned that taking a structured approach and paying close attention to performance metrics can help us identify and address issues efficiently.

HyperRace – finding the right architecture approach for an online racing game


You all know racing games. From the realistic, simulated ones like Formula 1 or Forza Motorsport to the playful, arcade-heavy ones like Super Mario Kart or Need For Speed.

The range is large, but somehow pretty standard. So we want to take it a little crazier. After all, who says that it always has to be cars as vehicles? Why not a turbine? Or yes – directly a whole rocket engine? Yes! And the driver sits directly on the drive train! And players shoot each other down with hilarious and mean attacks and traps. Something like this:

I know it’s not strictly a racing game – more of a flying game, but from a technical point of view we have the same parameters per player such as the position or speed of the players and other objects on the track and the collisions of these objects with each other.

Ok, now that we have our crazy game idea, we also need to somehow ensure that our players can enjoy playing together on different devices.

But how are we gonna do this? What multiplayer architecture approaches are there? What are their pros and cons? What data do we need to exchange between players? And how do we trade different latencies to give every player a smooth, zero-latency gaming experience?

Exactly these questions are answered in this blog entry. Thus, this blog entry can be seen as the first starting point into the world of online multiplayer architecture.


To narrow things down a bit, let’s first establish a few rules for our game:

  • Each race takes place on a closed race track / There is no open world
  • There are 10 different racetracks and 7 different vehicles.
  • A maximum of 10 players play together in a lobby or on a race track
  • The graphic representation takes place exclusively on the end device of the user
  • The gaming experience should take place in real time
  • Players are matched to a lobby
  • There is a global player ranking
  • Each player can accelerate, brake and steer
  • Each player can pick up items on the racetrack and use them to attack other players or place traps on the racetrack
  • Each player must be authenticated in order to play
  • The system should initially be able to trade around 250,000 users per hour and scale accordingly
  • The system should be operated as cheaply as possible; however, we also want to offer our players a smooth, fun and immersive gaming experience

As can be seen in the requirements, in this article we will focus exclusively on the pure racing game experience – for example driving, overtaking and collision between vehicles and objects. For the time being, we neglect the use of universally popular social functions such as managing friend lists or joining friend lobbies.

Architecture approaches 

Now that we’ve defined some basic requirements, let’s first look at the most popular architectural paradigms that connect players around the world. Underlying these approaches is a common goal: to provide players with a synchronous, latency-free and fluid gaming experience in near real-time. 

Hosted Peer-to-Peer (Hosted P2P) 

In times when cloud solution is always touted as the all-purpose weapon, it is almost unbelievable that an old approach like peer-to-peer is still so widespread. But popular games like Demon’s Souls, Dark Souls, Bloodborne, GTA Online, Red Dead Online, Animal Crossing: New Horizons, Super Smash Bros. Ultimate or F1 2021 partly operate their online worlds via peer-to-peer networks 1.

Img.1: Hosted Peer-to-peer architecture (Hosted P2P)

The principle of a hosted P2P network seems simple at first: One player is the host of the game, i.e. the player’s computer acts as a “server”. The host inputs, as well as the inputs of the other players, arrive at the host. The current game events (state) are calculated and sent back to the players (Img.1). 

So as a company, there is no need for a server at this point, so we gonna save these costs, which can be an enormous sum with a targeted utilization of 250,000 players. The scaling and the corresponding rush of incoming players can also be easily distributed among several host players. Unlike a dedicated server, thousands of games are not calculated on one machine, but on a selected player – the host – calculates exactly one game and transmits the results to the other players (in our example max. 10 players in a lobby, so 25,000 host players must be found with a planned load of 250,000 players).

Conversely, this also means that the other players in a lobby are dependent on the host in many ways. The host migration – which player is the best host? – is crucial. Because if the host has a miserable download or upload speed, this creates high latencies for the other players – delays and a bad gaming experience for these players are the consequences. One solution would be that the host is the player who has the lowest average ping time among all other participating players. Or only players who are geographically close to each other are assigned to a lobby. But what do we do if a player from Tokyo wants to play with a player in Berlin? In addition, the host user must provide enough free hardware resources such as CPU and RAM in order to be able to calculate the entire gameplay and display his own gameplay. But what if that’s not the case? Likewise, the host receives the results on his PC first – simply because of the short geographic and physical transmission path and the resulting lack of ping – so that this can also lead to asynchronous gameplay. This must also be prevented so that all players get a synchronous, simultaneous gaming experience. 

Another challenge in a hosted P2P network is host cheating or host hacking. How can we trust the host and make sure they aren’t cheating and manipulating gameplay? Because all player events are calculated on his PC. To solve this requires a solution such as a separate authoritative server 2 .

In summary, there are the following reasons for or against a hosted peer-to-peer approach:

No / Low costsHost performance affects other players’ gaming experience
Simple scaling through the computing power of the usersHost cheating / hacking
Load balancing across multiple players (not 1000 games on one server)Host migration
Global distribution

In particular, the problem of the trustworthiness of the host poses a major challenge for the P2P approach. Another popular approach is the “dedicated server” architecture. This approach solves the trust problem through a central authority. Let’s take a closer look.

Dedicated Server

With a “dedicated server” architecture, all game events – the inputs of the players – are sent to a central server. The server calculates the resulting state for each player and sends this state back to the players (Img. 2). The server is neither part of the game nor a separate player as in the hosted P2P approach. A dedicated server is only responsible for calculating the gameplay. Other social functions such as matchmaking are usually outsourced to other servers, since a dedicated server and all its resources are only provided for this one task – calculating the current state. Accordingly, dedicated servers usually do not have a GPU, since a graphical game display is not necessary. Instead, they have a high number of CPUs and RAM, since game lobbies and their state have to be kept and calculated during the game. 
A large number of currently very popular multiplayer games such as Fortnite, Minecraft, Apex Legends, Rocket League, Among Us, Rainbow Six Seige, Ghost of Tsushima Legends or Roblox use this principle 1.

Img. 2: Dedicated server architecture

Compared to the hosted P2P approach, there are definitely costs associated with operating dedicated servers. Depending on the infrastructure and scaling pattern, this can result in different costs 3. Furthermore, scaling can also be a challenge if the servers are operated “on-premise”, because there must be enough servers available, they must be maintained and operated. And in addition, these servers are sometimes “dead capital” if they are not fully utilized. 

On the other hand, there are advantages that eliminate some of the disadvantages of the hosted P2P approach: When operating a dedicated server, you can count on consistently high performance, because we know about the performance of our servers and know how many users can play on one unit at the same time. This is usually not the case with hosted P2P, since the performance and networking capability can vary greatly depending on the host’s end device. This also makes it easier to estimate and plan capacities. If we operate our servers using the cloud, we can relatively easily scale horizontally (more servers) and vertically (more resources per server) depending on the corresponding demand from the players. Operation in the cloud also ensures consistent global performance, since the servers are distributed around the world and are not in a central location. This is mostly not the case in the hosted P2P approach, since a gaming lobby is dependent on the performance and location of the host, which does not need to be close to the other players.

However, one of the biggest advantages of the dedicated server approach is the reliability and security of the game state, as it is calculated on an independent, secure instance, making it very difficult for players to cheat. While the hosted P2P approach allows a host to manipulate its position on the racetrack, for example, this is made significantly more difficult with approaches such as server-side reconciliation. Because the server always knows the last position of the player. If this delivers unrealistic values ​​as input, the server can compare this with the value of the last frame and determine that there is an abnormal value and send adjustments to the player accordingly 4

In summary, there are the following reasons for or against the dedicated server approach:

Security & Reliability (GameState is calculated on an independent authority)High costs
Constant (high) performanceScaling (on premise)
Plannable capacities
Easy scaling (cloud)
Global Distribution (Cloud)

Hybrid solutions & services 

In addition to the approaches just mentioned, there are also mixed approaches. It is not uncommon for a dedicated server or hosted P2P architecture to be expanded by additional, separate game services. Such services are operated on standalone servers and deal with things like matchmaking, grouping up players and picking a dedicated server or (for peer to peer games) sharing IP addresses between players so they can connect together. They handle other things such as leaderboards, player progress, unlocks, chatting and friend lists etc. In this way, areas of responsibility are clearly separated and the load is distributed to different machines (Img. 3). 

Well-known examples of such a solution are the games GTA Online, Warframe or Among us, whose players are connected to each other via a hosted P2P network, but the matchmaking runs on an own server. This model has been the subject of some criticism due to the potential for host cheating. However, the developers mostly has implemented several separate Measurement Services to prevent cheating in the game like Client-Side-Validation that includes checks for speed hacks, aimbots, and other forms of cheating and Activity- and System Monitoring that checks for suspicious activity on the player’s system like memory manipulation, injection attacks, and other forms of hacking 4.

Img. 3: Hosted P2P with gaming services (left), Dedicated Server with gaming services (right)

Game States: Racing World & Player Data

Now that we’ve gotten to know the common architecture approaches in online gaming, we have to ask the question of what data we have to exchange in order to determine the current state of the game. But before we get into the data, what exactly is the state anyway? To shed more light on this, let’s look at the multiplayer approach of one of the most well-known game engines – the Unreal Engine 5 – here for example, a distinction is made between three different units: GameMode, GameState and PlayerState 5 (see Figure 5). Other game engines like Unity use a similar approach, but the implementation varies 6.

Img. 4: GameMode, GameState and PlayerState in  the example of a client-server architecture (dedicated server)5

Let’s explain the orange part of the image above (Img. 4) using our racing game 7

The GameMode manages the global rules of a race. In our case these are for example: 

  • The minimum number of players needed to start a race (min. 2)
  • The maximum number of players in a race (10 players)
  • The number of laps each player must complete to finish the race
  • The spawn location and order of players on the racetracks
  • The time of the countdown
  • The allowed weapons and traps

The GameState manages global information about the current gameplay that is relevant for all players, such as: 

  • The number of currently connected players
  • The start and finish time of each individual player
  • The current ranking (based on each player’s current position)
  • The position of all players
  • The location and state of all fired weapons and traps
  • The position and state of each checkpoint on the track.

The PlayerState will exist for every player connected to the game on both the server and the clients. This class can be used for replicated properties that all clients, not just the owning client, are interested in like the current

  • car model
  • round on the race track
  • position of a player  
  • speed of a player 
  • direction of a player
  • steering angle of a player
  • down force of a player
  • amount of a specific collected item of a player
  • status of a specific collected item (fired or not and target player)
  • collisions of a player with another player or object

By continuously updating the game state and all player states based on these inputs, we can provide a seamless and engaging experience for all players involved.

In addition to identifying what data makes up the state, it is important to understand how the player state is computed taking into account different latencies so that all players feel like they are sharing a synchronous gaming experience with other players at all times. How this works is explained in the next section.

State Simulation

Almost all modern 3D online games have to face an essential challenge: How do we balance the latency of the client-server communication (or client-host player in P2P) so that each player can experience smooth, latency-free gameplay locally, but at the same time uses the current state of the server. 

Client prediction 

During a multiplayer online game, each player communicates with the server (dedicated server) or the host player (hosted P2P). This communication happens asynchronously so that the player can experience a smooth gaming experience: For example, the player sends his new position to the server. Instead of waiting until the answer comes back from the server and freezing the player’s screen for the response time (latency) of for example 300 ms (that would be synchronous), the next frame is rendered directly and the player (in our case the racing car) is located directly at the new position. This approach is known as client prediction. Assuming that the game world is deterministic, the player’s input can lead directly to the desired action locally (for exmaple a new position, an animation or a physics calculation) and this will in most cases also correspond to the state of the server 8,9.

Server reconciliation 

Since in a racing game the communication between player and server should take place as quickly as possible, UDP is used as the network protocol in most online multiplayer games and frameworks 10,11. In contrast to TCP, the order of the incoming packets is not guaranteed with UDP. Depending on the latency between player and server, the responses from the server can sometimes arrive at the player at different times and in a different order, so that the position of the server no longer has to match the new local position of the player (because the player is already moving in the game, for example). Thus, the player game state and the server game state are desynchronized. However, since the server game state is the trusted game state 12, the game now has to process the new information from the server and make an adjustment, for example reset the player locally to the server’s position. However, since the player has already moved locally, this would lead to an unclean gaming experience (glitches) and breakes immersion – and we want to avoid that at all costs! So what do you do in such a case?

One solution to this problem is server reconciliation. A sequential number is added to each input sent by the player to the server to ensure the order of actions. This input is stored both on the player and on the server. If the player now receives a response from the server that matches the player state, the associated number can be removed locally and the next number follows in order. However, if the server’s answers follow in a different order, the player now knows locally that an answer to a specific number is still missing. However, since the player knows the answer to the actions before and after the missing answer, he can simulate the current state, which is then validated again by the server afterwards. This means that the server and player always synchronize with a slight delay in comparison to the next frame. However, this also ensures that the gameplay is represented graphically fluently 8.


Now that the simulated state between two states (the result of the server-side reconciliation) also graphically leads to a smooth transition movement, we use the concept of interpolation. Interpolation algorithms are part of every common game engine 14,15.  There is linear interpolation to calculate states between two positions and spherical interpolation to calculate states between two degrees of rotation. This algorithm always requires two states (for example two positions on the race track) in order to calculate the positions in between and then display them in a fluid movement. In our case, these are the two positions that have been successfully validated and transmitted by the server before and after a certain input from the player. Based on these positions, the intermediate positions are now calculated, which are required for the execution of the graphic representation 13.

The combination of client-side prediction, server-side reconciliation and interpolation can help ensure a smooth and responsive online multiplayer game experience. By using these techniques, game developers can minimize network latency issues and keep players synchronized across multiple devices and network connections. To make these three approaches easier to understand, I will highly refer to a Live-Demo at this point.


With this article we took the first step into the world of online-multiplayer-architecture. We discussed architectural approaches for multiplayer games, got to know the game and player state and understood which data we have to exchange between our players. Finally, we looked at concepts that compensate the latencies between player and server by simulating an intermediate state.

When choosing a suitable architecture for an online game, it has been shown that P2P, dedicated servers or hybrid approaches are technical solutions that present developers with different challenges. As it turned out, hybrid solutions in particular are preferred in order to keep costs as low as possible while still being able to offer players an immersive and fun gaming experience through server-based, separate services such as match-making or activity monitoring. P2P solutions are ideal for small, unknown games with little to no budget, while games with predictable, stable sources of income have more money for their own server infrastructures. However, by using free tiers, smaller games with a few players at start can also take advantage of Dedicated Cloud Servers to launch their game in the first place 2,16.

In order to make a final decision regarding the architecture of our racing game, further research is required, such as a detailed cost comparison, comparison of different frameworks or cloud providers and implementation details.


[1]R.Rhett, What is Peer-to-Peer Gaming, and How Does it Work?, December 2021 [Online]. Available:
[2]Photon, Authoritative Server FAQ, 2023 [Online]. Available:
[3]AWS, Amazon GameLift – Prices, 2023. [Online]. Available:
[4]G, Gambetta, Fast-Paced Multiplayer (Part I): Client-Server Game Architecture, 2022 [Online]. Available: Client-Server Game Architecture
[5]R. Mayya, Leveraging UE4’s Gameplay Framework for our Multiplayer game, 2020 [Online]. Available:
[6]Unity, Unity Documentation: NetworkBehaviour, 2021 [Online]. Available:
[7]Epic Games, UE5 Documentation: Game Mode and Game State – Overview of the Game Mode and Game State, 2023 [Online]. Available:
[8]G, Gambetta, Fast-Paced Multiplayer (Part II): Client-Side Prediction and Server Reconciliation, 2022 [Online]. Available:
[9]Wikipedia, Client-side prediction, 2021 [Online]. Available:
[10]Photon, Binary Protocol, 2023 [Online]. Available:
[11]M. Carroll, UDP vs TCP: Why to Run Gaming Servers Separate from Chat, December 2022 [Online]. Available:
[12]Epic Games, UE3 Documentation: Unreal Networking Architecture, 2012 [Online]. Available:
[13]G, Gambetta, Fast-Paced Multiplayer (Part II): Fast-Paced Multiplayer (Part III): Entity Interpolation 2022 [Online]. Available:
[14]Unity, Unity Documentation: Vector3.Lerp, 2021 [Online]. Available:
[15]Epic Games, UE5 Documentation: Lerp, 2023 [Online]. Available:
[16]Photon, Photon Pricing, 2023 [Online]. Available: