CrowdConnect – Developing a Scalable Live Chat Application with AWS Cloud Services

Imagine you’re developing a live chat application in the cloud that needs to serve a growing number of users simultaneously and in real time across multiple chat rooms. Sounds like a challenge? It is. But with proven approaches and valuable insights from real-world experience, this task can be successfully and efficiently mastered.

In this article, we share our experiences in developing a scalable, serverless live chat application on AWS. Our goal was to create a ‘mini Twitch’ with support for multiple chat rooms – as simple as possible and without having to worry about server management. We’ll show which tools and AWS services are best suited for specific tasks, how they interact, and present a possible architecture. Additionally, we place a strong focus on testing the application, addressing the biggest challenges and pitfalls we encountered along the way.

This article provides a hands-on introduction to building scalable live chat and real-time applications. It doesn’t cover every aspect but serves as a solid starting point for your own projects. Whether you already have experience with serverless architectures, are a complete beginner, or are simply curious about how modern chat systems are built in the cloud – here, you’ll find practical insights, concrete approaches, and valuable learnings.

Requirements

At the beginning of the project, we defined the minimum requirements. Functionally, our application should allow users to log in with a username, create and join chat rooms, and send and receive messages within them. However, the non-functional requirements are particularly relevant, as they influence our architectural decisions.

Since a real-world live chat application operates as a distributed system, the CAP theorem can be applied. This theorem is a fundamental concept in distributed systems that describes the trade-off between consistency, availability, and partition tolerance. According to the theorem, a distributed system can guarantee at most two of these three properties at the same time.

In our live chat system, consistency, availability, and partition tolerance have different levels of importance:

Partition Tolerance (Essential): Network partitions or connection disruptions are unavoidable in distributed systems. The application must remain stable in such cases, ensuring that messages can still be sent and received even during temporary network failures.
Availability (Very High): Users expect a seamless real-time communication experience, where messages are sent and received instantly with minimal delay. A non-available chat system would significantly degrade the user experience.
Consistency (Negligible): Strict consistency is not crucial for a live chat application. Users generally tolerate slight delays in message synchronization or messages appearing out of order due to eventual consistency.

Our well-designed live chat architecture prioritizes availability (A) and partition tolerance (P) while relaxing strict consistency in favor of these two properties. This AP approach ensures stable real-time communication, even in the presence of network issues.

Beyond availability and partition tolerance, additional non-functional requirements are defined for the system:

Latency: Chat messages should be displayed within 150 milliseconds to ensure a smooth real-time communication experience.
Scalability: The system must support scaling to handle traffic spikes and a high number of concurrent users, without compromising real-time performance.

Realizing Real-Time Communication in the Backend on AWS

As part of our project, we aimed to find a solution that was fully integrated within the AWS ecosystem while ensuring scalability and cost efficiency as the number of users increased. To achieve real-time communication in our chat application, we explored two possible backend architectures, each with its own advantages and disadvantages. Regardless of the chosen approach, using the WebSocket protocol was essential to enable real-time message exchange between users.

Option 1: WebSocket API Gateway + Lambda

The serverless approach of this solution initially offers the advantage that no dedicated server management is required and that AWS takes care of automatic scaling. This significantly simplified the management of the infrastructure. However, we also encountered some critical challenges: the performance limits of API Gateway and Lambda, especially under high loads, for example, there is a hard limit of approximately 3.6 million concurrent connections, as well as restrictions regarding the number of new connections and the invocation frequency of Lambda functions (1,000 concurrent executions per region and a maximum of 10 requests per second per execution environment). These limitations demonstrated that this solution is not optimal for high-frequency communication and a growing number of users. Additionally, the use of Lambda functions results in increased latency, and costs can quickly escalate with very high user numbers, as both API Gateway and Lambda are billed based on the number of connections and invocation frequency.

Option 2: Containerized Socket.IO Application

The alternative of running a containerized Socket.IO application proved to be much more promising in practice. Initially, we assumed that we would need to use Elastic Kubernetes Service (EKS) or even EC2 servers, which would have been associated with significantly more complexity and administrative overhead compared to Lambda. However, we then discovered Elastic Container Service (ECS) with AWS Fargate. This serverless orchestration service seemed to be a very promising option. Direct WebSocket communication allows for significantly lower latency, and the solution can be implemented and scaled flexibly. In addition, this model proves to be more cost-efficient under consistently high connection loads. Of course, this option entails increased infrastructure effort, as scaling and monitoring must be organized independently. The necessary Redis integration for synchronization across multiple instances also leads to a more complex setup. With lower loads, this model might not have offered the same economic advantage as the approach in Option 1.

In summary, Option 2 provided us with the flexibility and scalability we needed to meet the demands of a growing user base, while simultaneously optimizing latency and long-term operating costs. For these reasons, we decided to go with the containerized Socket.IO application using ECS and Fargate.

Tech Stack

The following section describes all relevant technologies, frameworks, and tools used in the development of our application. They are categorized into Frontend, Backend, Infrastructure, and Testing. The used AWS services and their relationships are outlined in the Architecture chapter.

Frontend:
- React in combination with Tailwind CSS as libraries for building web UIs.
- Socket.io Client API for real-time communication between client and server.
Backend:
- Node.js with Express.js as a server-side web framework.
- Socket.IO as a library for bidirectional real-time communication between web clients and the server.
Infrastructure: Terraform as an Infrastructure-as-Code (IaC) tool.
Testing: Artillery as Load-Testing-Framework.

Laying the Foundation: Our Initial Architecture

We initially developed an architecture that ran without scaling and on a single instance, all components are shown in the following diagram. We were aware that running on multiple instances and implementing scaling would require an additional mechanism for message synchronization. Therefore, we chose a simplified setup for this first phase.

Initial Architecture

The entire infrastructure spans one region – in our case, eu-central-1. At its center is a Virtual Private Cloud (VPC), which serves as an isolated network and forms the foundational framework for all other components. IAM is used to control access to AWS resources such as DynamoDB, CloudWatch, ECR, and S3. It ensures that only authorized services and roles have the necessary permissions.

Originally, all our services ran in a public subnet. However, following best practices for enhanced security, we adjusted the architecture and moved the application to a private subnet, details of which can be found in our other Blog. The Internet Gateway enables communication between instances in the public subnet and the internet.

The user flow is as follows: A user enters the URL https://www.crowdconnect.fun, at which point Route 53 resolves the domain and forwards the request to the associated CloudFront distribution, through which our frontend is served. Once the request reaches CloudFront, the connection is encrypted using the SSL/TLS certificate provided by ACM. CloudFront checks whether the requested files are already in the cache. If they are, the files are delivered directly to the user; if not, they are fetched from the S3 bucket, which serves as the origin for the static files of our React app and also caches them.

Once the React application is executed in the browser, it establishes a WebSocket connection to the backend to enable real-time communication with other clients. This connection runs through an Application Load Balancer (ALB), which forwards incoming connections to our Socket.IO server – a server we run in a container on Amazon ECS with Fargate. The container image is pulled from the Elastic Container Registry. Information about all usernames used and chat rooms created is stored in Amazon DynamoDB, a highly scalable, serverless NoSQL database that is ideal for chat applications due to its automatic scaling, high read/write rates and elimination of server management. CloudWatch allows us to monitor the resources used by our services.

The Restart Loop: Debugging Constant Task Restarts

With this setup, we deployed and launched our application. Initially, everything seemed to be working as expected—we could access the application over the internet without any issues. However, while monitoring the system in AWS CloudWatch, we noticed unexpected behavior: new tasks were being registered continuously (see figure). Although our Elastic Container Service (ECS) was configured to run only one task at a time, we observed that a new task was periodically started while another was stopped.

Number of running tasks of our application. New AWS Fargate task is started periodically.

A closer analysis of the logs revealed that the Target Group always displayed a faulty target marked as “unhealthy”. The root cause of the problem was that the health checks of our application were failing. Originally, we assumed that using the root endpoint “/” as a health check would be sufficient, which turned out to be incorrect. The Application Load Balancer (ALB) ignores the health check status and forwards traffic to a target marked as “unhealthy” if no other healthy targets are available (a “failover mechanism”). In such situations, the ALB assumes that a potentially unstable target is better than no target at all, meaning that our application remained accessible, albeit in an unstable state.

The solution was to implement a dedicated health check endpoint and register it correctly in the Target Group. This ensured that only healthy instances received traffic, thereby preventing the constant re-registration of tasks. As shown in the figure, from 13:15 onwards only a single task is running, a clear sign that the issue has been resolved. With the application finally running as expected, it was time to load test our application. To see how our application handles different loads.

Number of running tasks of our application after the bugfix (at 13:15).

Load Testing Our Initial Architecture

A crucial aspect of load testing is determining the peak load, which represents the maximum number of users or operations a system can handle before experiencing slowdowns or failures. This is typically done by gradually increasing the load while monitoring key performance metrics such as response times and error rates. Identifying these thresholds helps detect potential bottlenecks and optimize the system for better scalability and stability.

For our tests, we took two different approaches:

Baseline Test: We tested our application under moderate usage to understand its behavior and resource consumption. This helped us estimate how many users and messages the system could handle comfortably.
Limit Test: We pushed the system to its absolute limits to determine the maximum capacity it could support before breaking down.

We used Artillery as the testing framework for our load tests. Artillery is a powerful, scalable and flexible platform for professional load testing, known for its ease of configuration. Test scenarios can be created using YAML or JSON scripts, allowing for quick adjustments and extensions, which can also be versioned in Git. It supports a wide range of protocols and technologies, including HTTP APIs, WebSocket services, and Socket.IO. In contrast, JMeter, a commonly used alternative for load testing, does not natively support WebSockets and requires additional plug-ins, making it more complex to set up and maintain. For that reason we decided against JMeter. By choosing Artillery, we were able to quickly run realistic load tests, especially for our Socket.IO-based application. The seamless execution of tests in AWS and the native support for WebSockets made it the ideal solution for our needs.

To monitor our application, we integrated the Amazon CloudWatch agent into our ECS task definition as a separate sidecar container (cwagent). This allowed us to collect key metrics related to CPU, memory and network utilization and visualize them in AWS CloudWatch. The cwagent container wrote all metric collection logs to the container’s default log system. The collected metrics were stored under the namespace “LiveChatCustom” and could be accessed in AWS CloudWatch. To ensure precise monitoring, we set the measurement interval to 1 second, allowing us to capture fine granular data. With this setup, we were able to track the following metrics:

CPU Utilization (CPUUtilization): Displays the active CPU usage as a percentage (cpu_usage_active).
Memory Utilization (MemoryUtilization): Indicates how much RAM the container is using (mem_used_percent).
Network Traffic (bytes_recv, bytes_sent): Measures the incoming and outgoing network traffic in bytes.

After setting up our environment with a Fargate task with 0.25 vCPU, we conducted our first load tests to evaluate the performance of our system. However, we encountered several issues:

Many tests failed.
The UI became unresponsive.
Our CloudWatch metrics for CPU utilization (cpu_usage_active) only showed around 25% usage.
Some virtual users (vUsers) failed during the test.

This discrepancy raised questions for us. To perform a more detailed analysis, we enabled ECS Container Insights for our tasks. Container Insights automatically collects performance-related metrics and logs from our ECS Fargate tasks with a frequency of 1 Hz and stores them in CloudWatch. When analyzing the ECS CPUUtilization metric, we observed 100% CPU utilization, indicating that our server was already at its maximum capacity. This also explained why the UI became unresponsive under heavy load.

We wondered why our previous CPU metrics showed different values. After thoroughly reviewing the AWS documentation, we discovered that the calculation of CPU metrics differs. The key difference between cpu_usage_active and ECS CPUUtilization lies in what they measure relative to. The cpu_usage_active metric measures the actual CPU usage of the container relative to the total physical CPU of the Fargate host machine. For example, if our container uses 0.05 vCPU and the host (the underlying EC2 instance in Fargate) has 1 vCPU, the calculation is as follows:

Result: The container uses 5% of the total host CPU.
The CPUUtilization metric in ECS, on the other hand, measures actual CPU usage relative to the CPU reserved for the task. For example, if our task again uses 0.05 vCPU, but 0.25 vCPU was reserved for it in the ECS task definition, the calculation is as follows:

Result: The task uses 20% of its allocated CPU capacity.

With this understanding, we began running the baseline and limit tests. During the following tests, our application was run on a single instance (1 Fargate task with 1 vCPU).

Test: Baseline

The first test aimed to evaluate system performance under moderate user load. Over a 240-second period, two new users per second were added to the application, with each user remaining active for 60 seconds. This allowed us to simulate a maximum of 120 concurrent active users. To analyze the impact of message transmission on system performance, we extended the test with an additional variation. Each user joined a chat room upon entering and sent seven messages. The goal was to understand how increased network traffic affects system resource consumption.

During the tests, we observed differences in resource utilization. Initially, when no messages were sent, CPU usage only showed a slight increase. However, once users started sending messages, CPU usage increased by 20%, reaching a total of 30%. In contrast, memory consumption remained stable at 9% throughout the test, regardless of whether messages were sent or not.

As expected, network traffic increased significantly when users started sending messages. In the first test run without messages, 80 kB of data was received, and 40 kB was sent. However, in the second test with message transmission, these values jumped to 500 kB received and 700 kB sent. These numbers confirmed that sending and receiving messages had a substantial impact on network traffic. The results for the low user count test and the low user count and sent messages test are visualized in the figure.

Comparison of metrics collected by the CloudWatch Agent for the low user count test (B1) and the low user count plus sent messages test (B2).

In terms of user experience, we found that the application remained responsive throughout the test. The message latency averaged 50 milliseconds, meaning a sent message was visible to other users within this timeframe. Additionally, the server’s event loop delay was measured at 20 milliseconds, indicating that incoming requests were processed efficiently.

Based on these results, we can conclude that our application performs well with up to 120 concurrent users, each sending seven messages without significant performance degradation.

After gaining initial insights into resource utilization through our Baseline Tests, we wanted to determine how far we could push our application on a single instance (1 ECS task with 1 vCPU) before it reached its limits. To achieve this, we designed a Limit Test, aimed at evaluating the maximum capacity under increasing user load.

Test: Limit Test

The test ran for 420 seconds and simulated a user load of up to 480 concurrent users, with each user sending seven messages. To create a realistic load distribution, we divided the test into three phases:

Warm-Up (120 seconds): Gradually increasing the load from 2 to 8 users per second.
Peak Load (180 seconds): A sustained 8 users per second were added to the application.
Cooldown (120 seconds): The user load gradually decreased from 8 back to 2 users per second.

Each user remained active for 60 seconds, allowing us to reach the maximum of 480 concurrent users.

We first had to play around with the number of users created per second at peak load. But in the end, we found that 8 is the maximum number of users per second that can be generated before the application becomes unusable. As the number of users increased, the application’s responsiveness declined significantly. The UI became extremely sluggish and nearly unusable. The latency increased from 50 ms in the Baseline Test to 5 seconds in this test. Memory utilization remained at a similar level to that observed in the Baseline Test, indicating that CPU was the primary bottleneck. The figure shows the utilization of the CPU and Memory during the test.

CPU and Memory utilization during the limit test.

A closer look at the CPU metrics in the figure revealed a significant discrepancy. cpu_usage_active showed a utilization of 60%, indicating that there should still be capacity available. However, ECS CPUUtilization remained constant at 100%, indicating that the task was actually fully utilized. Since we initially assumed that ECS CPUUtilization was overestimated and did not accurately reflect the actual workload, we conducted an in-depth analysis of various system components to pinpoint the exact cause:

Network Bandwidth: No noticeable limitations were detected.
DynamoDB: Running on-demand, so it should not have been a bottleneck.
Event Loop: Remained stable at 20 ms, showing no signs of blocking.
API Gateway: Given the number of users in our test, this should not have been a limiting factor either.

Since none of these components could explain the observed discrepancies, and due to poor documentation from AWS, we concluded that the issue was likely internal to AWS, specifically related to task scheduling and Fargate resource allocation. This would explain why cpu_usage_active never reached 100%, even though the application usability and increased latency clearly indicated that the task was already fully saturated. As a result, we decided to use only the CPUUtilization metric from ECS Container Insights for future analysis, as it better reflects real-world system behavior.

CPU utilization of the Fargate task during the load test: CPU_Usage_Active (left) vs. ECS Container Insight CPUUtilization (right).

Fazit Baseline and Limit Test

The tests not only helped us determine the maximum load capacity of our application on a single instance, but also provided valuable insights into which metrics are truly relevant for scaling and performance monitoring. Understanding these discrepancies allowed us to refine our monitoring strategy and ensure that future scaling decisions are based on the most accurate and meaningful data.

Scaling Our Application

In the previous diagram of the initial architecture, we saw that – although a Load Balancer is present – only a single container sits behind it. As a result, all clients connecting to the Load Balancer are handled by the same server process. This limits the number of clients that can establish a connection to the application: The more clients connect, the more they compete for the limited computing resources of the server process. To address this issue, scaling must be implemented to distribute the load across multiple instances and support a higher number of concurrently connected users.

There are fundamentally two approaches to scaling: vertical scaling and horizontal scaling. With vertical scaling, an instance receives additional computing resources, such as extra CPU cores. However, Node.js is single-threaded by default and therefore cannot directly take advantage of multiple CPU cores. For this reason, horizontal scaling is crucial in our case. This approach dynamically increases the number of application instances — especially when vertical scaling reaches its limits.

Ensuring message synchronization at scale

One issue with pure horizontal scaling is that users would only see messages from other users connected to the same application instance. However, the goal is to ensure that users within a chat room receive all messages — regardless of which instance they are connected to.

To address this issue, all application instances must communicate with each other and synchronize messages across instances. This can be achieved using the Socket.io-Redis adapter, which acts as a message broker, relaying and synchronizing messages between all connected instances. In our case, Amazon ElastiCache is used as a Redis server, running in the same VPC as the Fargate containers.

Autoscaling

The concept is illustrated in the following figure: A client connected via the load balancer sends a message to an application instance (Node.js process) running in a container. This instance forwards the message to the Redis adapter, which then distributes the message to all connected clients.

Quelle: Skalieren einer Echtzeit-Chat-App in AWS mit Socket.io, Redis und AWS Fargate | von Nathan Peck | Container auf AWS | Mittel

To enable automatic horizontal scaling based on the current load, we used Autoscaling — specifically, Amazon Application Auto Scaling. During setup, we defined the scaling policy, metric type, and target utilization. The number of application instances is dynamically scaled based on real-time metrics — in our case, the average CPU utilization. If the metric exceeds a defined threshold (60% in our case), additional instances are automatically provisioned to maintain performance.

The following image illustrates the concept of Autoscaling in AWS: Amazon ECS collects various metrics from running containers, such as CPU and memory usage, and transmits them to Amazon CloudWatch. CloudWatch monitors these values and triggers alarms when predefined thresholds are exceeded. The autoscaler responds to these CloudWatch alarms and uses them as triggers for scaling actions.

Quelle: Skalieren einer Echtzeit-Chat-App in AWS mit Socket.io, Redis und AWS Fargate | von Nathan Peck | Container auf AWS | Mittel

Final Architecture: Optimized for Scalability

With the recently introduced changes, we have revised and optimized our original architecture. Our final architecture is illustrated in the diagram. The enhancements include the integration of an additional Availability Zone (AZ), ElastiCache, and Application Auto Scaling.

The use of multiple availability zones significantly improves fault tolerance and availability, bringing us closer to our goal in the context of the CAP theorem. The implementation follows AWS best practices to ensure a robust and scalable infrastructure.

Final architecture

Further Load Tests

Limit Test on two instances

After implementing multi-instance support for our application, we wanted to retest its performance using two instances under the same Limit Test conditions as described earlier to assess the impact of scaling. The test results showed significant improvements compared to running on a single instance. Latency dropped from 5 seconds to 500 ms, demonstrating a significant increase in performance. Network traffic was successfully distributed across both instances, confirming that load balancing was working correctly. In addition, message synchronization across multiple instances using Redis worked seamlessly, ensuring that users received messages regardless of which instance they were connected to. CPU utilization was no longer at 100% because the workload was effectively distributed across multiple instances, preventing bottlenecks. The figure shows the network traffic and CPU utilization for both instances during the test. These results confirmed that our application was now more resilient and able to reliably handle higher loads. However, we still did not quite reach the 150 ms target for real-time communication.

Network traffic and CPU usage on two instances during the limit test.

Limit Test with Auto Scaling: Evaluating Dynamic Scaling Behavior

After testing our application on multiple instances, we wanted to understand how it behaves when starting with a single instance and scaling dynamically using Auto Scaling as described earlier. The goal of this test was to determine whether scaling triggers in time and efficiently enough to prevent overload and maintain optimal performance. For this test, we used Target Scaling as the Auto Scaling strategy. Scaling was configured based on the ECSServiceAverageCPUUtilization metric, with a threshold of 60%. This meant that if the average CPU utilization of the active instance exceeded this value, a new instance would be started.

During testing, we encountered two main challenges. Although Auto Scaling correctly detected high CPU usage, the time until the new instance was ready to use was too long to effectively avoid overloading the existing instance. This behavior is shown in the diagram. Another problem was that no traffic was distributed to the new instance. Since all the users from the test definition were already connected, the newly started instance remained idle and received no traffic.

Starting of a new instances triggered by auto scaling during test

Based on these results, we identified optimization strategies for both our testing strategy and our autoscaling strategy. To improve the load test used to test autoscaling, the test duration should be adjusted with a slower and more gradual increase in the number of users. A more realistic load simulation, where users are added gradually over an extended period of time rather than abruptly, would better reflect the scaling behavior of the system. In terms of our autoscaling strategy, proactive resource allocation might be good for handling expected high traffic. If a surge in user activity is expected, such as during peak hours, it would be beneficial to manually scale the system in advance or increase the minimum number of running instances to avoid delays caused by instance startup times. The standard metrics from AWS are only recorded once per minute, which leads to delays in auto-scaling if the load increases rapidly. A finer granularity or the use of custom metrics would be useful here in order to be able to react more quickly to load changes and improve auto scaling. In addition to CPU utilization, other key metrics should be monitored and considered for auto-scaling, such as:

Number of messages sent
Active WebSocket connections
Number of new users

Key Takeaways from All Our Tests

In summary, our tests have provided valuable insights into the scalability and performance of our application. The results showed that a single instance (an ECS task with 1 vCPU) can handle up to 480 concurrent users before response times increase dramatically and the application becomes unusable. By using multiple instances, we were not only able to handle a larger number of concurrent users, but also noticed an improvement in response times. The scaling using load balancing and Redis for synchronization worked reliably. A key bottleneck was CPU utilization, which proved to be a critical factor for system performance, while RAM utilization remained stable.

Challenges arose during the tests that made both the execution and interpretation of the results difficult. Artillery’s documentation was inadequate in some areas, which made it difficult to configure and analyze the tests. In addition, many AWS services proved to be complex, with the documentation on metric calculations in particular not always being clear.

Additionally, several opportunities for optimization were identified in both the autoscaling strategy and the corresponding testing approach.

Boosting Performance with Data Efficiency

When optimizing data storage and retrieval, various aspects can be considered:

Data access strategies, such as the Global Secondary Index (GSI), to improve query performance.
Data relevance: Which data is essential for the application logic and must be stored?
Optimization of asynchronous data access logic at the server level to minimize blocking read operations.
Caching mechanisms to reduce database queries and improve response times.

The next section takes a deeper dive into caching with Redis.

Redis is an in-memory database that enables extremely fast read and write access to key-value pairs. In our case, Redis serves as a cache for temporary and frequently accessed data. This relieves the persistent database (DynamoDB) and significantly reduces response times.

A key question is which data should be permanently stored in DynamoDB and which can be temporarily kept in Redis. In our case, data for all existing chat rooms is stored in both systems in parallel, with Redis synchronizing with DynamoDB every 10 seconds. As a result, the application may, in the worst case, access or display chat rooms that are up to 10 seconds old, which is still acceptable.

The combination of Redis and DynamoDB provides a well-balanced approach to consistency, availability, and performance. If the Redis server fails or reaches its memory limit, the required data can still be read directly from or written to DynamoDB. However, the biggest advantage is that frequently requested data is retrieved directly from Redis instead of DynamoDB, making it available to the client significantly faster. The following image illustrates the interaction between Redis and persistent data storage.

Quelle: Distributed Caching mit Redis und Spring Boot | b-nova

Redis offers various configuration options to control how long data should be stored. For example, a time-to-live (TTL) can be set for individual entries. Additionally, Redis provides eviction policies that define which criteria determine when old entries are automatically removed to free up space for new data. In most cases — including in our application — the Least Recently Used (LRU) strategy is applied, meaning that the least recently accessed entries are removed first.

The Price of Overlooking AWS Resources

One of our key learnings is to be careful when testing new AWS services and creating them using the AWS Management Console, as it is easy to lose track of or forget what resources have been deployed, ultimately leading to unintended costs.

In our case, we set up a Redis cache in its serverless variant without immediately considering the ongoing costs . The price was $0.151 per GB hour. Unfortunately, we forgot to delete the resource and it ran in the background for 340 hours before we noticed. In the end, this oversight cost us $51 – just for a forgotten cache (see image).

This experience showed us how quickly you can lose track of active services when manually provisioning cloud resources through the management console, and the benefits of using Infrastructure as Code for better visibility and control.

Outlook

As mentioned earlier, there are numerous opportunities for further development, especially before deploying the application to a production environment. Below, we outline some key aspects:

1. Multi-Region Deployment for Higher Availability

At present, the application operates in a single AWS region. However a production-ready live chat application should support multi-region deployment, allowing it to run simultaneously in multiple AWS geographic regions. This approach minimizes latency by bringing data closer to users, enhances fault tolerance and availability, and further improves scalability.

2. Utilizing Multiple CPU Cores in Node.js

Since Node.js is single-threaded by default, it does not automatically utilize multiple CPU cores. To overcome this limitation, vertical and horizontal scaling could be combined. One possible approach is the use of a CPU node cluster. A CPU node cluster consists of multiple servers (nodes) that share their CPU resources to process compute-intensive tasks in parallel. This setup would allow Node.js to leverage all available CPU cores, making it especially beneficial for high-performance processes such as real-time data processing and scaling.

3. Anti-Spam Measures

Another important enhancement involves implementing anti-spam mechanisms to regulate message exchanges via Socket.io. One effective strategy is rate limiting, which restricts the number of messages or requests a user can send within a specified time frame, helping to prevent spam and abuse. Additionally, connection throttling can be introduced to limit how frequently a client can establish new connections within a short period. This measure is particularly useful in mitigating DDoS attacks or excessive connection requests, ensuring the stability and security of the application.

4. Refined Software Engineering Workflow

Another key focus of our project’s further development is the optimization of the software engineering workflow. To improve code quality and stability, automated tests for both the backend and frontend can be introduced. This includes unit tests for the backend and UI tests for the frontend to ensure the functionality and user-friendliness of the application.

Additionally, a CI/CD pipeline (Continuous Integration / Continuous Deployment) could make the deployment process more efficient. This pipeline would automatically build the application whenever changes are pushed to the Git repository, execute the necessary tests, and create a new Docker image. The updated image would then be uploaded to AWS and automatically deployed by updating the running ECS tasks. At the same time, the updated frontend code would be stored in S3. This automation would significantly reduce manual effort, accelerate deployment, and enhance overall system stability.

5. Reworking Auto Scaling and Testing Strategy

To improve the load test for evaluating autoscaling, the test duration should be adjusted to allow for a slower and more gradual increase in the number of users, better reflecting a realistic usage scenario. Regarding the autoscaling strategy, proactive resource allocation and the use of more fine-grained custom metrics besides CPU usage would be beneficial to respond more quickly to load changes and enhance autoscaling performance. Since Redis plays a central role in synchronizing multiple instances, its scalability should also be considered if the load increases. Additional tests would be necessary to analyze the system limits even more precisely. These include longer test runs to investigate how performance develops over a longer period of time, as well as tests with users in different chat rooms to evaluate how parallel conversations affect overall performance.

Conclusion

The serverless tools and services we used allowed us to set up the application quickly and with minimal effort, achieving initial results in a short time. Terraform, in particular, positively surprised us in this regard. This demonstrates that modern serverless technologies enable the efficient development of professional applications without requiring extensive server configuration.

On the other hand, during testing, we realized the importance of defining clear test cases from the outset. We underestimated the interpretation of metrics, and it became evident that a deep technical understanding of the underlying processes and resources is necessary to correctly analyze metrics and derive meaningful insights. Our tests confirmed that scalability fundamentally works, but there is still room for optimization, especially in auto-scaling, CPU utilization, and metric collection.

Ultimately, we developed a fully functional application that meets all defined functional and non-functional requirements, providing a solid foundation for a live chat service. Users can log in with a chosen username, create chat rooms, and communicate in real-time. Additionally, we implemented auto-scaling to ensure high system availability. By applying specific optimizations, we were able to maintain real-time performance.

Comments

2 responses to “CrowdConnect – Developing a Scalable Live Chat Application with AWS Cloud Services”

Fabian Röhrle
28. February 2025
Sehr cooler Beitrag 😉
Log in to Reply
1. Jannik Scheider
  28. February 2025
  danke
  Log in to Reply