Observability?! – Where do we go from here?

MEME: I always, always test my code. The I test it again in production.

The last two years in software development and operations have been characterized by the emerging idea of “observability”. The need for a novel concept guiding the efforts to control our systems arose from the accelerating paradigm changes driven by the need to scale and cloud native technologies. In contrast, the monitoring landscape stagnated and failed to meet the new challenges our massively more complex applications pose. Therefore, observability evolved as a mission-critical property of modern systems and still attracts much attention. The numerous debates differentiated monitoring from observability and covered its technical and cultural impact on building and operating systems. At the beginning of 2019, the community reached consensus on the characteristic of observability and elaborated its core principles. Consequently, new tools and SaaS applications appeared marking the beginning of its commercialization. This post identifies the forces driving the evolution of observability, points out trends we presently perceive and tries to predict future developments.

Continue reading

Safety Culture – Improve “the way we do things around here”

The safety culture of an organization is the key indication of its performance related to safety. It incorporates the visible rules, norms and practices as well as the implicit factors such as values, beliefs and assumptions. That is why the safety culture reflects “the way we do things around here” which is the most precise definition. Safety is a universal topic since we pursue it permanently and every action is safety related. To improve safety, we first need to understand the organization’s unique safety culture before we can derive tailored actions. This post covers the basic theoretical background of a safety culture and focuses and two central components: just and learning culture. The resulting principles can increase the resistance of an organization towards its operational hazards but only if they are adapted to the unique situation. There is no generally applicable step-by-step manual on how to implement a safety culture.

Continue reading

End user monitoring – Establish a basis to understand, operate and improve software systems

Typical monitoring stack

End user monitoring is crucial for operating and managing software systems safely and effectively. Beyond operations, monitoring constitutes a basic requirement to improve services based on facts instead of instincts. Thus, monitoring plays an important role in the lifecycle of every application. But implementing an effective monitoring solution is challenging due to the incredible velocity of changes in the IT landscape and because there is no silver bullet applicable for everyone. This post outlines the basics of monitoring on a strategic level and focuses on some practical facets, so you can derive a concrete solution well adapted for your unique needs and circumstances. The first part covers monitoring principles followed by monitoring tactics for web-based applications. This post concludes by pointing out major challenges you will probably face when implementing a monitoring solution.

Continue reading

Continuous Integration – Move fast and don’t break things

Continuous Integration is an increasingly popular topic in modern software development. Across many industries the companies acknowledging the importance of IT and delivering value to their customers through great software prevail against their competitors. Many reports indicate that Continuous Integration is one of the major contributing factors to developing high quality software with remarkable efficiency. There are many excellent articles, talks and books explaining the principles of CI in theory. During the lecture System Engineering and Management, we had the opportunity to apply our abstract knowledge and gain our own experience by creating and operating a CI pipeline in an accompanying project. The following article covers the approach, major challenges and most important the lessons learned of our Continuous Integration endeavor. By pointing out relevant issues we want to raise awareness on our misconceptions and mistakes we committed so you can avoid them in the first place.

Continue reading