Bidirectional Power Transfer for a Smart Load Management for Electric Vehicles

Abstract The global expansion of electromobility is progressing rapidly. The chinese city of Shenzhen has the world’s first and largest fleet of electric buses with more than 16,000 buses. A gigantic charging infrastructure for 5805 electric buses was established to cope with this. It reaches peak loads of 464.4 megawatts which is an enormous challenge to the grid. The use of a smart load management to avoid peak loads is indispensable. In combination with the Bidirectional Power Transfer (BPT), new perspectives open up and smart load management is efficiently enhanced.
The objective of this paper was the analysis and evaluation of the BPT for a smart load management for Electric Vehicles (EVs) regarding depot charging. This paper explains the relevant technologies and standards with respect to BPT. This was followed by the extension of Open Charge Point Protocol (OCPP) 2.0 for BPT, a prerequisite for the prototype implementation of the optimization algorithm including various strategies.
The results reveal that load management for depot charging profits substantially from BPT and that optimized planning in advance is a key factor, albeit increasing complexity. Currently, the amount of BPT-enabled EVs is marginal and certain relevant standardizations have not been adapted yet. The results of this paper contribute to an efficient and smart load management and the necessary adaptations of the standardizations towards the future growth of BPT-enabled EVs.

Index Terms E-Mobility, Smart Load Management, Bidirectional Power Transfer, ISO 15118-20 DIS, ISO 15118-2, OCPP 2.0, OSCP 2.0, OpenADR 2.0, Optimization


I. Introduction

THE global growth in e-mobility is proceeding rapidly. The number of EVs sold almost doubled to two million in 2018 compared with the previous year. The Chinese market, followed by Europe and the United States, is primarily responsible for these sales with 1.1 million EVs [10].
In 2011, a bus fleet electrification initiative was launched in the Chinese city of Shenzhen. Shenzhen was the first city to operate a fleet of approximately 16300 electric buses in 2017. A charging infrastructure of 86 depots for a total of 5805 buses was set up to cope with such an enormous amount of EVs, reaching a peak load of 464.4 megawatts. The essence of the Shenzhen program is to completely rethink the perception of electricity and vehicles. Experts in both the energy and transport sectors must embrace the insight that electric vehicles surpass mere vehicles [14].
The necessity for smart load management to minimize peak loads is therefore indispensable. In conjunction with the BPT, new perspectives open up and smart load management is effectively enhanced.
To motivate an integration of BPT into a smart load management system, a large variety of opportunities exist. Conventional load management enables power consumption of EVs at Charging Stations (CSs) to be controlled. As far as overload situations in the power grid are concerned, only the load of the EVs may be dropped, whereas BPT allows active stabilization. In addition, smart load management with BPT-enabled EVs eliminates overload situations by drawing additional power locally from other EVs. Cost-optimized load management is intensified due to BPT, as electricity can be charged inexpensively and discharged or sold profitably. In the residential sector, smart home load management offers the option of operating BPT-capable EVs as emergency power generators. The full potential of renewable energy sources are leveraged by load management with BPT since EVs provide flexible energy storage and sources. Furthermore, increasing research in accumulator technology will lead to advances in capacity as well as charging and discharging speeds. As a result, the relevance and scope of load management will expand in future.
Smart load management with BPT is only one key aspect towards achieving the aforementioned scenarios. The overall system depends on a large number of various actors requiring coordination. Currently this process remains incomplete. Therefore simplifications are made within this paper. The characteristic of smart grids to measure power consumption in low voltage grids is considered given. In addition, electro-technical aspects are simplified to a certain extent. Interaction among the affected actors is crucial to the overall function. Load management takes a key role in the respective use cases. Various use cases exist for load management, all of which lead to different solutions. In this paper, smart load management with BPT support considering the use case of depot charging is targeted.
The objective of this paper is the analysis and evaluation of this smart load management. Load management is a component of the Charging Station Management System (CSMS) or Charge Point Operator (CPO), which is illustrated in Figure 1.

PIC Figure 1.  Overview of the project objective

BPT-enabled EVs and CSs are essential prerequisites for bidirectional charging. Additionally, direct communication with the CSMS must support BPT. Communication to the CSs via OCPP 2.0 and to the Distributed System Operator (DSO) via Open Smart Charging Protocol (OSCP) 2.0 are particularly relevant. CHArge de MOve (CHAdeMO) and Guóbiao tuijiàn Standard (GB/T) communication standards are neglected despite a large market share, as they are not part of OCPP or similar protocols. Generally the use case of depot charging is the main focus and serves as a foundation to optimize load management. Accordingly, the primary task of load management is to calculate the optimum power distribution of charge and discharge of electric buses considering the demands of the DSO.

II. State of Technology

The e-mobility ecosystem consists of a variety of interrelated actors. Figure 2 depicts the relationship among the individual actors and the available communication standards with respect to BPT.

PIC Figure 2.  Overview of the e-mobility ecosystem with respect to BPT

Communication between EV and Electric Vehicle Supply Equipment (EVSE) is crucial for BPT. Currently, it is only supported by the standards CHAdeMO and International Organization for Standardization (ISO) 15118-20 Draft International Standard (DIS). According to ISO 15118-20 DIS, so-called PowerSchedules and PowerDischargeSchedules can be interchanged among an EV and an EVSE before and during charging. The Figure 3 indicates the correlation between these Schedules.

PIC Figure 3.  ISO 15118-20 DIS PowerSchedules and PowerDischargeSchedules applying for Flexible Schedule Mode

Moreover, the two control modes Flexible Schedule Mode and Dynamic Control Mode are specified. The Battery Management System (BMS) of the EV can calculate its own charge or discharge curve based on these Schedules. This curve must be within the limits of the PowerSchedules and PowerDischargeSchedules. In Dynamic Control Mode the EVSE is master of the charging procedure. It dictates a PowerSchedule to the EV, leaving the EV with no other choice. The EV can inform the EVSE about its charging needs and limits before the current flows. The parameters EVTargetEnergyRequest, EVMaximumEnergyRequest, EVMinimumEnergyRequest, EVMaximumChargePower, EVMinimumChargePower, EVMaximumDischargePower and EVMinimumDischargePower are critical for BPT in load management. Figure 4 illustrates the relations of the actual EVTargetEnergy, EVMaximumEnergy and EVMinimumEnergy as well as the corresponding Requests [11]. The blue curve in Figure 4 represents the EVCurrentEnergy over time. The differences between the respective energy levels with the EVCurrentEnergy constitute the parameters EVTargetEnergyRequest, EVMaximumEnergyRequest and EVMinimumEnergyRequest.

PIC Figure 4.  ChargingEnergyLimitations

A negative EVMinimumEnergyRequest poses an unique characteristic for BPT. In this state charging only is possible. BPT is allowed only if the EVCurrentEnergy level is within EVMaximumEnergy and EVMinimumEnergy [11].
The communication between EVSE and CPO is decisive for load management. It can be implemented by International Electrotechnical Commission (IEC) 61850-90-8, OCPP, IEC 63110 or proprietary protocols, although OCPP is most common. Essentially, the tasks consist of controlling charging procedures, configuration, maintenance, payment and monitoring of EVSEs. These protocols do not support BPT currently, thus within this paper OCPP 2.0 has been extended to include BPT. The standard ISO 15118-2 is supported by OCPP 2.0 for the first time, whereby necessary adaptations are manageable [1], [3], [8], [13].

Communication between CPO and DSO enables the grid integration of load management. Especially in the context of BPT new possibilities arise. In this environment there are a number of protocols, where Open Automated Demand Response (OpenADR) 2.0 is the most common. On the other hand OSCP has been developed by Open Charge Aliance (OCA) with respect to OCPP. In general, both protocols propagate the power provided by the DSO to affect the active charging procedures managed by the CPO. In addition, the CPO can send signals to the DSO for monitoring purposes. The integration of OpenADR 2.0 with OCPP is described within a white paper published by the OCA. OpenADR 2.0 defines the so-called Virtual Top Nodes (VTNs) and Virtual End Nodes (VENs), where one VTN can represent a VEN to another VTNs. Thereby a hierarchy can be constructed. It is recommended to define the CPO as VEN in relation to the DSO or VTN. If the DSO detects an overload situation in the power grid, it triggers the messages IEvent or the events LOAD_DISPATCH and LOAD_CONTROL. The CPO translates this information into SetChargingProfile messages to be send to the CSs [2], [4], [5], [6].

The communication among the CPO and other actors allows user management and payment. This communication is important in order to achieve a user-optimized load management, i. e. to favour or penalize certain user groups. In terms of BPT and depot charging, this is less relevant.

The electro-technical fundamentals have an enormous impact both on the requirements and the functionality of load management. The most important aspects are the Kirchhoff’s circuit laws.
The first Kirchhoff’s circuit law is known under the current law and defines the behavior of the currents in a node. A node is described as a point in an electrical circuit with at least three connections to circuit elements. The current can branch at this point. The first Kirchhoff’s circuit law formulates the sum of the incoming currents to be equal to the sum of the outgoing currents in a node of an electrical circuit.
The second Kirchhoff’s circuit law known as the voltage law defines the behavior of voltages in a mesh. A mesh is described as a closed loop in an electrical circuit with at least two branches. The second Kirchhoff’s circuit law defines the sum of the partial voltages in a mesh of an electrical circuit to be zero.
Additional crucial elements are bus bars within switch gears. Switch gears are a key component of the power grid. Switch gears form the interface between feed-in and feed-out of a network node. Their bus bars represent the network nodes connecting grids of different voltage levels. A switch gear provides the actual power distribution as well as the aggregation of consumers and generators [16].

The load management of this paper has been influenced by an existing load management. However, it did not take these concepts of bus bars into account. Nevertheless, the concept of a so-called Topology was adapted and reinterpreted. A Topology abstracts the electrical circuit elements from the grid access point to the CSs and EVs. A simplified view of an example Topology with EVs is illustrated in Figure 5.

This Topology provides an abstract model for optimization. The links between the nodes in Figure 5 are displayed in one direction only.

Figure 5.  Top down view of a sample Topology with EVs including three charging stations and five EVSEs

In reality there are two-way links to map the BPT. Figure 6 demonstrates a possible model of a bus bar (BB).

In order to improve the existing load management, the issue is reconsidered upon the integration of BPT. This can be mathematically formulated as Maximum Flow Problem. The Maximum Flow Problem can be modelled on a digraph whose edges contain maximum flow capacities between nodes.

Figure 6.  Model mapping of a bus bar into two nodes

The objective of the Maximum Flow Problem is to propagate a maximum flow within this digraph from a source s to a sink t. Figure 7 illustrates such a digraph.

Figure 7.  Example digraph representing the Maximum Flow Problem

The mathematical assignment of flow is affected by the so-called Capacity Constraints and Flow Conversation Constraints. Capacity Constraints are formulated as inequalities. They define the capacity for each edge of the digraph. Flow Conversation Constraints are equations defining the flow rate for each node except the source and sink. The sum of the flow values leading to a node must be equal to the sum of the flow values leaving that node. By means of these Flow Conversation Constraints the first Kirchhoff’s circuit law is fulfilled.

To solve the Maximum Flow Problem different methods exist, which include the Linear Optimization, the Dinic’s Algorithm and the Ford-Fulkerson Algorithm. This paper focuses on Linear Optimization.

max.       cTx
     s.t.    Ax = b
              x = 0                                         (1)

The vector x represents the variables to be determined. The vectors c and b and the matrix A define known coefficients. The objective function cTx is maximized with respect to the conditions Ax = b and x = 0. The solution set of this linear objective function resembles a convex polytope or simplex [17].

III. Results

The extension of OCPP 2.0 for BPT is the prerequisite for the integration of BPTs into load management. The messages NotifyEVChargingNeedsRequest, NotifyCentralChargingNeedsRequest and NotifyEVChargingScheduleRequest were altered and enhanced to exchange the essential BPT specific parameters among CS and CPO.
The message NotifyEVChargingNeedsRequest was extended by the EV parameters mentioned in section II, such as EvMaximumEnergyRequest and EvMaximumDischargePower. Utilizing the extensions to the message NotifyCentralChargingNeedsRequest, load management can communicate the Control Mode and one or more charging profiles for charging and discharging including price information to EV via CS. In the context of the extended message NotifyEVChargingScheduleRequest the EV informs the load management not only about the charging profile being used, but also about it’s Control Mode.

After achieving the foundation via this protocol extension, a generator for load management test scenarios was created. The generator allows to create Topologys of any complexity and number of charging procedures. BPT specific parameters relevant to load management, such as the maximum charge or discharge power of the EVs or charging stations or the desired energy amount at departure can be generated using configurable distribution functions.

By means of these generated test scenarios the strategies Priority-, Equal- and Planning-Strategy for load management developed in this paper were evaluated.
A Strategy calculates the power distribution of all active charging processes of a Topology for a certain time frame. The result of a Strategy is indicated by two lists. The first list represents the execution priority. The second list embodies the percentage of available maximum power. The charging process with the highest execution priority is processed first. The power is assigned using the percentage of the maximum power. A percentage of the maximum charging power of 100% with the highest execution priority results in the maximum charging power for this charging process in this period. A percentage of the maximum charge power of 0% with the highest execution priority results in no power in that period. Similarly, a negative percentage of the maximum discharge power will provide the corresponding discharge power for this period.
The Equal-Strategy is the simplest Strategy. It realizes an equal distribution of the power over all active charging processes. The Equal-Strategy always provides the same execution priority and a maximum power percentage of 100%. This Strategy can be extended by reducing all maximum power percentages by 100%. This reduction can be used to control the total power consumption, thus responding to current price fluctuations and grid situations. In addition, aggregation of charging processes is possible, with each group having a different percentage of maximum power. The charging processes in a group have the same maximum power percentage.

The Priority-Strategy is primarily based on execution priorities. Any external dependencies can be mapped to the power distribution, in particular existing planning and scheduling systems. Analogous to the Equal-Strategy it may be expanded, by altering the percentage maximum power. In general, the Priority-Strategy and Equal-Strategy can be mapped to each other. The grouping of the Equal-Strategy allows different percentage maximum power levels to be defined. The smaller the groups, the greater the similarity between Equal-Strategy and Priority-Strategy. The reverse is also true.

According to its name, the Planning-Strategy assigns power based on a pre-calculated plan. This plan represents a time-discrete power assignment of parallel charging processes of a Topology. The plan is calculated consisting of a two-step procedure using Linear Optimization.
In the first step, the objective function is formulated in a way that the energy charged at the beginning of the charging processes is maximized. Negative energy amounts corresponding to discharging are excluded in this step by means of constraints. In addition, further constraints ensure that the charging and discharging limits of the Topology, EVs and the charging power already assigned and the EVMaximumEnergyRequest and EVTargetEnergyRequest are not exceeded. On the basis of this result the required partial energy amounts are determined, which are necessary to reach the EVMinimumEnergy of the charging processes with positive EVMinimumEnergyRequest. In the next step, the partial energy amounts of the discrete time steps are marked as constant until the EVMinimumEnergy is reached. This prevents discharge before the EVMinimumEnergy is reached.
The second Linear Optimization thus calculates the partial energy amounts additionally charged and discharged in favor of others EVs.
The optimization of this planning offers numerous degrees of freedom, such as the factors of the objective function, the constraints or the number of variables. The factors of the objective function are crucial due to maximizing or minimizing the energy amounts to be assigned to the charging processes as well as the partial amounts of energy to be controlled within them.
For each charging process the desired power curve is adjustable, thus flat or steep power gradients are possible. Different criteria for prioritizing energy amounts of the charging processes can be used. The start or end time as well as the duration of the charging processes are possible criteria.
A higher resolution of the charging processes by means of additional time-discrete steps results in more variables. On the one hand, this increases the problem complexity and the number of possible factors of the objective function. On the other hand, the power distribution of charging processes is controlled more precisely over time. Depending on the application, constraints can be added or removed. Another factor is the planning scope. The uncertainty of the forecasts and the number of variables increases with it, which adds to the problem complexity and decreases the quality of the results.
The planning allows to consider a day-ahead price forecast of the European Energy Exchange (EEX). In contrast to the Equal-Strategy and the Priority-Strategy, planning allows to react to future price fluctuations, by modeling them by means of the factors of the objective function.

The key results are presented in the Figures 9, 10 and 11. They illustrate the calculated charging profiles of the Strategies based on a generated test scenario with twelve EVSEs divided among ten charging stations. The Figures 8 and 9 display two possible charging profiles referring to one charging process of a EV.
The sample EV is capable of charging for 11.1 hours and has a maximum charging power of 150 kilowatts (kW). Charging power is limited to 140 kW due to the Topology. The EV is capable of BPT with a maximum discharge power of 80 kW, it can charge a maximum energy of 534 kilowatt hours (kWh) and requires a total energy of 324 kWh at departure time.

PIC Figure 8.  Example charging profile based on a simple Priority-Strategy

The Figure 8 depicts a charging profile for this EV based on the Priority-Strategy. This assigns the EV the maximum power until the desired energy amount is reached based on the priority. In comparison, the Figure 9 depicts a charging profile for this EV, which was calculated using the Planning-Strategy.

PIC Figure 9.  Example charging profile based on the Planning-Strategy with additional BPT

In this case, the EV is charged up to the maximum energy amount first, which allows discharging later. Thus another EV will receive additional power and will complete charging earlier. After all, only sufficient energy is discharged until the desired energy amount is reached at departure time.

The impact of this additional energy increases significantly in sum. The Topology has a maximum power of 300 kW enabling nine EVs to charge or discharge simultaneously. Analogous to the charging profile in figure 8, each incoming EV is assigned the maximum power until the desired energy amount is reached. In Figure 10 the maximum power consumption of the Topology exceeds starting from the eighth hour after charging the first EV. This is due to too large power assignments of the EVs.

PIC Figure 10.  Aggregated power and energy of the example Topology without load management

The Figure 11 illustrates the accumulated charging profiles and charged energy of the identical Topology using the load management with the Planning-Strategy.

PIC Figure 11.  Aggregated power and energy of the example Topology with load management using the Planning-Strategy

The maximum power consumption of the Topology of 300 kW is reached, but never exceeded. The power curve fluctuates less and yet the same total energy amount is reached due to BPT.

IV. Discussion and future work

This section provides a summary and discusses comparable load managements with respect to the load management of this paper. Subsequently, an outline of future work is presented.

The developed approach and the realization emphasize the fundamental feasibility and optimization of load management by means of the support of BPT. The results not only indicate that significant optimizations can be achieved within a load management system using BPTs, but that the necessary requirements can be conceptually combined with OCPP 2.0.
The results of the standardization of the e-mobility ecosystem reveal a large number of actors involved as well as a high degree of complexity in supporting BPT for load management. As a consequence, the electro-technical fundamentals, the protocol procedures and the optimization methods play a decisive role. The extension of OCPP 2.0 is a prerequisite for the feasibility of load management including the support of BPT. Based on these findings, the special characteristics regarding the use of depot charging are formulated. In addition, the standardization has not yet been completed, new technologies are opening up further opportunities and further research work is ongoing. Due to this uncertainty, the solution presented in this paper is deliberately flexible. The development of suitable extensions for OCPP 2.0 as well as optimization strategies have been developed in the meantime.
The implementation confirms the feasibility of the software design, the strategies and the extension of the OCPP. The results underline the potential of BPT in load management for the use case of depot charging. In addition, the relevance of planning ahead in load management is emphasized.

The load management developed by Detzler is based on the evolutionary algorithm. Test scenarios with varying numbers are examined. The evolutionary algorithm, consisting of recombination, selection and mutation, is applied in the following form. First a starting population of random individuals is generated. An individual represents a sequence of charging processes including power assignments for a certain period of time. Minima and maxima of the EV ensure that the assigned charging power of an individual is always within a valid value range. The evaluation of individuals is based on a cost function involving all EVs and the energy price. The recombination process depicted in Figure 12 begins by selecting two parent individuals of the current generation. A random cross-over point is selected, which is associated with a particular charging process in the sequence of the charging processes. The child individuals are created by dividing the parent individuals at the cross-over point and assembling each one with the complementary part of the other parent individual. The mutation distinguishes between three variants, changing only the order of particular charging processes, the charging power or both [9].
In the test scenarios of Detzler it was found that the fitness value of the algorithm converges after 100 generations. In addition, charging profiles for 200 vehicles at 100 generations could be calculated within 14 seconds [9].
Due to randomness, the runtime of the evolutionary algorithm is not deterministic. The memory consumption is constant, because the number of individuals remains constant over time. However, nonlinear constraints can be mapped.

PIC Figure 12.  Recombination of two individuums (Source: [9])

The use case of BPT is not considered, so a direct comparison is not possible. In addition, Topology conditions are modelled by means of a valid value range. The Topology conditions of this paper provide dependencies between the assigned EV powers. This increases the complexity during the creation of valid individuals, thus also increasing runtime.
Basically, the approach of Detzler is similarly promising as the one presented in this paper. However, only a direct comparison, examined in future work, provides certainty. The realization of this comparison can be done with an own Strategy based on the evolutionary algorithm.

The load management of Lee, Chang, Jin, et al. relies on an infrastructure consisting of a 50 kW Direct Current (DC) Fast Charger, consumers from a garage as well as its own three-phase alternating current transformer to which 54 type-2 EVSEs of 6.6 kW are subordinated. The load management procedure is an online scheduling procedure and requires discrete time steps. In addition, the amount of available EVs, the amount of active EVs and a EV state must be defined. The state of the EVs is represented by a tuple consisting of the requested energy amount, the remaining charging time, the maximum power of the EVs and a metering of the charged energy amount until the current time. Based on this data, the algorithm is performed in the following three steps:

  1. The amount of active EVs is determined using the plugged EVs with a remaining energy demand greater than zero.
  2. It is checked whether a new schedule has to be calculated, which can be triggered by events or a timeout.
  3. A new optimal schedule is calculated by maximizing an objective function using an optimization horizon.

The optimization horizon represents the constraints. The procedure is iterative, meaning that the state parameters for the next point in time are recalculated for each point in time for each EV. The objective function is determined by operator objectives in the form of regularization and weighting factors. This optimization offers various options. Charging according to a certain curve, e.g. the generation of renewable energies, the smoothing of the charging curve by minimizing the amount between the partial powers as well as fastest possible or evenly distributed charging are enabled. In addition, auxiliary conditions are defined which ensure that the charging power between a value of zero and the EV maximum power, the energy amount charged, the power limits of the infrastructure and the prevention of charging processes after departure time are ensured. Furthermore, Lee, Chang, Jin, et al. define so-called Second Order Cone conditions that utilize specific properties of the three-phase AC transformer. The Second Order Cone conditions improve the phase imbalance in the three-phase grid. Moreover, non-ideal charging behavior is counteracted by increasing the calculation frequency [12].
 The iterative procedure is similar to the one presented in this paper. It differs in its absence of planning.. On the one hand, this results in the advantage of reduced complexity, as the calculations need to cover a short period of time. On the other hand, planned changes are difficult to consider.

The primary concern in the project by Projektplaner LEW is on use of locally produced Photovoltaic (PV) power for charging of electric vehicles. The BPT is thus represented by the PV system. The intended fields of application are fleets and car parks. In the project eight Alternating Current (AC)-CSs with two charge points of 22 kW each were installed, thus having an aggregated maximum power of 352 kW. The aggregated maximum power was limited to 100 kW by the load management. 78 % of the EVs had a standing duration between 8 and 14 hours.
 The load management calculates the charging profiles using Linear Optimization based on a cost function depending on the EEX electricity prices and the PV energy, taking into account the minimum and maximum power of the EVs. By means of an additional uncertainty factor it is ensured that the EVs are fully charged even with earlier departure. An underestimation of the actual charge duration compensates for the modelling error of a permanently constant power. Due to the charging characteristics of lithium-ion batteries, a permanently constant power is not feasible. The project involved 56 volunteers over a period of two years. During the test phase, various findings were obtained. This includes the fact that not all EVs support the ISO 15118-2, some EVs switch off at too low power and the creation of the charging profiles took longer than three minutes and thus too long. The use of PV energy was increased by more than 40 % [15], [7].
 This load management does not consider Topology conditions. In addition, the problem of skew load is not addressed. One option to counteract this is the use of OCPP in combination with charging stations, enabling to select the phase to be used. Possibly a skew load was counteracted during installation by rotating phase assignment of the charging stations to the grid. In this way, if the charging stations are used uniformly, all phases are balanced. A lesson of considering a minimum output power can be derived from this.

The paper by Wolpert and Macready proves the proposition that a universal method for solving an optimization problem does not exist considering the set of all problems. This proposition is presented in the form of the following two theorems.

The average performance of two algorithms for a set of optimization problems is independent of the algorithm selected.
If one algorithm performs better than another for one particular cost dynamics, the opposite is true for all other cost dynamics [18].
Basically these theorems confirm the results of this paper and this discussion. Depending on the use case there are different requirements and therefore different optimization problems. To develop a universal method, being optimal for all use cases, is not feasible. However, several diverse and specially adapted algorithms can provide reliable and efficient solutions. The flexibility to exchange these algorithms at runtime is key. It is ensured by the adaptability of the software design and implementation corresponding to this paper.

Finally, it should be noted that load management with the support of BPT holds enormous potential for a variety of applications, especially for depot charging. Furthermore, a successful integration into the existing e-mobility ecosystem is possible despite high complexity. Currently there is an uncertainty of the standardization, due to ongoing development of the standard ISO 15118-20 DIS and additional factors, such as the integration of other protocols like CHAdeMO or Deutsches Institut für Normung eV (DIN) 70121. The developed software design as well as the prototypical implementation thus have a high adaptability for potential changes of the standardizations. The intention in ISO 15118-20 to introduce an additional data type for the charging profiles in Dynamic Control Mode is an example of such a current new feature.
Presently, this solution provides promising results for the use case of depot charging. In particular, the BPT represents a valuable enhancement in terms of optimization compared to other load management systems. In future the extension of the generator for test scenarios is conceivable, in order to produce further use cases and detailed sample data. Additional strategies can be developed and existing ones can be optimized. By means of a simulation these can be validated in the long term. Moreover, they can be employed in a future collaboration with a DSO in order to gain practical insights.


[1] Alliance, Open Charge. 2015a. Open Charge Point Protocol 2.0: Part 2 – Specification. Open Charge Alliance.

[2] ———. 2015b. Open Smart Charging Protocol 1.0 – Interface Description Between Dso and Central System. Open Charge Alliance.

[3] ———. 2017. Open Charge Point Protocol 1.5. Open Charge Alliance.

[4] ———. 2019. Open Smart Charging Protocol 2.0 – Specification Draft Use Case Proposal. Open Charge Alliance.

[5] Alliance, OpenADR. 2012. OpenADR 2.0 Profile Specification A Profile. OpenADR Alliance.

[6] ———. 2015. OpenADR 2.0 Profile Specification B Profile. OpenADR Alliance.

[7] Carron, Virgile. 2014. Untersuchung Geeigneter Optimierungsstrategien Zur Umsetzung Eines Intelligenten Lademanagements Für Elektrofahrzeuge. Hochschule für angewandte Wissenschaften München.

[8] Commission, International Electrotechnical. 2016. Technical Report 61850-90-8: Communication Networks and Systems for Power Utility Automation – Part 90-8: Object Model for E-Mobility. International Electrotechnical Commission.

[9] Detzler, Sarah Katharina. 2017. “Lademanagement Für Elektrofahrzeuge.” PhD thesis, Karlsruher Institut für Technologie (KIT); KIT Scientific Publishing, Karlsruhe. doi:10.5445/KSP/1000057827.

[10] IEA. 2019. Global Ev Outlook 2019. IEA.

[11] ISO/IEC. 2018. “ISO/IEC DIS 15118-20: Road vehicles – Vehicle to grid communication interface – Part 2: Network and application protocol requirements.”

[12] Lee, Z. J., D. Chang, C. Jin, G. S. Lee, R. Lee, T. Lee, and S. H. Low. 2018. “Large-Scale Adaptive Electric Vehicle Charging.” In 2018 Ieee Global Conference on Signal and Information Processing (Globalsip), 863–64. doi:10.1109/GlobalSIP.2018.8646472.

[13] Mültin, Dr. Marc. 2017. IEC 63110 – Standardizing the Management of Electric Vehicle (Dis-)Charging Infrastructures. V2G Clarity.

[14] Poon, Linda. 2019. Why U.s. Cities Aren’t Using More Electric Buses. Citylab.

[15] Projektplaner LEW, LVN und FfE. 2017. Lademanagement an Park and Ride Parkplätzen. Innovations- und Technologiezentrum.

[16] Schwab, Adolf J. 2009. Elektroenergiesysteme – Erzeugung, Transport, übertragung Und Verteilung Elektrischer Energie. 2. Aufl. Berlin Heidelberg New York: Springer-Verlag.

[17] Suhl, Leena, and Taïeb Mellouli. 2013. “2 Lineare Optimierungsmodelle.” In Optimierungssysteme: Modelle, Verfahren, Software, Anwendungen, 31–76. Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-642-38937-5_3.

[18] Wolpert, D. H., and W. G. Macready. 1997. “No Free Lunch Theorems for Optimization.” Trans. Evol. Comp 1 (1). Piscataway, NJ, USA: IEEE Press: 67–82. doi:10.1109/4235.585893.

Consensus protocols – A key to cluster management?


In these times, applications require increasing robustness and scalability, since otherwise they will collapse under the burden of the vast number of users. Cluster managers like kubernetes, Nomad or Apache Marathon are a central factor of this resilience and scalability. A closer look at the insides of cluster managers reveals consensus protocols to be the crucial point. This blog post gives an overview of the most common consensus protocols and their workflows. Furthermore, the concept of a consensus protocol is questioned and potential improvements of the consensus protocols in cluster management are discussed.

Table of contents

List of figures

  1. Container Management Platforms Perferences
  2. Distributed System
  3. Concurrency
  4. State transition
  5. Consensus algorithm
  6. FLP impossibility
  7. Two-phase commit, fault-free execution, phase one
  8. Two-phase commit, fault-free execution, phase two
  9. Two-phase commit, with coordinator failure, phase one
  10. Two-phase commit, with coordinator failure, phase two
  11. Procedure ZooKeeper
  12. Performance test


Container management platforms are the preferred choice when it comes to orchestrating containers with high availability, reliability and scalability. The diagram in Figure 1 shows the distribution of container management platforms from 2016 to 2017.

Figure 1: Container Management Platforms Preferences
Figure 1: Container Management Platforms Preferences [1]

In comparison to the other platforms, a clear trend towards kubernetes is evident. In order to understand why such a trend exists, it is necessary to look behind the scenes. Are there differences or similarities between these platforms in terms of consensus protocols? Is this a critical factor for the emergence of this trend? In order to clarify, a number of essential terms need to be addressed.

Distributed system

A distributed system includes a set of distinct processs sending messages to each other and coordinating to achieve a common objective.
Even a single computer can be viewed as a distributed system.
Memory units, input-output channels and the central control unit also represent separate proceses collaborating to complete an objective.
In this blog post the focus is on distributed systems where the processes are spatially distributed across computers [4].

Figure 2: Distributed System
Figure 2: Distributed System [4]

Properties of a distributed system

The following features are particularly important in relation to a distributed system.

1. Concurrency

Nodes or Processes running simultaneously require coordination. As illustrated in Figure 3, this concurrency results in different events at certain points in time.

Figure 3: Concurrency
Figure 3: Concurrency [2]
2. Lack of global clock

The order of the events must be determined. However, there is no global clock that can be used to determine the order of events on the computer network.
The following two factors can be considered to determine whether an event happened before another.

  • Messages are sent before they are received
  • Every computer has a sequence of events

This results in a partial sequence of events of the system. In order to obtain a total sequence of the system's events, an algorithm is required which requires the communication of the computers in this system.
If an algorithm relies only on the order of events, abnormal behavior may occur.
Such abnormal behavior can be avoided by synchronizing physical clocks.
The coordination of independent clocks is rather complex and still clock drifts can occur.
The time and sequence of events are fundamental obstacles in a distributed system with spatially distributed processes [4].

3. Independent failure of components

A key aspect is the insight that components can be faulty in a distributed system.
It is impossible to have an error-free distributed system due to the large number of potential failures.
Faults can be divided into the following three groups.

  • Crash-fail: The component stops immediately without warning.
  • Omission: The component sends a message which does not arrive.
  • Byzantine: The component behaves arbitrarily, it sometimes exhibits regular behavior but also malicious Byzantine behavior. This variant is irrelevant in controlled environments like data centers.

Based on this assumption, protocols should be designed to allow a system to have faulty components, yet achieve a common objective and provide a meaningful service.

Since every system has failures, a major consideration is the ability of the system to survive if its components deviate from normal behavior regardless of whether they are malicious or not.
Basically, a distinction is made between simple fault-tolerance and Byzantine fault-tolerance.

  • Simple fault-tolerance: In systems with simple fault-tolerance, one assumes that components either exactly follow the protocol or fail completely. Arbitrary or malicious behavior is not considered.
  • Byzantine fault-tolerance: In uncontrolled environments, a system with simple fault-tolerance is not particularly useful. In a decentralized system, where components are controlled by independent actors communicating on the public, unapproved Internet, malicious components must also be expected.

The BAR fault-tolerance extends the Byzantine fault-tolerance and defines the following three classes.

  • Byzantine: Components that are malicious
  • Altruistic: Components that always follow the protocol
  • Rational: Components that only follow the protocol when it is convenient [4].
4. Message transmission

Messages are sent either synchronously or asynchronously.

  • Synchronous: In a synchronous system, messages are assumed to be delivered within a fixed time window. Conceptually, synchronous messaging is less complex because it guarantees a response.
    However, this variant is often impracticable in a genuine distributed system, with computers crashing, messages being delayed, or not arriving.
  • Asynchronous: In an asynchronous system, it is assumed that the network delays, duplicates, or sends out-of-order messages for an infinite amount of time.

Replicated State Machine

A replicated state machine is a deterministic state machine distributed across many computers, though acting as a single state machine. The state machine works even if an arbitrary computer fails.
A valid transaction in a replicated state machine results in a transition to another state.
A transaction represents an atomic operation on the database complying with the ACID principle.

Figure 4: State transition
Figure 4: State transition [4]

A replicated state machine is a set of distributed computers all starting with the same initial value.
Each of the processes decides on the next value for each state transition, illustrated in Figure 4.
Achieving consensus implies the collective agreement of all computers on an output value based on the current value. A consistent transaction log thus is obtained for each computer in the system.
A replicated state machine must continuously accept new transactions in the log, even when:

  • Some computers fail,
  • The network fails to send messages reliably,
  • No global clock exists to determine the order of events.

This is the fundamental intent of any consensus algorithm as depicted in Figure 5.

Figure 5: Consensus algorithm
Figure 5: Consensus algorithm [4]

A consensus algorithm achieves consensus when the following three conditions are satisfied:

  • Agreement (or safety): All non-faulty nodes decide on the same output value.
  • Validity: The value to be decided on must have been proposed by a node in the network.
  • Termination (or liveness): All non-faulty nodes may decide on an output value.

Typically a consensus algorithm has three types of actors in the system.

  1. Proposers, often referred to as leaders or coordinators
  2. Acceptors or followers, are the ones who listen to the requests of the proposers and answer
  3. Learners, are the ones who learn the output value resulting from the vote

A consensus algorithm generally has the following procedure.

  1. Elect: In this phase the leader is selected. A leader makes the decisions and proposes the next valid output.
  2. Vote: All non-faulty components listen to the proposed value of the leader, validate it and propose it as the next valid value.
  3. Decide: All non-faulty components must come to a consensus on a correct output value, otherwise the procedure is repeated [4].
FLP impossibility

As described above, there are differences between synchronous systems and asynchronous systems. In synchronous environments, messages are delivered within a fixed time frame. In asynchronous environments, there's no guarantee of a message being delivered.

Figure 6: FLP impossibility
Figure 6: FLP impossibility [3]

Reaching consensus in a synchronous environment is possible due to assumptions about the maximum time required to deliver messages. In such a system, different nodes are allowed to alternately propose new transactions, search for a majority vote, and skip each node if they do not offer a proposal within the maximum time period.

In a fully asynchronous system there is no consensus solution that can tolerate one or more crash failures even when only requiring the non triviality property [4]

The FLP impossibility describes the property of being unable to accept a maximum message delivery time in an asynchronous environment. termination becomes much more difficult, if not impossible. This is necessary since termination conditions must be complied with in order to reach a consensus, meaning every node not having a fault must decide on an output value [4].

To circumvent the FLP impossibility there are two options.

  • Use synchrony assumptions
  • Use non-determinism

Consensus protocols

Consensus protocols can be distinguished into two basic approaches: Synchrony assumption and non-deterministic.

Approach 1: Use Synchrony Assumptions

If messages are sent asynchronously, termination cannot be guaranteed.
How can it be guaranteed that every non-faulty node will choose a value?
Due to asynchronicity, a consensus cannot be reached within a fixed time window.
This leads to the conclusion that consensus cannot always be reached.
One way to prevent this are timeouts. If there is no progress, the system waits until the timeout and restarts the process. Consensus algorithms like Paxos and Raft apply this method [4].

Simple fault-tolerance

The following section describes algorithms of the Simple fault-tolerance category. These differ from the Byzantine fault-tolerance algorithms in terms of either following the protocol or failing. Byzantine nodes may also exhibit malicious behavior.

Two-phase commit

The two-phase commit is the simplest and most commonly utilized consensus algorithm. As the name suggests, the algorithm consists of two different phases.
The first phase is the proposal phase. It involves proposing a value for each participant in the system and obtaining the answers as shown in Figure 7.

Figure 7: Two-phase commit, fault-free execution, phase one
Figure 7: Two-phase commit, fault-free execution, phase one [5]

The second phase is the commit or abort phase. In this phase, the result of voting is communicated to all participants in the system. Also, it is transmitted whether to continue and decide or erase the log, as depicted in Figure 8.

The node proposing the values is referred to as the coordinator. The coordinator is not required to be selected by means of a special procedure. Each node may act as a coordinator and thus start a new round of the two-phase commit.

It is important that the nodes do not reach a consensus on what a value should be, but reach a consensus on whether or not to accept it.

Figure 8: Two-phase commit, fault-free execution, phase two
Figure 8: Two-phase commit, fault-free execution, phase two [5]

In phase 2 each node decides on the value proposed by the coordinator in phase 1 when and only when communicated by the coordinator. The coordinator sends the same decision to each node, so if a node is instructed to determine a value, they all do. Therefore the condition for agreement is satisfied.

The two-phase commit always aborts, except when each node approves. In all cases, the final value of at least one node was voted on. Thus the condition for validity is complied with.

Finally, termination is guaranteed if each node makes progress and finally returns its vote to the coordinator, who passes it on to each node. There are no loops in the two-phase commit, so there is no possibility to continue forever [5].

Crashes and failure

To understand the failures, it is necessary to consider each state the protocol may take and contemplate what occurs if either the coordinator or one of the participants crashes in this state.

In phase one, before messages are sent, the coordinator could crash, as illustrated in Figure 9. This is not too troublesome as it simply means that Two-phase commit is never started.
If a participant node crashes prior to starting the log, then no harm will result until the proposal message does not reach the crashed node, so this fault can be corrected later.

Figure 9: Two-phase commit, with coordinator failure, phase one
Figure 9: Two-phase commit, with coordinator failure, phase one [5]
If the coordinator crashes, but some proposal messages have not all been sent, some nodes have received a proposal and are starting a two-phase commit round, and some nodes are unaware anything happened. If the coordinator does not recover for a long time, the nodes receiving the proposal will block and wait for the result, which may never complete. This can mean nodes have sent their votes back to the coordinator without knowing they failed. Therefore, the protocol cannot simply be aborted due to a possibility that the coordinator awakens again, sees their "commit" votes, and starts phase 2 of the protocol with a commit message.

Therefore, the protocol is blocked on the coordinator and cannot make any progress. The problem of waiting for a participant to fulfill his or her part of the protocol cannot be completely resolved.

To counteract this, another participant can take over the coordinator's work once the coordinator is determined to have crashed. If a timeout occurs at a node, it can be forced to complete the protocol the coordinator started.
As in a phase 1 message, this node can contact all other participants and discover their votes. However, this requires persistence in each node.

It is also possible that only one node knows the result of the transaction. If the coordinator fails in phase 2 before all nodes are told to abort/transmit the decision, as shown in Figure 10.

Figure 10: Two-phase commit, with coordinator failure, phase two
Figure 10: Two-phase commit, with coordinator failure, phase two [5]

However, if another node crashes before the recovery node can end the protocol, the protocol cannot be restored to its original state.

The same applies to phase 2. If the coordinator crashes before the commit message has reached all participants, a recovery node is required to take over the protocol and safely complete it.

The worst-case scenario is when the coordinator is a participant himself and grants himself a vote on the result of the protocol. Then, if it crashes, both the coordinator and a participant are shut down, ensuring that the protocol remains blocked as a result of a single fault [5].


The Paxos protocol consists of the following phases.

Phase 1: Prepare request

The proposer picks a new number n and sends a prepare request to all acceptors.
If all acceptors have a prepare request (prepare, n), they send a response (ack, n, n', v') or (ack, n, ,). In order for acceptors to respond with a promise, n must be greater than any number ever received before.
Acceptors now propose the value v of the proposal with the highest number they have accepted, if any. Otherwise, they reply with ^.

Phase 2: Accept request

When the proposer receives the responses of the majority of acceptors, it sends an accept request (accept, n, v) with the number n and the value v to the acceptors. The number n is the number from the phase 1 prepare request.
The value v is the highest numbered proposal among the responses.
If an acceptor receives an Accept Request (accept, n, v), it accepts the proposal unless it has already responded to a Prepare Request with a number greater than n. The value v is the highest numbered proposal among the responses.

Phase 3: Learning phase

Whenever an acceptor accepts a proposal, he answers to all Learners with (accept, n, v).
Learners receive (accept, n, v) from a majority of acceptors, decide v and send (decide, v) to all other Learners. Learners receive (decide, v) and the decided value v.

Each Distributed System contains faults. To counteract these, the decision is delayed in Paxos if a proposer fails. A new number is used to start in phase 1, even if previous attempts have not yet been completed.

Paxos is difficult to understand due to its many deliberately open implementation details.
Questions like: When to be certain if a proposer has failed? or Do synchronous clocks have to be used to set the timeouts? are some of these implementation details.
Leader election, failure detection and log management are also purposely kept open to ensure greater flexibility. However, exactly these design decisions are the biggest disadvantages of Paxos [4].

A Leader election mechanism in Paxos might be realized by a simple algorithm like the bully algorithm.
It starts by sending a server id to all nodes. If an id is received, a node sends a response containing the own id.
Next, a node checks if all responses have lower ids than its own. If this is the case the node is a the new leader.
If the id of a node is higher than a the received id, the node starts its own election.

The procedure of performing multiple Paxos decisions consecutively based on a log is called Multi-Paxos.


Unlike Paxos, Raft is designed with a focus on intelligibility.
For the first time, Raft introduced the concept of shared timeouts to deal with termination. If an error occurs in Raft, a restart is performed. Nodes wait at least one timeout period until they try to declare themselves leader again. This guarantees progress.

The shared status in Raft is typically represented by a replicated log data structure. Like Paxos, Raft requires a majority of servers that are available to operate correctly. In general, Raft consists of the following three elements.

  • Leader election: If the current leader fails, a new leader must be elected. The leader is responsible for accepting client requests and managing the replication logs of other servers. Data always flows from Leader to other servers.
  • Log replication: The leader synchronizes the logs of all other servers by replicating his own log.
  • Safety: If a server commits a log entry with a particular index, other servers cannot set a different log entry for that index.

Raft servers can have the following three states.

  • Leader: Typically only one leader exists, all other servers in the cluster are followers. Client requests from the followers are forwarded to the leader.
  • Follower: A follower is passive. It only responds to requests from leaders or candidates, or forwards client requests.
  • Candidate: A computer in this state wants to be elected as the new leader.

If a candidate wins an election to be leader, this leader remains for an arbitrary period of time, referred to as the term. Each term is identified by a term number, which is incremented after each term. Each server must persistently store the current term number.

Raft uses the remote procedure calls RequestVotes and AppendEntries.
RequestVotes are used by candidates during elections.
AppendEntries are used by leaders for replication log entries and as heartbeat [6].

Leader election

Leaders periodically send heartbeats to the followers. A Leader election is triggered when a follower does not receive a heartbeat from the leader for a certain period of time.
Next, the follower becomes a candidate and increments his term number.
He now sends RequestVotes to all other participants in the cluster, resulting in the following three options.

  • The candidate receives the majority of votes and becomes leader.
  • If another candidate receives an AppendEntries message, he must check whether the received term number is greater than his own. If the own term number is greater, the server remains in candidate state and the AppendEntries message is rejected. If the own term number is smaller, the server switches back to the follower state.
  • Several servers became candidates at the same time and the vote did not give a clear majority decision. In this case a new election starts and one of the candidates times out [6].
Log replication

Client requests can initially be regarded as write-only. Each request consists of a command, which is ideally executed by the replicated state machine of all servers. A leader who receives a new client request adds it to his log in the form of a new entry. Each log entry contains a client-specific command, an index to identify the position in the log, and the term number to maintain a logical order.
To ensure consistency, a new log entry must be replicated in all followers.
The leader sends the AppendEntries message to all servers until all followers have replicated the entry securely. If all followers have replicated the entry, this entry can be considered committed together with all previous entries.
The leader stores the highest index of committed logs. This index is sent to the followers with every AppendEntries message, so they can check if their state machine is still in the correct order.
So if two different logs have the same index and the same term number, they store the same command and all previous log entries are identical.
If a follower does not find a suitable position for the log entry when receiving an AppendEntries message based on the index and the term number, it rejects that message.
By this mechanism the leader is certain that after a successful response of the AppendEntries request the log of a followers is identical to his own log.

In Raft, inconsistencies are resolved by overwriting the follower logs. First, the leader tries to find the last index matching the followers log. If found, the follower deletes all newer entries and adds new ones [6].


Raft ensures the leader of a term has all committed entries of all previous terms in his log. This is necessary in order for all logs to be consistent and for the state machines to execute the same commands.
During a Leader election, the RequestVote message contains information about the candidate's log. When a voter's log is more up to date than the candidate's log sending the RequestVote message, it does not vote for that candidate.
Choosing which log is more up to date is based on the term number and the length of the log. The higher the term number and the longer the log, the more up-to-date it is [6].

ZooKeeper Atomic Broadcast protocol (ZAB)

Like Raft, the ZooKeeper Atomic Broadcast protocol (ZAB) achieves high availability by distributing data across multiple nodes. This allows clients to read data from any node. Client writes are also forwarded to the leader. An important design criterion is the incremental assignment of each state change to the previous state. This results in an implicit dependency of the order of the states. Besides the guarantee of replication in order, ZAB defines procedures for Leader election and recovery of faulty nodes.

A term in Raft is defined in ZAB as an epoch of a leader. An epoch is also identified by a number generated by the leader.
The epoch number must also be larger than all previous epoch numbers.
A transaction represents a state change the leader propagates to his followers.
Furthermore, analog to the index in Raft, a sequence number is generated by the leader, which starts at the value 0 and increments.
The epoch number and the sequence number are important to ensure the order of the state changes.
Analogous to the replication log in Raft, each follower in ZAB has a history queue. In this queue all incoming transactions are committed in received order.

In order for ZAB to be executed correctly, the following prerequisites must be met.

  • Replication guarantees reliable delivery, total and causal order.
    • If a transaction is committed by a server, it may be committed to all servers.
    • If a transaction A is committed by a server prior to a transaction B, all servers must commit transaction A prior to transaction B. If a transaction A is committed by a server prior to a transaction B, all servers must commit transaction A prior to transaction B.
    • If a transaction A is committed by a server and another transaction B is sent, transaction A must be placed before transaction B. If a transaction A is committed by a server and another transaction B is sent, transaction A must be put before transaction B. If transaction B is committed by a server, transaction A must be entered before transaction B. If transaction B is committed by a server and another transaction B is sent, transaction A must be put before transaction B. When a transaction C is then sent, C must be put after B.
  • Transactions are replicated as long as the majority of nodes are available.
  • If a node fails and restarts, it must catch up on the missed transactions.

The general procedure outlined below is depicted in Figure 11.

When a leader receives a change update from a client, it generates a transaction with a sequence number and the epoch number. Afterwards, it sends this transaction to its followers. A follower adds this transaction to its history queue and confirms this to the leader via ACK. If a leader has received an ACK from the majority of the nodes, it sends a COMMIT for this transaction. A follower who accepts a COMMIT commits this transaction unless the sequence number received is greater than the sequence numbers in its history queue. This causes the follower to wait for COMMITs from previous transactions before committing.

Figure 11: Procedure ZooKeeper
Figure 11: Procedure ZooKeeper [9]

If the leader crashes, the nodes run a recovery protocol. Both to agree a common consistent state before resuming regular operation and to establish a new leader for transferring state changes.

Since nodes can fail and be restored, multiple leaders can emerge over time, allowing the same nodes to perform a node role more than once.
The life cycle of a node is defined in the following four phases. Each node performs an iteration of the protocol. At any time, a node can cancel the current iteration and start a new one by transitioning to phase 0.
phases 1 and 2 are especially important in terms of coordination for a consistent state for recovery after a failure [7].

0. Leader election phase

Nodes are initialized in this phase. No particular Leader election protocol must be used. As long as the leader election protocol is terminated and chooses a node which is available and the majority of the nodes have voted for it most likely. After termination of the Leader election, a node stores its vote to local volatile memory. When a node n voted for node n0, then n0 is called the prospective leader for n. Only at the beginning of phase 3 a prospective leader becomes an established leader, when it will also be the primary process [8].

1. Discovery phase

In this phase the followers communicate with their prospective leader so the leader collects information about the last transactions his followers have accepted. The intent of this phase is to determine the most recent sequence of accepted transactions between the majority of nodes. In addition, a new epoch is being established so previous leaders cannot make new proposals. Since the majority of followers have accepted all changes from the previous leader, there is at least one follower having all changes accepted from the previous leader in its history queue, meaning the new leader will have them too.

2. Synchronization phase

The synchronization phase completes the recovery part of the protocol and synchronizes the replicas in the cluster using the updated history of the leader from the discovery phase. The leader proposes transactions from its history to its followers. The followers recognize the proposals when their own history is behind the leader's history. If the leader receives approval from the majority of the nodes, it sends a commit message to them. At this point, the leader is established and no longer just a perspective leader.

3. Broadcast phase

If no crashes occur, the nodes remain in this phase indefinitely and send transactions as soon as a ZooKeeper client sends a write request.

To detect errors, ZAB employs periodic heartbeat messages between followers and their leaders. If a leader does not receive heartbeats from the majority of his followers within a certain timeout, he resigns leadership and switches to Leader election in phase 0. A follower also switches to phase 0 if it does not receive heartbeats from its leader within a timeout [7].

Byzantine Algorithms

A Byzantine fault-tolerant protocol should be able to achieve a common objective even in the event of node malicious behavior.
The paper "Byzantine General's Problem" by Leslie Lamport, Robert Shostak and Marshall Pease provided the first proof for the solution of the Byzantine General's problem: It revealed that a system with x Byzantine nodes must have at least 3x + 1 total node to reach consensus.
Byzantine nodes are the cause. If x nodes are faulty, then the system must function correctly after reconciliation with n – x nodes, since x nodes can be byzantine.
In order for the number of non-faulty nodes to exceed the number of faulty nodes, at least n – x – x – x > x is required. Therefore n > 3x + 1 is optimal.
However, the algorithm from the paper "Byzantine General's Problem" only works in synchronous environments.
Byzantine algorithms such as DLS and PBFT prodived for asynchronous environments are significantly more complex [4].

DLS Algorithm

The DLS algorithm was presented by Dwork, Lynch and Stockmeyer in the paper "Consensus in the Presence of Partial Synchrony" as a significant advancement of Byzantine algorithms.
It defines models to reach a consensus in a partially synchronous system. Partially synchronous lies between a synchronous and an asynchronous system, which is defined by the following two statements.

  • There are fixed timeouts until messages are delivered, though these are unknown in advance. The aim is to reach consensus independently of the actual timeouts.
  • The timeouts are known, yet they only apply at an unknown time. The goal is to design a system capable of achieving consensus regardless of when this time occurs.

The process of the DLS algorithm is based on rounds divided into "drying" and "lock-release" phases.

  1. Each round has a proposer and starts with each node giving a value it considers correct.
  2. The proposer suggests a value if at least n -x nodes have given this value.
  3. When a node receives the proposed value from the proposer, the proposer must lure and broadcast this information on the network.
  4. If the proposer learns from x + 1 nodes that they have lured the value, then they commiteted the value as the final value.

For the first time, DLS introduced the terms safety and liveness, which are equivalent to agreement and termination.

In addition to the DLS algorithm, there is also the Practical Byzantine Fault-Tolerance (PBFT) algorithm, available for use in asynchronous environments. However, due to the limited scope of this blog post, it is not discussed here [4].

Approach 2: Non-Deterministic

In addition to the option of using synchrony assumption to bypass the FLP impossibility, non-deterministic algorithms can also be used to achieve this.

Nakamoto Consensus (Proof of Work)

In traditional consensus algorithms, a function f(x) is defined so a proposer and a set of acceptors must all coordinate and communicate to decide on the next value.
Therefore, these algorithms often scale poorly, since each node must be aware of and communicate with every other node in the network. Using the Nakamoto Consensus, nodes do not agree on a value, but f(x) works in such a way that all nodes agree on the probability that the value is correct.
Instead of electing a leader and coordinating with all nodes, a consensus is reached on which node can solve a calculation puzzle the fastest. Nakamoto Consensus assumes that the nodes will use computational effort for the chance to decide the next block. This proof of work consensus is simpler than previous consensus algorithms, eliminating the complexity of point-to-point connections, leader choices, and square communication effort.
However, the Nakamoto Consensus requires time and energy to write a new block to solve the calculation puzzle [4].

Proof of Stake and Proof of Authority

In addition to reaching consensus on resources and mining power via PoW, the mechanisms Proof of Stake (PoS) and Proof of Authority (PoA) proceed differently.
The PoS mechanism operates with an algorithm that selects participants with the highest stakes as validators, assuming the highest stakeholders receive incentives to ensure a transaction is processed. Meanwhile, PoA uses identity as the only verification of the authority to be validated, so there is no need to use mining [10].

Cluster manager

Now, knowing the consensus protocols, a look with regard to the use of consensus algorithms in existing cluster managers is possible.

Cluster Manager Distributed key value store Consensus Protocol
Google Borg Chubby Paxos
Kubernetes/Openshift Etcd Raft
Nomad Raft
Docker Swarm Raft
Apache Marathon based on Mesos Apache Aurora ZooKeeper
Kontena Ectd Raft
Amazon Elastic Container Service (ECS) Consul Raft
Table 1: Overview of consensus protocols in cluster managers

As Table 1 illustrates, the tendency towards Raft is strong [11-14].
Surely this is due to the good comprehensibility and the many implementation requirements compared to Paxos. Although Paxos is used by Google, but rather historically in Borg. ZooKeeper is very similar to Raft and therefore a reasonable choice for Apache Marathon.

Cluster management without a consensus protocol

In contrast to a cluster management with consensus algorithm, the paper "Subordination: Cluster management without distributed consensus" proposes a cluster management without consensus algorithm.
In this paper cluster management is achieved by subdividing the nodes in the cluster via subordination. The main goal is to distribute the workload over a large number of nodes, similar to a load balancer. Cluster management without distrubuted consensus relies on the following conditions.

  • Few frequented network configuration changes
  • It is not intended for managing updates of a distributed database.
  • The node performance and node latency must remain stable.
  • A constant network traffic is assumed [15].

Node Mapping

A fundamental idea of this variant of cluster management is node mapping or the evaluation of nodes.
In general, node mapping is defined as a function that maps a node to a number. This allows nodes to be compared with each other.
Due to Amdahl's law: The higher the link performance, the higher the speedup.
Instead of including the node performance and latency separately in the mapping, the node performance and latency can be correlated. This means that only the ratio is included in the mapping [15].

Subordination Tree

In the subordination tree, each node is uniquely identified by a level and an offset. A distance can be calculated based on this.
To select a leader, a node evaluates all nodes in the network based on the mapping and calculates the distance. The node with the smallest distance and ranking is selected.
Level and Offset is only used for linear topologies (at a switch).
For non-linear topologies, latency is used for mapping.
Instead of maximizing performance, the main goal of the algorithm is to minimize network traffic per time unit if the leader is chosen and the number of nodes is unknown.
Due to the condition of low traffic changes to the network configuration, the algorithm could be initially executed and persisted when the cluster is installed.
The paper includes a performance and subordination tree test.
In the performance test, a full network scan was performed as a basis for comparison and a leader was determined.
The IP mapping was then performed by using the node mapping algorithm and a leader was determined. Figure 12 displays the result [15].

Figure 12: Performance test
Figure 12: Performance test [15]
Using node mapping algorithm, the subordination tree is built up about 200% faster.

The subordination tree test examined whether the resulting trees reached a stable state. For 100-500 nodes the structure of the subordination tree was repeated in order to obtain meaningful results. After 30 seconds the test was aborted to define a temporal upper limit. The results reveal that a stable state was achieved well below 30 seconds.


In conclusion, consensus algorithms are a central aspect of cluster management and distributed systems. Older protocols like Paxos have revealed their strengths and weaknesses over time. The experience gained has been incorporated into newer protocols such as Raft and ZooKeeper to improve understanding and robustness.
Byzantine consensus algorithms represent a particularly exciting application. Especially in times of increasing Internet crime and organized hacking, manipulated behavior of individual nodes in the public Internet must be assumed.
New technologies such as blockchain offer a new perspective on matters. For instance, a key-value store could possibly be developed using Proof of Stake or Proof of Authority as a consensus algorithm. This key-value store could be used in a public kubernetes cluster instead of Etcd. This would provide additional protection against Byzantine nodes. On the other hand, there are also incentives against cluster management with a consensus algorithm. The paper "Subordination: Cluster management without distributed consensus" shows a way to manage a cluster without a consensus algorithm.
However, this approach has many implicit conditions which cannot be guaranteed in a real distributed system. Another approach supporting this hypothesis is K3s, which shows that cluster management on a small scale is also possible without a consensus algorithm. K3s uses SQLite3 as database. However, it is possible to use Etcd3. The field of application is IoT, where often fixed IPs are assigned and therefore hardly any changes take place.
Ultimately, there is no generic solution. Currently a tendency to Raft in controlled data centers seems to exist. Nevertheless, depending on the application, a decision on whether to use cluster management with simple consensus, possibly Byzantine consensus or without consensus remains.


  1. Survey Shows Kubernetes Leading As Orchestration Platform
  2. The Techglider
    Kartik Singhal –
  3. A Cursory Introduction To Byzantine Fault Tolerance and Alternative Consensus
    Alexandra Tran-Alexandra Tran –
  4. Let's Take a Crack At Understanding Distributed Consensus
    Preethi Kasireddy-Preethi Kasireddy –
  5. Consensus Protocols: Two-phase Commit
    Henry Robinson –
  6. Understanding the Raft Consensus Algorithm: an Academic Article Summary
    Shubheksha –
  7. Architecture Of Zab – Zookeeper Atomic Broadcast Protocol
    Guy Moshkowich –
  8. ZooKeeper’s atomic broadcast protocol: Theory and practice, Andr´e Medeiros March 20, 2012
  9. A simple totally ordered broadcast protocol, Benjamin Reed, Flavio P. Junqueira
  10. Proof Of Authority: Consensus Model with Identity At Stake.
    POA Network- POA Network –
  11. Consensus Protocol
  12. Raft Consensus in Swarm Mode
  13. The Mesos Replicated Log
  14. The Developer Friendly Container & Microservices Platform
    Kontena, Inc –
  15. Gankevich, Ivan & Tipikin, Yury & Gaiduchok, Vladimir. (2015). Subordination: Cluster management without distributed consensus. 639-642. 10.1109/HPCSim.2015.7237106.

Social Engineering – Hacking the human OS


Nowadays, our secure systems are already sophisticated and perform well. In addition, research on subjects such as quantum computers ensures continuous improvement. However, even with a completely secure system, we humans pose the most significant threat. Social engineers prey on this to conduct illegal activities. For early detection and prevention, this paper deals with the analysis and discussion of social engineering attacks. The major challenge is to balance trust and mistrust. However, this threshold varies depending on the application. Therefore, it is advisable to extract patterns from past incidents and to recognize them in future scenarios. First, the basic principles and techniques of social engineers are introduced. Three different models are then analyzed. The effects of social networks and the feasibility of the models are outlined in the 58th US election. Finally, possibilities for avoidance, prevention and recovery are discussed.

Table of contents

Web Performance Optimization for Continuous Deployment – Move fast and don’t lose performance

The performance of websites today is a decisive factor in how many users visit them and thus how much money can be earned from them. The impact of this fact is further enhanced by the widespread use of mobile devices and the speed of the mobile Internet.
To counteract the development of heavyweight websites, web performance optimizations should be integrated into the development process as early as possible.
As part of this blog post I want to address this topic in the context of Continuous Deployment using the following sections.