Cloud security tools and recommendations for DevOps in 2018

Introduction

Over the last five years, the use of cloud computing services has increased rapidly, in German companies. According to a statistic from Bitkom Research in  2018, the acceptance of cloud-computing services is growing.

Cloud-computing brings many advantages for a business. For example, expenses for the internal infrastructure and its administration can be saved. Resource scaling is easier, it can be done when the appropriate capital is available. In general the level of innovation within companies is also increasing, so this can support new technologies and business models. Cloud-computing can also increase data security and data protection, because the big providers must comply with standards and obtain certification through the EU-DSGVO General Data Protection Regulation, which came into force in May 2018. There is even special cloud providers that offer such services, so these are working based on the regulations of the TCDP (Trusted Cloud Data Protection) certification.

However, not all that glitters is gold. Based on a study by KPMG Bitkom Research, 19 percent of the interviewed companies reported that data security incidents have happened in their company by the use of public cloud solutions within the last 12 months. But in my opinion, 19% is far too high, because we as experts, have the task to increase this security even more!

Secure Cloud Computing Meme
Figure 1: secure cloud computing

In the end, there is the question:
Which areas or components within the cloud-computing technologies are affected in terms of data-security?

To be honest, there are a lot of components involved, these are:

  1. Cloud-infrastructures:
    1. SaaS – Software as a Service
    2. PaaS – Platform as a Service
    3. IaaS – Infrastructure as a Service
  2. Devices that use cloud-services and therefore also
  3. The user / developer / administrator  (The human)

That’s many, right?

Continue reading

Usability and Security

Usability and Security – Is a tradeoff necessary?

Usability is one of the main reasons for a successful software with user interaction. But often it is worsened by high security standards. Furthermore many use cases need authentication, authorisation and system access where high damage is risked when security possibilities get reduced. In this article the dependence of these two areas as  well as typical mistakes with their possible solutions are shown to bury a fallacy in IT: “There needs to be a tradeoff between security and usability”.

“Too secure” services

“… security is only as good as it’s weakest link, and people are the weakest link in the chain.’” – Bruce Schneier

Security can depend on usability. Especially in enterprise context. To make an environment as secure as possible very strict guidelines are introduced: e.g. at least 13 characters, two special characters, three numbers, large and lower case characters, password needs to be changed every month, every access needs different passwords, new passwords may not be too similar to old one etc. Sure, no bruteforce attack could never crack such passwords and hackers will also have their difficulties but what most admins establishing these rules are not aware of, are users trying to bypass these security measurements.

The typical non-IT user  is not aware of security threats and tries to get his work done as easy and fast as possible. Such high password guidelines will not just make users angry, they could start writing their passwords on notes that are on their monitors due to difficulties to remember these informations. When passwords need to be changed every month users start using patterns not to forget their passwords. A friend who was system administrator told me of employees talking about such patterns. They had to change their password every quarter and used the season with year: Spring2017, Summer2017, etc. to remember them.

Fortunately there are solutions to compensate the balance of usability and security:

Single-Sign-On: One secure password for all services prevent users making notes.

Reducing forced changes: Changing a password only once a year demands less from an user.

User motivation: When users decide for their own sake to improve security, they are more likely to cooperate. A great possibility for consumer software is a password strength bar. The user can use a pretty simple password but then he or she is pointed to its weakness and gets proposals to improve it. With positive feedback (like green colors, animations and icons) the motivation can be increased too.

Password Manager: Password manager are generating very secury, dissimilar, cryptic passwords that can be requested by one master password.

Fingerprints: A hardware solution where instant auth   entication and authorisation is possible without remembering a password. But partial fingerprint-based systems can be cracked.

YubiKey: Two-factor authentication through a physical device that is plugged into a usb slot. For every service a public-private-key-pair is generated, the service only has access to the public key. Supports One-Time-Passwords where every new log-in uses another passcode and it is phishing resistant because key pairs are generated on base of domain (e.g. https://facbook.com is not valid due to missing e). By clicking on the “y”-button before logging in, it is verified that the user tries to authenticate with this device. That prevents man-in-the-middle attacks. Stolen YubiKeys can be disabled when using the YubiCloud-service. 

Using alternative software

Secure services must be as easy to use as insecure services or users will tend to use insecure alternatives. We are used to great consumer products of big companies like Google or Apple but business solutions are often not that easy to work with, especially old systems. So employees could use simpler file sharing services (like Google Drive or Dropbox), private mail-services (Hillary’s mistake) and office software (like Google presentation). Thereby the uploaded data can be on servers of potential competitors.

Usability wins

Consumer software does not get successful when it is incredibly secure, it gets successful when customers like to use it. For Example WhatsApp and Facebook were without encryption for a long time and many people do not know how to securely send their mails, even when highly sensitive data is shared. Personally, I had this experience with a bank that sent account data without encrypting it (though we requested it) because the contact person was not that familiar with this “computer stuff”.

Typical errors and their solutions

Not always security depends on usability but there are some use cases where usability can be increased without unstabilizing the system.

Validation

Validation is important to prevent user from adding wrong input or execute sql injection and cross site scripting attacks. Backend validation is absolutely necessary for a reliable system, frontend validation should be used to show instant feedback. But it should not be too strict or imprecise.

Xi Wu wants to register with her real name though her fore- and lastname contain less than three characters. René and Søren want to use their special characters too. Just code relevant characters like < > & “ could inject attacks. Furthermore error messages should be precise and help inserting the input correctly. Hints like “input not valid” will not make users happy. And of course: DO NOT delete the form if one or more inputs are incorrect when using backend validation only.

To provide instant feedback, frontend validation is very pleasant for users. Thereby the location of the error is easier, the context of the input field is fresh in mind, it is less likely to unintentionally skip required fields, it satisfies the user (keyword gamification) and improves efficiency.

There are also some common errors when developing live inline validation. The error should not be shown when users write into the input the first time. The same applies to messages after the submit button has been pressed. Only when the field loses focus the first time a hint should be shown that fades out as soon as the input is correct.

Updates

Updates are fundamental for security. New security breaches need to be fixed and deployed on every device, but most IoT-appliances and routers have no automatic update function. It needs to be done by the user itself which is rarely carried out.

Even IT-experts have problems to do these necessary actions privately:

Source: http://www.properaccess.com/docs/Tripwire_SOHO_Router_Insecurity_white_paper.pdf

 

Performance

Performance is one of the most crucial usability parameters but security can decline it. Especially encrypting large amounts of data is bad for runtime. A tradeoff should be made: Decide which impact the leak of data could have and use encryption just for critical information. When performance is bad users will tend to use insecure alternatives for every data.

Security Messages

They are ignored, they seem annoying but are very important. Before installing apps with access to sensible parts like camera, data system or microphone, the user should be aware of that, though he or she does not want to read the warnings. But the good thing is: Users can be manipulated to pay more attention to messages, thanks to neuroscience. Bonnie Anderson defines in a talk for security specialists three parameters that affect perception of such dialogs: Dual-task interference (DTI), Habituation and Generalisation.

Dual-task interference (DTI)

Users will ignore warnings when they are doing something “more important”. Brains have problems doing several tasks at one time. When showing a security message (like “browser detected unusual behaviour”, etc.) a suitable time slot should be used to maximize the user’s attention. Ideally before or after a task is done, for example during loading time or after a video is watched.

Habituation

The brain links a new visual input to an older, similar looking one, comparable to cache. When showing users same looking messages he or she is more likely to click them away or ignore them. When using different designs and animations, they are more likely to pay attention. Due to Bonnie Anderson’s studies even four alternating message designs or animations decline the ignoring rate.

Generalisation

Due to generalising between similar looking dialogs, a warning message should look different in comparison to an ordinary info message. Also frequent notifications decrease the attention to security messages.

Team structure and Management

One of the most important and effective options to combine great usability and high security standards in software is communication. Cross functional teams where an usability specialists works in the same team as a security expert tend to much better harmony between these two IT areas than separating teams into special disciplines. The following image visualizes cross functional teams:

Another important point regarding management, is “security by design”. Security needs to be embedded into sprints in planning, design and implementation phases from beginning.

Final thoughts and research questions

Security and usability are both fundamentally important to develop great software and can even depend on each other, when users start to circumvent impractical safety guidelines. The tradeoff of these two subjects can be minimized when circumventing the shown errors, using alternatives and experts of both subjects work together. This does mean larger effort and costs but ends up in better, more successful software and secure digital company environments.

Build Bridges!

The fusion of these two areas is still not completed and there are some research questions for coming investigations:

Are there possibilities to simplify user interactions regarding security we do not know yet?

With studies examining user behaviour, creating personas, interviewing people and research in cognitive psychology entirely new possibilities could be possible.

Can we increase safety awareness in our society?

Should “behaving secure with software systems” be integrated to IT curricula in schools? And how do we teach people more efficiently to create such awareness?

Is it possible to make maximum secure systems without any limitations regarding usability?

Authorization and authentication processes still reduce usable system due to users must enter id and password. Can this step be simplified by using external hardware? Especially smartphones have great potential to be used as key for PC applications. Unfortunately it needs a lot of adaption to make this possibility common in IT world.

Can we make software management more effective using new methods?

Traditionally, software development, security and usability have been studied separately, and each has evolved special development processes. Few development cycles for one address the interests and concerns of the other two. In order to design truly usable and secure systems we must integrate these three disciplines and examine more efficient management methods.

Security in Smart Cities

Today cities are growing bigger and faster than ever before. This results in various negative aspects for the citizens such as increased traffic, pollution, crime and cost of living, just to name a few. Governments and city administrations and authorities are in need to find solutions in order to alleviate these drawbacks. Over the past years one solution arose and has grown continuously was the concept of the smart city.

The concept of smart cities is based on the application of connected systems to manage a city efficiently. There are various aspects in which smart cities emphasize such as transport control, energy and water transport or public health and safety management. The broad distribution of Internet of Things (IoT) technologies favors the development of smart cities. IoT devices are considered the backbone of a smart city as they function as sensors and can be applied in many environments.

In some environments today’s cities are already really smart. For example, many large cities are using a traffic and transport control system, which can control the flow of traffic and make it more efficient, reducing or even avoiding congestion and increase traffic flow. Smart cities are becoming reality. But as the smart city technologies touching more and more aspects of the citizens everyday life, these technologies drawing increased attention from cyber attackers. Since many of the smart city technologies control safety-critical systems like the already mentioned traffic and transport control system, those systems are worthwhile and, because of the security concerns about the underlying IoT technology, often weak targets.

Issues, Threads and Challenges

Security of Hardware

The IoT sensors are probably the biggest issue in smart cities. These devices are often not updated frequently or not updated at all and are not or only badly tested. Unfortunately, these devices are out in the wild, they are literally everywhere in a smart city. Even worse, to fulfill their purpose they may need to send and receive data over wireless communication channels such as WIFI or cellular networks, which means they are easily accessible on the network layer, and often are interconnected and part of a larger network. The lack of standardization makes it easy for attackers to hack such IoT devices and to feed fake data into the system, causing errors, failures and shutdowns.

Smartphones used by the citizens to access services of a smart city are also considered a security issue. Although these devices are generally better tested and updated, there are many legacy models which lack security updates and thus are easy to attack.

Security of Communication

The data generated by the citizens is a valuable good, which an attacker might be interested in. A security issue in the communication between a citizen’s client and a smart city service could lead to data theft of the citizens data. There are various technical issues an attacker could exploit to breach the security of the communication channel, such as incompatible file formats, weak or faulty encryption protocols and irregularities in response capacity.

Networks used by IoT devices likely come from many different vendors and manufacturers. The interoperability of different groups of devices can also cause security issues. An additional issue arises from the interconnected nature of the devices in a smart city: if only a single device gets corrupted, it can threaten the entire network.

Large Attack Surface

The number of entry points into the smart cities systems is enormous. A variety of sensors, IoT devices and smartphones are interconnected to large and complex networks to get access to or provide various services. Each of these devices could have vulnerabilities and thus get attacked and finally hacked. Because of the interconnected nature of these devices, a single compromised device constitutes a threat for the entire network and system it is connected with.

There are multiple approaches an attacker could use to compromise such devices. Security protocols encrypting the data send and received by the device over wireless communication channels may be unsafe. The device hardware components could have bugs and be exploitable. The firmware could be attackable because it lacks updates or is badly configurated.

Bandwidth Consumption

A myriad of devices in a smart city are sending and receiving data. In most cases, this data is transmitted over wireless communication channels. The most common wireless communication channels for sure are WIFI and cellular networks. Although the data stream generated by a single device may be small, the sheer number of devices continuously generating data at the same time accumulate to a massive amount of data which needs to be transmitted over the wireless communication channels. The bandwidth of these communication channels however is limited. Especially when many devices use the same communication channel, data rates decreasing and the protocol overhead is increasing, making these transmissions more and more inefficient. Also, the extensive usage of wireless communication channels can affect other wireless channels on a different spectrum, causing interferences with other services such as radio or television.

Another issue arises if a great number of devices communicates with one single server or system. Again, because of the sheer number of these devices present in smart cities, the accumulated data can overload the system and lead to service failures.

Application Risks

The citizens of a smart city interact with the services of the city mainly via their smartphone. Applications installed on these smartphones represent the interface between the citizens and the services of the smart city. Developers can create apps, which access the services of the smart city in order to achieve added value for the citizens. It is easy to provide apps via virtual stores, and a hacker could exploit this functionality. Malicious apps, developed by hackers, could be used to violate the privacy of the citizens or could contain security holes like backdoors. The more apps the citizens install on their smartphones, the more likely some of the apps contain malicious code.

On the other hand, it does not need a hacker to create an app with security issues. Also, apps from serious developers sometimes contain security problems, which are then vulnerable to attacks.

Possible Solutions

There are plenty of possible technical and organizational solutions to solve the issues and challenges mentioned above and to secure a smart city. The solutions listed here may be incomplete, but nevertheless they are very important ones that should be implemented as a foundation of a smart city.

Basic Security

Smart city solutions such as IoT devices, sensors, smartphones as well as data centers should implement basic security mechanisms:

  • Strong cryptography: Data should be encrypted using up to date encryption protocols and standards. This concerns all communication channels (wired and wireless) and all data at rest and in transit.
  • Authentication: A username and password should be required to use the functionality of any system. Mechanisms like certification or biometric authentication can also be used to increase security.
  • Authorization: Permission-based usage of functionality.
  • Automatic updates: Software as well as firmware should be updated frequently and automatically in a secure manner.
  • Auditing, alerting and logging: Mechanisms to audit and log any security relevant event. The logs should be saved so that they cannot be manipulated.
  • Anti-tampering: Systems should implement mechanisms to prevent tampering with their data by unauthorized access.
  • No build-in accounts: Systems should not have backdoor, undocumented or hardcoded accounts. Such build-in accounts constitute severe security issues if known to malicious persons.
  • Non-basic functionality disabled: Only functionality really needed for the systems purpose should be enabled. All other functionality should be disabled.
  • Fail safe: The system should remain secure in case of a malfunction or crash.
  • Secure by default: Secure default configurations in each system.

Tests

Systems and solutions used in smart cities must be properly tested before they get implemented. This includes auditing the solutions for security vulnerabilities, weak security protection and compliance with basic security requirements. Beside the compliance with basic security requirements mentioned before, the solutions should pass some advanced security checks. Penetration tests ensure security of the solutions by revealing attack vectors. Hardening verifies that systems are properly separated and run in isolated spaces. Certification should be used to evaluate solutions and to support decision-making.

Maintenance

During operation the solutions implemented in smart cities must be supported, tracked and monitored. There are several requirements to ensure security during operation:

  • Monitoring: Systems need to be monitored in order to get information about events which could threaten the correct operation of services such as system stability, suspicious activities, abnormal behavior or bad performance.
  • Patching: Systems should be updated continuously via well tested patches. Updating the firmware of IoT devices can be difficult due to bad or no standardization. The update procedures of the systems themselves must be secure.
  • Assessments and auditing: Testing systems to verify they comply with security standards and policies. After deploying a patch to a system, it needs to be tested again.
  • Protection of logging environments: Logs are crucial to identify service-threatening events. They must be stored and transmitted in a secure manner, so that it is impossible to manipulate their information.
  • Access control: Every access to a system of a smart city must be monitored and include information about identification, time and access type.
  • Cyber-threat intelligence: To identify and react quickly to new threats and attacks, responsible organizations can use cyber-threat intelligence. Since many attacks use the same or similar vulnerabilities, they can be prevented before they occur in a system of a smart city.
  • Compromise reaction and recovery: Well defined procedures in case a system of a smart city gets compromised. If such an event happens, for example, certificates and keys must be declared invalid. In the aftermath organizations must retrace the incident and draw conclusions out of it, so that this incident may not happen again under similar circumstances.

Governance and Management

Smart cities depend on data directly or indirectly generated by their citizens. The more data a smart city can use, the better the quality of services it can deliver to its citizens. Thus, it is important that the citizens trust their governance. If they lose their trust, they will stop using services of the smart city, generating less data and weakening the quality of the services for the remaining users.

To date, there exist no documented universal governance and management structures for smart cities considering privacy of their citizens. Instead, cities build and use their own structures, with no coordination. With respect to the potential risks, a strategic and coordinated approach to form universal governance and management structures is needed to build up and maintain trust from the citizens in their smart city.

A potential governance and management structure could consist of three parts [4]: advisory boards, transparent data policies and emergency response teams. Advisory boards assess in which ways the smart city authorities generate, store and use data. They also account issues like confidentiality, anonymity, deletion or sharing and publishing as open data. Transparent data policies define and publish how authorities handle the data they gather and use, e.g. what personal data is hold, why and how it was collected and in what way it is used. Finally, the emergency response teams are groups of the privacy and security department or IT services, that react upon security incidents within the smart city systems. Their purpose is to reduce the impact of incidents and to get the systems up and running again in case of hacks or system failures.

Conclusion

Smart cities touch many fields of their citizens everyday life. There are two major points from which issues can arise and thus need special handling: smart city services control a lot of safety-critical infrastructure and they generate and use an enormous amount of personal data of their citizens. This results in several issues on different levels. To maintain the citizens trust and thus to keep the smart city well working it is necessary to solve the issues such cities encounter. Like the issues, the solutions must be implemented on different levels, too. In fact, the issues smart cities face and their appropriate solutions are very similar to the ones companies may be confronted with.

References

  1. Mohamad Amin Hasbini, Martin Tom-Petersen: The Smart Cities Internet of Access Control, opportunities and cybersecurity challenges. https://securingsmartcities.org/wp-content/uploads/2017/09/SSC-IAC.pdf (28.06.2018)
  2. Cesar Cerrudo, Mohamad Amin Hasbini, Brian Russell: Cyber Security Guidelines for Smart City Technology Adoption. https://securingsmartcities.org/wp-content/uploads/2016/03/Guidlines_for_Safe_Smart_Cities-1.pdf (28.06.2018)
  3. Mohamad Amin Hasbini, Cesar Cerrudo, David Jordan: The Smart City Department Cyber Security role and implications. https://securingsmartcities.org/wp-content/uploads/2016/03/SCD-guidlines.pdf (28.06.2018)
  4. Ernst & Young LLP: Cyber Security; A necessary pillar of Smart Cities. https://www.ey.com/Publication/vwLUAssets/ey-cyber-security-a-necessary-pillar-of-smart-cities/%24FILE/ey-cyber-security-a-necessary-pillar-of-smart-cities.pdf (29.06.2018)

Preserving Anonymity

Since the amount and value of data are constantly increasing more and more data of each individual is collected and processed. Moreover Facebook’s recent data leak with Cambridge Analytica shows that collected data cannot be absolutely securely treated and stored.

In 2014 and 2015, the Facebook platform allowed an app … that ended up harvesting 87m profiles of users around the world that was then used by Cambridge Analytica in the 2016 presidential campaign and in the referendum.

This is one of the reasons why we’ll take a look on our digital identity, how it can be linked to our real identity and how we can restrict that. An understanding of what data is collected while surfing on the web is a first step in preserving anonymity.

Anonymity has a lot of advantages and disadvantages and the discussion if it should be strengthened or not is roughly as old as the internet itself. We just concentrate on technical aspects here. But wait, what exactly does anonymity mean?

Definition

Anonymity is derived from the Greek word ἀνωνυμία, anonymia, meaning “without a name” or “namelessness”.

That’s more a historical point of view. Today we don’t need a name to identify an user on the internet, just think of the IP address. Thus a modern definition of anonymity is being unreachable. You can communicate, access or publish information and use services without revealing your identity.

Common identification methods

A very common method of identification on the internet is the user revealing his information on his own, such as registering on Facebook with the real name or purchasing clothes on Amazon.

There are several methods to track your usage behaviour, the websites you visit and the services you use, so that a clear picture of you can be created. These identification methods are split up into two categories: cross-domain tracking and cross-device tracking.

Figure 1: Common identification method categories.

Cross-domain states tracking on one device, in one browser across different websites and applications, whereas cross-device tracking relates to multiple devices and applications.

Cross-domain tracking techniques include:

  • Tracking cookies
  • Virtual fingerprints
    • DNS profile
    • Browser profile

Tracking cookies are sent in a header field of the HTTP request and response, so that a server can recognise recurring users. These cookies are mostly not limited to one website, but shared across multiple domains through ad networks. Virtual fingerprints are individual characteristics, which distinguish one user of another, for example the combination of the browser type, operating system, installed fonts and plugins. On https://panopticlick.eff.org/ you can start a short test to see if your browser is unique and safe against cross-domain tracking.

Cross-device tracking techniques:

  • User permission
  • Device fingerprint
  • IP address
  • Eavesdrop
  • Compromised system
  • Ultrasonic cross-device tracking

As mentioned above obtaining the user’s permission is a great tool for cross-device tracking. If you’re logged into your Google account on several devices one can perfectly comprehend which services you used on which device. Another method, similar to cross-domain tracking with browser fingerprints are device fingerprints. These consist of your phone type, operating system, wifi and browser information and several other. If not using any proxy or VPN the IP address is also a good indicator. A user’s data traffic can be obtained through eavesdrop on the provider or proxy side and through compromised systems, consequently indicating user’s identity. Ultrasonic cross-device tracking is a method where a device emits a unique “ultrasonic audio beacon” that can be picked up by your device’s mobile application containing a receiver. It does not require any connections, but only access to the microphone.

With these techniques a lot of separate information can be collected, which often lead through connections to an user’s identity.

Defending your anonymity

Defending your anonymity is not an easy task to achieve, because it is not a discrete condition but a spectrum. This section is targeting users with a normal internet behaviour and should give a first insight on the tools and methods for preserving anonymity.

Figure 2: Anonymity spectrum.

An unwritten law on the internet is using pseudonyms instead of real names in online forums and games, so you don’t give out your identity to everyone. Exceptions are social networks like Facebook, where the real identity is a central part of the application.

To come back to the two mentioned categories, cross-domain tracking and cross-device tracking, there are tools and techniques to prevent this kind of tracking.

Tracking-Cookies can be either marked as not preferred through the Do Not Track Value (DNT) or completely disabled in most browser’s settings. The user experience on websites can suffer from disabled tracking cookies, as the individualization of websites relies on these cookies.

Virtual fingerprints can be reduced through the usage of a common, less individualized browser, e.g. Safari on an iPhone. Disabling JavaScript prevents obtaining installed fonts and plugins, but at the expense of usability. The Tor browser includes a feature for reducing the browser fingerprint and is immune against most of the conventional browser fingerprinting techniques.

Prevention of cross-device tracking techniques includes:

  • Anonymizer
    • Proxy
    • VPN
    • Onion routing (Tor, I2P)
  • Anonymous remailers (Mixmaster) and filesharing (Freenet)
  • Friend-to-friend networks (Ripple)

With an Anonymizer the real IP address can be hidden through a connection to a broker, which forwards the requests and responses. There are different architectural types, including proxy, VPN and onion routing. Anonymous remailers and filesharing tools are based on the peer-to-peer (P2P) principle, where systems are connected with each other without the need of a central instance. Friend-to-friend networks are working similar, but the peers are solely familiar people.

User permissions are critical for preserving anonymity and should be chosen wisely, so that unnecessary information and rights are not provisioned. As described above, there are several ways to protect your digital identity, but mostly at the expense of usability.

Further research

There are several unaddressed topics which are gaining relevance in the future and thus require further research:

  • How to provide positive aspects of anonymity without strengthening cyber crime?
  • How would the internet change if every user would need a clear identification for access?
  • How does the uprising connectivity of all kinds of objects (IoT) influences anonymity?

References

[ 1 ] – https://www.theguardian.com/technology/2018/jul/11/facebook-fined-for-data-breaches-in-cambridge-analytica-scandal

[ 2 ] – https://www.dictionary.com/browse/anonymous

[ 3 ] – https://thehackernews.com/2017/05/ultrasonic-tracking-signals-apps.html

[ 4 ] – https://www.eff.org/issues/do-not-track

Differential Privacy – Privacy-preserving data analysis

It is widely known that tech companies, like Apple or Google and their partners collect and analyse an increasing amount of information. This includes information about the person itself, their interaction and their communication. It happens because of seemingly good motives such as:

  • Recommendation services: e.g. word suggestions on smartphone keyboard
  • Customizing a product or service for the user
  • Creation and Targeting in personalised advertising
  • Further development of their product or service
  • Simply monetary, selling customer data (the customer sometimes doesn’t know)

In the process of data collection like this clients’ or users’ privacy is often at risk. In this case privacy includes confidentiality and secrecy. Confidentiality means that no other party or person than the recipient of sent message can read the message. In the special case of data collection: no third party or even no one else but the individual, not even the analysing company should be able to read its information to achieve proper confidentiality. Secrecy here means that individual information should be kept secret only to the user.

Databases may not be simply accessible for other users or potential attackers, but for the company collecting the data it probably is. Despite anonymization/pseudonymization, information can often be associated to one product, installation, session and/or user. This way conclusions to some degree definite information about one very individual can be drawn, although actually anonymized or not even available. Thus, individual users are identifiable and traceable and their privacy is violated.

The approach of Differential Privacy aims specifically at solving this issue, protecting privacy and making information non-attributable to individuals. It tries to reach an individual deniability of sent/given data as a right for the user. The following article will give an overview of the approach of differential privacy and its effects on data collection.

Continue reading

Beyond Corp – Google’s approach to enterprise security

What is Beyond Corp?

Beyond corp is a concept which was developed and is used by Google and is by now adopted by some other companies. The idea behind it was to get away from the intranet and its perimeter defense, where, if you breach the perimeter you can access much of the enterprise data. With Beyond Corp, your enterprise applications are not hidden behind a perimeter defense but are instead deployed to the internet, only accessible via a centralized access proxy. With the deployment of the enterprise applications to the internet, Google establishes a zero trust policy – anyone no matter from which IP tries to access a enterprise application has to have sufficient rights, determined through device and user data.

The trigger for this to happen was the “Operation Aurora” in 2009, an advanced persistent threat (APT) supposedly originating from China, where data from Google and around 35 other companies in the USA was stolen. Since you wont detect an APT through monitoring, because the many single steps in themselves are uncritical and hard to relate if the attackers take their time (talking about several weeks), but are easy to achieve once you entered the intranet successfully,  Google decided to start the Beyond Corp project to find a more secure architecture for their enterprise.

 

Components of the beyond corp infrastructure

Securely identifying the device

Device Inventory Database & Device Identity

Beyond corp uses the concept of “managed devices”. A managed device is managed, maintained and monitored by the enterprise IT. It has to have an entry in the device inventory database and receives a “managed device certificate” after fulfilling several security requirements. Only managed devices can access corporate applications. The certificate provides the device identity to the system, is renewed periodically if the device fulfills the security requirements or is revoked if it doesn’t. The certificate itself doesn’t provide any access rights, it only servers as key for a set of information about the device which is stored in the device inventory database like patch and installation history.

 

Securely identifying the user

User and group database

The user and group database is closely connected to the HR management and stores data about (as the name suggests it) users and groups.It provides processes for managing job categories, usernames and group memberships, needs to be updated whenever an employee leaves or a new one starts working at the enterprise and gives all demanded information about a user that wants to access enterprise resources to the beyond-corp system.

Single sign on

The single sign on system only works for specific resources and usually provides a short living token, depending on the trust tier the respective device and user is given.

 

Removing trust from the network

Deployment of an unprivileged network

A core concept of Beyond Corp is the zero trust policy and with it the deployment of an unprivileged network. This network is physically inside the google enterprise and connects managed devices to the internet and limited infrastructure services (DNS, DHCP, NTP, …). Devices which aren’t managed devices or don’t have a sufficient trust level, are instead connected to a guest network. To authenticate to the unprivileged network a 802.1x authentication with several RADIUS servers is used. The authentication is able to dynamically handle authentications instead of a switch with a static configuration. It can also tell the switch to which appropriate network (VLAN) to connect the managed device to.

 

The Access Proxy

All Google enterprise applications are exposed to internal and external clients through the access proxy. This access proxy enforces encryption between client and application and servers as a reverse proxy. It has to be configured for each application which is exposed through it. The configuration contains rules which are used to determine if a device is provided with access to the application. The access proxy also contains common features like global reachability, load balancing, access control, application health checks, and DDOS protection.

 

Inventory based access control

Trust Inference for devices and users

As mentioned before Beyond Corp works with a so called “trust level” or “trust tier”. This is the level of access a device or user is given and is dynamically determined through monitoring and interrogating of multiple data sources from the databases (device inventory and user and group database) and the managed device certificate. The level of trust given to a user or device can change over time and is determined by the “Trust Inference” which can also use data like the access location of the device and its patch level.

Pipeline into the Access-Control Engine

The Pipeline component provides the data for the access control engine decision making. It accumulates data like the trust tier of user and device, and inventory details about the user, his group and the device he does the request with. It also holds a certificate whitelist.

The pipeline divides the data it receives into two categories:

Observed Data (programmatically generated):

Observed data is the data the Beyond Corp system generates when a device requests a resource behind the access proxy. It is data that periodically changes like the user operating system (its version), installed software, the last time a security scan was performed and what its result was.

Prescribed Data (manually maintained by IT Operations)

Prescribed data is the data the Beyond Corp system could access at any time because it is stored in the databases. This data is maintained by the IT department. Prescribed data is for exampled the assigned owner of a device, users and groups allowed to access a device and explicit access to particular VLANs in the unprivileged network.

 

If there are differences in the data, the system tries to merge the records for the device, or create a new one if it hasn’t any records already. Since merging records could mean, that there is new data about the device available, the trust inference has to evaluate the trust level again. After the information is evaluated, overrides like whitelists are applied and the data is passed to the access control engine.

Access Control Engine

The Access Control Engine is  usually physically located in the access proxy and decides if a device receives access to the requested resource or not. The decision made is based on the trust tier, which the trust inference defined, the configuration of the access proxy for the respective application, which defines the rules that needs to be fulfilled to receive access and data from the databases and certificate, which need to match the requirements of the access proxy configuration. It is also possible to give partial access to an application (like only show data lists and hide more critical things like search boxes).

 

Challenges

The challenges of migrating from a intranet enterprise network to the Beyond Corp system and using it are different in origin and effect. To start off: the migration from an in years developed and grown intranet to beyond corp is most likely a immense challenge in itself. Furthermore, incorrect data about devices might enter the network and have to be filtered out by putting effort in maintaining the device inventory database.

Another challenge would be if device records get corrupted because components withing the device change (like motherboards), which have then to be resolved yet again by a maintaining instances. Since the Beyond Corp system is in charge of a smart and real time decision making system, that decides if a device can access the required resource or not, it is important to have a disaster recovery, so that in any case it is possible to regain control about the system.

Another challenge of the Beyond Corp system is to integrate a  user friendly but yet secure user experience.

 

Advantages

There are several advantages the Beyond Corp system has over classic intranet systems. It is way harder for anyone to be able to do identity phishing with the beyond corp system because the authentication process is tied to the devices, so it isn’t enough to phish a username/password combination, you’d also have to get a managed device with a valid certificate that isn’t marked as stolen in the system.

Since the Beyond Corp system is accessible via internet there is no more VPN for i.e. home office needed, reducing the effort for the IT department which doesn’t have to set up any more VPN connections to the intranet.

Several other advantages evolve around the fact, that the Beyond Corp system does real-time trust evaluation, intelligent decision making and enforces security controls for user and devices. The last advantage I want to mention is that since all enterprise application are exposed via the access proxy, it is a single point that has to be focused on securing. Of course this might also be a disadvantage since any mistakes made in the configuration of the access proxy might affect all applications.

 

Conclusion

Beyond Corp has many advantages compared to the classic intranet approach and is, in a time of excessive usage of mobile devices and home office, the right step going into the future. I can see many enterprises never migrating to BeyondCorp because of old legacy applications that cannot really be imported into the new system. The migration is an immense task, that is most likely harder with a bigger enterprise. That aside, Beyond Corp solves many of the intranet’s security and usability problems and isn’t restricted to a physical space, which eases the remote access because you no longer need to set up a VPN access.

 

Possible Research Questions

How to secure the access proxy? / Access proxy as most vulnerable part of Beyond Corp?

As spoken about in the advantage section, the access proxy is the core to secure in the Beyond Corp System. So how can we secure it correctly? What should we focus on when setting it up, and what are things you might not think about, but could turn out to be crucial. Furthermore, how can you assure that the prescribed data (like the whitelist) hasn’t been corrupted at some point?

How to hack Beyond Corp? / Social engineering vs. Beyond Corp?

It is a fact, that no system (with everyday usage by persons) is perfectly secure and cannot be hacked if you have enough skill and knowledge about the system. So how could you approach to hack a Beyond Corp system? Would it be enough to convince someone with your social engineering skills to hand over his or her device without flagging it as stolen for a longer period of time? Where is the most valuable enterprise data you want to access and what are the steps to get there?

How does Beyond Corp work with IoT?

Since all devices accessing the enterprise applications have to provide a managed certificate, how could the Internet of things work with a Beyond Corp system? Many devices of the IoT aren’t capable of providing a certificate so would it be a correct approach to just whitelist all of those devices? How would that impact enterprise security?

 

Sources

If you are interested in informing you further about Beyond Corp, have a read at some of the sources. The Google documentation are well written and quite easy to understand and give a good insight into their approach to it. The other sources I used are mostly blogs which describe the subject from a different perspective and help to not get lost in Google’s shiny world of their own technology.

Google Sources:

Beyond Corp: The Access Proxy

Beyond Corp – A New Approach To Enterprise Security

Beyond Corp: Design To Deployment At Google

Migrating To Beyond Corp: Maintaining Productivity While Improving Security

Beyond Corp: The User Experience

Beyond Corp

 

Other Sources

thinkst thoughts – Farseeing: a look at Beyond Corp

The Newstack – Beyond Corp: How Google ditched VPNs for remote employee access

DZone – Fundamentals of the Beyond Corp ‘Zero Trust’ Security Framework

Disaster prevention in Germany

This article has two main topics: First a short overview about the organization of the disaster prevention in Germany. The second part of this article is about the consequences of a national blackout and how to prepare for this disaster.

1. Definitions

Today the word „disaster” is often used in our live. Used to describe the latest failed exam or the result of the German football team at the last world cup. But in the view of e.g. a member of the German Red Cross a disaster is something different:

1.1 Disaster

The World Health Organisation [1] defines a disaster as follows:

Situation or event, which overwhelms local capacity, necessitating a request to national or international level for external assistance (CRED).

A disaster is usually classified into one of the following three categories:

Natural disaster

Always when extreme weather events lead to a disaster, that’s called natural disaster. Examples for a natural disaster would be a flood, an earthquake or extreme heat with following drought.

Technical disaster

If a technical error leads to a disaster. Examples are all kinds of accidents from trains to a nuclear power plant.

Man-Made disaster

When a disaster is caused by an intentionally or unintentionally action, it is called a man-made disaster e.g. a terroristic attack or a wildfire caused by a dumped cigarette.

1.2 Disaster prevention

In the German law a disaster is a situation, which the normal rescue forces can’t handle anymore, e.g. because of a high number of injured. In this case specials laws start to apply, and the responsible disaster authority now coordinates the rescue actions.

In Germany every community can declare a disaster to start this process. If more than one community is affected the next disaster instance takes over command, e.g. the government of the federal states the affected community belongs to. The German government takes command if disaster threats the hole country, or if the disaster prevention needs to be coordinated with other countries.

2. Organization

Germany has special structures to deal with disasters.

2.1 Civil protection

Disaster prevention isn’t organized centrally in Germany. Caused by the federal structure of Germany every federal state in Germany has its own disaster prevention laws and forces. Although in special situations the German government is responsible.

Federal states German government
Fire protection

Disaster threating only the specific federal state

Police

Emergency care/supply (on federal levels)

Civil protection

Military defence

Disaster threating several states

Federal police

Emergency care/supply (in case of international events (war, pandemic, etc.)

2.2 Involved Organisations

A lot of different organisations are part of the disaster prevention. Most members of these organizations are volunteers. That is quite special since no other country in Europe has that much volunteers involved in their disaster prevention strategy. Another particularity is, that there are several organizations, which sometimes have the same tasks. Followed by this, in case of a disastrous situation the coordination becomes more difficult, because there is not just one organisation for all tasks, but several organisations splitting the responsibilities. On the other side these different organizations are common with their local areas.

Some of the involved organizations are:

  • German Red Cross
  • Johannites
  • Malteser

And there is a third point, which is remarkable: Most of this organizations are registered association, so they are partially funded by donations and are forbidden from generating economic profit.

2.3 Critical Infrastructure

Besides the disaster authorities and these aid organisations there is another important member of the German disaster prevention concept: The providers of critical infrastructure like hospitals, water supply, power plants but also gas stations. All these facilities are important for the life of the people and need to function in case of a disaster. Because of this, the providers of such facilities have strict official instructions to secure their infrastructure and to prepare to run autonomous over a certain period.

3. Blackout

In the opinion of a lot of experts a national blackout would be the worst-case scenario for Germany. The consequences would hit the people hard, especially the financially least secured. Because of the lack of energy, the following problems will occur and could become quite dangerous:

No communication

Without electricity the digital telephones couldn’t work, as well as the technique to root a call. Smartphones have batteries, but they will also lose their power after a certain time. The only possibility for an authority would be to inform the people over radio, but then one with an external power source (batterie, solar, etc.).

No public transport

Also, the public transport like trains need electricity to run. The driver of cars will also get in trouble because the traffic lights for example also run with electricity. And only a few number of gas stations are prepared to run self-sufficient without electricity. So, the people will run out of petrol as well.

No water

The public water system also need electricity to pump the water into the households.

No food

Only a small number of households in Germany have stored enough durable foods at home, to sustain themselves over a longer time. Without electricity it is impossible to cool food, so a lot of things will become uneatable after a short time.

No medical care

Hospitals are facilities which are prepared to handle a period without electricity, but their measures also don’t work over a longer time. And hospitals will get in trouble without water, like mentioned before. Trash could also become a problem, because someone needs to carry it away. Otherwise hospitals will run in big hygienically problems.

Hard to bring electricity back

It could last weeks to bring electricity back to every citizen in Germany. The problem is, that most power plants need electricity to start. Only a view number of them have the possibility to start themselves. With these power plants, small “energy-islands” would be created to bring the big power plants back to life. Because of the complexity of the German electricity-system this process could last a long time.

So, the consequences of a blackout would be horrible for the people in Germany. But the biggest problem is, that nobody is prepared for the scenario. A study of the Allianz assurance company shows, that most people just trust in their local government and don’t make any private preparations to get more independent from the rescue force. These people forget, that the rescue forces also need electricity to work over a longer period, so after a certain time also the rescue forces become victims of the blackout.

4. Self-Help

In 2010 the German government released new laws and recommendations for the German people. In these laws there were also some hints how everybody can prepare himself better for upcoming disasters. The goal was to become independent from the public infrastructure for two weeks, until help come.

4.1 Food

Everybody should store the following supplies:

Food Amount
Water 28 liter
Corn, bread, potatoes, noodles, rice 4.9 kg
Vegetables, legumes 5.6 kg
Fruits, nuts 3.6 kg
Milk, milk products 3.7 kg
Fish, meat, eggs 2.1 kg
Fat, oil 0.5 kg

4.2 Equipment

Besides supplies the following things everybody should have to be prepared for a disaster:

  • Radio (battery driven)
  • Important medicine
  • Hygiene products
  • Candles, flashlight
  • Gas cooker
  • Important documents
  • Possibility to heat
  • Extinguisher
  • Respiratory protection

4.3 Knowledge

To be prepared with food and equipment is quite important, but also knowledge about the correct behaviour in a disaster-situation could save life:

  • First-Aid
  • Behaviour in an extreme-situation
  • Basic survival skills

5. Discussion

What are the consequences for the disaster prevention, if the number of volunteers will continue to decline?

Like mentioned before a huge part of the disaster preventions in Germany is done by volunteers. In the last years the number of people, who are willing to spend their free time in an association in Germany has sunken. But without these people is it not possible to provide the protection of the people in Germany in case of a disaster. So new possibilities must be found to motivate people to participate in the disaster prevention in Germany or the number of paid people in the rescue organizations must be arise. But who will pay for this?

How to sensitize people about possible disasters or self-help in case of a disaster?

A study of the Allianz shows, that most people in Germany don’t care about disasters respectively they trust in the German government to protect them from the consequences of a disaster. The reasons for this behaviour are various. Maybe it is the fact, that most people just know disaster from their TV. But even if the disaster prevention is working, the rescue forces need some time to reach everybody and in this time every citizen must survive alone. So, referring to the chapter “Self-Help” people should have a small number of water and food and other equipment to supply themselves. But like mention in the question, most people don’t prepare themselves for this situation.

Is the disaster-prevention-organization in Germany ready for a big disaster?

The organisation of disaster prevention in Germany is quite complex, so it is questionable, if this organization with 16 federal states really is prepared for a big disaster. The Allianz assurance company has the opinion, that there is a fix number of injured people, which the system can still handle. But this is just a result of a study. Luckily there have never been such big disaster in Germany to check the system.

Is the disaster prevention still working while in a lot of important facilities like hospitals people lost their jobs to save money?

Besides the mostly volunteer operation rescue organizations there are other facilities in Germy like hospitals, which also have problems with small numbers of staff. To save money hospitals for example only run with the necessary number of people especially in the night and on the weekend. This situation offers the question if these hospitals can handle a suddenly huge number of injured people in case of a disaster.

Is the privatization harmful for the supply of the people in a disaster-case?

To earn some money several cities and communities sell their critical infrastructure to private companies. These companies want to earn money with this infrastructure and so they are also interested in reducing the cost to run it. This could lead to a reducing of the operational capability of this infrastructure. Indeed, there are rules and laws, which the provider of critical infrastructure has to follow, but is this enough?

Is the digitalisation and the associated centralisation a threat for the disaster prevention in Germany?

To optimize the work in some infrastructure in Germany, provider starts to digitalize and centralize these infrastructures. This leads to a higher risk of hacking attacks, because digitalization often is associated with the connection to the internet. Digitalization often also means centralization, which leads to several knot points in an infrastructure. Meaning these points are a worthier target for attacks and are also more endangered in case of an accident or a human error.

Sources

[1] http://www.who.int/hac/about/definitions/en/ (4.07.18)

[2] Katastrophenschutz auf dem Prüfstand (German Source): https://www.bbk.bund.de/SharedDocs/Downloads/BBK/DE/FIS/DownloadsInformationsangebote/DownloadsKritischeInfrastrukturen/DownloadsProjekte/Katastrophenschutz%20Studie.pdf?__blob=publicationFile (4.07.18)

[3] Experten: Flächendeckender Stromausfall wäre nationale Katastrophe mit vielen Toten (German Source): https://www.focus.de/politik/deutschland/zivilverteidigungskonzept-experten-flaechendeckender-stromausfall-waere-nationale-katastrophe-mit-vielen-toten_id_5856252.html (07.07.18)

[4] So schlecht ist unser Stromnetz vorbereitet (German Source): https://www.focus.de/immobilien/energiesparen/regenerative_energie/blackout-so-schlecht-ist-..unser-stromnetz-vorbereitet_id_7821522.html (07.07.18)

[5] Katastrophenmanagement: Katastrophenschutz in Deutschland (German Source): https://www.youtube.com/watch?v=glrX-t79gm8 (07.07.18)

[6] Blackout – Deutschland ohne Strom (German Source): https://www.zdf.de/dokumentation/zdfinfo-doku/blackout-deutschland-ohne-strom-108.html (23.07.18)

Server “less” Computing vs. Security

The term ‘serverless’ suggests systems with no back-end or that no servers are used. This terminology is very misleading, because serverless architecture certainly includes a back-end. The difference is that the users or programmers who are supposed to develop an application no longer have to deal with the servers.

Serverless computing is a cloud computing system. There are different levels of cloud computing. The highest level would be serverless computing, also called Function as a Service (FaaS). In FaaS, everything is considered to be below the business logic. This includes server, network, database, possibly virtualization levels, operating system, runtime environment, data and also the application. Only the business logic with the functions has to be implemented by the user. The difference to the traditional computing systems is that the users or the programmers who are supposed to develop an application no longer have to deal with the servers. Neither does it matter to them what is happening on the lower OS levels or how the servers are managed or protected. Likewise, the user does not have to worry about particular aspects such as scalability and questions about specific hardware or used middleware services. It applies the underlying environment.

The typical structure of a serverless computing architecture is shown in the diagram below:

There is a function that, because of an event, is started in a specific context and calls business logic or other back-end services. Its result is then returned asynchronously. The functions are called either synchronously via the classic request / response model or asynchronously via events. In order to avoid a close coupling of the individual functions and to optimize the resource requirement at runtime, the asynchronous variant should be preferred.

 

How does it help security?

1. Management of OS patches is not necessary

Using FaaS, the underlying platform handles the server for the user, so relieving its deployment, management and monitoring.

FaaS also assumes responsibility for “patching” these servers, which means updating the operating system and its dependencies to secure versions when they are affected by newly disclosed security vulnerabilities. Known vulnerabilities in unpatched servers and applications are the main cause of system exploitation.

‘Serverless’ therefore shifts the risk of the unpatched server from the user to the “professionals” who operate the platform.

2. Short-lived servers are less at risk of foreign intervention

An important limitation is that the serverless computing features are stateless. In an FaaS environment, the user does not know and does not have to worry about which server is responsible for performing a function. The platform provides servers and disables them as they think best.

Serverless does not give the attackers the luxury of time. By repeatedly resetting the machine, the attacker must always make compromises and each time face the risk of failure or threat. Stateless and short-lived systems, including all FaaS functions, are therefore inherently less at risk of external intervention.

3. Denial-of-Service resistance through extreme elasticity

FaaS provides functions immediately and seamlessly. This automated setup leads to extreme elasticity, so no servers need to be run.

This scalability also protects from external manipulation because attackers often try to clean up systems by submitting a large amount of compute- or memory-intensive operations which maximise server capacity and stop legitimate users from using the application.

More enquiries – whether good or bad – would lead the platform to provide more ad hoc servers, which would then ultimately try to prevent the problem of denial of service / server overload.

 

How does it hurt security?

1. Stronger dependence on external services

Serverless apps are practically never created only on FaaS. They are usually based on a network of services connected by events and data. While some of them are their own functions, many are operated by others. In fact, the reduced size and statelessness of functions is leading to a significant increase in the use of third party services, both cloud platforms and external services.

Any third-party service is a potential compromise point. These services receive and deliver data, influence workflows, and provide extensive and complex input into our system. If such a service proves to be malicious, it can often cause significant damage.

2. Each function expands the attack surface

While functions are technically independent, most are called in a sequence. As a result, many functions begin by assuming that another function is performed before them, and in some way clean up the data. In other words, functions begin to trust their input, believing that it comes from a trusted source.

This approach makes security extremely vulnerable. Firstly, these functions can be called up directly by an attacker. Secondly, because the function can be added to a new flow later, which does not clean up the input. And thirdly, because an attacker can invade one of the other functions and then have easy and direct access to a poorly defended peer.

3. Simple deployment leads to an explosion of functions

The designation of functions is very simple. It is automated and costs nothing, as long as the function is not heavily used. At such a low cost, we do not ask where we should use it, but rather, why shouldn’t we use it? As a result, many functions are inserted, even if many of them are rarely used. Finally, the implemented functions are very hard to remove because you never know what depends on their existence.

With excessive privileges that are similarly difficult to reduce, there is an explosion of hard-to-remove, overly powerful functions that ultimately provide the attackers with a rich and ever-growing attack surface.

 

Summary

Like any system, serverless computing has its strengths and weaknesses. On the one hand, it facilitates the work of the user through the automatic scaling, extreme elasticity and automatic management of the OS patches. The user does not have to worry about the server, which is very helpful for him.

But like any other system, serverless computing has some security holes that can lead to low to high damage. Above all, the server is very vulnerable to external interference, as there is a large attack surface.

The user should therefore be aware that the security issues are not necessarily covered by the use of serverless computing. Serverless computing is certainly not suitable for high individual monitoring and individual control of the server. But for ease of control and management of the server, this is an optimal solution. Only the user should know that serverless computing is not necessarily the safest system.

 

Research Questions

  1. How high is the elasticity?
    1. Does it really have endlessly expandable capacity?
  2. Are data at risk during transmission?
    1. Can they be safely protected?
  3. Are granular permissions even being managed for hundreds or thousands of functions?
    1. Is it feasible?
  4. Are users at all concerned with security issues when FaaS automatically handles server-level security concerns?

Web Performance Optimization for Continuous Deployment – Move fast and don’t lose performance

The performance of websites today is a decisive factor in how many users visit them and thus how much money can be earned from them. The impact of this fact is further enhanced by the widespread use of mobile devices and the speed of the mobile Internet.
To counteract the development of heavyweight websites, web performance optimizations should be integrated into the development process as early as possible.
As part of this blog post I want to address this topic in the context of Continuous Deployment using the following sections.

Motivation


To avoid starting a continuous deployment environment from scratch, I used a previous project as a basis.
The Debts² project is a distributed application used to jointly manage expenses in groups. The application consists of the following three components.

  • A backend server
  • A native Android app
  • A simple admin frontend

The backend was hosted on a server of the HdM. Using this server, both the admin frontend and the Jenkins could be hosted as a continuous deployment environment. Within this environment the native Android app was also built. Additional resources related to debts² can be found here for further information.
So, the Jenkins and the admin frontend created the starting point for the integration of Web performance optimizations in a continuous deployment environment.
To enable a before-and-after comparison, a snapshot of the admin frontend and the Jenkins pipeline was taken first. In order to measure the performance of the admin frontend a lighthouse report was generated. The following figures represent the initial state of the Jenkins pipeline and the performance of the admin frontend.

Initial pipeline snapshot
Figure 1: Initial pipeline snapshot
Initial lighthouse report
Figure 2: Initial lighthouse report

Based on this initial status survey, I have formulated the following goals.

  • Extend the Jenkins pipeline to automatically measure the performance of the admin frontend and modify its pipeline status accordingly
  • Make use of the measurements to optimize the performance of the admin frontend

In the context of this blog entry, the former is the main goal. The second serves to illustrate what the benefits of the first goal could look like, where many websites still have optimization potential, and what actions are necessary for this.

Implementation


All source code is available at GitHub. A presentation on this topic is available here. Code concerning this blog post can be found in the following subdirectories of the mono repository.

The following course of action is based on my previously depicted self-defined goals.

Goal 1: Extend the Jenkins pipeline

In order to achieve the first goal, I decided to integrate a lighthouse report generation into the pipeline using Docker and Docker Compose.
After some research I found multiple possible options to proceed.

  1. Making use of this web service to generate the lighthouse report while providing the URL to be tested as a URL parameter.
  2. Running a Docker container on the Jenkins server to generate the report locally.
  3. Running an own web service using another server.

Due to the minimal effort of option one, I decided to try the second one and declare the first one to my fallback solution. Until I arrived at a working solution, I had to go through some attempts.
First, I wanted to make use of a preexisting Docker image. After trying several images without success, I was able to run the image lighthouse-ci on my local system. Next, it should run on the Jenkins server. Unfortunately, the image could not be executed on the server due to missing UI. The container couldn’t also be executed using its settings for the headless mode of google chrome.
As a further try, I installed lighthouse and chrome on the server directly without Docker to reduce complexity. However, this attempt failed because lighthouse waited for chrome on a specific port, although it had already started in headless mode. In hindsight, I have to say that at this point the information about starting chrome with remote debugging would have saved a lot of effort.
For the last attempt I organized a remote server with UI in order to have all prerequisites to make it work.
Unfortunately, when starting google chrome I received a misleading error regarding no display connection being established. To overcome this difficulty, I took a closer look at the google chrome headless mode and how it interacts with lighthouse. I learned that chrome must be started in headless mode using remote debugging on a specific port in order to work with lighthouse.
Based on this insight I was able to create a local solution first. Subsequently, I managed to build an own Docker image based on the chrome-headless-trunk by installing Node.js and the lighthouse npm package manually. The resulting Dockerfile is accessible here.
Initially, I thought that the generation of the lighthouse report would be integrated into the pipeline via a separate build stage at the Jenkinsfile. By using this Dockerfile in combination with Docker Compose health checks, I was able to line up the build and run of my custom Docker lighthouse container into the execution order of the build and run of the actual distributed application. The resulting output contains only a few lines of code at the Docker Compose file. Additionally, no separate build stage is required.
Next, to put the lighthouse report at disposal at Jenkins I used the Jenkins HTML Publisher Plugin to publish the HTML version of the lighthouse report.
Therefore I created a new stage at the Jenkinsfile.
To ensure the performance of the website under load, an additional stage was appended. By the means of Taurus, the Automation-friendly framework for Continuous Testing, load tests can be specified in a declarative manner by using a YAML-file. Simply put, the file consists of the following five sections.

  • Scenarios
  • Execution
  • Reporting
  • Provisioning
  • Modules

The section scenarios depicts actual HTTP requests to be executed.
The section execution describes load test metrics such as a concurrency, locations, scenario, ramp-up and hold-for time. The subsection scenario links one scenario to be realized.
In the section reporting consists of modules aggregating the results of the executors and feeding them into a report. By making use of the module passfail rules for certain metrics like the average response or latency time can be defined to either let the test succeed or fail.
To perform the load tests in the cloud, the provisioning section must be set correctly. By default Taurus uses local provisioning.
BlazeMeter’s free plan allows one location only for cloud testing. So, be sure to set only one location, when enabling cloud provisioning.
In the module section, you can provide the credentials and further settings to connect to BlazeMeter or other Cloud Testing platforms. In addition, data worth protecting can be defined at the module section of the .bzt-rc file of the current user. A more comprehensive breakdown regarding the YAML-file can be found here

In conjunction with Taurus I harnessed the Testing platform BlazeMeter for load test execution. In order to connect the Jenkins server to BlazeMeter the Taurus command line tool bzt has to be installed on the Jenkins host machine. Make sure to install the version 1.12.1 to avoid the AssertionError: monitoring bug for cloud tests. A well structured tutorial containing detailed Information about the installation is available here. Next, the Taurus command line tool bzt has to connect to BlazeMeter. Therefore an API key and API secret has to be generated at BlazeMeter’s account settings. To avoid exposing the credentials, it’s recommended to write these into the .bzt-rc file at the home directory of the Jenkins user. Afterwards it’s ready for use in the Jenkinsfile. Invoking a report to be accessed at BlazeMeter the option ‘report’ has to be applied. To better distinguish the load tests, the Jenkins build number can also be integrated into the test name. During each build process, a link to the newly created load test report on BlazeMeter is now displayed in the console. In the following picture an overview of a sample report of a cloud test is depicted.

Overview Cloud Test on BlazeMeter
Figure 3: Overview Cloud Test on BlazeMeter

For the purpose of influencing the pipeline status according to the results of the lighthouse report, I initially created a script section in the Jenkinsfile. Using the JSON version of the lighthouse report certain values could be extracted. Analog to the Taurus module passfail certain rules could be formulated by the means of these values. Depending on whether these rules apply or not, the pipeline status has been set. Although this solution worked well, the Jenkinsfile quickly became confusing because declarative code was mixed with functional code.

To counteract this problem, I decided to develop my own Jenkins plugin for it. The starting point for me was this article in the Jenkins Wiki. Additionally, this link was especially helpful for the implementation of the pipeline support. Instead of using the empty-plugin archetype, I used the hello-world-plugin archetype to better understand the structure of it.
The goal of the Jenkins plugin is very straightforward. As an input it receives a JSON lighthouse report, a path to a nested property inside the JSON file, a limiting value, a type of comparison and a pipeline status. In this way, the nested value can be found in the JSON file and compared with the limiting value. When the expected comparison is met, the pipeline status is set to success, otherwise the predefined pipeline status is set. To ensure correct execution, I have defined some unit tests. The most challenging part was the realization of the recursive descent to the nested property using its path. A working example is depicted in the following code snippet.

step([$class: 'LighthousePlugin',
filepath: 'report.json',
// Performance score
path: 'reportCategories/Array/0/Object/score/Double',
action: 'lt',
value: '60',
failStatus: 'UNSTABLE'])

For my initial concept I wanted to pass an array of input data, so multiple rules could be checked in sequence. However, due to little documentation I wasn’t able integrate an extendable list into the UI Jelly. Therefore, I simplified the concept to only validate one rule every plugin call. The plugin can also be integrated into Free Style projects. It is configured as a build step and may be executed. The following figure illustrates the previous configuration with the pipeline as a build step using UI.

Ligthouse plugin UI configuration
Figure 4: Ligthouse plugin UI configuration

The possible values for the data fields are comprehensible and can be read as well as the whole plugin source code more precisely here.

Goal 2: Optimize the performance

Now that my first goal was achieved, I could focus on optimizing the admin frontend. Based on the results of the initial lighthouse report a number of things were in need of improvement. Now I have listed a subset of the most important optimizations for me in the following.

  1. Image and video compression
  2. Gzip compression
  3. Uglify/Minify source files
  4. Unused CSS
  5. Critical CSS path
  6. Cache Control
  7. SSL certificates

Due to the simple use case of the admin frontend image and video content is not available. Therefore techniques for image and video compression could not be applied.
On the other hand the gzip compression at the nginx could be activated. For this purpose, a new file was simply created in the configuration of nginx, which sets the compression for the different MIME types.
Next, minifying or uglifying the source code files was done via webpack. Unfortunately, an older version and a lot of plugins were used. Therefore, this task, which previously seemed so simple, became more difficult than expected. In order to minify HTML-files the minify property of the HtmlWebpackPlugin had to be set. Minifying CSS-files was configured by the use of a style-loader. JS-file uglification also required the use of an additional plugin called UglifyjsWebpackPlugin.
Removing unused CSS can be very performance-enhancing, but also very dangerous, as initially invisible code can be removed, especially in single page applications. To remove unnecessary CSS automatically, there are several free websites like jitbit.
It requires only an URL as input and starts to analyze the given website. The result is a list of unused CSS selectors. However, after removing these selectors you should check manually if the styles of the sub pages of your single page application are still functional. Of course, this is not a viable solution for larger scaled websites, as the risk of accidentally removing code that is needed later is too high. In such cases, it is best to eliminate unnecessary code during the development, e.g. through code reviews, or possibly fall back on more high-quality and possibly proprietary tools.
In order to obtain a rapid first meaningful paint a critical CSS path is crucial. Online tools like the ones from sitelocity.com or jonassebastianohlsson.com are also available for this purpose. I tried both and noticed that the generated internal stylesheets are identical. In addition, sitelocity.com advises you to load the remaining CSS files asynchronously using JavaScript. This prevents a CSS file in the head tag to be loaded in a blocking manner.
Caching data can be very performance-enhancing, but can also limit the functionality of the website if used incorrectly. Caching should therefore be used carefully. Adding a cache control header is fairly easy with the ngx_http_headers_module module. An expiration date can be assigned to each MIME type or a location by the means of a simple key-value mapping.
Finally, I intended to convert the individual system components to HTTPS in order to comply with today’s minimum security requirements. I used self-signed certificates. Using an intermediate docker container, a new certificate could be generated in the multi-stage build and integrated in nginx, for example. In the backend, the inclusion of a certificate has become tougher due to the fact that the certificate had to be in a certain format as a result of the older Node.js version 6. Unfortunately, after a functioning state was reached, I had to realize that the lighthouse report could no longer be created. The reason for this is the self-signed certificate, whose error messages can usually be ignored via the flag --ignore-certifcate-errors. Unfortunately, this flag has no effect in conjunction with the chrome headless mode, as stated here. This is the reason why the final version of the source code does not include SSL support. Alternatively, this problem could have been overcome with a correctly signed certificate. Due to time constraints, this solution could no longer be pursued.

Results


Since the performance and condition of the pipeline were measured initially, it is now possible to measure again and compare the results.
First, we look at the lighthouse reports. The following figure depicts the final result of the lighthouse report, which is fully available here.

Optimized lighthouse report
Figure 5: Optimized lighthouse report

Compared to the initial report, the optimizations described in the previous section achieved performance improvements of 58 points, 10 points in PWA, and 3 points in Best Practices.
These reports were created on a local computer for better comparability, since if we take a look at the lighthouse report generated by Jenkins, we notice that the performance rating is significantly inferior.
The explanation for this is the performance gap of the computers, especially since the Jenkins server is under greater load due to the execution of the backend, frontend and the database of the application.
To avoid this problem, the generation of the lighthouse report could be swapped out onto a dedicated server so that it produces consistent results.
Nevertheless, I find the increase in performance that I have achieved with the optimizations more than sufficient. Without integrating the performance measurements into the distributed application and thus also into the Continuous Deployment pipeline, I would not have obtained regular feedback to constantly improve it.
Second, a closer look at the Jenkins pipeline. The subsequent image illustrates the final status of the Jenkins pipeline.

Final Jenkins snapshot
Figure 6: Final Jenkins snapshot

In contrast to the initial snapshot, the new build steps stand out instantly. These ensure that both the lighthouse report is generated and the load test is performed, but they also tremendously extend the execution time. The primary cause for this increase of execution time is the load test. Load tests are considered as integration tests and therefore should not be executed every build. To abbreviate the time of execution, the load test could be conducted less frequently depending on a conditional manually set.
For each build that executes a load test, a new record is created in a performance trend via the BlazeMeter Jenkins plugin, allowing you to view the performance curve of the load tests during the development process. The following figure displays the performance trend of my Jenkins server as an instance.

Jenkins BlazeMeter Performance Trend
Figure 7: Jenkins BlazeMeter Performance Trend

On the other hand, the lighthouse report generation is lightweight and has a rapid execution time. By means of configuration, the scope of the audits can be limited, so that the generation proceeds even faster.
The lighthouse plugin I created is extremely compact and small. Since it is based for the most part on the hello-world-plugin archetype, it probably still contains a few unnecessary files. So it could be possible to refactor them. Additional optimization opportunities would be to support arrays of rules. This way, the JSON report file would have to be parsed only once. More detailed information on the Jenkins pipeline you will find here.

Conclusion


So far, we have focused on the incentives, realization and outcome of Web Performance Optimization for Continuous Deployment environments using a concrete scenario. I would like to conclude this post by identifying the challenges that will most likely be encountered if you are also aiming for similar goals and highlighting opportunities to address them.

The integration of the lighthouse report into a Docker container or on my host machine of the Jenkins server proved to be complicated due to missing GPU and UI. Especially as there was less information to obtain about the virtualization software of the virtual machine and thus possible conflicts for lighthouse’s requirements could be excluded. Therefore, my fundamental recommendation is to familiarize yourself with the target environment and the needs of lighthouse or comparable software. If possible, keep the target environment as lean as possible so that if errors occur, they are easy to reproduce and fewer conflicts may arise with other software. Through this learning I have dealt more closely with the individual components of lighthouse and have gradually worked out an individual solution. Although this took longer, I got a more comprehensive understanding of how lighthouse works compared to just using a Docker container.

During the development of the Jenkins plugin, the biggest challenge was to get an overview of the individual units and their interaction. Especially the special use case that the plugin is supposed to be compatible with the pipeline plugin made it harder to search for samples and documentation. Also the documentation for the Jelly UI components regarding the data binding was not intuitive, which is why I could not realize the support for arrays as rules by now.
As a tip I can only refer to the links I mentioned before. There you will find a detailed description of the development of a Jenkins plugin with pipeline support.

While optimizing the admin frontend the work flow with webpack was challenging. Especially as an older version as well as many dependencies were leveraged, I was able to obtain an overview more difficult. In addition, I have noticed that certain dependencies offer identical functions, but these occasionally conflict. As a result, it took me significantly more time to integrate the optimizations. To avoid this problem in the future, it would be advisable to use a current version and to minimize the number of dependencies in advance during development.

After all, the automation of web performance measurements is an extensive and demanding task. I am aware that Debts² was not an ideal starting point, since the distributed application already existed before the pipeline was enhanced with the web performance measurements.
Nevertheless, I was able to point out that it is also worthwhile to establish this in more advanced applications.
What I find particularly fascinating is the idea that by using a solution described in this post in software projects, non-functional requirements regarding web performance may directly be stored in the pipeline in the form of rules. Thus these are constantly validated and displayed to the development team.

Finally, I would like to emphasize that the measurement of web performance as part of a continuous deployment system itself only points to grievances or best practices. The value of such a solution depends heavily on the resulting business value, commitment and acceptance of the development team.

Quantum and Post-Quantum Cryptography

BB84 Protocol key generation

In a world where political activists and dissidents get persecuted by authoritarian governments, strong cryptography is more necessary than ever. But the general public benefits from it as well. Identity theft, banking fraud and cyber bullying can happen to anybody. The most effective protection is to not make sensitive material available to anybody. Unfortunately some people have an “I have nothing to hide” mentality. But would you post your opened mail to your garden fence? Just because most people are not doing illegal activities, some information is better kept private to stay safe from the aforementioned crimes.

Continue reading