GDPR and Information Security: A practical guide for Startups

Let me start with a story. My first contact with GDPR (general data protection regulation) and the topic of information security was during my bachelor throughout an app project. We had set ourselves the goal of uploading the app to Google Play Store by the end of the semester and were thus inevitably confronted with the data protection and privacy topic, which was still relatively fresh at the time.
Since we had no previous experience and background knowledge in this area, we were rather intimidated by the available information and very vague wording in correlation with GDPR. The intrinsic desire to take care of personal and sensitive data was rather absent and overshadowed by the fear of doing something wrong and experiencing legal consequences. When we turned to professors and lawyers at the university, who were (in theory) responsible for the topic of GDPR and information security, the responses were comparable to the game “hot potato”. Everyone we approached tossed the hot potato (aka GDPR) to the next person by saying something along the lines of “Ah yes, I think Mr. X would be more suitable for that”. In the end, we kind of patched together a data privacy declaration and implemented suitable protective measures, which was okay for the time being, but not particularly good and worthwhile. Overall, it left a rather unsatisfactory feeling and aftertaste.

The combination of founding my own Startup and attending the lecture “Secure Systems” during my masters made me rekindle with that topic again and I decided to take matters into my own hand and shine a new light on this rather unattractive and dry, but also very important and meaningful subject.

Therefore – with this blog entry – I’m hoping to provide you with a more practical and satisfactory approach to information security and GDPR. I will answer questions like “Why should I even strive for GDPR compliance or security in general?” and “What can I – as a programmer – do to achieve information security?”. Furthermore I will explain terms like Privacy By Design, Privacy By Default and Security By Design. This guide is addressed to all those who want to gain a better understanding of this topic in general, as well as start-ups, smaller companies or freelancers who are looking for specific information to implement this topic in their own applications with a focus on inexpensive but effective measures. However, it should not be considered as a complete and sufficient solution for information security.

My forecast for the future is: In the future we will talk much more about information security instead of talking about IT security and data protection / GDPR separately!
Eric Weis (CISO and auditor of ISO/IEC ISO27001)

With this fitting quote in the back of our minds, let’s dive right into it. 🤿

Why should you even strive for GDPR compliance?

Depending on the severity of the violation, fines of up to €10 million or 2% of the total annual turnover of the previous business year, or respective €20 million or 4% for the higher severity level, may be imposed if your organisation violates data privacy guidelines. The respective frame that is chosen is the one which is higher (GDPR Article 83, section 4 and 5).
For example, Google (Sweden) was fined €7 million in March 2020 for failing to remove personal information from various individuals who had requested exclusion from Google search results.
An Italian telephone and network operator (TIM SpA) was hit even harder, being fined €27 million in January. The reason for this was several legal violations in marketing and advertising campaigns. Unsolicited calls were made, people were entered into competitions without consent and in one case, a person was called 155 times after requesting exclusion from calls.
Even in law-abiding Germany there was a high penalty in December 2019. 1&1 Telekom was fined €9 million because anyone could get complete access to a person’s data as long as they simply knew that person’s date of birth and name.

The vagueness and individuality of GDPR

Upon reading statements and guidelines of the GDPR, such as the following excerpt of Article 32, which targets the Security of processing, one often has more questions and uncertainties than before.

Art. 32 (1): „Taking into account the state of the art, the costs of implementation and the nature, scope, context and purposes of processing as well as the risk of varying likelihood and severity for the rights and freedoms of natural persons, the controller and the processor shall implement appropriate technical and organizational measures to ensure a level of security appropriate to the risk…”.

All wordings, components and safety measures that vary according to context and organization, are color highlighted. Only once you start breaking down the separate parts and enrich them with background knowledge from information security, a greater picture starts to form. With that being said, here’s my breakdown of the separate parts:

“…the state of the art…”

Mainly refers to technology. It makes sense: the safety and security measures you implement today might be outdated in 3-5 years. Technology and software have a fast pace, which should be reflected and reviewed in your infrastructure and design choices.

“…nature, scope, context and purposes of processing…”

Depending in which field your organization is operating and what kind of sensitive information you’re processing the needed safety measures vary a lot. The how and where of data processing are also really important. Do you dispose of all data completely independently or are third parties involved? Do you process highly sensitive information like racial and ethnic background, political opinions or religious beliefs, health data or information regards the sexual life or orientation of your users.

“…risk of varying likelihood and severity for the rights and freedoms of natural persons…”
“…appropriate technical and organizational measures…”
“…level of security appropriate to the risk…”.

This is basically risk management. If you’re striving for ISO 27001 compliance this is handled by your Risk Treatment Plan (RTP) and your Statement of Applicability (SoA). There are different approaches for application thread modeling. One popular and widespread one is using STRIDE and DREAD.
It’s about analyzing which vulnerabilities or weaknesses in your infrastructure / architecture lead to risks of violating the core pillars of information security. Confidentiality, Integrity and Availability.

Depending on your background knowledge and the field you’re specialized in, even in those more detailed explanations there might be a lot of unknown terms for you. I’ll try to provide you with some basic knowledge in the following paragraphs.

Before taking a closer look at core pillars of GDPR a quick side note about the LFDI: The LFDI is a German authority and roughly translates to “state commissioner for data protection and freedom of information”. It supervises and advises the public authorities of the country on data protection and information security issues. One of the tasks of the LFDI is to impose fines on companies that violate data protection. In June 2020, for example, a fine of €1.2 million was imposed on the AOK, since they handled personal data incorrectly in regard to competitions during the timeframe from 2015 to 2019. Following an administrative fine, the LFDI also works with the organisation to improve the technical and organisational measures.
However, if you’re looking for advice and specific recommendations, one must not have to wait until a fine is imposed on you. Instead, contact with the LFDI can also be proactively sought in order to receive advice and gain valuable insights. As part of my Startup, I did just that and will therefore incorporate advice and insights that have arisen throughout this cooperation. So, if I’ll say something along the lines of “the LFDI recommended using encryption at rest” you’ll know what and who I’m talking about.

The core pillars of GDPR

Hint: This is not an official classification; this is simply how I personally structured GDPR into different sections. It might help you too for forming a better understanding.

🔐 Security

Prevent physical access to (personal) data. Ensure through appropriate infrastructure and technology that only authenticated users have access to data.

Safety measures might include:

no openly accessible databases
no default (admin) users for databases

🤝 Accountability

As an organisation, one must clearly and comprehensively explain how data is processed, for what reason, for what purpose, etc.
An awareness of who is responsible should be created. Am I? My company? A third-party company? Someone else? In general, you should be aware of what happens with the data and have an understanding of the complete flow of data in your system. The importance of a proper sense of responsibility was strongly emphasised by the LFDI’s technical manager.

Accountability also includes that a privacy statement is available, complete and easy to find.

👤 Individual Rights

You should respect and implement the user rights set out in the GDPR. Included is Privacy By Design and Privacy By Default. Ask yourself what the absolute minimum of data is you need in order for your service / product to work. Then try to stick to that. Work as data efficient and minimizing as possible. It’s also essential to only process data for as long as needed and delete it from your system once possible.

Content of the GDPR

Content of the EU General Data Protection Regulation	Articles
General provisions	1-4
Principles	5-11
Rights of the data subject	12-23
Persons responsible for data processing and Third-Party Processors	24-43
Transfer of personal data to third countries or to international organisations	44-50
Independence of supervisory authorities	51-59
Cooperation and coherence	60-76
Remedies, liability and sanctions	77-84
Provisions relating to specific processing situations	85-91
Delegated acts and implementing acts	92-93
Final provisions	94-99

We’ll focus mainly on the developer and technical side of things, which are covered by the highlighted articles (5-50). Some key words and important components are shown in the visual below.

Putting the user at the centre

One cornerstone of the GDPR is that any processing of personal data is forbidden by default – unless the user has explicitly transmitted his consent. The consent of a user requires the clearly recognizable added value of the data processing.
If the user gives his consent, it must be given voluntarily, explicitly and verifiably. According to the LFDI an opt-out or pop-up is not an effective consent! It is essential that the user can revoke this consent at any time and that his right to revoke must be pointed out directly at the time of consent.
An example of the correct use of consent is asking the user for permission to use his e-mail address for sending him newsletters and updates. It’s key to adhere to the coupling prohibition, meaning that non-consent has no significant disadvantage for the user! Consent may only be mandatory if the disclosure of the data is absolutely necessary to provide the service.

To add on the point “Information obligation and transparency”:
Data privacy statements must be worded in a way that minors and persons without legal capacity can understand them.

With the following examples of imposed fines, it should be made clear what one should not do:

*Violated_Right*__	*Description*
Right to data deletion	In October 2019, “Deutsches Wohnen” was sentenced to a €14 million fine for storing data in an archive system that offered no possibility to delete data at all. Their system therefore had confidential information on previous users who have long since stopped using the service.
Right to limitation of processing	Delivery Hero was fined just under €200,000 in Sept 2019 for failing to delete dormant customer information and continuing to send unsolicited marketing emails.
Right to protection of personal data	An insurance company in France was fined €180,000 in July 2019 because confidential data of other customers could be accessed simply by changing the number (user ID) at the end of the URL. The data disclosed included driving licences, registration cards and bank documents.

Privacy By Design

Too many entrepreneurs, in the interest of building the product as quickly as possible, think that security is a “freeze all the code, do an assessment, and write all the policies” project they can do later. It isn’t. Think about security from the very beginning. It’s actually not that hard to anticipate what needs you’ll have to deal with in the future.
Michael Borohovski – Cyber Security Expert

Data protection must be included from the beginning of the design and development of an app. You should NOT develop the app, add functionality, acquire customers and then at some point – possibly when there are already millions of users on the system – realize “Oh, maybe I should take a look at privacy and information security”. This approach has been possible in the past, however since GDPR a bare minimum of information security is required by law.
All in all, it is anyway much easier and more sustainable to develop a safety mindset and culture from the very beginning and then to continuously improve and expand it as you grow.

Risk Assessment has some overlapping points with Security By Design, however since Privacy By Design can also be viewed as Data Protection By Design this overlap is unavoidable and reasonable. This includes the different likelihood of occurrence and the damage potentials of the risks associated with the processing of data. Information security and data protection are simply closely tied together, as already depicted in the quote at the beginning.
Another key element that should be targeted by a thorough analysis is data minimization. During data processing, only as much personal data should be collected as is absolutely necessary for the respective application.
Authentication, anonymisation and pseudonymisation and encryption of data are all safety measures that are actually explicitly listed and specified in the GDPR. According to LFDI using TLS 1.2 or above in transit is mandatory and additionally encrypting your data at rest is highly recommended. The reason being for the latter that servers are often located with a provider. If, for example, technical errors or the termination of the contract should occur, there shouldn’t be any resulting problems if your data is encrypted. Therefore the risk of violating confidentiality is reduced.

In summary the system should be conceptualized and developed so that maintenance of user rights, such as access, deletion and correction of data are addressed from the very beginning.

Privacy By Default

When using an application, the preconfigured settings must always offer the highest possible security and data protection. Only by opting out or manual configuration of the user can the security or data protection be reduced in order to obtain simplifications or advantages regarding usability. The aim of this directive is to protect the less technologically inclined users, who are not able to adjust their data protection settings themselves.

Users should therefore be able to decide for themselves what data they make available to companies beyond what is necessary.

Airbnb has a really good and interesting approach in my opinion. In their mobile App they list all services and tools they use in their privacy section and you can decide which one to enable or disable. There’s only 4 SDKs that are strictly necessary and therefore can’t be disabled (Braintree, Facebook, Google Maps and Google reCAPTCHA).

Security By Design

Applications without security architecture are as bridges constructed without finite element analysis and wind tunnel testing. Sure, they look like bridges, but they will fall down at the first flutter of a butterfly’s wings. The need for application security in the form of security architecture is every bit as great as in building or bridge construction.”
OWASP, Secure Coding Principles

As already mentioned earlier, a proper implementation of information security is now basically mandatory and legally required due to GDPR.

That this is unfortunately not (yet) always the case is depicted by the €123 million fine Marriott received in July 2019. After acquiring its competitor Starwood, Marriott discovered Starwood’s central reservation database had been hacked. This included 5 million unencrypted passwords and 8 million credit card records. The breach dated back to 2014 but was not discovered until November 2018. In total about 30 million EU residents were affected.

I hope that the violations and respective fines listed in this blog have already given you some insight into what you should NOT do if you intend to correctly apply privacy and information security in your company and processes.
However since I always feel that illustrative examples provide a lot of benefit in understanding a complex topic, this is exactly what we’ll do now to deepen the understanding. Guided by the core pillars of information security, we’ll look at some concrete measures one can implement to increase security and robustness.

The core pillars of information security

🤫 Confidentiality

In short: Sensitive or personal data should not be disclosed to outsiders. Countermeasures include (strong) passwords, access control lists and authentication procedures. It’s beneficial to use encryption so information that may be accessed despite the previous controls is still protected.

👌 Integrity

Integrity means on the one hand that data may not be changed from the outside and manipulation is impossible, but on the other hand it also means protection against unintentional changes, such as through user error or data loss due to a system error. Changes should only be made by authorized persons.
In short: the correctness and completeness of data must be guaranteed.Countermeasures include access controls and strict authentication. Administrative controls such as separation of duties and training are also beneficial.

🙋 Availability

An example of an availability violation is the loss of data through malware. Actually, most threats for availability are non-malicious in nature and include hardware failures, unscheduled software downtime and network bandwidth issues.

Countermeasures include redundant systems in separate physical locations and backing up data. Especially Systems that have a high requirement for continuous uptime should have significant hardware redundancy with backup servers and data storage immediately available.

Additional principles are:

Authentication ⇒ Recipient must be able to determine the origin of the message
Non-Repudiation ⇒ The authorship of a message/action must not be deniable
Anonymity ⇒ Protection of the confidentiality of the identity
Accountability ⇒ Ensuring that subjects can be assigned to their actions
Auditability ⇒ Ensuring that previous system states can be reconstructed and processes can be traced

Security – Practical Measures

Monitoring

In this example, specifically Event Loop Monitoring.

Safety measures against Brute Forcing

Application Activity Logging

Affects

Confidentiality, Integrity, Availability

What

Must have. Insufficient Logging & Monitoring is still in the OWASP Top 10. Not only can you detect errors at runtime, but attacks can be identified early or even prevented.
As an advanced setup you can feed all your logs into a SIEM (Security Information and Event Management System) and enable Intrusion Detection / Prevention.
So that you’re prepared and know what to do once an attack or breach is detected you should setup an Incident Response Plan.

Limit data flow

Keep your packages and dependencies up to date

Stay clear of unfavourable regexes

Affects

Availability

What

Most Regular Expressions can reach extreme situations that cause them to work very slowly (exponential in relation to input size). Therefore, an attacker can use regular expressions to crash an application by performing a Regular expression Denial of Service (ReDoS).
There are some tools to check if a regex has a potential for causing denial of service. One example is vuln-regex-detector. Besides that, applying input validation in general is already a good and meaningful approach.

Security Linters and Code Checking

Input Validation

Affects

Integrity, Availability

What

The secure principle Reluctance to Trust applies here. When building an application, you should always anticipate malformed input from unknown users. Even if users are known, they are an easy target to social engineering attacks, making them therefore potential threats to a system. With correct input validation widespread and popular attacks like (SQL) Injection or XSS can be prevented.

Any integer between -2 billion and 2 billion is
seldom a good representation of anything.

One interesting approach to input validation is using Domain Primitives. For example, instead of using a string as type for a username you define a class called UserName. This class has all domain rules related to a username bundled in itself, e.g. minimum and maximum lengths, allowed characters, etc. Therefore, if the value exists, its automatically valid!

Transactions in NoSQL Databases

Affects

Integrity

What

Say you’re using MongoDB as database. By default, it doesn’t support transactions and therefore the ACID principles are not given. If you do have any logic chains in your application that consist of more than one write command, you’re in trouble. It might happen that your server restarts during one of those logic chains and only a fraction of many dependent writes is executed. As a result, you’d end up with corrupted and incorrect data in your database. Depending on the context of your application and the severity of that risk, you should either consider switching to a database that innately supports the all or nothing principle – transactions -, or setup transactions for your MongoDB database. This can be done since version 4.0 by setting up a replica set.
The opinion of the LFDI towards this topic is actually quite strict and limiting. Their advice is to always use a “proper” database, like PostgreSQL. Their argumentation is that only sequential databases can guarantee mathematical correctness and thus integrity of the data. However, it’s totally reasonable that you might chose a “non-optimal” database for your projects for reasons like being lean or simply being more experienced with it. This is absolutely valid. You should still try to get the best possible security with the choice you have made. By doing so you probably end up with a higher security level anyways than when you’d have chosen the theoretically best fit with which you’ve no or limited experience.

The statement of the LFDI being: “Only if a lack of technology leads to a protection goal being violated, then it can become a problem”.
In other words: if a violation or a breach could’ve been prevented if you’d have chosen another technology, like SQL instead of a NoSQL, only then there might be repercussions.

Prevent data (e.g. IP Address) leakage

Affects

Confidentiality

What

This goes hand in hand with being conscious about data flow in your application. Chances are high that you carelessly give data of your users to strangers. The easiest example is embedding an image into your app which is hosted by someone else. If the image is not physically located within your own infrastructure, any external hoster could read the IP addresses of your users accessing said embedded image.
Another popular example is using SDKs. The very polarizing opinion of the LFDI is that one would have to forego using ANY third-party components or libraries, if the objective pursued is to be as data protection friendly and correct as possible.
However, the LFDI also realizes that this is contrary to the entire open source movement and ultimately simply not feasible. If you’d follow that guideline, you would have to constantly reinvent the wheel as a developer. They key takeaway is Reluctance to Trust again. Be really conscious about which libraries or SDKs you’re adding to your project. If you want to be really sure, you should check each library for potential data leakage before adding them. And if you’re planning on adding something like Facebook or Google SDK, ask yourself if it’s worth it. Are your users okay with their data being shared? Does the benefit outweigh the negative? At the end of the day there’s always a business model behind something. Facebook and Google are not offering their SDKs for free, because they are such kind hearted people. They want to gather as much data as possible. And that’s exactly what happens once you add those SDKs to your project. Be aware of that.

Conclusion

GDPR and information security can be a daunting task and overwhelm you on first approach. I totally get it, since I’ve been there and quite frankly still am. However, I think it is essential to recognize the importance of the issue. In the end, it is not just a matter of taking action out of fear of fines, but of actually seeing the bigger picture. First of you establish trust. If you’re honest and authentic your users and customers will definitely notice and appreciate it. Secondly, the complexity of software systems is constantly increasing and connectivity between systems and devices is growing. In combination with weaknesses due to errors in requirements, architecture, design, implementation, operation and organization this could break your neck financially if you don’t take safety and security into account from the start. According to the IBM System Science Institute the relative cost of fixing defects can be up to 100 times higher in production than in the design & planning phase (as seen in the figure below).

https://www.researchgate.net/figure/IBM-System-Science-Institute-Relative-Cost-of-Fixing-Defects_fig1_255965523

Security incidents regularly affect companies of all sizes, often putting them on public display and causing irreversible damage to the reputation of the companies involved.
To add on this, our society is more technologically reliant than ever before and there is no sign that this trend will slow.

If you plan on your software existing for more than 5 years, start developing a data and information security mindset. Be mindful about the tools you use, where data flows in your application and learn to think ahead. Ask yourself what risks or vulnerabilities might arise and what inconsistencies could appear. Be careful, anticipatory and conscientious. But don’t overdo it, after all it’s about your (and your companies) priorities. Decide what is best for you right now and plan a little into the future. But there’s no need to try and anticipate everything that might happen and to build a Fort Knox infrastructure right from the start. Information security should be seen as continuous process in which you iterate and evolve in many small and incremental steps.
I truly hope this blog helps you get started on your way and gives you some insight into the possibilities and opportunities. As another help to get you started, I attached a small cheat sheet and some useful resources.

Helpful & Interesting Websites

References and further reading

Waidner, M., Backes, M., and Müller-Quade, J. (2013), Entwicklung sicherer Software durch Security by Design, Technical Report SIT-TR-2013-01, Fraunhofer-Institut für Sichere Informationstechnologie, Darmstadt, Germany, URL: http://www.kastel.kit.edu/downloads/Entwicklung_sicherer_Software_durch_Security_by_Design.pdf
Adkins, H., Beyer, B., Blankinship, P., Lewandowski, P., Oprea, A., Stubblefield, A. (2020), Building Secure and Reliable Systems, O’Reilly Media, Inc. ISBN: 9781492083122
User Privacy Protection Cheat Sheet by OWASP, https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/User_Privacy_Protection_Cheat_Sheet.md
Third Party JavaScript Management Cheat Sheet by OWASP, https://cheatsheetseries.owasp.org/cheatsheets/Third_Party_Javascript_Management_Cheat_Sheet.html
Authentication Cheat Sheet by OWASP, https://cheatsheetseries.owasp.org/cheatsheets/Authentication_Cheat_Sheet.html
NodeJS security Cheat Sheet by OWASP, https://cheatsheetseries.owasp.org/cheatsheets/Nodejs_Security_Cheat_Sheet.html#monitor-the-event-loop
GDPR: From confusion to chaos by Halak Mehta, https://www.datacenterdynamics.com/en/opinions/gdpr-confusion-chaos/
GDPR: General Data Protection Regulation, https://advisera.com/eugdpracademy/gdpr/
ICO: Statement: Intention to fine Marriott International, Inc more than £99 million under GDPR for data breach, https://ico.org.uk/about-the-ico/news-and-events/news-and-blogs/2019/07/statement-intention-to-fine-marriott-international-inc-more-than-99-million-under-gdpr-for-data-breach/
Was ist Privacy By Design? By TUEV Nord, https://www.tuev-nord.de/explore/de/erklaert/was-ist-privacy-by-design/

GDPR and Information Security: A practical guide for Startups and small businesses

Why should you even strive for GDPR compliance?

The vagueness and individuality of GDPR

The core pillars of GDPR

🔐 Security

🤝 Accountability

👤 Individual Rights

Content of the GDPR

Putting the user at the centre

Privacy By Design

Privacy By Default

Security By Design

The core pillars of information security

🤫 Confidentiality

👌 Integrity

🙋 Availability

Security – Practical Measures

Monitoring

Safety measures against Brute Forcing

Application Activity Logging

Limit data flow

Keep your packages and dependencies up to date

Stay clear of unfavourable regexes

Security Linters and Code Checking

Input Validation

Transactions in NoSQL Databases

Prevent data (e.g. IP Address) leakage

Conclusion

Helpful & Interesting Websites

References and further reading

Comments

Leave a Reply Cancel reply