GDPR: What should I take into account when developing or using AI?
When you develop artificial intelligence based on personal data or process personal data using artificial intelligence in your research, the General Data Protection Regulation (GDPR) applies. AI applications must therefore ensure privacy and data protection principles throughout their lifecycle, including the principles of 'privacy by design' and 'privacy by default'.
Do you or will you process personal data?
AI systems generally use large amounts of data that allow these systems to learn and become intelligent. This does not necessarily involve personal data.
But when personal data is used at some point in the lifecycle of AI, you need to comply with the provisions of the GDPR. AI systems may contain and/or use personal data, to develop, train, use… the AI.
Following questions may provide direction:
- What category of personal data is being processed?
Are you going to process special categories of personal data (sensitive data) such as health data, audio files, video images?
Although the processing of special categories of personal data is in principle prohibited, the GDPR provides grounds for exception. For instance, you can possibly invoke the exception for scientific research purposes or projects where the processing of special categories of personal data is necessary. In this case, you must take appropriate and specific measures to protect the interests and privacy of the data subjects in your research.
- Are you going to process “raw” personal data? Or are you going to process pseudonymised or anonymous data?
If you receive or collect raw personal data or pseudonymised personal data, you should check whether there is a legal basis for using this data in your research.
Anonymous data does not fall within the scope of the GDPR. If you believe the data is anonymous or anonymised, it is best to double-check. If you establish that it is reasonably possible to trace the data back to the persons from whom it originated, for example because additional information exists somewhere that could lead to (re)identification, the data are not anonymous and the GDPR will apply. Big data increases the possibility of re-identification through the possible combination of different data sets.
If you are going to anonymise the data yourself, you should also comply with the GDPR up to the point of anonymity. More information on this can be found in this research tip.
- Are you going to collect personal data yourself? Or are you going to reuse the personal data from another or previous research? Or maybe you will receive the personal data from another internal/external colleague/partner?
Even if, in a research project, the personal data was not collected by you directly from the data subjects, you must comply with the GDPR.
- Or maybe you will collect the data online via social media? Information on social media and other online data is considered “pseudo-public”: the data subject shares his or her data on the social medium for certain social media purposes. The information usually remains publicly accessible to private and/or professional persons if the data subject does not shield their account. The principles of the GDPR however apply to both publicly accessible and “closed” personal data, i.e. regardless of the visibility settings of the platform (e.g. data only visible to other users, friends, the entire public, …).
How to develop or use ‘GDPR-compliant’ AI?
‘GDPR-compliant’ first and foremost requires the correct development and use of AI, based on the principles of ‘privacy by design’ and ‘privacy by default’.
Privacy by design means considering privacy aspects such as purpose limitation and data minimalization as early as possible in your research, and therefore already in the development phase of AI. With every new process or development, you should consider from the design phase onwards whether and how they (may) affect the way personal data are being processed. On that basis, the necessary (technical) measures should then be built into the process or product. Examples of such measures include applying pseudonymisation (the replacement of identifiable personal data with pseudonyms) and encryption or encoding (a method in which data are rendered unreadable by means of certain algorithms).
Privacy by design is often mentioned in the same sentence as privacy by default. They are related concepts; according to privacy by default, you should build in the maximum degree of data protection into your AI. This ensures that the users’ privacy is protected from the beginning of the study, without requiring any extra effort from the users.
Both privacy by design and by default should be embedded in AI systems and developments proactively as well as pre-emptively. If considered later or too late in the process, this can have detrimental consequences for both users, researchers and the AI itself…
In addition, the other basic principles of the GDPR must be followed.
Developing and using ‘GDPR-compliant’ AI therefore starts with you.
The following questions might help guide you:
- Design of the research / study set-up:
- Do you really need personal data? If not necessary, you should use anonymous data or just not collect personal data at all.
- What personal data do you strictly need to achieve the research goal (data minimization)? Try to limit yourself to only those personal data that contribute to answering the research question.
- What legal basis do you rely on to process personal data?
- Have you created a GDPR record to register the processing of personal data (in DMPonline.be)?
- Information obligation:
- Are data subjects adequately informed in advance about data collection, the purpose of the processing of personal data and their rights (transparency)? Even when reusing data, you need to inform data subjects.
Informing data subjects throughout the life cycle of AI is essential. You must be transparent about the business practices of the technology used or envisioned. A distinction is made, in the context of AI, between external and internal transparency:
External: this term refers to the transparency you must provide to the outside world regarding the processing of personal data. This comes down to translating what is happening technically in an AI system to the data subjects in understandable terms and reasoning.
Internal: this refers to the transparency regarding the operation of the AI system that is best ensured within your research. Not only IT teams should be able to understand how the AI system works, but also other people in your research should be able to have relevant information so that these systems are used in an appropriate and informed manner.
- How will you inform users? Is there a project/AI-specific privacy statement? Do you foresee any other way?
Even if you reuse data, you should inform data subjects individually. However, if providing the information proves impossible or would require unreasonable effort, you can deviate from the information obligation. In that case, you must still take measures to protect the rights and freedoms of the data subject and ensure minimum data processing, including making the information public. In this case, the information can be disclosed at a more general level, such as on a website.
- Data subjects’ rights:
- What rights do data subjects have and how can they exercise them?
- Do you need additional mechanisms for data subjects to access, modify or delete their data?
- How can participants in your research withdraw their consent, view their data or even have their data deleted?
AI & ethics
In Europe, the 'EU ethics guidelines for trustworthy AI' have been broadly recognized as the guiding ethics principles on AI. This document defines the basic principles that you should observe during the development or application of AI. For more information, see this page.
Last modified Sept. 19, 2023, 9:31 a.m.