GDPR: What should I consider when using social media data for scientific research?
Social media contain an ever-increasing source of information and data. More and more (sensitive) (personal) data is published on social media profiles by the users themselves. But to what extent may this data then be used to base research on, for instance? Consideration must in any case be given to the possible processing of personal data, which will result in the General Data Protection Regulation (GDPR) being applicable.
Data on social media: personal data?
The term 'personal data' has a broad definition, also in an online context. Online personal data does not only include, for example, a name, identification number, location data (e.g. geotagging), images, online identifiers... Also non-explicit information, such as likes and comments on an online article, cookies and web traffic are personal data if the person concerned can be directly or indirectly identified by it.
For example, on the internet, you can use usernames or social media handles that give the user a (false) 'sense of anonymity'. However, such usernames or handles, which appear anonymous, may be sufficient to uniquely identify a person (even if this identification only relates to a person's "online" identity). It is possible to distinguish one (online) person from another (online) person by means of a username. User names are thus not anonymous or anonymised data.
Moreover, you may also find image and voice recordings on social media, such as online photos, online interviews or YouTube videos depicting people. Image and voice recordings may even be a special category of personal data in some circumstances.
Online data therefore often contains information about natural, living persons - in short, personal data.
Consequently, it is best not to think too quickly that the data you process online is not covered by the GDPR; a cautious approach is safest. Even when working with pseudonymous data, the possibility of (re-)identification exists. Only when you are effectively using anonymous data (data that does not relate to an identified or identifiable natural person or personal data rendered anonymous in such a way that the data subject is not or is no longer identifiable), this is not the case. However, the threshold to speak of anonymous or anonymised data was set very high by the legislator, so in most cases you will not be able to speak of anonymous or anonymised data.
(Personal) data on social media: public data?
Information on social media and other online data is considered "pseudo-public": the data subject shares his/her data on the social media platform for certain social media purposes. Although the information is publicly accessible to private and/or professional individuals (when the data subject does not shield his/her account), the posts or information shared are not necessarily always intended for a general audience.
Thus, the fact that some data is public on social media does not mean that there are no limits to its use. The principles of the GDPR apply to both publicly accessible and "closed" personal data, i.e. regardless of the visibility settings of the platform (e.g. data visible only to other users, friends, the entire public, etc.).
However, the fact that some data may be considered "public" is not completely irrelevant: it may provide insight into the expectation pattern of the data subject. Before secondary processing of personal data, it is advisable for the researcher to weigh up interests in the compatibility test (see below), where the expectation pattern of the data subject (which you can derive, for example, from the visibility settings of the data/profile) may be an important element. The data subject will have a different expectation pattern when (reusing) data in an open than in a closed forum.
So, as a researcher, when there is secondary processing of personal data, you will have to assess whether the data subjects actually intended their data to be disclosed. It is not enough that the data is accessible; it must have been disclosed to the extent that data subjects have a different expectation of privacy.
(Personal data) on social media: how do you handle it?
You can collect data in different ways:
- You can directly contact the data subject(s) themselves via the social media platform, to then collect the necessary data from the data subject(s) themselves. This is primary processing of personal data.
- You can obtain the data you need via the social media platform (e.g. via an automated process (API) the social media platform provides you with the requested data that users shared on their platform, or by using scraping you obtain the data you need from the platform). This is secondary processing of personal data.
Primary processing of personal data: you contact the data subject(s) through the social media platform and collect personal data directly from them
In certain cases, you may contact the data subject(s) via a social media platform and thus collect personal data directly from them (such as distributing a survey via social media). In order to process these personal data, you must identify in which legal basis the processing of these personal data rests.
The processing of personal data in the context of your research must be based on one of the six legal grounds in the GDPR. You can find more information on the legal grounds .
Not every legal ground is relevant for research purposes; for that reason, below we only discuss consent and public interest when processing social media data.
If you would want to rely on consent of the data subject for the (planned) processing of personal data, the consent must meet specific conditions.
When it comes to social media, the question could be asked whether the user has not given implicit consent by agreeing to the general terms and conditions of the platform and/or by making the data itself (semi) public. However, this is not the case.
The fact that a user can choose through the settings of a social media platform whether or not to display his or her data publicly does not imply consent from that data subject to you to use his or her data. You should keep in mind here that social media users are not always well informed about the possible consequences of their social media use and may therefore find themselves in a "vulnerable" position.
!! The fact that data is (publicly) available does not imply permission for your use of it. You will have to obtain active permission from users yourself.
Please note: when you process special categories of personal data, an exception ground provided in Article 9 GDPR is also needed. This can also be consent, but it must be explicit.
- Public interest
If you believe that the (planned) processing of personal data is necessary in the public interest, the research must lead to an increase in knowledge and insight that benefits society (directly or indirectly). You must demonstrate that the processing of personal data in your research is also necessary to fulfil this task and serves a societal interest. In addition, this legal basis also requires that an effective public interest task has been assigned to the controller. This task must be defined in national law. Only if there is a legal basis for the research can this legal basis be invoked. More information on this can be found here>.
Secondary processing of personal data: you receive the personal data via the social media platform
When you extract data from social media platforms, and thus do not collect it directly from the data subjects themselves, this constitutes "secondary processing of personal data". To process data on social media, researchers often rely on so-called application programming interfaces (APIs) that connect researchers directly to the platform. Another possible technique is web scraping, which uses software to extract ("scrape") information and data from other websites. The scraped information can then be analysed, and structured into its own dataset.
First and foremost, the GDPR requires that, even with secondary processing of personal data, the processing in the context of your research must be based on one of the six legal grounds in the GDPR. Also in this case, if you process special categories of personal data, an exception ground provided in Article 9 GDPR is needed.
Secondly, collected data may not be further processed in a manner incompatible with the original purposes. So, as a researcher, in the case of secondary processing, you need to conduct a compatibility test.
The GDPR does provide a "presumption of compatibility" when data is further processed for the purpose of scientific research. This means that when personal data are further processed for scientific research, compatibility is presumed. The compatibility test can then be limited to assessing applicable safeguards; namely, sufficient technical and organisational measures should be implemented (e.g. anonymous data should preferably be used and data should be as soon as possible).
If you are allowed to use the collected personal data for your research, you can, depending on the social media platform, contact the "holder of the source material" (i.e. the operator of the social network itself). This way, you can contractually agree on the modalities around your processing, and clearly define the way it will take place. However, such an agreement does not replace the GDPR obligations (such as the information obligation and the need for a legal basis). After all, an operator cannot (fully) freely dispose of the personal data in question itself either. Of course, this does not mean that it is also pointless to obtain cooperation from the operator concerned. In view of intellectual property rights, it may even be necessary to negotiate agreements. However, such an agreement must respect all applicable principles of the GDPR.
(Personal) data on social media: information obligation
If you process personal data, you are obliged to inform the data subject about the (planned) processing of his or her personal data. This regardless of the origin and therefore also for social media data. This applies to both primary and secondary use of personal data.
Data subjects should thus be informed individually about the specific research in which data about them will be used. However, there is an "exception" to this for secondary use of personal data, in particular for processing for the purpose of scientific research. Namely, if providing the information proves impossible or would require unreasonable effort, you can deviate from the information obligation. In that case, you must still take measures to protect the rights and freedoms of the data subject and ensure minimum data processing, including making the information public. In this case, the information can be disclosed at a more general level, such as on a website or in a brochure.
Data subjects not only have the right to information, but also have other rights they can exercise under the GDPR, which may have a potential impact on your processing of personal data. For more information, click here.
You should also document this in your GDPR record (via dmponline.be).
- AVG: Waar moet ik rekening mee houden als ik sociale media gegevens gebruik voor wetenschappelijk onderzoek?
Last modified Oct. 4, 2023, 3:47 p.m.