Towards an ethics of social media data analysis

by Lykle de Jong and Richard Rogers, UvA


The ethics behind the collection of data from social media and other web spaces is a polemic topic of debate. As a case in point, in his analysis of 380 studies using Twitter data, Michael Zimmer has pointed out the overwhelming lack of ethical reflection on data collection and research design within the vast majority of them (256). On the whole these studies are characterized by the assumption that online data is already public and, therefore, its collection is in no particular need of ethical consideration. Although this may sound quite reasonable, there is much more to it than first meets the eye.

Ethical decision-making in the online sphere is complex, context-dependent and defined by a grey area, where individual researchers are expected to assume responsibility. As a 2012 report of the Association for Internet Research emphasizes, social media research relies heavily on the individual researcher’s judgment of context and situation, even more so than is the case in other fields of research. Rather than following universally applicable ethical rules, every internet scholar dealing with human subjects is expected to consider the specific context of the research and the consequences it has for the subjects involved, during all steps of research.

Although the scholarly discussion on ethics on the Web may well be marked by ambiguity and equivocality on particular cases, some general guidelines can be distilled nonetheless. Based on several critical yet rather pragmatic concerns about ethical internet research and privacy related matters (see e.g. Bruns et al.; Crawford et al.; Ess; Markham and Buchanan; Methcalf and Crawford; Zimmer), we can suggest the following points of consideration:

(1) How to (re)define the nature, desirability and possibility of (informed) consent on the Web?

(2) How to understand the privacy of the subject within a given context? Is it, for example, reasonable to assume that the subject can expect their data to be public (as, arguably, is the case with public figures)?

(3) How vulnerable are the subjects studied, in terms of the sensitivity of data and the repercussions its publication has? Is there potential harm to be done?

(4) Is anonymization desirable? If so, what techniques could be used, and how to deal with the possibility of re-identification after anonymization?

(5) Relatedly, what methods can be applied to ensure the validity and reliability (through data management, storing, sharing and disclosure) of the research while securing anonymity?

(6) How to balance the urgency and social benefits of the research, vis-à-vis the privacy and rights of subjects?

Although it should be clear that no general policy, or any predetermined standards can be articulated, we could give a first thought on the general impact these issues have on the ODYCCEUS project on European right-wing populism.

The much discussed topic of informed consent on the Web stands at the center of much of the ethical discussions, and is one of a highly contested nature. Although online data undeniably is already public, it should be reasonable for a social media user to expect that their data is meant to stay in the context in which it was originally posted. Even besides the issues surrounding informed consent, adherence to this so-called contextual privacy is crucial to any ethical internet research. Despite having signed the terms and conditions, giving Facebook for example the right to use and sell data to third parties, extracting this data (for whatever purpose) remains one of ethical concern.

Depending on the particular studies, informed consent may thus be desirable for extensive casestudies on individuals, perhaps with the exception of public figures. It could very well be argued that informed consent for public figures, like politicians, do not apply. Being already situated in the wider public domain, they have much reason to expect their data to be public and analyzed by academics and journalists alike. In addition, their impact on society makes for an urgent and compelling case to study them regardless of their consent. Nevertheless, their contextual meaning should always be acknowledged and respected. Contrariwise, we do have to assume the infeasibility of pursuing such endeavors tout court in the case of larger data-sets such as studies depending on big data - if only for the practical ramifications of dealing with such enormous quantities of data.

As a result, we have to think about the anonymization of the data. One of the biggest concerns here is that understanding isolated posts - without the context of the totality of data from that specific user’s profile - is fairly useless, because it cannot reveal the intention and background of the post. One technique to circumvent this issue involves the paraphrasing of the data. But, this raises problems of its own, for it makes hermeneutical and discourse-oriented research hardly possible as a consequence of the modification (and the necessary interpretative choices made) of the original framework of meaning. In addition, it risks fraud because reliable ways of monitoring the fabrication of data may be hard to realize.

Given these ethical problems, we must ask ourselves to what extent the end justifies the means. The pressing need of studying right-wing populism on the Web could arguably vindicate a liberal attitude towards informed consent. However, this means that we should be clear about how the research serves the interest of society. An ethics of social media analysis thus essentially means placing the central responsibility with the individual researcher, simply because it is fundamentally context dependent and specific to the situation. In sum, we should acknowledge the issues at stake, describe the process of ethical consideration and be concerned with potential harmful implications at all times.


