There has been increased focus on the ethics associated with data work (traditional analytics, machine learning, AI, research, etc.) in recent years. There are a few good reasons for this: Data has become so embedded in our daily lives that the risk to individuals associated with improper data collection, data handling, and deployment of data products has grown exponentially of late. Also, there are more people doing data work (more data professionals, more researchers, and more hobbyists) than ever before, so there are more eyes on the problem. What is clear is that real harm can be done by malicious data practices as well as by simply failing to prioritize ethical data work. At CommonWealth Data Solutions, we aim to lead by example in this area, standing on high ethical ground and setting an example for others to follow.
A 2021 post by the Harvard Business School outlines 5 principal areas of data ethics: Ownership (an individual owns their own personal information and has the right to decide how it is used), transparency (data practitioners have a responsibility to clearly state how they are using personal data), privacy (an individual’s personally identifiable data should not be publicly accessible, even if they have consented to its collection and use), intention (the intended use of data must be ethical), and outcomes (outcomes must conform to an ethical standard regardless of good intentions; disparate impact matters).
In our consultancy work, all of these areas of data ethics matter, and here is how we think about our ethical duties in each:
Ownership: The vast majority of the data that we use in our business belongs to a third party to begin with. We always ask to ensure that any data that we will be trusted with was collected in an ethical manner, and by default, we ask that personally identifiable information be stripped from all datasets before we come into possession of them to the extent possible for any given project. That said, there are times when we need to work with personally identifiable data. In these cases, we review all privacy and consent agreements beforehand and enter into contractual data use agreements as necessary. We only store data on secure, company-owned devices or encrypted cloud accounts, and we ensure that only people who are directly involved in a project are able to access project related data. When it comes to our own self-initiated projects, we only use publicly available data or synthetic data that we have generated ourselves.
Transparency: Entering into any project, we make clear statements (typically contractual) about how we will use and where we will store any data that a client shares with us. If the project involves personally identifiable data, we ensure that our use is consistent with what the individuals represented in the dataset have consented to.
Privacy: We place the highest priority on personal privacy. Unless there is a clearly stated (and ethical) business case associated with the collection and retention of personally identifiable information, we typically advise against it. Saving it just in case it might be useful later is not a good enough reason. If a client needs to share a dataset containing personally identifiable information with us, all of the storage and contractual agreements that were outlined above govern this use.
Intention: Obviously, we need to ensure that we are only involved in projects with positive ethical intentions. We are in the business of doing what is right above all else, and we will never compromise on that principal. As we define the scope of a project with our clients, we make sure we have a clear understanding of the intended outcome, and the values that are driving it.
Outcomes: Finally it is of utmost importance that no matter how pure our intentions, the outcomes associated with our projects are ethically sound. Our process for assessing this priority will vary from project to project, and no method is completely foolproof, but there are a few key areas to highlight: We are always worrying about sources of bias. We have plenty of experience in academic research design, and we’ve spent a lot of time understanding how to build representative samples. Bias in data collection does not lead to valuable data projects, and it can do real harm once a data tool is deployed. As such, we obsess over project designs that eliminate potential sources of bias from the start. Additionally we spend time brainstorming all the ways that the end product of our work could be misused and how this misuse can be guarded against. It is very easy to get so narrowly focused on the INTENDED use case that we fail to pay attention to the UNINTENDED uses that may arise down the line. Knowing this, we think through these possibilities as an intentional and planned step at the outset of a project.
Data has become more fundamentally ingrained in business practices as the years have gone by, and this trend will only continue. Standard ethical practices around data collection, storage, and use are still trying to catch up, and there are often perceived conflicts between what is ethically appropriate, and what is best for the bottom line of the business. We propose that the ethical choice will always be the best business decision in the long run. Even if this stance results in less than maximum profitability in the short term, it builds long term trust and minimizes the damage to credibility and reputation that can arise when ethical data practices are not a priority.
If this approach fits with your project needs, we would love to collaborate! Contact us so that we can get started on a high quality data project for you!
Comments