The Journey Continues: Starting as a Fellow at data.org!

A set of colored squiggles on a canvas
Process 16 (Software 2) by Casey Reas

I am thrilled to announce that I am starting a one-year fellowship at data.org this April! For those unfamiliar, data.org is a platform for partnerships to build the field of data science for social impact. Having stepped down as CEO of DataKind last year, I wanted to find an organization where I could focus on field-building activities in the data science and AI for good space. data.org is the perfect place for this work, as it aims to provide a neutral platform where different actors — from funders to social impact orgs to private sector organizations — can learn from one another and build the field. Moreover, I’m impressed with the team there and their thinking about how to build this platform, and I look forward to working with them.

So what will I be working on? Generally, I want to help folks make sense of the scattered and fractured data science/AI for social impact landscape so that more fruitful and strategic collaboration can take place. The first project I’m stewing on is an ontology of the data science for social impact space. To see the confusion we face today, look no further than my clunky use of both “data science” and “AI” in my opening sentences in an attempt to convey what I’m working on broadly enough for all audiences. I see this confusion sapping focus and direction everywhere. Organizers will hold “Data for Good” conferences to bring people from the same discipline together, only to have attendees who are western data privacy experts pick their way through folks building machine learning algorithms in Kenya, who are in turn alienated from the folks working on building better data infrastructure in government. Foundations will proclaim that they are launching a new data science portfolio, but it’s unclear if that means they’ll be providing support for critical AI ethics research, for non-profits to use their data better, for creating their own machine learning products, or anything in between. Nonprofits seek to become “data-driven”, but this term can signal everything from building an AI innovation team to compiling data for their funders more effectively.

The ontology seeks to untangle this web and categorize the different end goals, processes, and cultures that people are working towards in clear, evergreen terms. Moreover, it will include a landscape of the actors in the space and how they’re operating given this ontology so that people can more easily find one another, fund one another, and identify problems yet to be solved together. For example, do you want to find or fund research on the ways in which track-and-trace technologies built for COVID are affecting society? You should check out Data & Society or the Engine Room. Do you need technologists to help you build your own data innovations? Head over to DataKind. Are you looking to train your non-profit staff in technical skills? TechChange has a great program for that.

While the ontology is the project I want to focus on first, there are a lot of other field-level topics that have been stacking up since my time at DataKind that I hope to delve into this year. Here are just a few:

  • Ethical AI: A Constructive Lens: Much of the debate I read on the ethics of AI is very narrowly focused on criticism of western capitalist uses of AI. That’s an important strain of work, as western corporations are disproportionately shaping the use of AI. However, while that conversation is centered on the question “How is this corporate-built AI creating unwanted harms and how do we mitigate them?”, there is less conversation centered on the questions “How might we build AI that is intended to help advance human prosperity?” Ethical AI conversations about corporate use of AI are bound by the institutional incentives of capitalism and mostly-US-based business. International NGOs, foundations, and some governments have entirely different incentive structures and may be able to practically answer that question. I am interested in illuminating the different systems, institutions, and incentive structures that are possible for AI when you’re not constrained solely by western business law and culture.
  • The Data Production Pipeline: Data is a very unique element. In some ways it is a commodity, traded in exchange for access to services. In other ways, it is a resource, transformed and aggregated to create new abilities. Through it all, it is often linked indelibly to our personal identities. I believe much of the confusion around the AI / data science debate is because of the odd properties of data. Building on the ontology work, I think it would be a useful exercise to describe the ways data is similar or different from other production processes, e.g., harvesting lumber to build products from wood.
  • Designing scalable technology for social impact: At DataKind, it was clear that the path to major change wouldn’t come from single, one-off projects with one non-profit at a time. However, the vast majority of data-driven tech is still built in this manner, which is identical to how for-profit companies build tech: build it for one company, hope it scales across the customer base. We were funded by data.org’s founding partners at DataKind to develop a process for building data-driven innovations that could potentially scale across entire issue areas called Impact Practices. Though still in its infancy, I am quite bullish on the model and want to support data.org and others in their efforts to create these collective design models.

If these ideas intrigue you too or, more importantly, if you’ve already worked on them or solved them, drop me a line at jake@data.org. I intend all of this work to be highly collaborative with the other data science-, ML, AI-, quantum-techno-whatever- for gooders out there, and I’d love to work together with you on this. During my fellowship with data.org there will be opportunities to come together (online at first, and eventually in-person), and I very much welcome community members interested in contributing to and advancing these conversations.

We know that AI and data can be used to harm inadvertently, and we hear about those cases daily. However, we also have so many wonderful opportunities to thoughtfully and safely use this technology if we’re willing to redesign the way the game is played. That is the core inspiration that drove me to start DataKind and it’s the core inspiration that drives me now to do this work. I’m so eager to get started shaping a more prosperous and organized data-for-good field, and I look forward to doing this work alongside all of you.

Keep your eyes on this spot for updates and see you out on the front lines!