Mapping the Data for Good Landscape at Good Tech Fest

This is a part of a series about a project defining “Data for Good” and “AI for Good” in a more helpful and nuanced way.

I had the great honor of running a workshop at Good Tech Fest last week on Mapping the Data for Good Landscape. I recently joined data.org on a year-long fellowship, determined to make the big tent of “Data for Good” a little more granular and easy to navigate for nonprofits, foundations, and the public at large. This workshop was a great opportunity to get out of my Zoom bubble and talk with folks on the frontlines of using tech for good about what organizations they interact with and what problems exist with the current “data for good” landscape.

The exercise we ran started with six common issues that social impact organizations often have when trying to work with data, data science, or AI:

  • I need more data
  • I need a way to store / share / access data better
  • I need data science talent
  • I need a data science output built from some data
  • I need a data strategy
  • I need data funding

Participants were then asked to add as many organizations as they could think of that helped address each problem for the sector. About 30–50 participants joined in over the course of the Zoom session. Participants were also asked what issues social impact organizations faced in using data that were not addressed above. The results were added to a Jamboard (publicly available here), from which we then had a discussion about what we observed, what was missing, and what we should make of this little slice of the data for good ecosystem.

The group had some telling observations once all the stickies were up. Here are just a few of the thoughtful observations people made:

The space is very fractured. There’s no clear throughline between organizations at each stage. It’s natural for a new space to be fragmented, especially when the six needs above require different solutions. However, this observation should prompt us to ask if we want the data for good landscape to remain this way. Should the stages of building data science for social impact be more coherent, allowing people to go to a “one-stop-shop” for using data? Or is it all more effective for the space to be modular? Either might be fine, but only if designed that way intentionally. Today the space simply seems chaotic.

There is a “missing middle” for data sources. The data sources listed for the “I need data” question were very broad, aggregate, national data. NGOs in turn have hyper specific data about their local operations. Is there a solution for organizations somewhere in the middle? Might datasets or data collectors specialize in getting data at a finer grained scale? Could these datasets be designed for collectives, sectors, or other middle level participants?

The training programs are largely located in the global north. Given the conversations about how much data science for good work is centered on the global south, shouldn’t there be a preponderance of training programs there? The majority of organizations on our Jamboard were from the global north. This comment did prompt a few participants to add talent programs focused outside the US and Europe, like Zindi. The point remains, however, that there was a disproportionate amount of training programs in the global south on our whiteboard.

As good statisticians we must note that these results are a function of the sample of people in this workshop. Good Tech Fest skews toward the US (from my observation), and I would expect many people who would come to a session I’m hosting are in my network in North America, so we’re likely to have a fairly strong North American bias in these results.

Beyond the comments about the individual issues, participants also raised interesting meta-issues when talking about other issues the social sector faces when using data. Maintaining data science solutions came up as a major weak point that many nonprofits face but that few service providers support. One participant pointed out that the answers to these questions could differ if taken from the individual organization level (“My nonprofit needs data”) to the sector level (“Everyone working to stop illegal fishing needs X data”), raising the question of what it means for “data for good” organizations to function at a sector level. And of course, the challenges of trying to accomplish data science in a nonprofit context — without funding, without time, without a team — looms large. Nonprofits are not new to resource constraint issues, but the newness of data science and its relatively high cost may mean nonprofits have a higher hurdle than usual in getting the funding they need to do this work. You can read below the fold for a longer list of the observations participants made for each issue (along with some of my editorialization).

So what are we to make of this conversation? On a tactical level, there seems to be a need for a more in-depth directory of data for good groups by their function. At a minimum that would allow us to critically reflect on opportunities and gaps in the sector and have a deeper conversation about what is needed. At a higher level, it’s clear that we need some clarity in all the ways we could “do data for good”. There is surely a difference between helping every nonprofit hire a data scientist and creating more tools so that nonprofit executives can build their own visualizations. What could we create to help drive this clarity? Perhaps a vision or manifesto for the “data for good” movement would causes groups to distinguish themselves, and organize around how they each contribute to that vision. Or perhaps a prescribed set of “packages” for organizations trying to utilize data science would help cut through the confusion. Imagine that instead of a list of “data for good” groups writ large, there was a “data-driven policy change pack” that guides organizations to a set of partners to work with them on data strategy, collection, and use specifically for that outcome.

While we may not have found a magic bullet for strengthening the data for good ecosystem in this workshop, the good news is that the work continues on in many forms. These workshop results will inform a landscape I’m building for the data for good space. You can watch this account for more info on that, or reach out to me at jake@data.org if you want to get involved. We here at data.org are also thinking about ways to help fill some of these gaps in the data for good landscape. We’ve got a long way to go, but workshops like these show the need, the white space, and the opportunity we all have to mature and build this space together!

A huge thanks to everyone who participated in the workshop. Some notes on the conversation around each slide in the Jamboard are below.

I don’t have enough data

  • There is a lot of macro-level data
  • Many of the organizations on the stickies above provide data, but not necessarily the means by which NGOs can collect their own data. FulcrumApp shows up here, which is a tool that allows for data collection, but is one of the few. How can orgs get data customized to their needs? Should they have to collect it themselves?

I don’t have a way to store / share / access data

  • A lot of commercial tools appeared for this issue — Airtable, Excel, sheets, cloud services. This issue seems primarily solved by for-profit offerings. Are we OK with that?

I need to hire/train data people for my organization

  • Many of these organizations are from the US and the global north. There need to be more organizations working in the global south, if that’s where the work is focused.
  • The range of skills taught from these programs vary widely, from teaching data analysis to software development to executive training on the business of data science. Are mission-driven organizations able to identify which training they need?

I need to create something from my data (e.g. a report, analysis, algorithm)

  • There is a mix of platforms vs providers — does the organization have capabilities/resources for “self-service” platforms?
  • The service providers deliver different outcomes from data, e.g. Tableau will visualize your data but not write an algorithm to automate your intake. DrivenData will build you a machine learning solution but won’t run a competition on your data strategy. How do we make it more clear what types of data outcomes these providers deliver?

I need a data strategy

  • “Data strategy” isn’t well defined and very general. We’re missing teams that provide a specific type of data strategy (like on a specific issue GDPR, or governance, or product strategy), or strategy for a sector.
  • The hidden issue with data strategy is that the data infrastructure is treated under NGO IT budgets, which is seen as a utility whose costs needs to be reduced. Getting support to expand data strategy in this politic is a challenge.
  • Data strategy has the fewest stickies.
  • There are a lot of professional service firms on this list. If they are the go-to strategists for the social sector, then who are the SMEs guiding the engagements? Is there a small network of SMEs that are common, or is this ad hoc? If the latter, what worries do we have about one-offs?

I need data funding

  • The funders on this page fund very different outcomes from data — some fund the use of data to drive policy, others fund machine learning projects. Do we know which does which? Can mission-driven organizations easily connect their needs to the funding?

Other comments

  • There is a throughline of frustration and angst through some of the post-its. For example, under training, there is a comment that “we learn on the job”. In another section, people talked about needing time and “sanity” to do this taxing work. In each frustrated comment one can read a sticking point in the data for good system, from lack of funding, to lack of executive buy-in, to lack of clean data for purposes.
  • “Knowing what we don’t know” came up a lot. It’s one thing to find data that you need, it’s another to even know if you need data, or if the type of data you think you need is right. It seems many organizations need a way to assess what they need to apply data and computing to their goals.

Believer in tech+data for beautiful purposes || Director @DataKind || TV nerd @NatGeoChannel || Fiercely optimistic