NOTE: The following blog is Part 3 of a five-part series discussing areas of the AI field that the Atlas AI team is passionate about and working to advance in 2024. You can read Part 1 and Part 2 here.
Causal inference has long been a backbone of science – and, in turn, a backbone of society! Methods to figure out how changing one phenomena changes another have been central in the development of everything from new medicines and agricultural technologies to perfecting the shade of blue in a popular search algorithm in order to maximize ad revenue.
The gold standard for causal inference has long been the randomized controlled trial. Medicine and agriculture were early adopters. A random half of a study population gets a new drug, the other gets a placebo, and the subsequent difference in outcomes between the two tells you whether the new drug worked. Ten randomly selected quadrants of a field plot get fertilizer, ten others don’t, and the difference in crop yields tells you how well the fertilizer performed.
A few decades ago, social scientists figured out they too could do randomized trials, spurring a revolution in development economics and related fields, and providing a new benchmark against which standards of scientific evidence could be measured. But many important questions of interest are not easily amenable to a randomized trial. How do you figure out whether an already-implemented country-wide anti-poverty program worked? How do you figure out whether a changing climate is reducing agricultural productivity?
To answer these sorts of questions in a credible way, social scientists and statisticians have also worked for decades to hone methods that will allow for credible causal inference in “observational” settings - i.e. settings where randomization was not possible or ethical, but where understanding the effect of one variable on another is of critical interest. The key challenge is in figuring out how to accurately predict the counterfactual – i.e. what would have happened to the groups who got the poverty program, or the crop fields that experienced climate change, in the absence of those treatments.
In the past few years, work in causal inference has increasingly borrowed insights and tools from machine learning, and this intersection is rapidly changing how we do causal inference. I want to highlight two primary ways that approaches from machine learning are improving causal inference, both of which we are advancing here via our AI research at Atlas AI.
The first are methods to better predict counterfactuals – i.e., what would have happened in the absence of some treatment. Tools deemed most reliable in observational causal inference have long relied on being able to observe both treated and untreated units for many periods before and after treatment, with estimation of causal effects arrived at by comparing what happened to treated units before and after treatment relative to untreated units over the same period. This can work well so long as treated and untreated units were actually trending similarly prior to the treated ones getting treatment. Machine learning offers both new methods (e.g. matrix completion) and improvements on existing tools (e.g. synthetic control) that make it more likely that untreated units provide valid counterfactuals for treated units.
But what if you don’t actually have the data to measure what happened to people or areas that got some treatment and others that didn’t? This is a common problem: a government or agency rolls out a program, but outcomes are not systematically measured for populations that got the program and others that didn’t. Here, machine learning can help as well, and recent developments in this area are core to what we do at Atlas. In particular, we’ve shown in a number of papers that the combination of satellites and machine learning can be used to accurately measure outcomes of interest over time, in settings where data on those outcomes (e.g. poverty levels, levels of wealth or consumption) are otherwise unavailable. David highlighted in his post a few weeks ago how LLMs can be similarly leveraged to make useful predictions where ground data might be unavailable.
Using these machine-learning-derived data responsibly for causal inference tasks requires care. Predictions from these models are never perfect, and imperfections can bias “downstream” inferences you make, potentially leading to incorrect assessment of causal effects. How best to fix these biases are a very active area of research, and are filling the pages (Prediction-powered inference; Methods for correcting inference based on outcomes predicted by machine learning) of top scientific journals. A combined team of Atlas AI and Stanford scientists have added to this literature with a recent paper in Nature, where we showed how appropriately-debiased satellite-and-machine learning predictions of local-level wealth could be used to measure the causal impact of a nationwide electrification program in Uganda. This paper used both machine learning tools for predicting counterfactuals (in particular, matrix completion), with a new approach to using custom loss functions in a convolutional neural network to ensure that errors in model prediction did not bias inference.
Conclusion: Implications for Monitoring a Complex and Rapidly Changing Planet
In the realm of socioeconomic monitoring and analysis, the intersection of causal inference and machine learning will continue to be a critical area of research. Most of the significant challenges of the coming decades will involve understanding and shifting complex systems – environmental, social, commercial, or otherwise. Causal Inference ML will allow for a deeper comprehension of these complex global systems and will offer predictive capabilities crucial for making informed decisions in geopolitics, commercial investment and sustainable development. By advancing and applying these techniques, Atlas AI will continue to play a leadership role in the field and I’m excited to share some of our latest research throughout 2024.
At the same time, the responsible use of these technologies will be imperative, ensuring its benefits are ethically and equitably distributed. The fusion of cutting-edge machine learning with social science research and public engagement will continue to be key to responsible advancement of the field. This commitment to a balanced and inclusive approach highlights Atlas AI's role not just as a technology leader, but as a conscientious steward of the AI field as we continue to advance our mission to guide organizations through the complexities of our rapidly changing planet.
Marshall Burke is a Co-Founder of Atlas AI, an Associate Professor of Global Environmental Policy in the Doerr School of Sustainability, and Deputy Director at the Center on Food Security and the Environment, both at Stanford University. He is also Research Fellow at the National Bureau of Economic Research. His research focuses on social and economic impacts of environmental change, and on measuring and understanding economic livelihoods across the developing world. His work regularly appears in both economics and scientific journals, including recent publications in Nature, Science, the Quarterly Journal of Economics, and The Lancet. He holds a PhD in Agricultural and Resource Economics from UC Berkeley, and a BA in International Relations from Stanford.