Causal inference is of great importance in the international development space. A government might be interested in distributing fortified rice to improve health outcomes among the food insecure, but it first needs to understand the impact fortification might have on the target population. If an investor wants to invest in emerging markets, they might want to understand which geographies would yield maximum impact per dollar invested.
As is widely understood, establishing causality is hard. Gold standard approaches to showing causation, such as randomized controlled trials (RCTs), deliver iron-clad causal estimates but can be time-consuming and expensive to pull off. Other approaches using observational (or non-experimental) data, such as difference in differences, synthetic control, or matrix completion, can approach the rigor of RCTs in certain settings. But these approaches require, at the very least, trustworthy and comprehensive data on outcomes of interest — data that are often unavailable in much of the world.
This is where we see a unique opportunity in using machine learning and remote sensing for causal inference. At Atlas AI, we produce socio-economic data layers at a temporal and spatial scale previously unavailable from any of the traditional surveys conducted on the African continent. We train Machine Learning (ML) models that utilize remotely sensed data in combination with nationally representative survey data to predict socio-economic outcomes at high spatial and temporal resolutions. These data can then be used in conjunction with natural experiments and/or RCTs for impact evaluation. In doing this we solve two big problems in establishing causality - 1) we provide high-quality data to be used as the dependent or independent variable in causal analyses, and 2) we provide tremendous statistical power due to the sheer volume and completeness of our data.
The use of ML and satellite data for causal inference provides unique possibilities, but also comes with a unique set of challenges. It is important to understand the statistical impact of each of these challenges and how they might be overcome. Here, we highlight a few of these challenges and their potential solutions.
The first potential challenge could occur in the features used in the ML model. It could be possible that one or more of the features used in the predictive model may be correlated with another variable (independent/dependent/covariate) in the causal analysis. For example, let’s say that we are trying to understand the impact of a deworming intervention for children on household economic well being, where economic well being is a machine-learned variable, and that our satellite/ML model uses proximity to a public school as one of the features used to predict well-being. This could be an issue for causal inference if proximity to a public school is also correlated with the probability of being dewormed. Hence, it is important to ensure that variables produced by an ML model do not “bake in” a treatment effect of interest.
The next issue is that of bias and measurement error. We understand that the predictions from an ML model contain some error - this can be concluded based on the fact that the measures of performance of ML models, such as accuracy or R-squared in predicting held-out test data, is never equal to 100%. In traditional settings of causal inference, this error is referred to as measurement error. Different considerations must be taken depending on whether this error is random or systematic.
If the measurement error is random (classical), i.e. it has a zero mean and is unrelated to the dependent, or independent variables or any of the covariates, it has different implications on the causal analysis, depending on whether the ML variable is used as a dependent variable or as an independent variable. When used as a dependent variable, random measurement error will lead to increased standard errors on the causal estimate. This is a problem that can be tackled with more volume of data to provide greater statistical power. When used as an independent variable, it would lead to attenuation bias in the causal estimate, i.e. it will lead us to underestimate the causal impact, but will not change the direction of the impact. This can still be useful to establish what is the minimum impact of an intervention.
More often than not, measurement error in ML variables is systematic or nonclassical, i.e. it is related to one or more variables in the analysis. This is a harder problem to deal with as it can provide a biased estimate of the causal impact. For example, if we again consider the impact of deworming on economic well-being, let’s say that our ML predictions are biased with respect to time, i.e. the model overpredicts well-being in years after the intervention but not in the years prior. If deworming in reality did have a positive impact on economic well-being, the bias in the ML variables would lead us to significantly overestimate this positive impact. Similar issues could arise if the predicted well being was (differentially) biased with respect to any other variable, such as the independent variable (deworming), the dependent variable (true well being), or a covariate that is correlated with the independent variable, such as geographical area (urban/rural), or elevation. Thus it is important to understand whether such systematic biases exist in the ML variables. If they do, it is important that estimates be properly debiased, either during the training phase or after inference. Simulation-based methods can be used to help understand how bias can affect the causal estimate of interest and can offer insights into how to accurately account for the effect of known bias.
Another consideration is the validity of the predictions themselves. Let’s say an ML model trained in South Africa was used to predict economic well being Tanzania, or that a model trained on 2013 labels was used to predict 2018 outcomes. It is important to understand whether those predictions are valid, or whether the ML model is generalizable in space and time. These reliability checks can be performed by specifically designing training, validation, and testing splits in the ML pipeline. Contextual understanding of the area of interest can also help validate such predictions.
Finally, the training data itself may have sampling bias, or it might also contain noise. For example, it may not be adequately representative of the population in question - one district might have disproportionate representation in the training data and another might not. Assigning appropriate weights to the training data may be helpful to mitigate this bias.
At Atlas AI, we have spent much of our COVID lockdown developing solutions to overcome each of the challenges described above. As such, we are uniquely positioned to leverage the tools of machine learning and remote sensing to advance the field of impact evaluation in the development sector.