Machine learning on remote sensing? What does that even mean?

Gabe Cadamuro

A quick overview

Machine learning on satellite imagery has a long history, but an understanding of its possible role in the developing world came a little later than it did for other data sources. Initial attempts to detect objects from satellite imagery would apply different machine learning algorithm (AdaBoost, SVMs etc) by featurizing the image into pixel histograms or other such features. This led to projects to extract roads, detect buildings, as well as machine learning approaches to crop yield estimation based on special spectral data. A seminal paper matching spatially tagged economic survey paper with a convolution neural net approach resulted in accurate wealth prediction over several countries. While deep nets had been used in satellite imagery before, this showed that CNNs were sufficiently powerful to infer economic variables that were previously considered too complex to estimate from just remote sensing data. Several works have followed in this vein by applying deep learning to novel difficult tasks like mapping slums or understanding the quality of a road. Not all recent work involves application of deep nets: in particular predictions of agricultural yield have a long history with several different approaches with continued success in that area. However, even here concepts from deep learning have been found to be of use: for example in using LSTMs (a type of deep net) for temporal vegetation modelling in the context of crop identification.

Advantages of ML on Remote Sensing data

Remote sensing, and in particular that based on satellite platforms, has a number of advantages in the context of developing world ML. The first is that satellites are truly global and do not incur any transportation costs to obtain imagery of any part of the globe. This is a significant advantage over measurements that have to transport specialized equipment or professionals since developing regions, and especially rural or remote districts, will be harder to access. Limited transportation infrastructure and rugged terrain would be comparatively bigger problems in these cases. The second advantage is that satellites can persistently measure an area without having to deal with local factors such as security or supply. This is invaluable for studies involving areas in the midst of conflict or natural disasters. The third advantage, a natural by-product of the other two, is that satellite measurement can be done in a truly unbiased manner if desired. Other passively collected data sources such as social media or CDR rely upon a set of users that might have a bias towards being younger or more urban. Manual measurements or surveys may focus their attention on more densely populated or easy to access areas to the detriment of rural communities. There is no reason for either of these biases to occur when using satellite sensing data.