Tag Archives: How artificial intelligence spotted every solar panel in the U.S

PV survey from satellite data

Just how much Solar Power generation is actually deployed today?  We know there is more and more, but PV generation is highly decentralized—anybody can do it, no permission needed.  Furthermore, PV technology is deployed at many scales, in large utility scale farms, in local and campus arrays, and on individual buildings and homes.  (This is one of the great virtues of PV technology.)  So, reports from utilities or other industrial parties are surely an incomplete inventory.

This month a research team at Stanford report a new survey that they say has recorded the size and GPS coordinates of every PV installation in the continental US [1].  They used image analysis of a large dataset of satellite imagery (15cm resolution, available via Google), and constructed a machine learning model to recognize PV installations in the imagery.  They further correlated the presence of PV installations with socio economic variables and available sunlight.  This supplements other estimates, including the self-reported Open PV Survey from NREL.

The resulting Stanford dataset is available  here.

Obviously, this particular task is a good target for satellite imagery, because PV panels are easy to spot from the air, and have regular characteristics (an array of rectangular panels, set in open sunlight, etc.).  Only the very smallest PV installation would be hard to see, and only the most marginal site would be even partly obscured from overhead observation.

I must say that it’s not completely clear to me why this survey is necessary, other than research.  They researchers indicate that they will be of use to planners, developers, and utilities.  They report a so-called “predictive solar deployment model”, which Is a complex regression analysis of social and economic variables that “predict” the installation of PV.  Of course, this is not a causal model, and it also is backward looking.  I’d bet it’s a poor prediction, because these things are all changing.

The correlations confirm widely held intuitions about PV.  Wealthy areas have more PV installed, places with a lot of available sun have more PV, and so on.  The correlations with income, housing, and so on surely represent complicated political economics, and the availability of sunlight represents the underlying physics.

There is one slightly surprising finding, that the density of PV installations increases with average income up to $150,000  and then levels off.   Evidently, (a) rich people are not influenced by the modest economic carrots of PV, (b) rich people couldn’t care less where their power comes from,  and/or (c) Republicans believe that PV is an anti-American plot.

The research paper makes the curious claim about “an ‘‘activation’’ threshold triggering the increase of solar deployment” when solar radiation is above 4.5–5 kWh/m2/day ([1], p. 2609). This is obviously some kind of economic break even point, and that surely will change as technology, costs, and public policy evolves. One thing it is not, is a causal “trigger”.  (Sigh.)

The data is available, organized by census tracts.

At a high level, there are few surprises, California has a lot of PV, as does the reset of the Sunbelt.  At the level of counties, dense urban areas have lots of PV, along with outliers like my home county, a University town with average years of education at 14.3 and 2.16 PV per 1000 people .

At the finer grain, there are relatively large differences in installed PV, which reflect local variables.  Near my house there is a range from < 1 to more than 7 PV per 1000 people.  This reflects the wealth of local neighborhoods, peculiarities of a university campus, and, at the high end, I’m pretty sure that is the presence of industrial PV arrays on the edge of town.

The point being that, however clever this survey, it gives me little I didn’t know from on the ground experince.  Furthermore, the areas with similar lower levels of PV installed are not really similar.  Some are poorer neighborhoods.  Some are old neighborhoods (not necessarily poor).  Some are largely rental properties, with large numbers of transient student tenants.  And so on.   So increasing installations will require to meeting local needs, which are not revealed in this dataset.

Overall, it’s pretty cool to be able to reliably detect PV installations.  I’m not sure how deeply useful this information will be.

I’ll add one potential use: PV installations have limited lifetimes (and also may be knocked out by storms).  If this survey can be continued over many years, it may be able to give some estimate of how the PV base is aging, and where it will need to be refreshed or refurbished.  That’s actually an important problem, especially for privately owned arrays on homes.


  1. Jiafan Yu, Zhecheng Wang, Arun Majumdar, and Ram Rajagopal, DeepSolar: A Machine Learning Framework to Efficiently Construct a Solar Deployment Database in the United States. Joule, 2 (12):2605-2617, 2018/12/19/ 2018. http://www.sciencedirect.com/science/article/pii/S2542435118305701
  2. Vicky Stein, How artificial intelligence spotted every solar panel in the U.S., in PBS News Hour – Science. 2018. https://www.pbs.org/newshour/science/how-artificial-intelligence-spotted-every-solar-panel-in-the-u-s