Data, data everywhere, but not enough to train (a model)

Written by Dan Travers, co-founder of Open Climate Fix

As we move to net zero and electricity replaces oil and natural gas as energy vectors, three times as much energy will pass through the electricity grid. Meanwhile grid complexity is increasing exponentially with weather dependent generation, batteries and EVs becoming commonplace. This presents a huge challenge, as the supply and demand of electricity must be balanced at every point on the grid, at every point in time.

Surely AI could smooth the integration of variable renewables into the grid? Well, while there is huge potential for the advanced ML models to play a role in decarbonising the power grid, the way forward presents challenges in terms of accessing enough data to train and test said models.

In a sector built by physicists and engineers, and one of the earliest to digitise, data is everywhere but models are not always trainable. To riff off Coleridge’s, “The Rime of the Ancient Mariner”, there’s data, data everywhere, but not enough to train (a model).

It could be helpful to look at the digitalisation of the electricity industry as happening in two waves. The first wave took place in the second half of the 20th century as countries invested in the infrastructure for national electrical power grids. The second wave has unfolded over the last decade, with the emergence of energy businesses meeting new demand for solar panel installations, EV charge points, smart homes, etc.

These two waves have led to two general reasons why training data for AI is difficult to come by in the electricity industry. Grid data from the 20th century digitisation wave is centralised, but collected using older technologies and stored in less modern data structures using outdated programming languages, making it difficult to access and of lower quality. While the majority of the second wave data is proprietary, fragmented and not shared.

But this doesn’t mean that AI isn’t playing an important role in decarbonising the power grid or that there aren’t easily available datasets that can be used to train models to that end. Renewable energy forecasting is critical for decarbonizing the grid and depends significantly on widely accessible datasets.

Open Climate Fix’s Quartz Solar, developed with the UK’s National Grid Electricity System Operator, uses AI to forecast clouds. Quartz Solar is used by NG-ESO to help them understand how much solar energy will be available for the UK electric grid on a daily basis. It is built on AI trained on publicly available data sets such as weather forecast data from major national weather services, satellite imagery from geostationary European satellites, and live PV data from partners (still the hardest data to procure).

Having fast and accurate information is crucial to NG-ESO’s operations. Relying on weather forecasts alone is leaving information “on the table” – particularly the satellite and live PV data provide additional real-time information which AI can assimilate in seconds after it has been trained.

Why is AI for renewable forecasting so useful and transferrable? Firstly, the weather and satellite datasets we are utilising have over 10 years of relatively consistent data spanning much of the globe. Secondly, while climates do vary, these models are highly transferable compared to models relying on specific grid topologies and local energy consumption behaviours.

What are the challenges in developing the next generation AI-weather forecast?

Firstly, the data volumes are very large. At Open Climate Fix we are consuming terabytes of data every day. And it could be more – we could start to consume sky imaging cameras, for instance. Then once we have this data, the AI model would like more again. The limiting factor becomes partly just the speed at which we can read from multi-dimensional data structures. Most of the AI advances of late have been processing text, which is one-dimensional by nature. Atmospheric data is three dimensional and with multiple variables at each point. One of the areas of our research is to speed up the reading of these multi-dimensional arrays in Zarr, the python package.

Secondly, there are continually new research papers emerging from this nascent field. We have implemented Google and others’ models to search for the best technique, a time consuming process.

While perhaps no AI will be able to manage the power grid in the near future, OCF is banking on innovative ways for AI to decarbonize it.