Q&A: Open Climate Fix’s Jack Kelly on the ‘beautifully simple’ transformer model and nowcasting

Image: Open Climate Fix.

Earlier in April, non-profit Open Climate Fix announced that it had received a two year grant from Google.org of £500,000. The company was set up to examine the possibility of using the transformer model to more accurately forecast cloud formations, and ultimately levels of solar irradiation.

It builds on previous work using transformer models for predicting the shape of proteins and formed part of breakthroughs such as Google DeepMind’s AlphaFold-2.

Solar Power Portal caught up with Open Climate Fix co-founder Dr Jack Kelly – a former engineer at Google DeepMind – to discuss where transformer models came from, how they can apply to solar forecasting and how they could change the energy sector.

 

Could you tell me a little bit more about Open Climate Fix and where the idea for the nonprofit came from?

One of the aims of Open Climate Fix is to hopefully demonstrate what's possible using modern technology in terms of reducing carbon emissions and reducing costs for end users.

I've worked in the energy industry for a while, I did some consulting for National Grid Electricity System Operator and they do an amazing job of keeping the lights on and building physical infrastructure. But hopefully we can demonstrate what's possible, more cheaply, and more quickly than some of these existing legacy energy companies can do.

And because we're a nonprofit, we're not here to make lots of money. So our aim is hopefully once we've demonstrated what's possible, we'll then do our best to help as many other people implement that technology as possible. So in the case of solar forecasting, Solar Nowcasting, that will mean helping other forecasting companies implement the technology, helping the ESO implement on their own systems if they want to do that.

What led you to set up Open Climate Fix?

I worked at Google DeepMind, working on wind power forecasting. And I really loved working there, great working culture, really amazing team, really thankful to get the chance to work there. And I'm still friends with a bunch of my old teammates.

What led me to leave is this general idea that one of the blockers to reducing emissions quickly using technology is that we need lots of companies to work together. At the moment, a real challenge with that is IP contracts. Whenever any two companies want to work together, they will inevitably pull in the lawyers, argue about who owns all the intellectual property. That feels like a problem that we can maybe help solve by having a nonprofit that from the get go is all about not owning any IP, and releasing everything as openly as possible.

I don't think it's worked out exactly how I expected it, we still have to sign IP contracts with collaborators. But it's just very clear to them that we're not interested in owning the IP, we're interested in trying to reduce emissions as quickly as possible. 

Could you tell me a little about the transformer model and how it works?

Absolutely. I love getting into some of the geeky details. The transformer model was first discovered by a team at Google Research back in 2017 in a paper called Attention is All You Need. At its heart, it's a really beautifully simple idea, which is that it turns out to be really useful to allow your model to literally attend to different bits of the inputs, and itself decide which bits of the input to attend to, dynamically based on the inputs.

So there's this idea of kind of self-attention. For example, this was first developed for processing English sentences, where it turns out to be really useful – if given a sentence, you can find relationships between specific words. So in a sentence that contains ‘it’ and you want to figure out what ‘it’ refers to. These models are amazingly powerful when fed lots of sentences at learning that structure and learning the relationships between different words.

A particularly famous example of this working well is in Open AI's GPT-3 model, which has been trained on an enormous amount of text. The end result is that you can prime it with a sentence and it will spit out a whole bunch of text that naturally follows from that sentence. And it does a really good job at capturing both the sort of short-term structure in English like basic syntax, but also longer-term structure. That's all learnt from the data, based on this idea of learning how different bits of input relate to each other, so in languages, learning how words relate to each other. 

And how will that be applied to solar forecasting?

I should emphasise that this is research at the moment, and we haven't conclusively proved that this will work. But we have good reason to believe that given a satellite image, it's probably really important to attend to specific bits of that satellite image more than others. So the basic idea is you want to attend to the cloud that's between the sun and the solar PV system.

We believe that this self-attention model, which is at the heart of the transformer model, will be really useful for figuring out which bits of that image you should attend to, or which bits the model should attend to given a satellite image. And then hopefully we can go further than that.

What I've described so far is just taking a single satellite image at a specific time and trying to figure out what the solar PV power output should be at the centre of that image, and to figure out which bits of the image to focus on. But we're also hoping this will be useful for looking at a sequence of satellite images. So if we give it the last hour of satellite images, then hopefully it can learn to attend to the interesting features in that last hour.

And this has been demonstrated in a paper last year called MetNet, which also came out of Google Research. They were looking at precipitation nowcasting and we're pretty optimistic that it will be possible to take a very similar approach to solar PV nowcasting. So we definitely can't claim that this is a really novel approach, we're taking the best bits from modern machine learning and applying it to solar PV nowcasting.

Could you tell me a little about how it has been used for protein folding?

These transformer models are called set-to-set models whereby they will take one set of things, such as a set of words for language processing, and map that to another set of things. For language models, the input and the output are both sentences, and the sets are sets of words. Because these transformer models don't actually understand inherently the order of the inputs, you have to explicitly tell them: this word comes after this word, this comes after this word. But that turns out to be really useful for protein folding, where your input set is the sequence in the DNA, the sequence of genetic instructions, and the output is the 3d structure of that protein.

Just as in language modelling and just as in solar forecasting, that's super important to learn the relationships between different bits of that input. And it is particularly important in protein folding, because you get this one-dimensional input, so just a sequence of DNA bases and what you ultimately want to do is figure out this really complex 3D structure of the folded protein. So you want to learn that this bit of DNA and this bit of DNA talk about, because in the folded structure they are going to be right next to each other. 

How will you use the solar nowcasting you’re developing in the UK?

We're mostly focused on helping the transmission system operators and National Grid ESO. It's really hard to quantify precisely what the advantages will be, but we believe that there's two main ways it will help the ESO.

One is in helping them decide how much spinning reserves they need. So how many of these typically natural-gas-fired generators they'll need if their operating ramps down – so they have headroom to ramp up. So we're planning to work closely with National Grid ESO as design partners on this to help figure out exactly what they need.

And the other use case is much more close to real-time where the people in the control room are sometimes just minutes ahead, trying to figure out what instructions to send to dispatchable generators. And they definitely need better forecasts to help them figure out what the rest of the system is going to be doing.