Deploying Europe’s largest VPP

Olivier Deckers
December 16, 2024

One of the critical moments in the software development life cycle is when a change is rolled out in the production environment. It is notoriously hard to get this right and mistakes in this phase can lead to large outages. One approach to limit this risk is to deploy as little as possible. At Beebop we prefer to maintain a high velocity and we have found that deployments including lots of changes are more likely to fail than smaller ones. This is why we deploy often and early.

At Beebop our mission is to build a scalable virtual power plant that connects distributed idle flexible energy from batteries, PV, EVs or heat pumps on one side to traders active on the energy markets on the other side. Using our technology, it is possible for end-customers to save up to 50% on energy bills.

Our platform needs to ingest and aggregate large volumes of sensor data and continuously update a model towards the trader to ensure they have an accurate view on the available flexibility in the pool of assets. When the trader decides to activate a certain amount of flexible energy, our platform tracks that schedule accurately using closed loop control to correct for any deviations, as they could lead to imbalance costs.

This poses strict non-functional requirements on the platform in terms of:

  • Scalability: large volumes of flexible energy are required to do meaningful trades on the energy markets.
  • Correctness: any errors in flexibility representations or dispatching directly lead to higher imbalance costs.
  • Availability: not delivering energy that was traded means the difference is settled at the imbalance price.
  • Fault tolerance: increasing pool sizes lead to higher probabilities of something going wrong with part of the pool. The system should be resilient to this and continue working with the remaining assets.

We have an extensive suite of tests that are run before a change can be merged. Investing in these tests is important as it allows the author of the change to detect problems before they can affect anyone else. The author is best placed to resolve the problem and the rest of the team doesn't need to waste cycles on it.

When a change passes automated tests and code review, it is merged and automatically deployed to a test environment. In this environment we have a full end to end test running continuously. Virtual assets are being simulated on one side, and a virtual trader is continuously optimising schedules based on day ahead price information. The platform is dispatching these schedules and monitoring performance. Alerts will notify us of any regressions before we bring these changes to production.

Changes that pass these end to end tests are deployed to production automatically after a manual approval. We aim to minimise the difference between the version deployed in production and the main branch. This way we avoid a number of problems coming with big-bang deployments.

  • It is hard to keep track of all the changes in a big deployment.
  • When multiple interdependent services are involved, the order in which the deployment happens needs to be carefully planned.
  • If a regression occurs, it is much harder to pinpoint the exact problem in the big pile of changes that were made.
  • It is harder to debug and hotfix problems in production when that version of the code is old.

Not every problem can be caught in an end to end test. Some features should be tested over an extended time period, or should be tested with real assets. Feature flags allow us to deploy new features to production without enabling them. They can be enabled in a specific environment to be evaluated over a longer time period, or they can be enabled for a smaller pool to test them with real assets while still limiting the impact of unexpected behaviour. They can also help different people to test a feature they are working on without disrupting each other.

Database migrations and code changes are made in a backward compatible way using the expand/contract pattern. This allows us to deploy services independently and roll them back easily in case of problems. Even though evolving the system using expand/contract takes additional effort and discipline, we have found that it pays for itself by avoiding unexpected problems down the line that have to be fixed in a high-stress situation.

As a bonus, it enables us to do rolling deployments: when an instance of a service is replaced by a newer version, we verify it is healthy before continuing to upgrade the other instances. Potential problems with the new version are contained to a single partition of assets. The rest of the system can interact with both the old and the new version thanks to the expand/contract principle. If our monitoring detects an issue with the new version, it is trivial to roll back to the latest stable version.

Building and operating a VPP at scale is not an easy task. We have found that by deploying often and early we avoid many problems and the resulting unexpected downtime. It is of critical importance to have high quality test suites to catch problems early, when they are still cheap to resolve. In addition to this a strategy for rolling out changes gradually helps to limit the impact of potential problems. Monitoring and alerting give a view on the health of the system without depending on manual checks.

With these practices in place, we’re well-equipped to deliver a highly reliable and stable VPP platform.

Step into the power system of the future.