Carbon Clouds

 

Introduction

Before COVID-19 turned the world upside down, the theme of 2020 was meant to be the Climate Emergency. If there is anything good to come from COVID it is perhaps that CO2 emissions for 2020 will be lower than expected, although still far too high, and COVID has been a distraction from what is almost certainly a bigger issue to address in historical terms. As we approach the end of the year, with some good news of vaccines finally on the horizon, I’m seeing CO-two rather than CO-vid appearing in my tech news feed more often, with the E&T Magazine devoting most of the November issue to "Growing Back Green", and the Azure Podcast this week focus on sustainability.

 

Cloud … Part of the problem

Although many sectors have seen their carbon footprint fall during the pandemic, IT is a definite outlier. As the output from aviation and other forms of transport cratered with everyone spending more time at home, the use of both cloud based work systems (Teams, Zoom, etc), and the cloud required to keep everyone entertained (Netflix, gaming platforms) went the other way. This means our use of compute continues unabated and there’s a surprising amount of carbon produced by IT. The UN Environment Programme estimates ICT as a whole amounts for between 2%-6% of global Green House Emissions – a proportion bigger than aviation (although emitting directly into the upper atmosphere is a whole issue in itself). If ICT is this big a producer of Carbon it’s not something we can ignore - particularly as big data, machine learning, and the increase of digital citizens in developing countries continues to drive demand. The same UN source shows of that 45% and 24% of emissions come from Data Centers and Networks respectively – meaning that the internet and cloud compute are a huge part of the problem.


Cloud … Part of the solution

Fortunately the cloud can also help to solve this problem in a number of ways

  • More efficient than on-prem: The first way the cloud can help is that Cloud Providers are more energy efficient than running applications on-premise, or even co-located data centers in terms of power consumption. This is for a variety of reasons:
    • economies-of-scale in mega data centers allows more efficiency in non-computer costs such as cooling
    • virtualisation makes better use of servers, which coupled with extensive automation tools allow customers to scale back when demand is lower or turn things off entirely when not needed (e.g. test environments overnight).
    • virtualistaion/automation at scale efficiently distributes virtual machines, and controls things like power and cooling better than smaller providers (e.g. GCP uses machine learning to predict and optimize cooling)
  • OpEx Pricing: Perhaps the biggest changes businesses need to tackle when moving to the cloud is the move away from Capital Expense (CapEx) to Operational Expense (OpEx) – namely paying by the minute/second for a server rather than paying for it up front then running it for 4 or 5 years. Similarly data storage and data transfer are charged by the megabyte rather than as a fixed price. This mindset shift is a good thing in terms of efficiency as it creates a market which encourages businesses to reduce their consumption. If you run more efficient code you pay less; if you turn off servers when not needed you pay less; if you are more active in deleting old data you... well you get the idea. It also encourages shifting workloads to quieter times of the day (e.g. use of spot instances) to help iron out the peaks in demand. This works well for the planet because generally cost = carbon, so reducing cost means reducing carbon, although unfortunately this isn’t always the case. There is no incentive built into the system to move workloads to areas with lots of hydro vs areas with lots of coal fired power stations for instance, and there's no guarantee that the energy mix is greener during quieter hours (perhaps at night when the sun doesn't shine) – so reducing power is good, but still needs green clouds (see below).
  • Serverless: Serverless is a concept born in the cloud and is the next iteration of the above trends (improved efficiency, and OpEx pricing). For this discussion it’s also spectacularly badly named. Of course there are servers, big power-hungry servers made from minerals mined from the ground, which take electricity to build and had to be shipped across the world to be installed in our cloud region. That said, Serverless is still beneficial in two major ways:
    1. Serverless isn’t just about only paying when something is being used, it’s about only running it when it’s needed (with some real-world caveats about how long it takes to scale down). Essentially serverless products like functions as a service and database engines can scale down to zero when not in use, meaning that the power needed to run them drops too.
    2. Serverless functions are (almost always) smaller than Virtual Machines and relatively short lived, which means all that magical cloud provier automation can more efficiently move these around the data center, filling gaps to most efficiently use underlying hardware.
  • Clouds going green: Of course being more efficient isn’t the same as being carbon neutral; fortunately all the major public cloud providers have targets to be net zero (although one might argue that some are taking it more seriously than others). As with most things in the world of cloud everyone is going in the same direction, but that’s not to say they’re equal (as noted in this now year old article in wired). At present a lot of this is through Renewable Energy Credits (RECs) meaning the cloud is not 100% clean all the time. That said it’s a step in the right direction and the fact that cloud providers are committing to buy so much from renewable sources helps the business cases to finance those renewable energy providers in the first place (as they can access credit because of their guaranteed revenue streams). This is an area of very positive competition:
    • Azure: Microsoft have been net renewable since 2014 (including RECs) and have sustainability targets to use 100% renewable energy by 2025, in addition to targets on waste, deforestation, and water usage. Microsoft also made headlines earlier this year when it announced it would become net negative by 2030 – attempting to erase their entire footprint since foundation in 1975.
    • Google Cloud: Google advertise themselves as being the “cleanest cloud in the industry” and hit their target of purchasing 100% renewable energy in 2017 – although this is an annual target not a minute by minute commitment to use only renewable energy. They have subsequently announced plans to go greener committing to use only renewable energy by 2030. This includes new initiatives such as building co-located power plants and installing batteries in GCP data centers.
    • AWS: In so many areas of cloud AWS are the leaders with the other playing catch-up, but that’s not true here. Much of AWS’s focus on sustainability talks of the efficiencies of moving to the cloud (see above) rather than them being carbon neutral. Amazon has (after public pressure from employees and shareholders) announced a target to be Carbon Neutral, but not until 2040. Although this is not entirely comparable with GCP/Azure as it’s a target for Amazon as a whole not just AWS it's definitely not as good. As far as I can see, there isn’t a specific AWS target to switch entirely to renewable, but rather some fairly vague talk of a “long-term commitment to use 100% renewable energy”. In addition Greenpeace have criticized Amazon in the past for breaking their own commitments.

 

What more can we do?

Although all this is good news from the cloud providers, what can we Engineers and Architects do to help (other than choosing a nice green cloud provider)?

Inefficient Software Engineering

One of the downsides of Moore’s law is that over the decades software has become significantly less efficient. To run Word on Windows I now need what would have been a supercomputer only a few decades ago (when remember, my computer would run a version of Word on Windows!). 

Although it's easy to criticize, in fact this inefficiency is in part understandable and even justifiable. Reusable components in software, frameworks, and abstraction all significantly reduce development effort even if they usually come with a cost. At a time when a day of development costs more than a month running a Virtual Machine, why spend hours reinventing the wheel or optimising code when we can just scale out or use a slightly bigger SKU? Similarly with compute and network so cheap and fast there are real benefits to breaking up software and distributing it over the network even if it adds an overhead and more network traffic; because that software is far more fault tolerant.

Sadly this is death by a thousand short cuts, but a few small changes made often enough can have equally big results.

Efficient Software Engineering

Microsoft have launched some thoughts on Efficient Software Engineer Principles both as a short learning path and a manifesto of sorts at https://principles.green/. The latter even include some concrete ideas (although this is an area which needs expanding, and as it's on Github I feel a PR may be in my future).

Essentially off the top of my head here are a few ways we could improve energy efficiency – many of these are good practice for other reasons (cost, reliability, simplicity, latency etc). Unfortunately, it does take some thought as almost all of these are - as is so often the case - a trade-off between reducing energy usage and something else:

  • Efficient code: Writing better code, in more efficient languages will make things faster and smaller – particularly when moving to microservices (e.g. packaging a whole JVM and App Server with a Java Microservice to run 10 lines of code is obviously less efficient than writing that 10 line service in Go or C). If that service is being used a lot it could need many times the compute to run it. That said frameworks improve speed and reduce risk of introducing security vulnerabilities or other bugs, and prepackaged code may already be optimized.
  • Efficient distribution: Should we use a whole virtual machine; containers built with big images; containers built with small minimal images; or serverless functions? It depends a lot on the scenario, but if I’m running a small microservice which processes a few messages from a queue once a day, I know which I’d choose.
  • Efficient deployments: Co-locating code avoids network transits. Pushing assets to a CDN reduces the use of both network and compute. Caching both at the edge and at the client side can improve both user experience, reduce electricity usage. These are all good practice even before we think of electricity usage, but other things may not be so obvious or clear cut such as can a compute heavy process (which doesn't require low latency) be done in a low carbon area? If so does a lot of data need to be shipped there just to run the compete (if so it might not be worth it).
  • Efficient data: Tuning retention of data can save on electricity (and ensure GDPR compliance!) but an even bigger saving can be made by not capturing data in the first place. This is true of business data, but even more true of technical data. We now drown in wonderful data which tools like AppDynamics and Splunk provide, but every metric and each log line piped into a monitoring tool means data being sent over the network, processed in the tool, and stored (even if only for a few days). Generally there are a lot of reasons why collecting more data is better but carbon footprint certainly isn’t one!
  • Efficient sizing & scaling: There is an obvious cost benefit to rightsizing compute for workload which means it should be on everyone's radar. This comes into play both when deploying (not selecting too large an instance) and when running (using autoscaling to handle variable load and turn off instances when not needed). That said this can be forgotten or large instances selected to start with and then never reassessed once the load is understood. It's perhaps the easiest area to make savings. 


Overall it seems like there is good news on the horizon for the efficiency of the cloud and the move to renewables but the shift cannot come soon enough! As the demand for storage, compute and networks continues to grow it's a good idea for all of us to be thinking about how we can make efficiencies in our cloud usage, as we think about our carbon footprint in every other part of our lives.

Postscript

Just after I posted this Mike Dodds pointed out there is an Azure Sustainability Calculator for working out the carbon footprint of your Azure Services which is well worth a look.

Comments