Azure SLAs: What customers need to know

Service Level Agreements (SLAs) can give customers some peace of mind regarding uptime guarantees. But there are lots of potential gotchas of which they should be aware.

Mary Jo Foley
March 20, 2024

Microsoft's Azure status chart tracking service availability

Microsoft offers Service Level Agreements (SLAs) on many of its online services, including Azure, Microsoft 365/Office 365, Dynamics 365, and Intune. There are separate SLAs published by Microsoft for more than 130 Azure services, but most use similar percentage thresholds, calculation formulas, and service credit tiers. SLAs are Microsoft promises of uptime and other qualities for these services. If specified levels are not met, customers can submit claims to the company and potentially receive credits (but not monetary compensation) toward future service usage.

Sounds relatively straightforward? Not really. Here’s what customers need to know about Azure SLAs, in particular.

There are many limitations on the type of service interruptions that can be claimed against Azure SLAs. There also are quite a few hoops that customers should be aware they need to jump through in order to make a claim for an unmet SLA.

"Customers often assume SLAs guarantee uptime for some ‘number of nines’ and when that uptime is interrupted, you aren’t charged, analogous to when your electric goes out, the meter stops running, and you aren’t billed during the outage. The reality of Azure SLAs is quite different: it’s up to the customer to determine whether SLAs are met, submit claims if not, and then at best, receive some future free usage of the same service,” explained Directions on Microsoft analyst Rob Sanfilippo. “Furthermore, some Azure SLAs only provide maximum service credits of 25% of the amount charged, even if the service never worked. “

Not every ‘outage’ qualifies

Not too surprisingly, Microsoft does not proactively notify customers when their SLAs have not been met. It's up to customers to figure this out for themselves.

Not every "outage" qualifies as an unmet SLA. For example, Microsoft-planned maintenance time for services and downtime caused by non-Microsoft factors, such as network providers and equipment do not qualify. If an issue is caused by a customer's own systems, processes, configurations, or methods of implementation or deployment, it also does not qualify.

Also outside the scope of what qualifies as an unmet SLA: Downtime caused by security breaches that could have been mitigated by the customer; natural disasters and government actions; downtimes in a single Microsoft datacenter where geo-resiliency could have headed off the problem.

Microsoft does not provide any services, tools, or automated processes that specifically track whether SLAs are met or to generate claims. Some customers deem it worth the time to try to keep tabs on historical Azure health data and health events and generate their own alerts.

However, organizations can end up facing costs for processes they use to track SLA adherence. For example, using a heartbeat monitor on a service could increase charges for the service depending on the heartbeat frequency.

There's also the time factor. Customers have two months after an incident occurs to determine that an SLA was not met and submit a claim — and Microsoft has 45 days after that to process the claim. Even if customers do end up qualifying for service credits for unmet SLAs, service fees still accrue during downtimes, and Microsoft does not refund them.

It’s good that Azure SLAs exist. They vary considerably across Azure services (see the SLA for Cosmos DB, which is a major selling point for the service), and mostly don’t exist for preview services. If you’re trying to decide which of several overlapping Microsoft services to use in a system, SLAs could be a factor, as they indicate in which services Microsoft itself is most confident.

But customers should be aware SLAs don’t mean every Azure outage will mean money — or service credits — in their pockets.