Prev: latency-and-throughput Next: caching

Availability

  • Customers wouldn’t be happy if a website was down — we would lose money.

  • Some products are life or death, like airplane systems.

  • Cloud providers like AWS and GCP also have high availability (99.95%)

  • We measure availability by how much uptime they have.

  • 99% - Two Nines - 3.65 days of downtime

  • 99.9% - Three Nines - 8 hours of downtime

  • 99.99% - Four Nines - 1 hour of downtime

  • 99.999% - Five Nines - 5 minutes of downtime (High Availability)

  • Service Level Agreement (SLA) are a guarantee of how much uptime is required.

  • Service Level Objective (SLO) are similar.

  • How do we increase availability?

  • eliminate single points of failure.

  • Make sure there’s redundancy.

    • If you have one server, have multiple.
    • If you have multiple servers, get more load balancers.
  • Active redundancy is like leader election, when a node crash reconfigures itself.

Prev: latency-and-throughput Next: caching