High Availability (HA):#

This refers to a system’s ability to remain accessible and operational for a desired period without significant downtime. High availability is achieved through redundancy, fault tolerance, and failover systems that ensure continuous operation even in the event of component failures.

Disaster Recovery (DR):#

This involves the processes, policies, and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster. DR focuses on minimizing downtime and data loss.

Uptime:#

This term measures the time a system or service is operational and available to users. Uptime is often presented as a percentage, such as 99.9% (“three nines”) availability.

Service Level Agreement (SLA):#

An SLA is a contract between a service provider and the end user that clearly states the level of service expected, including availability metrics. SLAs often specify downtime tolerances, response times for support requests, and penalties for failing to meet agreed-upon levels.

Scalability:#

The ability of a cloud service to handle increasing (or decreasing) workloads by adding (or removing) resources either vertically or horizontally. It ensures that the application can support growth without compromising performance.

Horizontal Scaling (Scaling Out/In):#

This involves adding more instances of the same hardware or software to a system, or removing them, to adjust capacity. For example, adding more virtual machines to handle increased load.

Vertical Scaling (Scaling Up/Down):#

This means upgrading the existing hardware or software capacity of an instance to handle more load, or downgrading it. For example, increasing the CPU, RAM, or storage of a server.

Elasticity:#

A system’s ability to automatically scale resources up or down as needed, often in real-time, based on workload demands. This is crucial for optimizing resource usage and maintaining performance without human intervention.

Load Balancing:#

The process of distributing incoming network traffic across multiple servers to ensure no single server becomes overwhelmed. This is key to maintaining high availability and scalability in cloud environments.

Auto-Scaling:#

A feature offered by many cloud providers that automatically adjusts the number of computing resources based on the current demand. This is a subset of elasticity and is critical for applications with variable workloads.

Availability Zones:#

These are distinct locations within a cloud provider’s region that are engineered to be isolated from failures in other AZs. They offer physical redundancy and failover capabilities within the same region, enhancing high availability by allowing resources to be distributed across multiple, geographically separated data centers.

Regions:#

A region is a specific geographical location where cloud services are hosted. Regions consist of one or more availability zones. Selecting the appropriate region(s) for deploying applications can reduce latency, comply with data residency requirements, and increase resilience. Failover Strategies

Failover:#

The process of automatically switching to a redundant or standby system, component, or network upon the failure or abnormal termination of the currently active operation. Failover mechanisms are crucial for maintaining continuous operations and minimizing downtime. Backup and Recovery

Point-in-Time Recovery:#

This is a feature that allows the restoration of data or a system to a specific point in time, helping to recover from data corruption or loss. It’s essential for disaster recovery plans.

Snapshot:#

Snapshots are a state of a system at a particular point in time. They can be used for backing up data and state of a virtual machine or database, allowing for easier restoration.

Affinity#

Affinity, in the context of cloud computing, refers to a strategy or policy that encourages or requires certain components, applications, or workloads to be located on the same physical hardware, virtual machine, or within the same data center. The goal of affinity rules is to improve performance, reduce latency, or maintain data locality by keeping related tasks and their dependencies close to each other.