Worker deployment patterns for Active-Passive and Active-Active HA/DR

View Markdown

When an outage strikes, a Namespace with High Availability fails over to another region automatically, but it does not move the rest of the architecture. Workers, Workflow starters, Codec Servers, databases, and the external systems that Workflows depend on each need their own failover story.

A critical piece of the recovery time achieved in a real-world outage is the Worker deployment pattern: where Worker fleets run and which region (or regions) processes Workflows at any given moment. This page describes common Active/Passive (also written Active-Passive) and Active/Active (also written Active-Active) patterns for deploying Workers and the rest of the architecture to achieve an overall high availability and disaster recovery (HA/DR) strategy across regions.

What needs a failover story

Beyond the Namespace itself, these components live in the application environment and must be planned for:

Workers (the focus of this page) — execute Workflows and Activities.
Workflow starters and Clients — start and signal Workflows.
Codec Servers — encode and decode payloads for Workers, the Web UI, and the CLI.
Proxies between Workers and Temporal Cloud — any forward proxy or mTLS terminator in the connection path between Workers / Starters / Clients → Namespace.
Databases and queues — the systems that Activities read and write.

Some systems must be active wherever Workers are running (for example, Codec Servers), while others might follow a different failover sequence (for example, databases). Because the right choice for each of these usually depends on where Workers run, this page focuses on Worker deployment patterns.

tip

See High Availability for Temporal Cloud Namespaces to learn more about Namespace replicas, replication, and failover.

Worker deployment patterns

These Worker deployment patterns are how you extend Namespace High Availability into a full disaster recovery (DR) and business continuity plan across regions. This page covers three main patterns — Active/Passive (Cold), Active/Passive (Hot), and Active/Active — plus a rarely needed Regionally Sharded Namespaces variant. They trade off recovery time after an outage, cost during normal operation, and operational complexity, and differ in where the Workers run and where Workflows process:

Active/Passive (also written Active-Passive) — Workflows process in one region at a time, the "active" region. The other region is "passive" and ready for failover. This pattern has two variants:
- Active/Passive (Cold) — a.k.a. Active/Cold — Workers run in only one region at a time. After a failover, Workers start in the secondary region. The region where Workers run == the region where Workflows process. To fail over, Workers need a "cold start" in the other region.
- Active/Passive (Hot) — a.k.a. Active/Hot — Workers run in both regions simultaneously, but Workflows still process in only one region at any given time. The other region's Workers are on "hot" standby.
Active/Active (also written Active-Active) — Workflows process in both regions at the same time. Necessarily, Workers run in both regions at all times.

info

Namespaces are always Active/Passive, but can support an Active/Active pattern.

A Temporal Cloud Namespace with High Availability has exactly one active region at a time. The other region holds a replica that passively receives replicated state.

However, since both regions can serve requests and Worker polls, Workers don't need to run in the same region as the active replica, and Temporal Cloud Namespaces can still fit into a broader "Active/Active" strategy, as described below.

These patterns work across two cloud regions, which could be in the same cloud provider or different cloud providers:

Primary region — the region where the Namespace is active during normal operation, also called the "preferred region."
Secondary region — the region the Namespace fails over to. It can be any Temporal Cloud region that supports replication from the primary region.

tip

Multi-region Replication and Multi-cloud Replication generally use the same set of Worker deployment patterns, so this page will not distinguish between multi-region and multi-cloud.

Compare Worker deployment patterns at a glance (benefits and tradeoffs)

Pattern	Best for	Major benefits	Major tradeoffs
Active/Passive (Cold)	Easy initial deployment	Acts like a single region; no special setup required	Failing over Workers is the user's responsibility

Pattern	Best for	Major benefits	Major tradeoffs
Active/Passive (Hot)	Low RTO with strict single-region behavior	Fast Worker failover; guaranteed to act like a single region	More configuration and higher cost for the Worker fleet

Pattern	Best for	Major benefits	Major tradeoffs
Active/Active	Low RTO with Workers active in multiple regions	Fast Worker failover; uses Worker fleet capacity (no standby Workers)	Cross-region requests add Workflow latency

Active/Passive (Cold)

Also known as "Active/Cold Standby", "Active/Cold", or simply "Active/Passive".

Active/Cold Pattern: Normal operation

Workers run in only one region. A single Worker fleet runs in the primary region and processes all Workflows. No Workers run in the secondary region.
The Namespace replicates to the secondary region. A Namespace with High Availability has an active replica in the primary region and a passive replica in the secondary region. Temporal Cloud continuously replicates Workflow state to the passive replica, so it stays ready to become active.
Your databases and queues replicate too, if needed. Workers read and write systems such as databases and queues. If your Workflows depend on that data, replicate it to the secondary region so it's available after a failover. Workflows that don't touch external state may not need this.
Setup is minimal. Turn on Replication for your Namespace (see High Availability for Temporal Cloud Namespaces) and enable replication on any databases or queues your Workflows use. At that point you're technically already running Active/Passive (Cold): the secondary region holds a ready replica, and failing over is a matter of bringing your Workers up there.

Active/Cold Pattern: On failover

The Namespace fails over automatically. Temporal Cloud promotes the secondary region's replica to active. No action is needed to fail over the Namespace itself.
You bring the Workers up in the secondary region. Because no Workers were running there, they start from nothing — a "cold" start. Starting and scaling that fleet is your responsibility, ideally through tested automation. Until the Workers are running, no Workflows make progress.
Promote your databases and queues, if needed. If your Workflows depend on external data, make the secondary region's copy active so the new Workers can read and write it.
Recovery time is dominated by Worker startup. After Temporal detects the outage and triggers failover, the Namespace is active almost immediately, but throughput returns to normal only after container or VM startup, image pulls, and application warm-up complete.

Active/Cold Pattern: Benefits

Easy to reason about.
- Only one region is active at a time, so traffic routing and interactions with systems (such as databases and queues) are simpler to understand, and the pattern pairs naturally with other active / passive systems. Active/Active, by contrast, requires deciding how Workers reach an active database: either a local active database in each region, or a single active / passive database that some Workers must reach cross-region.
Simple to operate.
- During normal operation it resembles a single-region deployment.
Lowest overall architecture cost.
- The size of the Worker fleet is simply the capacity needed to operate in one region. There are no standby Workers during steady state.

Active/Cold Pattern: Tradeoffs

Highest overall recovery time of the three patterns, due to cold starting the Worker fleet after failover.
Depends on tested automation to bring up the secondary-region fleet quickly.

Active/Cold Pattern: Recommendations and important constraints

Failing over the Workers is the operator's responsibility. The Namespace fails over automatically, but bringing up the Workers in the secondary region is up to you. Plan for these sub-considerations:
- How do you detect an outage and decide to fail over? Define the failover conditions and the signals (alerts, health checks) that trigger them.
- How do you scale up the Workers? Bring up the secondary-region fleet, ideally with tested automation, and scale down the primary region's fleet so Workers run in only one region at a time.
- Do you need to enforce single-region processing? The Cold pattern relies on the operator to keep Workers in one region. To have Temporal enforce single-region processing instead, use the Active/Passive (Hot) pattern.

Use the Namespace Endpoint.
- Connect Workers through the Namespace Endpoint, which always connects to the Namespace in its active region and automatically fails over to the new region.
- Rationale: If a Temporal Cloud incident requires the Namespace to fail over while the rest of the primary region is healthy, the Workers in the primary region can still connect through the Namespace Endpoint and process Workflows. If the Workers use the Regional Endpoint for the primary region, they will not reliably connect to the Namespace during a Temporal Cloud incident in the primary region.

Set up cross-region private connectivity.
- If you use private connectivity, give the primary region's Workers a network route to the VPC Endpoint in the other region, so they can reach the active replica after a Namespace-only failover. If you can't provide that cross-region route, use the Active/Passive (Hot) pattern instead, where each region's Workers connect to their local replica.
- For the full setup of Regional Endpoints, VPC Endpoints, and cross-region routing, see Connectivity for High Availability.

Route Workers to the active region's Codec Server. Two common approaches:
- Put DNS or a load balancer in front of the Codec Server address, and update it on failover to point at the new region's instance.
- Pass each Worker the Codec Server address for its own region as configuration, so a Worker always uses the service local to it. This is common in Kubernetes or with service discovery.

Route Workers to the active region's proxy. Two common approaches:
- Put DNS or a load balancer in front of the proxy address, and update it on failover to point at the new region's instance.
- Pass each Worker the proxy address for its own region as configuration, so a Worker always uses the service local to it. This is common in Kubernetes or with service discovery.

Active/Cold Pattern: Component behavior

Workers — run only in the primary region; brought up in the secondary region during a failover.
Workflow starters and Clients — run with the Workers; brought up in the secondary region during a failover.
Codec Servers and proxies — run alongside the active Workers; scaled up in the secondary region as part of a failover.
Databases and queues — single-region-active; fail over to the secondary region alongside the Workers.

Active/Passive (Hot)

Also known as "Active/Hot Standby" or "Active/Hot".

Active/Hot Pattern: Normal operation

Workers run in both regions. A full Worker fleet runs in each region. The primary region's Workers are active and process all Workflows; the secondary region's Workers stay connected and warm, but on standby, doing no work.
Workflows process in only one region at a time. The Namespace has a single active replica, so even though Workers run in both regions, Workflows execute only in the active (primary) region.
Forwarding is disabled for Worker polls. Each fleet connects to its local replica through a Regional Endpoint or VPC Endpoint with forwarding off, so polls that reach the passive replica are not sent to the active region. The standby fleet does no work and adds no cross-region overhead.
The Namespace replicates to the secondary region. A Namespace with High Availability keeps an active replica in the primary region and a passive replica in the secondary region, continuously replicating Workflow state so the standby is ready to take over.

Active/Hot Pattern: On failover

The Namespace and Workers fail over together, automatically. When the primary region fails, Temporal Cloud promotes the secondary replica to active, and the secondary region's standby Workers — already connected and warm — begin processing immediately.
No cold start and no DNS wait. Because a full Worker fleet was already running in the secondary region, there's nothing to start or scale up before processing resumes. This pattern achieves the lowest recovery time of the three.
Promote your databases and queues, if needed. If your Workflows depend on external data, make the secondary region's copy active so the now-active Workers can read and write it.

Active/Hot Pattern: Benefits

Easy to reason about.
- Only one region is active at a time, so traffic routing and interactions with systems (such as databases and queues) are simpler to understand, and the pattern pairs naturally with other active / passive systems. Active/Active, by contrast, requires deciding how Workers reach an active database: either a local active database in each region, or a single active / passive database that some Workers must reach cross-region.
Lowest overall recovery time of the three patterns.
- The secondary-region Workers are already connected and warm, so failover involves no cold start.
Low latency during normal operation.
- Tasks are processed only in the active region, with no cross-region forwarding.

Active/Hot Pattern: Tradeoffs

Highest overall architecture cost: a full standby Worker fleet runs in the secondary region at all times, even during steady state.

Active/Hot Pattern: Recommendations and important constraints

Use Regional or VPC Endpoints and disable forwarding.
- Connect each Worker fleet through its region's Regional Endpoint (or VPC Endpoint) and disable forwarding for Worker polls. Using the Namespace Endpoint by mistake routes the standby Workers to the active region and defeats the pattern.

Active/Hot Pattern: Component behavior

Workers — run in both regions; only the active region processes Workflows.
Workflow starters and Clients — run in both regions alongside the Workers.
Codec Servers and proxies — run in both regions continuously, not just after a failover.
Databases and queues — typically single-region-active; fail over alongside the active Workers.

Active/Active

Active/Active Pattern: Normal operation

Run Workers in as many regions as you want. A Worker fleet can run in any region — the fleets don't have to match the Namespace's regions, and you don't even need Workers in the same region as the active Namespace. Spreading fleets across regions is like spreading machines across Availability Zones: if one region's fleet goes down, the others keep processing.
Every fleet connects through the Namespace Endpoint. By default, all fleets use the single Namespace Endpoint, which always routes to whichever region currently holds the active Namespace. Temporal Cloud transparently forwards any request that lands on a passive replica to the active region.
The Namespace has one active replica. A Temporal Cloud Namespace is not "active/active" in the database sense — one region holds the active replica and another holds a passive replica that receives replicated state. Workflows process wherever your Workers run, all against that one active replica.

Active/Active Pattern: On failover

The Namespace fails over automatically. Temporal Cloud promotes a passive replica in another region to active.
Every Worker fleet follows automatically. Because all fleets connect through the Namespace Endpoint, they immediately reach the new active region — no reconfiguration, no DNS changes to manage, nothing to bring up. It "just works."
Surviving fleets keep processing. Only the fleet in the failed region is affected; fleets in every other region keep running with no cold-start gap. Scale up surviving regions if needed to carry the full load.
Promote your databases and queues, if needed. If your Workflows depend on external data, make the active region's copy available to the Workers there.

Active/Active Pattern: Benefits

Hands-off Worker failover.
- With the Namespace Endpoint, Workers in every region follow the Namespace to the new active region automatically — there's no Worker failover step to run.
Low recovery time, no standby fleet.
- Surviving regions keep processing, so there's no cold start, and capacity is spread across regions instead of parked in a dedicated standby fleet.
Resilient to losing a region.
- Like spreading across Availability Zones, losing one region's fleet leaves the others running.

Active/Active Pattern: Tradeoffs

Workers outside the active region reach it across regions (directly or via forwarding), which adds latency that can matter for latency-sensitive Workflows.
External systems are harder: Workers are active in multiple regions at once, so any databases and queues they touch need a cross-region consistency story.

Active/Active Pattern: Recommendations and important constraints

Default to the Namespace Endpoint.
- All fleets, in any region, connect through the single Namespace Endpoint. It always routes to the active region and follows failovers automatically, so every fleet keeps reaching the active Namespace with no reconfiguration — it "just works," and Workers in all regions fail over automatically. One endpoint everywhere also keeps configuration and management simple.
Use a Regional Endpoint only when you need the lowest recovery time.
- Connecting each fleet to its region's Regional Endpoint (or VPC Endpoint) removes the DNS step from the connection path, which can shave time off failover for the lowest possible RTO. The tradeoffs: more setup, and a real risk of misconfiguration (such as routing a fleet to the wrong region). Reach for it only when you absolutely need low recovery time. With Regional Endpoints, keep forwarding enabled so passive-region polls still reach the active replica.

Active/Active Pattern: Component behavior

Workers — run and process in any number of regions; all follow the Namespace's active region.
Workflow starters and Clients — run wherever convenient and connect through the Namespace Endpoint, like the Workers.
Codec Servers and proxies — run in every region where Workers run.
Databases and queues — accessed from every Worker region; cross-region consistency must be designed for.

Regionally Sharded Namespaces

Regionally Sharded Namespaces: Overview

A few workloads are so latency-sensitive, or so tied to region-specific data, that they need Active/Active behavior in each region at once. You can build this by sharding across multiple Namespaces — one Namespace active per region — and routing each region's traffic to its local shard. Each Namespace ("shard") serves low-latency, region-bound work in its own region and replicates to another region for disaster recovery. The same idea extends to as many regions as you need.

Regionally Sharded Namespaces: On failover (Region 1 outage)

Each shard fails over independently, following the Active/Passive sequence for the Worker pattern you chose for it.

Regionally Sharded Namespaces: Choosing a pattern per shard

Each shard's Workers can use Single-region, Active/Cold, or Active/Hot, depending on that shard's availability and recovery-time needs.
A shard that must survive a regional outage should use Active/Cold or Active/Hot, which get automatic Namespace failover. A shard that only needs low-latency local processing can run Single-region.

Regionally Sharded Namespaces: Benefits

Low-latency, region-bound data in each region.
- Each shard serves its own region from a local active Namespace during normal operation, and the model extends to as many regions as you need.
Per-shard control.
- Each shard fails over independently and uses the Worker pattern that fits it.

Regionally Sharded Namespaces: Tradeoffs

By far the most time-consuming architecture to operate, with the highest cost and the highest risk of misconfiguration: a separate active Namespace, Worker fleet, and failover story per region.
Single-region shards don't get high availability out of the box. Unlike the Active/Passive and Active/Active patterns — where Temporal Cloud fails the Namespace over for you — failing a Single-region shard over to another region is entirely your responsibility. Use Active/Cold or Active/Hot for any shard that needs automatic failover.

Regionally Sharded Namespaces: Component behavior

Workers — one fleet per shard, each active in its shard's region using that shard's chosen pattern.
Workflow starters and Clients — run with each shard's Workers.
Codec Servers and proxies — run in every region that hosts an active shard.
Databases and queues — region-bound per shard; each fails over with its shard.

The rest of the architecture

The Worker deployment pattern sets the approach; the supporting pieces follow it.

Workflow starters and Clients. Deploy these with the same regional pattern as the Workers, since a starter or Client often shares the same in-region dependencies (databases, queues, upstream services) and should fail over alongside them. Point Clients at the Namespace Endpoint so they follow the active region automatically with no configuration change on failover, and use a Regional Endpoint only when a Client must be pinned to a region.
Codec Servers and proxies. Anything in the connection path between Workers and Temporal Cloud must be reachable from every region where Workers connect. In Active/Passive (Cold), scale them up in the secondary region as part of a failover; in the Active/Passive (Hot) and Active/Active patterns, run them in both regions at all times.
Databases and queues. These remain the application's responsibility, and the right approach depends on the Worker deployment pattern: a single-region-active datastore pairs naturally with Active/Passive, while running Workers active in both regions raises consistency questions that must be designed for. Detailed guidance is out of scope for this page.

Serverless Workers failover

In every pattern above, the Worker fleet is something you run, so failing it over — a cold start, a standby fleet, or a second active region — is the application's responsibility. Serverless Workers move that responsibility to Temporal Cloud.

Instead of long-lived Workers that poll a Task Queue, Serverless Workers invert the model: Temporal Cloud pushes Task invocations to a customer-owned compute function (AWS Lambda today). Because Temporal Cloud is the component that starts the Workers, it can also start them in the secondary region after a failover, with no action from you.

One Worker Deployment spans both regions. You register a compute function per region under a single Build ID, so the deployment is ready to run in either region.
Failover is automatic. When the Namespace fails over, Temporal Cloud invokes the function in the new active region — there's no fleet to detect the outage and bring up.
The whole system fails over hands-off. Both the Namespace and the Workers move automatically, lowering overall recovery time by removing the manual Worker-failover step that the patterns above require.

On failover, Temporal Cloud promotes the secondary replica to active and invokes the Worker function there — no fleet to bring up and nothing for you to do. The Worker failover is hands-off.

To add a replica and turn on High Availability features, see Enable and manage High Availability.

To choose between the Namespace Endpoint and Regional Endpoints and to set up private connectivity, see Connectivity for High Availability.

To stop forwarding Worker polls to the active region for the Active/Passive (Hot) pattern, see Change the forwarding behavior.

To trigger and manage failovers, see Failovers.

To understand the recovery objectives each pattern is measured against, see RPO and RTO.

Frequently asked questions

What is the difference between Active/Passive and Active/Active?

In Active/Passive (also written Active-Passive), Workflows process in one region at a time and the other region stands by for failover. In Active/Active (also written Active-Active), Workers run in both regions and process Workflows in both at once. See Worker deployment patterns for the full comparison.

How do I fail over Workers to another region?

A Namespace with High Availability fails over automatically, but bringing up or activating Workers in the secondary region is your responsibility — unless you use Serverless Workers, which Temporal Cloud starts for you. The exact steps depend on your pattern; see Active/Passive (Cold), Active/Passive (Hot), and Active/Active.

Which pattern has the lowest recovery time (RTO)?

Active/Passive (Hot) achieves the lowest recovery time, because a standby Worker fleet already runs in the secondary region and begins processing the moment it becomes active — no cold start. See Active/Passive (Hot) and RPO and RTO.

Do I have to run Workers in both regions for high availability?

No. Active/Passive (Cold) runs Workers in one region at a time and is the simplest starting point for disaster recovery. Running Workers in both regions — Active/Passive (Hot) or Active/Active — lowers recovery time at higher cost.

What needs a failover story​

Worker deployment patterns​

Compare Worker deployment patterns at a glance (benefits and tradeoffs)​

Active/Passive (Cold)​

Active/Passive (Hot)​

Active/Active​

Regionally Sharded Namespaces​

The rest of the architecture​

Serverless Workers failover​

Related​

Frequently asked questions​

What is the difference between Active/Passive and Active/Active?​

How do I fail over Workers to another region?​

Which pattern has the lowest recovery time (RTO)?​

Do I have to run Workers in both regions for high availability?​