Home  Tech   What happen ...

What happens when Docker Swarm manager node dies

When a Docker Swarm manager node dies, the impact on the Swarm depends on several factors, including the number of manager nodes in the Swarm and whether you have set up high availability by having multiple manager nodes. Here’s what happens and how Docker Swarm handles it:

High Availability in Docker Swarm

To ensure high availability, it is recommended to have an odd number of manager nodes (typically 3 or 5) in your Docker Swarm. This setup allows the Swarm to tolerate failures and continue operating correctly.

Manager Node Responsibilities

Manager nodes in Docker Swarm are responsible for:

  1. Orchestrating and Scheduling Tasks: Deciding where and when to run tasks (containers) across the Swarm.
  2. Maintaining Cluster State: Managing the state of the Swarm, including services, tasks, and nodes.
  3. Service Discovery and Networking: Managing the internal DNS and overlay networks for service discovery.
  4. Health Monitoring: Monitoring the health of nodes and services.

Single Manager Node Failure

If you have a single manager node and it dies, the entire Swarm cluster is affected as there’s no fallback manager to take over its responsibilities. This can cause:

Multiple Manager Nodes (High Availability)

If you have set up multiple manager nodes, Docker Swarm can handle the failure of a single manager node gracefully.

Scenario: Multiple Manager Nodes

  1. Initial Setup: Suppose you have a Swarm with 3 manager nodes (Manager A, Manager B, Manager C) and several worker nodes.

  2. Manager Node Failure: If Manager A dies:

    • Leader Election: Docker Swarm uses the Raft consensus algorithm to maintain consistency and elect a new leader if the current leader dies. The remaining manager nodes (Manager B and Manager C) will automatically elect a new leader.
    • Task Continuity: The new leader continues managing and scheduling tasks, maintaining the cluster’s operational state.
    • Fault Tolerance: The Swarm remains functional and can tolerate the failure of one manager node without service disruption.

Adding and Removing Manager Nodes

Adding a New Manager Node

To restore redundancy after a manager node failure, you can add a new manager node to the Swarm:

docker swarm join --token <MANAGER-TOKEN> <MANAGER-IP>:2377
docker node promote <NEW-NODE-ID>

Removing a Failed Manager Node

You can remove the failed manager node from the Swarm:

docker node rm <FAILED-NODE-ID>

Practical Example

Here’s a step-by-step example illustrating the process:

  1. Create Swarm and Add Managers:

    docker swarm init --advertise-addr <MANAGER-A-IP>
    docker swarm join --token <MANAGER-TOKEN> <MANAGER-A-IP>:2377
    docker node promote <NODE-B-ID>
    docker node promote <NODE-C-ID>
    
  2. Deploy a Service:

    docker service create --name my-service --replicas 3 my-service-image
    
  3. Manager Node Failure:

    If Manager A fails, the Swarm cluster will elect a new leader (Manager B or Manager C).

  4. Monitor and Manage Nodes:

    docker node ls
    

    Use this command to see the status of all nodes and identify the failed manager node.

  5. Recover and Add a New Manager:

    docker swarm join --token <MANAGER-TOKEN> <MANAGER-B-IP>:2377
    docker node promote <NEW-NODE-ID>
    

    Replace <NEW-NODE-ID> with the ID of the new node you are promoting to a manager.

Published on: Jun 13, 2024, 10:58 AM  
 

Comments

Add your comment