Make a readiness probe to fail when there is a newer version of the app being rolled out



  • How to make a readiness probe fail when there is a newer version of the app available

    Stack: Node.js, GKE

    Context

    Clients are connected to pods via websockets. During a rolling update, clients that disconnect from the old pods can connect to either old pods or new pods. I want to minimize client disconnections and force them to connect only to the newest version of the pods. Making sure that ONLY one disconnection/connection takes place is crucial, I want to maximize the uptime of client connections, I cannot afford clients to connect/disconnect multiple times.

    For this purpose I want to use the readinessProbe. I need to make it fail for old pods when a rolling update is taking place.

    Recreate strategy cannot be used since the re-connection of all clients at once creates a big load on the system.

    Ideas

    1. ConfigMap that holds the latest version of the app.

    1. The current version of the app is passed as an env var and it is stored in memory.
    2. The newest version of the app lives in a ConfigMap.
    3. Health endpoint returns the oldVersion and newVersion.
    4. Readiness probe GETs the /health and compares the two versions (exec/curl), if not equal exit 1 and the probe fails.

    Drawbacks:

    • Configmaps can take some minutes to sync which is a lot in my case. Is there a way to accelerate it?

    the total delay from the moment when the ConfigMap is updated to the moment when new keys are projected to the Pod can be as long as the kubelet sync period + cache propagation delay

    https://kubernetes.io/docs/concepts/configuration/configmap/

    What happens during a rollback?

    • Logic to update the config map again and have the probes failing. Helm will not do the trick (helm upgrade --install)

    The purpose of this question is to find a way to do it in a timely manner. Do you have other ideas?



  • If your pods start fast, perhaps use https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#recreate-deployment , that will make sure to terminate "old" pods and "recreate" new ones.

    All existing Pods are killed before new ones are created when .spec.strategy.type==Recreate.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2