Azure Kubernetes pod is inaccessible from another pod



  • Internally we have a Spark Executor and Spark Driver running in Kubernetes.

    When you run a Spark Job in Kubernetes, Spark will create 2 types of pods Spark Drivers and Spark Executors. The Spark Executors communicate over TCP with Spark Driver using the internal DNS entry of each other's pod.

    Suddenly today, the Spark Executor is unable to communicate with the Spark Driver. We get an DNS Unknown Host error meaning it is not able to resolve pod-name.namespace.

    During troubleshooting, we notice that no pods in the cluster can communicate to this Pod via pod-name.namespace.

    During troubleshooting, we created a Kubernetes Service entry to the pod. We can successfully communicate to the pod using the service DNS entry.

    What is recourse for these sorts of issues to solve this sort of DNS lookup issues in Azure? We are opening a ticket with Microsoft for starters.



  • Restarting the CoreDNS services has fixed the issue. Apparently there is some stale state it can get into. So if you see this behavior, a restart kicks it back into the right state. This is quite rare so it should be OK.




Suggested Topics