Newly installed k3s cluster on fresh OS install can not resolve external domains or connect to external resources?



  • I'm following the https://rancher.com/docs/rancher/v2.5/en/troubleshooting/dns/ , after step 2 "Add KUBECONFIG for user.", if run this command,

    kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup www.google.com
    

    I get this error,

    nslookup: can't resolve 'www.google.com'
    pod "busybox" deleted
    pod default/busybox terminated (Error)
    

    However, I'm running k3s. It's a single node cluster, and on the same machine that k3s is installed I can run nslookup www.google.com, and everything works. The tutorial doesn't say where to go from there? What could cause DNS failures for external resolution inside of k3s, but not ousdie of k3s?

    My core DNS logs show,

    [ERROR] plugin/errors: 2 google.com. AAAA: read udp 10.42.0.6:40115->1.1.1.1:53: i/o timeout
    [ERROR] plugin/errors: 2 google.com. A: read udp 10.42.0.6:54589->1.1.1.1:53: i/o timeout
    

    And when I run curl on an external server, I get

    command terminated with exit code 6

    While this was the first symptom for me, it turns out that I also can't ping or curl/wget external websites by IP. For these reasons I think the problem is even more complex, and perhaps involves IP tables.

    I uploaded my https://github.com/k3s-io/k3s/files/8921761/k3siptables.log



  • My problem was basically that I had multiple default routes, from ip route

    default via 172.16.42.1 dev ens5 proto dhcp src 172.16.42.135 metric 100 
    default via 172.16.42.1 dev ens3 proto dhcp src 172.16.42.95 metric 100 
    default via 10.2.64.1 dev ens4 proto dhcp src 10.2.67.51 metric 100 
    10.2.64.0/19 dev ens4 proto kernel scope link src 10.2.67.51 
    169.254.169.254 via 172.16.42.2 dev ens5 proto dhcp src 172.16.42.135 metric 100 
    169.254.169.254 via 172.16.42.2 dev ens3 proto dhcp src 172.16.42.95 metric 100 
    169.254.169.254 via 10.2.64.11 dev ens4 proto dhcp src 10.2.67.51 metric 100 
    172.16.42.0/24 dev ens5 proto kernel scope link src 172.16.42.135 
    172.16.42.0/24 dev ens3 proto kernel scope link src 172.16.42.95
    

    The cause of this was not using no_gateway = true in my Terraform stanza,

    resource "openstack_networking_subnet_v2" "subnet_project" {
      name       = "subnet_project"
      network_id = openstack_networking_network_v2.net_project.id
      cidr       = "172.16.42.0/24"
      ip_version = 4
    }
    

    Without no_gateway = true. I can fix this by lowering the metric on the default route on the host,

    sudo ip route replace default via 10.2.64.1 dev ens4 metric 90
    

    Which will add a new route to this,

    default via 10.2.64.1 dev ens4 metric 90
    

    Now running nslookup google.com as in the question will work fine, I can re-break it with

    sudo ip route del default via 10.2.64.1 dev ens4 metric 90
    

    Other diagnostics

    These are taken before bringing up k3s. My ip -o addr shows,

    1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
    1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
    2: ens3    inet 172.16.42.95/24 brd 172.16.42.255 scope global dynamic ens3\       valid_lft 40534sec preferred_lft 40534sec
    2: ens3    inet6 fe80::f816:3eff:fecd:6722/64 scope link \       valid_lft forever preferred_lft forever
    3: ens4    inet 10.2.67.51/19 brd 10.2.95.255 scope global dynamic ens4\       valid_lft 24333sec preferred_lft 24333sec
    3: ens4    inet6 2620:0:28a4:4140:f816:3eff:fed2:72c7/64 scope global dynamic mngtmpaddr noprefixroute \       valid_lft 2591979sec preferred_lft 604779sec
    3: ens4    inet6 fe80::f816:3eff:fed2:72c7/64 scope link \       valid_lft forever preferred_lft forever
    4: ens5    inet 172.16.42.135/24 brd 172.16.42.255 scope global dynamic ens5\       valid_lft 40534sec preferred_lft 40534sec
    4: ens5    inet6 fe80::f816:3eff:fe3d:cde8/64 scope link \       valid_lft forever preferred_lft forever
    

    And ip link shows

    1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: ens3:  mtu 1492 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
        link/ether fa:16:3e:cd:67:22 brd ff:ff:ff:ff:ff:ff
    3: ens4:  mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
        link/ether fa:16:3e:d2:72:c7 brd ff:ff:ff:ff:ff:ff
    4: ens5:  mtu 1492 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
        link/ether fa:16:3e:3d:cd:e8 brd ff:ff:ff:ff:ff:ff
    

Log in to reply
 


Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2