I have been facing various different issues with one EKS cluster,
- the pods is stuck at
failedMount, even though the PV and PVC has been in
boundstate, and the PVC also shows bound to the specific pods, the EFS on aws is also in good state
## relevant error message
list of unattached/unmounted volumesWarning FailedMount 1m (x3 over 6m) kubelet, ..... Unable to mount volumes for pod "mypod1_test(c038c571-00ca-11e8-b696-0ee5f3530ee0)": timeout expired waiting for volumes to attach/mount for pod "test"/"mypod1". list of unattached/unmounted volumes=[aws]
Normal Scheduled 5m45s default-scheduler Successfully assigned /test3 to ....
Warning FailedMount 83s (x2 over 3m42s) kubelet, ... Unable to mount volumes for pod "test3_....(fe03d56e-aa26-11ea-9d9c-067c9e734f0a)": timeout expired waiting for volumes to attach or mount for pod ""/"test3". list of unmounted volumes=[mypd]. list of unattached volumes=[mypd default-token-mznhh]
- related, it shows the efs csi driver not available
aws ebs csi driver Warning FailedMount 9m1s (x4 over 15m) kubelet, ....Unable to mount volumes for pod "test3_...(f0d65986-aa27-11ea-a7b2-022dd8ed078a)": timeout expired waiting for volumes to attach or mount for pod ""/"test3". list of unmounted volumes=[mypd]. list of unattached volumes=[mypd default-token-mznhh]
Warning FailedMount 2m35s (x6 over 2m51s) kubelet, ... MountVolume.MountDevice failed for volume "scratch-pv" : driver name efs.csi.aws.com not found in the list of registered CSI drivers
Warning FailedMount 2m19s kubelet, .... MountVolume.SetUp failed for volume ".." : rpc error: code = Internal desc = Could not mount "fs-43b99802:/" at "/var/lib/kubelet/pods/f0d65986-aa27-11ea-a7b2-022dd8ed078a/volumes/kubernetes.io~csi/scratch-pv/mount": mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t efs fs-43b99802:/ /var/lib/kubelet/pods/f0d65986-aa27-11ea-a7b2-022dd8ed078a/volumes/kubernetes.io~csi/scratch-pv/mount
Output: Failed to resolve "fs-43b99802....amazonaws.com" - check that your file system ID is correct.
See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail.
- pods stuck at termination state, even thought the svc, deployment, and rs has been killed, the pod still has been stuck at termination
the final solution to sort out all is to reboot the EKS worker node