k3s¶
- Code: kubernetes/k3s
k3s
installs k3s on VMs/Bare-Metal hosts. When a plugin like hcloud_vms
is used, a file called inventory.yml
will be available for k3s
to determine the connection data for the machines to provision Kubernetes to. Alternatively, a custom inventory can be provided.
Directories used¶
- /usr/local/bin/ (k3s binary)
- /etc/systemd/system (k3s systemd units)
- /var/lib/kubelet (kubelet data)
- /var/log/ (logs)
- /var/lib/rancher/k3s (k3s data, containers, images)
Configuration¶
# defaults
commands:
install:
script:
- ansible-playbook k8s_k3s.yml
uninstall:
script:
- ansible-playbook k8s_k3s.yml
version: v1.21.1+k3s1
config:
agent:
args: ""
api:
endpoint: ""
token: ""
server:
args:
--kube-scheduler-arg 'bind-address=0.0.0.0' --kube-scheduler-arg 'address=0.0.0.0'
--kube-proxy-arg 'metrics-bind-address=0.0.0.0' --kube-controller-manager-arg
'bind-address=0.0.0.0' --kube-controller-manager-arg 'address=0.0.0.0' --kube-controller-manager-arg
'allocate-node-cidrs' --etcd-expose-metrics --disable traefik,local-storage
--disable-cloud-controller --kubelet-arg=image-gc-high-threshold=85 --kubelet-arg=image-gc-low-threshold=80 --kubelet-arg container-log-max-files=4 --kubelet-arg container-log-max-size=50Mi
systemd_dir: /etc/systemd/system
Note
To use cilium instead of flannel, set enabled
to true
in cilium's config in the Stackfile
Troubleshooting¶
etcd¶
https://gist.github.com/superseb/0c06164eef5a097c66e810fe91a9d408
No space left in tmpfs /run
¶
Indicators¶
- Pod-level events saying
No Space left on device: unknown
but the host still has enough space left on/
Remedies¶
- log in to the respective node via SSH (to know which node to login, check on which node the pod is currently running):
cloudstack ssh $NODENAME
(where $NODENAME is the name of the node as defined ininventory.yml
) - Check available / used disk storage:
df -h | grep "/run"
- Check available / used inodes:
df -i | grep "/run"
/run
is a virtual/temporary filesystem created from memory where k3s stores a Pod's lifecycle artifacts like the rootfs and logs. If /run
is at 100% usage, you need to analyze which Pod might cause the filesystem to fill.
No space left¶
df -h | grep "/run"
# gives the following output
tmpfs 797M 797M 0M 100% /run
For quick remedy, you can re-mount /run
with more available space at runtime by executing the following command: sudo mount -o remount,size=25%,noatime /run
Note
Starting with Cloudstack v1.31.0, /run
will be configured with 25% of the available memory (as opposed to k3s defaults which are around 10%).
For a deeper analysis down to Pod-level, run the following commands:
cd /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io
du -hsxc * | sort -h
# gives the following output
44K de8335deb8f9e52cc0a7962266488abd77b7b008820ad8e2ad9dbc6c78c73206
116K 69f1d90483bba5c5df91a85dc98853f4132017d65f036e50e9d5e528e5936b47
136K 30eda047e4de4d61fd84251adbed99ac7d59c32a5df5b76ad7dd85f955dd06f2
141M f98877f126118d7eb4e2e11fc5b3fa567ce1f025bc30b65285e273a5b31d3bdc
144M f7b0f3e6441c152037e41dff3063da44b99d186c83c5d793b45ba0ee5b36cf0d
286M total
In this example, f7b0f3e6441c152037e41dff3063da44b99d186c83c5d793b45ba0ee5b36cf0d
is the biggest directory so we cd
into it and check what's in it:
cd f7b0f3e6441c152037e41dff3063da44b99d186c83c5d793b45ba0ee5b36cf0d
ls -lh
# gives the following output
total 427M
-rw-r--r-- 1 root root 4K Dec 9 16:00 address
-rw-r--r-- 1 root root 24K Dec 9 16:00 config.json
-rw-r--r-- 1 root root 4K Dec 9 16:00 init.pid
prwx------ 1 root root 0 Dec 9 16:00 log
-rw-r--r-- 1 root root 144M Apr 19 11:08 log.json
-rw------- 1 root root 4K Dec 9 16:00 options.json
drwxr-xr-x 1 root root 4.0K Dec 9 16:00 rootfs
-rw------- 1 root root 0 Dec 9 16:00 runtime
Here you can see that the Pod's log file uses 144M. Now we have two options:
- remove the logfile without restarting the Pod:
mv log.json /tmp/${PWD##*/}-log.json && touch log.json
- restart the Pod:
- get the Pod's ID/hostname:
cat config.json | grep -i HOSTNAME
- delete that Pod in Lens/with Kubectl. The whole directory will be deleted and a new one will be created, instantly freeing up the used space
Note
Starting with Cloudstack v1.31.0, k3s will be configured with strict log rotation policies to make sure old logs cannot fill up the disk anymore
No free inodes¶
df -i | grep "/run"
# gives the following output
tmpfs 1019092 1019092 0 100% /run
For a deeper analysis down to Pod-level, run the following commands:
cd /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io
ls -ltrah
# gives the following output
drwx--x--x 3 root root 2,8K Aug 1 15:01 3aceae04271e4cc183a6b2d2dab143c5288427b03c7187031272bf07ec5f5811
drwx--x--x 3 root root 5,3M Aug 1 17:03 6c7288eabd2d6e66beb4f66ebae67b768fe8c4806c832a86c18e1985834c84ae
drwx--x--x 3 root root 4,9M Aug 1 17:04 08fb834658d44ed4b6839fb9bc3ac04e322fb885e9a72ac80b4e42f5cc656aec
drwx--x--x 3 root root 5,3M Aug 1 17:04 be421e99114024c631530bcc9217bbea081e193a22108b2dfd0b2f4d4e082a1c
drwx--x--x 3 root root 4,1M Aug 24 14:20 6645d4b83b43f3b45de22a0ac5d7dd4c4ef0b25177b18c8dbe05cad138ab06c2
Now we can scan these directories for .pid
files as these are most of the time the reason for filled up inodes.
cd 6645d4b83b43f3b45de22a0ac5d7dd4c4ef0b25177b18c8dbe05cad138ab06c2
ls -la | grep ".pid" | wc -l
# gives the following output
277206
This tells us that there are 277206 .pid
files in this directory which clearly contributes to the shortage of free inodes.
To deal with this, we clean up all files older than 30 days and then check if that's enough to mitigate the issue:
find . -name ".*.pid" -type f -mtime +30 -delete
ls -la | grep ".pid" | wc -l
# gives the following output
201223
##### OPTIONAL: delete all files older than 10 days
find . -name ".*.pid" -type f -mtime +10 -delete
ls -la | grep ".pid" | wc -l
# gives the following output
15432
Issues with resolv.conf
¶
Indicators¶
- Failed to create pod sandbox: open /run/systemd/resolve/resolv.conf: no such file or directory
Remedies¶
Usually this means that systemd-resolvd
has issues on that node. There's two fixes for that:
- automated: run
systemctl restart systemd-resolved
- manual:
- create
/run/systemd/resolve/resolv.conf
with the following content:# This file is managed by man:systemd-resolved(8). Do not edit. # # This is a dynamic resolv.conf file for connecting local clients directly to # all known uplink DNS servers. This file lists all configured search domains. # # Third party programs must not access this file directly, but only through the # symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way, # replace this symlink by a static file or a different symlink. # # See man:systemd-resolved.service(8) for details about the supported modes of # operation for /etc/resolv.conf. nameserver 1.1.1.1 nameserver 8.8.8.8 nameserver 8.8.4.4
- ATTENTION: this could interfer with
systemd-resolved
trying to create this file after a node reboot and should be monitored
After creating the file or restarting the service it might take a while for Pods to properly detect the change and restart gracefully.