If you found value in this post, consider following me on X @davidpuplava for more valuable information about Game Dev, OrchardCore, C#/.NET and other topics.
After a power outage, one of the servers in my homelab did not come back gracefully.
This server uses Docker Compose to run a couple services I use locally.
Logging into the server, I notice that restarting docker compose caused an error.
$ sudo docker-compose up -d
Error response from daemon: Conflict. The container name “/foo” is already in use by container “edjd98889d8dfddd9090998ddd09898ddd234234”. You have to remove (or rename) that container to be able to reuse that name....
The top posts from a Google search discussed checking to see if anything was using that port, which was probably not the underlying cause.
So next came a set of troubleshooting steps.
First up was to check the docker-compose.yaml
file to see if anything was out of the ordinary.
$ vi docker-compose.yaml
Nothing unusual in the file, but I noticed that when I closed it, there was a quick flicker of vi's command status bar that flashed red showing some kind of error.
I re-opened the file, the did a :qw
vi command to save the file and I received an "unable to save...out of disk space error".
Alright, let's check the disk usage.
$ df -h
\Filesystem Size Used Avail Use% Mounted on
tmpfs 392M 1.5M 390M 1% /run
/dev/mapper/server--vg-root 193G 0 193G 100% /
tmpfs 2.0G 4.0K 2.0G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda1 470M 254M 192M 57% /boot
nas.local:/volume1/serverbackups/server 14T 3.0T 11T 22% /mnt/nfs/serverbackups/server
tmpfs 392M 4.0K 392M 1% /run/user/1000
/home/user/.Private 193G 0 193G 100% /home/user
Yep, 100% of the file system was used up which was very unexpected.
After several attempts to delete files, I was still at 100% file system in use.
Using series of the following command:
$ sudo du -sch *
I concluded that the /var/lib/docker
folder was using an unusual amount of disk space.
So it made sense to clean up docker.
After years of use, I knew I had plenty of old images that could be removed.
So I ran the following command.
$ sudo docker system prune
...
deleted: sha256:715a1b962166ede06c7a0e87d068a4b686e6066e0eca5ecab6f4d6cfab2121fe
deleted: sha256:97ab3baee34d0c75ee10e65c63a06cbc87d20d695c17d14ad565d4ff1b8dc2ca
deleted: sha256:9f54eef412758095c8079ac465d494a2872e02e90bf1fb5f12a1641c0d1bb78b
Total reclaimed space: 15.67GB
The output of the command show that I reclaimed 15.67G of space.
Awesome, time to start everything back up.
$ sudo docker-compose up -d
Creating network "user_default" with the default driver
Creating service1 ... done
Creating service2 ... done
Creating service3 ... done
After everything restarted successfully, I could see that my disk file usage was back to normal, down almost 100G.
$ df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 392M 1.5M 390M 1% /run
/dev/mapper/server--vg-root 193G 91G 92G 50% /
tmpfs 2.0G 4.0K 2.0G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda1 470M 254M 192M 57% /boot
san.local:/volume1/serverbackups/server 14T 3.0T 11T 22% /mnt/nfs/serverbackups/server
tmpfs 392M 4.0K 392M 1% /run/user/1000
/home/user/.Private 193G 91G 92G 50% /home/user
Hard to say what happen but given that this occurred after a power outage, it's likely that the servers in my homelab did not start back up in the correct order.
Specifically, I think my network attached storage (NAS) server was off when the broken server tried to mount a drive used by the docker services.
This likely caused some sort of run away crash loop that filled up logs files in all the docker volumes.
Then after everything started successfully, the docker services probably clean up the log files or any large mounted files it was hanging onto.
If you found value in this post, consider following me on X @davidpuplava for more valuable information about Game Dev, OrchardCore, C#/.NET and other topics.