100 days of homelab Activity Log

I’ll just log mundane things here and update this post. Cool stuff will get a dedicated post. Do I go top bottom, or bottom top? I guess we’ll find out.

day 002 20250306

made this blog post (baby steps okay?)
logged into one of my ingress boxes and looked at logs. 99% bot traffic. no surprise.
one of my lanecloud hypervisor hosts has 550 days of uptime… that’s something I guess.
started on an ansible role to add custom startup scripts to clammy-ng
got distracted with some hugo theme tweaks that I couldn’t quite make work

day 003 20250308

Ended up being kind of an Armbian Day

submitted the removal of a RK3568 bugfix patch since it got merged into LTS kernels and was conflicting on build.
Ran a bunch of glmark2 and sbcbench marks on boards comparing kernels. No real conclusions other than I think an RK3588 DMA patch someone added had a slight performance regression.

day 004 20250309

My Synology backups that goto my eSata drive have been dead for a month. Got the drive online… now says its full. Really need to add back push notifications for that. More work to be done to get this resolved.
Made a new playbook for my Provision KVM guest ansible role. It just uses vars_prompt to make it easy to spin up a few VM in a hurry. I like it.
Wasted a lot of time trying to debug performance issues with my librespeed instance. Some of the performance problems are just VM performance sucks on Synology VMs, even when using appropriate acceleration. Other parts of it, is that it seems to be a known thing that nginx or ingress-nginx also seems to introduce some performance issues when it proxies librespeed. I managed to do some tuning that made it better, but it’s still unable to hit wirespeed on gigabit. I’m extra amused that librespeed is more performant on a VM running on my Rock 5B than it is running on my Synology DS1821+
Here’s some settings for the service ingress that sort of helped with nginx.. but needs more TBH
```
annotations:
  nginx.ingress.kubernetes.io/proxy-body-size: "21M"
  nginx.ingress.kubernetes.io/proxy-buffering: "on"
  nginx.ingress.kubernetes.io/proxy-buffers-number: "50"
```

day 005 20250310

Just kind of did my rounds and looked at some things.

Verified Synology backups are happy now that I cleaned up my external drive
Updated Netbox
- deleted retired LET deal VMs
- updated renewal date field on remaining LET deal VMs
Verified db dumps and borgmatic backups are running on my Netbox server
Looked at some healthchecks.io helm charts. Would be nice to deploy that in the future and integrate with borgmatic

day 006 20250311

Pretty beat today so just did some fiddling. I fired up some temporary debian pods on my synology moncluster VM, and one on my Rock 5B k3s cluster. I ran quick yabs.sh benchmarks on them. The pod on the 6 core ARM VM was faster in single and multithreaded than the 4 core VM on my Synology 1821+. Between this and the librespeed benchmarks, I think I’ve convinced myself the monocluster is moving off the Synology and onto a Rock 5B.

WTF is the monocluster

I have 2 k3s deployments at home. The first one is the “armlab” cluster. Its 6 VMs running on 3 Rock 5B boards. It’s meant to be torn down and recreated a lot. It has an HA control plane via kube-vip. It has metallb, cillium, and some basic service deployed via flux. I really wanted to figure out using BGP in the HA cluster before deploying to it, but I decided I should do that later and first just focus on getting my nomad cluster.

The monocluster was my KISS approach to just focus on a “best practices” IAC setup for k8s with fluxcd…and focus on bootstrapping all the extra things like external-dns, external-secrets, etc. One detail is that I was also trying to design it as a solution I could use to run on a single node VPS.. It took me a while to solve for getting nginx to listen on port 80,443 node ports but I did it. I’ve been redeploying it like crazy as well while I add functionality, but it’s time to set up it’s final (temporary) home.. and move stuff over it to it.

day 007 20250313

I found one of my rock-5b boards that I used as an armbian build node a year or so ago with the Googalator kernel. It has a nice NVME in it. It’s a great fit for the home of my monocluster. So I decided to build an image and get it installed… Naturally I hit some unexpected curveballs

My other Synology cache drive alerted that it had 1% lifespan…… I had to replace it.. Now I’m running my cache on a pair of $20 nvmes… RIP crucial P3
I booted edge kernel 6.14.0-rc4… it was weird… the nvme was intermittent. And eventually disappeared… I went back and forth a few times and then decided to build -current aka LTS kernel 6.12.y.
worked better.. but something with armbian-install or uboot on SPI flash isn’t booting from NVME.. i don’t feel like debugging.
just using sdcard for bootloader for now. seems to work.

day 008 20250314

I’m surprised about my burst of productivity given I’ve been on like 4 hours of sleep today.

The nvme boot problem seem to partially have somethign to do with the nvme drive I was using.. it would init fine on a reboot, but not on a cold boot. Storage controllers and SBCs can be picky. I pulled a 512 Skyhinx out of an external enclosure and now it’s happy.
I re-installed monocluster as bare metal k3s on the above mentioned Rock 5B.
I also tested libre speed… looks good!

root@ronny-1:~# librespeed-cli --skip-cert-verify --local-json librespeed.json --concurrent 2 --server 1
Using local JSON server list: librespeed.json
Selected server: mtest [librespeed.mtest]
You're testing from: {"processedString":"172.17.20.115 - private IPv4 access","rawIspInfo":""}
Ping: 0.64 ms	Jitter: 0.38 ms
Download rate:	925.08 Mbps
Upload rate:	953.28 Mbps

day 009 20250315

Time to start getting some actual workloads moved over…. or in my case.. test some workloads.

I focused on some basic nginx servers that are just directory indexes for data on an NFS mount. Was a great starting point.

I lost a lot of time trying to use app-template to do the things. It’s cool, but also extremely overkill and ended up being a bad fit for my use case.

I dumbed things down to unified manifest file for each server + service. I took advantage of flux’s ConfigMapGenerator to make my configuration hashed and cause pods to redeploy when I update. Works well.

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: web-static
resources:
  - ./namespace.yaml
  - ./nginx-linuxmirror.yaml
  - ./nginx-otherserver.yaml
configMapGenerator:
  - name: nginx-linuxmirror
    files:
      - default.conf=nginx-configs/default-linuxmirror.conf
  - name: nginx-default
    files:
      - default.conf=nginx-configs/default.conf

day 010 20250316

moved the monocluster rock5b into my rack
wired the fan on the heatsink to 3.3v for now
ordered some Molex Picoblade 1.25mm connector crimps so that I can wire the fan to the actual fan header later
naturally the switch port i used was wrong and i had to fix the PVID before I was pulling the properly assigned static DHCP address.

day 011 20250317

A rude awaking

Checked my email in the morning and saw an abuse report from Hetzner in my inbox. Turns out people were exploiting an old pastebin server I’d setup a long time ago for Armbian-related stuff. My hetzner box has haproxy doing TCP reverse proxying over a wireguard tunnel to my fabio load balancer for my nomad cluster. I stupidly have a mixture of public and internal services running on it. Anyway.. I terminated the service and ripped out the DNS records… Nobody was using it–including myself. Not how I wanted to start my week, but I good reminder to spend some more energy on my security posture. Looks like a bit of extra bandwidth got burned on my hetzner account as well. They changed policies from 20TB included a month to 1TB :( Boo…. but lets be real, 20TB of bandwidth a month in Ashburn for $5 a month is too good to be true.

Hopefully a better homelab day tomorrow.

day 012 20250319

My rackmate is working on a new core router. We’re upgrading hardware and going from vyos 1.3 to 1.4. I had to re-write my NAT rules. Also trying to use a dummy interface for my NAT interface since sometimes we have multiple WAN links.. Hopefully this will simplify things. Regardless I put the translation interface in an interface group. Hopefully that will make it easy to move things around in a pinch

day 013 20250324

Mostly poked around k9s and grafana today. Just wanted keep up the rhythm at least. I upgraded k3s on my armcluster. It was a minor update or 2 behind. Ansible did all the work of course.

day 014 20250406

Had some other stuff going on the past 2 weeks, but I haven’t given up.

I did my normal kernel build and update routine for my router. I spent a little bit of time trying to see if I could speed up the reconciliation time of my fluxcd deployment, but no solutions yet. I have a dependency chain that get held up waiting on external-secrets to reconcile back into a ready state. The annoying this is external-secrets leaves a ready state on any update I push.

day 015 20250412

Busy day. Changed from NVME cached LVM to ZFS on one of my lanecloud servers. It was quite a bit to unwind.. had to move VMs. make checklists.. move filesytems, backup stuff. etc. The box has 256 gigs of RAM so ZFS cheats performance pretty well on top of the underlying 4 spindles. I made my 2 NVME drives its own mirror pool. The 4 rust drives I configured in the ZFS raid10 equivalent:

#!/usr/bin/env bash
SDA="wwn-0x5000c50091843eae"
SDB="wwn-0x5000c50091906b31"
SDC="wwn-0x5000c50092509d76"
SDD="wwn-0x5000c500926006cc"
zpool create rustpool \
    -o ashift=12 \
    -o autotrim=on \
    -O compression=zstd \
    -O dedup=off \
    -O xattr=sa \
    -O acltype=posixacl \
    -O atime=on \
    -O relatime=on \
    -O sync=standard \
    mirror ${SDA} ${SDB} mirror ${SDC} ${SDD}

day 016 20250413

Around 18 months ago I ran into a bug with the Ansible Netbox inventory plugin when querying my netbox server that runs on a subpath. I made a local fix, but never upstreamed it. This weekend I decided to upstream it. Rebasing my patch just worked. That was cool. Doing the actual dance to properly submit the patch to GitHub was more involved. I was going to write a dedicated blog post on it, but maybe later.. or not at all. I’m not complaining at all, it’s just part of open source… and projects need good PR hygiene and process so that devs can not be beat down.

day 017 20250415

Woohoo my bugfix for the Ansible Netbox Inventory Plugin PR was accepted. FOSS Works! :P

I haven’t redeployed the VMs on the server that got the ZFS switch. My nomad tasks I stopped disappeared. And the particular VMs running were from my oldest iteration of the lanecloud IAC. TLDR; I have to bugfix the playbooks to get them to redeploy….. or I can follow through with switching to libvirt deployments. I opted to modernize… Let’s just say I’m moving slow.

day 018 20250505

Man.. really spacing this stuff out…. in my defense last week I was on vacation and I made sure to avoid computering.

Just a basic maintenance night to kind of get the juices flowing.. Some apt-updates and built some fresh 6.14 kernels for my RK3588 cluster

Apt update on my poor helios4 trixie box took forever… but that’s normal. The Helios 4 is still a champ!