Introduction

Warning

These docs contain information that relates to my setup. They may or may not work for you.



My Home Operations Repository :octocat:

... managed with Flux, Renovate, and GitHub Actions πŸ€–

DiscordΒ Β  TalosΒ Β  KubernetesΒ Β  Renovate

Home-InternetΒ Β  Status-PageΒ Β  Alertmanager

Age-DaysΒ Β  Uptime-DaysΒ Β  Node-CountΒ Β  Pod-CountΒ Β  CPU-UsageΒ Β  Memory-UsageΒ Β  Power-Usage


πŸ“– Overview

This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Ansible, Terraform, Kubernetes, Flux, Renovate, and GitHub Actions.


β›΅ Kubernetes

My Kubernetes cluster is deploy with Talos. This is a semi-hyper-converged cluster, workloads and block storage are sharing the same available resources on my nodes while I have a separate server with ZFS for NFS/SMB shares, bulk file storage and backups.

There is a template over at onedr0p/cluster-template if you want to try and follow along with some of the practices I use here.

Core Components

  • actions-runner-controller: Self-hosted Github runners.
  • cert-manager: Creates SSL certificates for services in my cluster.
  • cilium: Internal Kubernetes container networking interface.
  • cloudflared: Enables Cloudflare secure access to certain ingresses.
  • external-dns: Automatically syncs ingress DNS records to a DNS provider.
  • external-secrets: Managed Kubernetes secrets using 1Password Connect.
  • ingress-nginx: Kubernetes ingress controller using NGINX as a reverse proxy and load balancer.
  • rook: Distributed block storage for peristent storage.
  • sops: Managed secrets for Kubernetes and Terraform which are commited to Git.
  • spegel: Stateless cluster local OCI registry mirror.
  • volsync: Backup and recovery of persistent volume claims.

GitOps

Flux watches the clusters in my kubernetes folder (see Directories below) and makes the changes to my clusters based on the state of my Git repository.

The way Flux works for me here is it will recursively search the kubernetes/${cluster}/apps folder until it finds the most top level kustomization.yaml per directory and then apply all the resources listed in it. That aforementioned kustomization.yaml will generally only have a namespace resource and one or many Flux kustomizations (ks.yaml). Under the control of those Flux kustomizations there will be a HelmRelease or other resources related to the application which will be applied.

Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When some PRs are merged Flux applies the changes to my cluster.

Directories

This Git repository contains the following directories under Kubernetes.

πŸ“ kubernetes
β”œβ”€β”€ πŸ“ main            # main cluster
β”‚   β”œβ”€β”€ πŸ“ apps           # applications
β”‚   β”œβ”€β”€ πŸ“ bootstrap      # bootstrap procedures
β”‚   β”œβ”€β”€ πŸ“ flux           # core flux configuration
β”‚   └── πŸ“ templates      # re-useable components
└── πŸ“ storage         # storage cluster
    β”œβ”€β”€ πŸ“ apps           # applications
    β”œβ”€β”€ πŸ“ bootstrap      # bootstrap procedures
    └── πŸ“ flux           # core flux configuration

Flux Workflow

This is a high-level look how Flux deploys my applications with dependencies. Below there are 3 apps postgres, glauth and authelia. postgres is the first app that needs to be running and healthy before glauth and authelia. Once postgres and glauth are healthy authelia will be deployed.

graph TD;
  id1>Kustomization: cluster] -->|Creates| id2>Kustomization: cluster-apps];
  id2>Kustomization: cluster-apps] -->|Creates| id3>Kustomization: postgres];
  id2>Kustomization: cluster-apps] -->|Creates| id6>Kustomization: glauth]
  id2>Kustomization: cluster-apps] -->|Creates| id8>Kustomization: authelia]
  id2>Kustomization: cluster-apps] -->|Creates| id5>Kustomization: postgres-cluster]
  id3>Kustomization: postgres] -->|Creates| id4[HelmRelease: postgres];
  id5>Kustomization: postgres-cluster] -->|Depends on| id3>Kustomization: postgres];
  id5>Kustomization: postgres-cluster] -->|Creates| id10[Postgres Cluster];
  id6>Kustomization: glauth] -->|Creates| id7(HelmRelease: glauth);
  id8>Kustomization: authelia] -->|Creates| id9(HelmRelease: authelia);
  id8>Kustomization: authelia] -->|Depends on| id5>Kustomization: postgres-cluster];
  id9(HelmRelease: authelia) -->|Depends on| id7(HelmRelease: glauth);

Networking

Click to see a high-level network diagram dns

☁️ Cloud Dependencies

While most of my infrastructure and workloads are self-hosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about three things. (1) Dealing with chicken/egg scenarios, (2) services I critically need whether my cluster is online or not and (3) The "hit by a bus factor" - what happens to critical apps (e.g. Email, Password Manager, Photos) that my family relies on when I no longer around.

Alternative solutions to the first two of these problems would be to host a Kubernetes cluster in the cloud and deploy applications like HCVault, Vaultwarden, ntfy, and Gatus; however, maintaining another cluster and monitoring another group of workloads is more work and will probably be more or equal out to the same costs as described below.

ServiceUseCost
1PasswordSecrets with External Secrets~$65/yr
CloudflareDomain and S3~$30/yr
GCPVoice interactions with Home Assistant over Google AssistantFree
GitHubHosting this repository and continuous integration/deploymentsFree
MigaduEmail hosting~$20/yr
PushoverKubernetes Alerts and application notifications$5 OTP
UptimeRobotMonitoring internet connectivity and external facing applications~$60/yr
Total: ~$20/mo

🌐 DNS

Home DNS

On my PiKVM I have Bind9, blocky and dnsdist deployed. In my cluster external-dns is deployed with the RFC2136 provider which syncs DNS records to bind9.

dnsdist is a DNS loadbalancer and has "downstream" DNS servers configured such as bind9 and blocky. All my clients use dnsdist as the upstream DNS server, this allows for more granularity with configuring DNS across my networks such as having all requests for my domain forward to bind9 on certain networks, or only using 1.1.1.1 instead of blocky on certain networks where adblocking isn't required.

Public DNS

Outside the external-dns instance mentioned above another instance is deployed in my cluster and configured to sync DNS records to Cloudflare. The only ingress this external-dns instance looks at to gather DNS records to put in Cloudflare are ones that have an ingress class name of external and contain an ingress annotation external-dns.alpha.kubernetes.io/target.


πŸ”§ Hardware

Click to see the rack! rack
DeviceCountOS Disk SizeData Disk SizeRamOperating SystemPurpose
Intel NUC8i5BEH31TB SSD1TB NVMe (rook-ceph)64GBTalosKubernetes Controllers
Intel NUC8i7BEH31TB SSD1TB NVMe (rook-ceph)64GBTalosKubernetes Workers
PowerEdge T34012TB SSD64GBUbuntu 22.04NFS + Backup Server
Lenovo SA1201-10x22TB ZFS (mirrored vdevs)--DAS
PiKVM (RasPi 4)164GB (SD)-4GBPiKVM (Arch)Network DNS & KVM
TESmart 8 Port KVM Switch1----Network KVM (for PiKVM)
Unifi UDMP Max1-2x12TB HDD--Router & NVR
Unifi US-16-XG1----10Gb Core Switch
Unifi USW-Enterprise-24-PoE1----2.5Gb PoE Switch
Unifi USP PDU Pro1----PDU
APC SMT1500RM2U w/ NIC1----UPS

⭐ Stargazers

Star History Chart


🀝 Gratitude and Thanks

Thanks to all the people who donate their time to the Home Operations Discord community. Be sure to check out kubesearch.dev for ideas on how to deploy applications or get ideas on what you could deploy.

NAS

My NAS configuration as documentation currently using Ubuntu 22.04

Packages

  1. Add Fish PPA

    sudo apt-add-repository ppa:fish-shell/release-3
    
  2. Install Packages

    sudo apt install -y apt-transport-https ca-certificates containernetworking-plugins curl ffmpeg figlet fish fzf gettext git htop ifenslave iputils-ping net-tools lolcat mailutils msmtp msmtp-mta nano neofetch ntpdate podman psmisc rclone software-properties-common tmux tree uidmap unzip zfs-zed zfsutils-linux dmraid gdisk hdparm lvm2 nfs-common nfs-kernel-server nvme-cli open-iscsi samba samba-vfs-modules smartmontools socat
    

Networking

  1. Add or replace file /etc/netplan/00-installer-config.yaml

    network:
      version: 2
      ethernets:
        enp6s0f0:
          dhcp4: yes
          mtu: 9000
        enp6s0f1:
          dhcp4: yes
          mtu: 9000
    

ZFS

Mirrored Zpool

  1. Create initial pool and set configuration

    sudo zpool create -o ashift=12 -f eros mirror \
        /dev/disk/by-id/scsi-SATA_WDC_WD120EDGZ-11_9LHWA5KG \
        /dev/disk/by-id/scsi-SATA_WDC_WD120EMFZ-11_9MG0AHZA
    sudo zfs set atime=off eros
    sudo zfs set compression=lz4 eros
    
  2. Attach more mirrors

    sudo zpool add eros mirror \
        /dev/disk/by-id/scsi-SATA_ST12000VN0007-2G_ZCH0F1CH \
        /dev/disk/by-id/scsi-SATA_WDC_WD120EMFZ-11_X1G3B01L
    
  3. Add spares

    sudo zpool add -f eros spare \
        /dev/disk/by-id/scsi-SATA_WDC_WD120EMFZ-11_QGGETR5T
    

Datasets

  1. Create datasets

    sudo zfs create eros/Apps
    sudo zfs create eros/Apps/Frigate
    sudo zfs create eros/Media
    
  2. Share dataset over NFS

    sudo zfs set \
        sharenfs="no_subtree_check,all_squash,anonuid=568,anongid=100,rw=@192.168.42.0/24,rw=@192.168.1.0/24,ro=192.168.150.21,ro=192.168.150.28" \
        eros/Media
    sudo zfs set \
        sharenfs="no_subtree_check,all_squash,anonuid=568,anongid=100,rw=@192.168.42.0/24,rw=@192.168.1.0/24" \
        eros/Apps/Frigate
    
  3. Dataset Permissions

    sudo chmod 770 /eros/Media
    sudo chown -R devin:users /eros/Media
    

Snapshots

  1. Install zrepl by following these instructions.

  2. Add or replace the file /etc/zrepl/zrepl.yml

    global:
      logging:
        - type: syslog
          format: human
          level: warn
      monitoring:
        - type: prometheus
          listen: :9811
          listen_freebind: true
    
    jobs:
      - name: daily
        type: snap
        filesystems:
          "eros<": true
        snapshotting:
          type: cron
          cron: "0 3 * * *"
          prefix: zrepl_daily_
          timestamp_format: dense
        pruning:
          keep:
            - type: last_n
              count: 7
              regex: "^zrepl_daily_.*$"
    
  3. Start and enable zrepl

    sudo systemctl enable --now zrepl.service
    
  4. Give a local user access to a specific datasets snapshots

    sudo zfs allow -u jeff send,snapshot,hold eros/Media
    

NFS

Force NFS 4 and update threads

  1. Add or replace file /etc/nfs.conf.d/local.conf

    [nfsd]
    vers2 = n
    vers3 = n
    threads = 16
    
    [mountd]
    manage-gids = 1
    

Non ZFS NFS Shares

  1. Add or replace file /etc/exports.d/local.exports

    /share/PVCs 192.168.1.0/24(sec=sys,rw,no_subtree_check,all_squash,anonuid=568,anongid=100)
    /share/PVCs 192.168.42.0/24(sec=sys,rw,no_subtree_check,all_squash,anonuid=568,anongid=100)
    
  2. Dataset Permissions

    sudo chmod 770 /share/PVCs
    sudo chown -R devin:users /share/PVCs
    
  3. Reload exports

    sudo exportfs -arv
    

Time Machine

  1. Install required tools

    sudo apt install samba samba-vfs-modules
    
  2. Create ZFS datasets and update permissions

    sudo zfs create eros/TimeMachine
    sudo zfs create eros/TimeMachine/devin
    sudo zfs create eros/TimeMachine/louie
    sudo chown -R devin:users /eros/TimeMachine
    sudo chmod -R 770 /eros/TimeMachine
    
  3. Set a smb password for user

    sudo smbpasswd -a devin
    
  4. Add or replace file /etc/samba/smb.conf

    [global]
    min protocol = SMB2
    ea support = yes
    vfs objects = fruit streams_xattr
    fruit:aapl = yes
    fruit:metadata = stream
    fruit:model = MacSamba
    fruit:posix_rename = yes
    fruit:veto_appledouble = no
    fruit:nfs_aces = no
    fruit:wipe_intentionally_left_blank_rfork = yes
    fruit:delete_empty_adfiles = yes
    spotlight = no
    
    [devin]
    comment = Devin's Time Machine
    fruit:time machine = yes
    fruit:time machine max size = 1050G
    path = /eros/TimeMachine/devin
    browseable = yes
    write list = devin
    create mask = 0600
    directory mask = 0700
    case sensitive = true
    default case = lower
    preserve case = no
    short preserve case = no
    
    [louie]
    comment = Louie's Time Machine
    fruit:time machine = yes
    fruit:time machine max size = 1050G
    path = /eros/TimeMachine/louie
    browseable = yes
    write list = devin
    create mask = 0600
    directory mask = 0700
    case sensitive = true
    default case = lower
    preserve case = no
    short preserve case = no
    
  5. Restart samba

    sudo systemctl restart smbd.service
    
  6. Set up Time Machine on MacOS

    sudo tmutil setdestination -a smb://devin:${smbpasswd}@expanse.turbo.ac/devin
    

System

  1. Disable apparmor

    sudo systemctl stop apparmor
    sudo systemctl mask apparmor
    
  2. Disable mitigations and apparmor in grub

    sudo nano /etc/default/grub
    # GRUB_CMDLINE_LINUX="apparmor=0 mitigations=off"
    sudo update-grub
    sudo reboot
    
  3. Disable swap

    sudo swapoff -a
    sudo sed -i '/ swap / s/^/#/' /etc/fstab
    

Notifications

Info

Restart msmtpd after changing: sudo systemctl restart msmtpd.service

  1. Add or replace file /etc/aliases

    mailer-daemon: postmaster@
    postmaster: root@
    nobody: root@
    hostmaster: root@
    usenet: root@
    news: root@
    webmaster: root@
    www: root@
    ftp: root@
    abuse: root@
    noc: root@
    security: root@
    root: devin@buhl.casa
    

Info

Restart msmtpd after changing: sudo systemctl restart msmtpd.service

  1. Add or replace file /etc/msmtprc

    defaults
    auth off
    tls  off
    tls_trust_file /etc/ssl/certs/ca-certificates.crt
    logfile /var/log/msmtp
    
    account        maddy
    host           smtp-relay.turbo.ac
    port           25
    from           devin@buhl.casa
    tls_starttls   off
    
    account default: maddy
    
    aliases /etc/aliases
    

Info

Restart smartd after changing: sudo systemctl restart smartd.service

  1. Add or replace file /etc/smartd.conf

    DEVICESCAN -a -o on -S on -n standby,q -s (S/../.././02|L/../../6/03) -W 4,35,40 -m devin+alerts@buhl.casa
    

Info

Restart zed after changing: sudo systemctl restart zed.service

  1. Add or replace file /etc/zfs/zed.d/zed.rc

    ZED_DEBUG_LOG="/var/log/zed.debug.log"
    ZED_EMAIL_ADDR="devin@buhl.casa"
    ZED_EMAIL_PROG="mail"
    ZED_EMAIL_OPTS="-s '@SUBJECT@' @ADDRESS@ -r devin+alerts@buhl.casa"
    ZED_NOTIFY_VERBOSE=1
    ZED_NOTIFY_DATA=1
    ZED_USE_ENCLOSURE_LEDS=1
    

Misc

Replace a Drive

Info

Enable autoexpand on the pool with sudo zpool set autoexpand=on eros

sudo zpool offline eros /dev/disk/by-id/scsi-SATA_ST12000VN0007-2G_ZCH0F1CH
sudo zpool replace eros /dev/disk/by-id/scsi-SATA_ST12000VN0007-2G_ZCH0F1CH /dev/disk/by-id/scsi-SATA_ST22000NM001E-3H_ZX201HJC

When resilvering is complete detach the drive...

sudo zpool detach eros /dev/disk/by-id/scsi-SATA_ST12000VN0007-2G_ZCH0F1CH

Badblocks

Warning

This command is very destructive and should only be used to check for bad sectors, this also take ashile so be sure to start it in a screen

sudo badblocks -b 4096 -wsv /dev/disk/by-id/scsi-SATA_ST12000VN0007-2G_ZJV01MC5

Shred

Warning

This command is very destructive and should only be used to completely wipe the drive, this also take ashile so be sure to start it in a screen

sudo shred -vfz -n 4 /dev/disk/by-id/scsi-SATA_ST12000VN0007-2G_ZJV01MC5

Lenovo SA120

Due to the loudness of the fans, they can be adjusted by using AndrewX192/lenovo-sa120-fanspeed-utility.

Opnsense

Warning

I am no longer use Opnsense therefor this document will likely not be updated anymore.

BGP

Instead of using Metallb for L2/L3 load balancer IPs I am using the Kubernetes Calico CNI with BGP which allows me to advertise load balancer IPs directly over BGP. This has some benefits like having equal cost multipath (ECMP) for scaled workloads in my cluster.

  1. Routing > BPG | General
    1. enable = true
    2. BGP AS Number = 64512
    3. Network = 192.168.42.0/24 (Subnet your Kubernetes nodes are on)
    4. Save
  2. Routing > BGP | Neighbors
    • Add a neighbor for each Kubernetes node
      1. Enabled = true
      2. Peer-IP = 192.168.42.x (Kubernetes Node IP)
      3. Remote AS = 64512
      4. Update-Source Interface = HOME_SERVER (VLAN of Kubernetes nodes)
      5. Save
      6. Continue adding neighbors until all your nodes are present
  3. Routing > General
    1. Enable = true
    2. Save
  4. System > Settings > Tunables
    1. Add net.route.multipath and set the value to 1
    2. Save
  5. Reboot
  6. Verify
    1. Routing > Diagnostics | Summary

Warning

Without updating the configuration described in step 4 the routes from a client will only take a single path to your Kubernetes workloads even if they are scaled to more than one.

HAProxy

While kube-vip is very nice for having a API server ready to go and running in your cluster I had issues with mixing layer 2 and layer 3 between Calico in BGP and kube-vip using L2 ARP. You also cannot run Calico in BGP with kube-vip in BGP, they will fight and you will lose. Instead I choose to use Haproxy which you can install from the Opnsense Plugins.

  1. Services > HAProxy | Real Servers
    • Add a server for each master node in your Kubernetes cluster
      1. Enabled = true
      2. Name or Prefix = k8s-apiserver-x
      3. FQDN or IP = 192.168.42.x
      4. Port = 6443
      5. Verify SSL Certificate = false
      6. Apply/Save
      7. Continue adding servers until all your master nodes are present
  2. Services > HAProxy | Rules & Checks > Health Monitors
    1. Name = k8s-apiserver-health
    2. SSL preferences = Force SSL for health checks
    3. Port to check = 6443
    4. HTTP method = GET
    5. Request URI = /healthz
    6. HTTP version = HTTP/1.1
    7. Apply/Save
  3. Services > HAProxy | Virtual Services > Backend Pools
    1. Enabled = true
    2. Name = k8s-apiserver-be
    3. Mode = TCP (Layer 4)
    4. Servers = k8s-apiserver-x ... (Add one for each server you created. Use TAB key to complete typing each server)
    5. Source address = 192.168.1.1 (Your Opnsense IP address)
    6. Enable Health Checking = true
    7. Health Monitor = k8s-apiserver-health
    8. Apply/Save
  4. Services > HAProxy | Virtual Services > Public Services
    1. Enabled = true
    2. Name = k8s-apiserver-fe
    3. Listen Addresses = 192.168.1.1:6443 (Your Opnsense IP address. Use TAB key to complete typing a listen address)
    4. Type = TCP
    5. Default Backend Pool = k8s-apiserver-be
    6. Apply/Save
  5. Services > HAProxy | Settings > Service
    1. Enable HAProxy = true
    2. Apply/Save
  6. Services > HAProxy | Settings > Global Parameters
    1. Verify SSL Server Certificates = disable-verify
    2. Apply/Save
  7. Services > HAProxy | Settings > Default Parameters
    1. Client Timeout = 4h
    2. Connection Timeout = 10s
    3. Server Timeout = 4h
    4. Apply/Save

Receive Side Scaling (RSS)

RSS is used to distribute packets over CPU cores using a hashing function – either with support in the hardware which offloads the hashing for you, or in software. Click here to learn more about it.

  1. System > Settings > Tunables
    1. Add net.inet.rss.enabled and set the value to 1
    2. Add net.inet.rss.bits and set to 2
    3. Add net.isr.dispatch and set to hybrid
    4. Add net.isr.bindthreads and set to 1
    5. Add net.isr.maxthreads and set to -1
    6. Save
  2. Reboot
  3. Verify with sudo netstat -Q
    Configuration:
    Setting                        Current        Limit
    Thread count                         8            8
    Default queue limit                256        10240
    Dispatch policy                 hybrid          n/a
    Threads bound to CPUs          enabled          n/a
    

Syslog

Firewall logs are being sent to Vector which is running in my Kubernetes cluster. Vector is then shipping the logs to Loki which is also running in my cluster.

  1. System > Settings > Logging / targets
    • Add new logging target
      1. Enabled = true
      2. Transport = UDP(4)
      3. Applications = filter (filterlog)
      4. Hostname = 192.168.69.111 (Loki's Load Balancer IP)
      5. Port = 5140
      6. rfc5424 = true
      7. Save

SMTP Relay

To ease the use of application configuration I have a SMTP Relay running on Opnsense using the Postfix plugin. From applications deployed in my Kubernetes cluster, to my nas, to my printer, all use the same configuration for SMTP without authentication.

  1. System > Services > Postfix > General
    1. SMTP Client Security = encrypt
    2. Smart Host = [smtp.fastmail.com]:465
    3. Enable SMTP Authentication = true
    4. Authentication Username = devin@<email-domain>
    5. Authentication Password = <app-password>
    6. Permit SASL Authenticated = false
    7. Save
  2. System > Services > Postfix > Domains
    • Add new domain
      1. Domainname = <email-domain>
      2. Destination = [smtp.fastmail.com]:465
      3. Save
    • Apply
  3. System > Services > Postfix > Senders
    • Add new sender
      1. Enabled = true
      2. Sender Address = admin@<email-domain>
      3. Save
    • Apply
  4. Verify
    swaks --server opnsense.turbo.ac --port 25 --to <email-address> --from <email-address>
    

PiKVM

Load TESmart KVM

  1. Add or replace the file /etc/kvmd/override.yaml

    ---
    
    nginx:
      https:
        enabled: false
    
    kvmd:
      auth:
        enabled: false
      prometheus:
        auth:
          enabled: false
      atx:
        type: disabled
      streamer:
        desired_fps:
          default: 20
        h264_bitrate:
          default: 2500
        h264_gop:
          default: 30
        quality: 75
      gpio:
        drivers:
          tes:
            type: tesmart
            host: 192.168.1.10
            port: 5000
          wol_dev2:
            type: wol
            mac: 1c:69:7a:0d:8d:99
          wol_dev3:
            type: wol
            mac: 1c:69:7a:0e:f7:ed
          wol_dev4:
            type: wol
            mac: 1c:69:7a:0d:62:d4
          wol_dev5:
            type: wol
            mac: 94:c6:91:a7:7b:2b
          wol_dev6:
            type: wol
            mac: 94:c6:91:af:15:3d
          wol_dev7:
            type: wol
            mac: 1c:69:7a:09:bf:39
          reboot:
            type: cmd
            cmd: ["/usr/bin/sudo", "reboot"]
          restart_service:
            type: cmd
            cmd: ["/usr/bin/sudo", "systemctl", "restart", "kvmd"]
        scheme:
          dev0_led:
            driver: tes
            pin: 0
            mode: input
          dev0_btn:
            driver: tes
            pin: 0
            mode: output
            switch: false
          dev1_led:
            driver: tes
            pin: 1
            mode: input
          dev1_btn:
            driver: tes
            pin: 1
            mode: output
            switch: false
          dev2_led:
            driver: tes
            pin: 2
            mode: input
          dev2_btn:
            driver: tes
            pin: 2
            mode: output
            switch: false
          dev2_wol:
            driver: wol_dev2
            pin: 0
            mode: output
            switch: false
          dev3_led:
            driver: tes
            pin: 3
            mode: input
          dev3_btn:
            driver: tes
            pin: 3
            mode: output
            switch: false
          dev3_wol:
            driver: wol_dev3
            pin: 0
            mode: output
            switch: false
          dev4_led:
            driver: tes
            pin: 4
            mode: input
          dev4_btn:
            driver: tes
            pin: 4
            mode: output
            switch: false
          dev4_wol:
            driver: wol_dev4
            pin: 0
            mode: output
            switch: false
          dev5_led:
            driver: tes
            pin: 5
            mode: input
          dev5_btn:
            driver: tes
            pin: 5
            mode: output
            switch: false
          dev5_wol:
            driver: wol_dev5
            pin: 0
            mode: output
            switch: false
          dev6_led:
            driver: tes
            pin: 6
            mode: input
          dev6_btn:
            driver: tes
            pin: 6
            mode: output
            switch: false
          dev6_wol:
            driver: wol_dev6
            pin: 0
            mode: output
            switch: false
          dev7_led:
            driver: tes
            pin: 7
            mode: input
          dev7_btn:
            driver: tes
            pin: 7
            mode: output
            switch: false
          dev7_wol:
            driver: wol_dev7
            pin: 0
            mode: output
            switch: false
          reboot_button:
            driver: reboot
            pin: 0
            mode: output
            switch: false
          restart_service_button:
            driver: restart_service
            pin: 0
            mode: output
            switch: false
        view:
          header:
            title: Devices
          table:
            - ["#pikvm", "pikvm_led|green", "restart_service_button|confirm|Service", "reboot_button|confirm|Reboot"]
            - ["#0", "dev0_led", "dev0_btn | KVM"]
            - ["#1", "dev1_led", "dev1_btn | KVM"]
            - ["#2", "dev2_led", "dev2_btn | KVM", "dev2_wol | WOL"]
            - ["#3", "dev3_led", "dev3_btn | KVM", "dev3_wol | WOL"]
            - ["#4", "dev4_led", "dev4_btn | KVM", "dev4_wol | WOL"]
            - ["#5", "dev5_led", "dev5_btn | KVM", "dev5_wol | WOL"]
            - ["#6", "dev6_led", "dev6_btn | KVM", "dev6_wol | WOL"]
            - ["#7", "dev7_led", "dev7_btn | KVM", "dev7_wol | WOL"]
    
  2. Restart kvmd

    systemctl restart kvmd.service
    

Monitoring

Install node-exporter

pacman -S prometheus-node-exporter
systemctl enable --now prometheus-node-exporter

Install promtail

  1. Install promtail

    pacman -S promtail
    systemctl enable promtail
    
  2. Override the promtail systemd service

    mkdir -p /etc/systemd/system/promtail.service.d/
    cat >/etc/systemd/system/promtail.service.d/override.conf <<EOL
    [Service]
    Type=simple
    ExecStart=
    ExecStart=/usr/bin/promtail -config.file /etc/loki/promtail.yaml
    EOL
    
  3. Add or replace the file /etc/loki/promtail.yaml

    server:
      log_level: info
      disable: true
    
    client:
      url: "https://loki.devbu.io/loki/api/v1/push"
    
    positions:
      filename: /tmp/positions.yaml
    
    scrape_configs:
      - job_name: journal
        journal:
          path: /run/log/journal
          max_age: 12h
          labels:
            job: systemd-journal
        relabel_configs:
          - source_labels: ["__journal__systemd_unit"]
            target_label: unit
          - source_labels: ["__journal__hostname"]
            target_label: hostname
    
  4. Start promtail

    systemctl daemon-reload
    systemctl enable --now promtail.service
    

Proxmox Considerations

I am using bare metal nodes but here's some considerations when using Kubernetes on Proxmox. These are just my opinions gathered from experience I've witnessed first or second hand. I will always advocate for bare metal Kubernetes due to the overhead of VMs and disk concerns, however following along below will net you a very stable Kubernetes cluster on PVE.

Warning

Preface: etcd needs 3 master/control plane nodes for quorum also it is really read/write intensive and requires low iops/latency. With using the same disk for all master nodes and due to the way etcd works anytime a commit happens to etcd (which is probably hundreds of times per second), it will flood the same filesystem with 3x the amount of reads and writes

Now if you layer on Longhorn or rook-ceph to the same filesystem you are just asking for trouble, because that is also replicated.

Single Node PVE Cluster

  1. Use physical separate disks used for the PVE install, k8s VMs and Longhorn/rook-ceph
  2. Don't put k8s VMs or Longhorn/rook-ceph on HDDs, only use SSDs or NVMe
  3. Use k3s with a single master node (4CPU/8GB RAM/50GB disk) that is using sqlite instead of etcd and taint it.
  4. Use as many worker nodes as you want but start with 3 and add more later on if you need them.
  5. Consider using local-path-provisioner over Longhorn or rook-ceph if you aren't able physically separate the disks.

Dual node PVE Cluster

Buy another node for your PVE cluster or refer to Single Node PVE Cluster, however if you must...

  1. Use k3s with a dual master nodes (2vCPU/8GB RAM/50GB disk each) that is using postgresql/mariadb/mysql (in that order) instead of etcd.
  2. Put the postgresql/mysql/mariadb database on a VM on your first PVE cluster. However, without some architecting this means that your cluster store is not highly available and is a single point of failure.
  3. Evenly spread out your k8s masters and workers across each PVE node
    • In a 2 master/3 worker setup put one master on each PVE node and try to even out the workers on each PVE node.
  4. Consider using local-path-provisioner for application config data over Longhorn or rook-ceph if you aren't able physically separate the disks between Proxmox, VMs and Longhorn/rook-ceph.

Tripe node PVE Cluster

  1. Use physical separate disks used for the PVE install, k8s VMs and Longhorn
  2. Don't put k8s VMs or Longhorn on HDDs, only use SSDs or NVMe
  3. Evenly spread out your k8s masters and workers across each PVE node
    • In a 3 master/3 worker setup put one master on each PVE node and one worker on each PVE node.
  4. Instead of Longhorn, consider setting up a Ceph cluster on your PVE nodes and use Rook to consume it for stateful applications. Due to the way Ceph works in this scenerio, it is fine to use HDDs over SSDs or NVMe here.

S3 buckets

Alternatively creating s3 buckets can be automated with Terraform.

b2

Info

This requires installing the Backblaze b2 CLI tool

Creating a bucket

  1. Create master key-id and key on Account > App Keys

  2. Export settings

    export B2_APPLICATION_KEY_ID="<key-id>"
    export B2_APPLICATION_KEY="<key>"
    export B2_BUCKET_NAME="<bucket-name>"
    
  3. Create the bucket

    b2 create-bucket "${B2_BUCKET_NAME}" allPrivate \
      --defaultServerSideEncryption "SSE-B2"  \
      --lifecycleRules '[{"daysFromHidingToDeleting": 1,"daysFromUploadingToHiding": null,"fileNamePrefix": ""}]'
    
  4. Create the bucket username and password

    b2 create-key --bucket "${B2_BUCKET_NAME}" "${B2_BUCKET_NAME}" \
      listBuckets,readBuckets,listFiles,readFiles,writeFiles,readBucketEncryption,readBucketReplications,readBucketRetentions,readFileRetentions,writeFileRetentions,readFileLegalHolds
    

Minio

Info

This requires installing the Minio mc CLI tool

Creating a Bucket

  1. Create the Minio CLI configuration file (~/.mc/config.json)

    mc alias set minio "https://s3.<domain>.<tld>" "<access-key>" "<secret-key>"
    
  2. Export settings

    export BUCKET_NAME="<bucket-name>" # also used for the bucket username
    export BUCKET_PASSWORD="$(openssl rand -hex 20)"
    echo $BUCKET_PASSWORD
    
  3. Create the bucket username and password

    mc admin user add minio "${BUCKET_NAME}" "${BUCKET_PASSWORD}"
    
  4. Create the bucket

    mc mb "minio/${BUCKET_NAME}"
    
  5. Create the user policy document

    cat <<EOF > /tmp/user-policy.json
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": [
                    "s3:ListBucket",
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:DeleteObject"
                ],
                "Effect": "Allow",
                "Resource": ["arn:aws:s3:::${BUCKET_NAME}/*", "arn:aws:s3:::${BUCKET_NAME}"],
                "Sid": ""
            }
        ]
    }
    EOF
    
  6. Apply the bucket policies

    mc admin policy add minio "${BUCKET_NAME}-private" /tmp/user-policy.json
    
  7. Associate private policy with the user

    mc admin policy set minio "${BUCKET_NAME}-private" "user=${BUCKET_NAME}"
    

Allow public access to certain objects in the bucket

Info

This step is optional and not needed unless you want to make certain objects public to the internet

  1. Create the bucket policy document and update the folders that should be public

    cat <<EOF > /tmp/bucket-policy.json
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "AWS": [
                        "*"
                    ]
                },
                "Action": [
                    "s3:GetBucketLocation"
                ],
                "Resource": [
                    "arn:aws:s3:::${BUCKET_NAME}"
                ]
            },
            {
                "Effect": "Allow",
                "Principal": {
                    "AWS": [
                        "*"
                    ]
                },
                "Action": [
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::${BUCKET_NAME}"
                ],
                "Condition": {
                    "StringEquals": {
                        "s3:prefix": [
                            "avatars",
                            "public"
                        ]
                    }
                }
            },
            {
                "Effect": "Allow",
                "Principal": {
                    "AWS": [
                        "*"
                    ]
                },
                "Action": [
                    "s3:GetObject"
                ],
                "Resource": [
                    "arn:aws:s3:::${BUCKET_NAME}/avatars*",
                    "arn:aws:s3:::${BUCKET_NAME}/public*"
                ]
            }
        ]
    }
    EOF
    
  2. Associate public policy with the bucket

    mc anonymous set-json /tmp/bucket-policy.json "minio/${BUCKET_NAME}"
    

Sharing an object in a bucket

mc share download --expire=7d "minio/<bucket-name>/<file>.<ext>" --json  | jq -r .share | pbcopy

Secret variations with Flux

There are several different ways to utilize Kubernetes secrets when using Flux and SOPS, here’s a breakdown of some common methods.

I will not be covering how to integrate SOPS into Flux for that be sure to check out the Flux documentation on integrating SOPS

Example Secret

Info

The three following methods will use this secret as an example.

apiVersion: v1
kind: Secret
metadata:
  name: application-secret
  namespace: default
stringData:
  AWESOME_SECRET: "SUPER SECRET VALUE"

Method 1: envFrom

Use envFrom in a deployment or a Helm chart that supports the setting, this will pass all secret items from the secret into the containers environment.

envFrom:
  - secretRef:
      name: application-secret

Example

View example Helm Release and corresponding Secret.

Method 2: env.valueFrom

Similar to the above but it's possible with env to pick an item from a secret.

env:
  - name: WAY_COOLER_ENV_VARIABLE
    valueFrom:
      secretKeyRef:
        name: application-secret
        key: AWESOME_SECRET

Example

View example Helm Release and corresponding Secret.

Method 3: spec.valuesFrom

The Flux HelmRelease option valuesFrom can inject a secret item into the Helm values of a HelmRelease

  • Does not work with merging array values
  • Care needed with keys that contain dot notation in the name
valuesFrom:
  - targetPath: config."admin\.password"
    kind: Secret
    name: application-secret
    valuesKey: AWESOME_SECRET

Example

View example Helm Release and corresponding Secret.

Method 4: Variable Substitution with Flux

Flux variable substitution can inject secrets into any YAML manifest. This requires the Flux Kustomization configured to enable variable substitution. Correctly configured this allows you to use ${GLOBAL_AWESOME_SECRET} in any YAML manifest.

apiVersion: v1
kind: Secret
metadata:
  name: cluster-secrets
  namespace: flux-system
stringData:
  GLOBAL_AWESOME_SECRET: "GLOBAL SUPER SECRET VALUE"
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
# ...
spec:
# ...
  decryption:
    provider: sops
    secretRef:
      name: sops-age
  postBuild:
    substituteFrom:
      - kind: Secret
        name: cluster-secrets

Example

View example Fluxtomization, Helm Release, and corresponding Secret.

Final Thoughts

  • For the first three methods consider using a tool like stakater/reloader to restart the pod when the secret changes.

  • Using reloader on a pod using a secret provided by Flux Variable Substitution will lead to pods being restarted during any change to the secret while related to the pod or not.

  • The last method should be used when all other methods are not an option, or used when you have a β€œglobal” secret used by a bunch of YAML manifests.

YAML Madness

YAML aliases, anchors and overrides are a great way to keep your manifests DRY (Do not Repeat Yourself) but only on a very basic level.

Anchors and Aliases

Note

The anchor operator & is a way to define a variable and the alias character * is a way to reference the value defined in the anchor.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: &app "awesome-app"
  namespace: default
  labels:
    app.kubernetes.io/name: *app

this will be rendered out to...

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: awesome-app
  namespace: default
  labels:
    app.kubernetes.io/name: "awesome-app"

Overrides

Note

The << operator allows referencing a block of YAML as many times as needed.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: &app "awesome-app"
  namespace: default
  labels: &labels
    app.kubernetes.io/instance: *app
    app.kubernetes.io/name: *app
spec:
  selector:
    matchLabels:
      <<: *labels

this will be rendered out to...

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: "awesome-app"
  namespace: default
  labels:
    app.kubernetes.io/instance: "awesome-app"
    app.kubernetes.io/name: "awesome-app"
spec:
  selector:
    matchLabels:
      app.kubernetes.io/instance: "awesome-app"
      app.kubernetes.io/name: "awesome-app"

Important Notes

  • Defining an anchor, alias or override cannot be referenced in separate YAML docs whether it is in the same file or not.
  • You absolutely cannot concat, or do any advanced string functions on anchors, aliases or overrides.
  • Try to make sure your YAML is comprehensible, don't get hung up on making DRY an absolute rule to follow.