Reordering systemd services

Use case

As I still only have one public IP address I run my private mail server behind an HAProxy instance. At the same time I use Postfix on my servers to provide me with system information (anything from information on system updates to hardware failures). Naturally the mail service listeners in HAProxy collide with those of the local Postfix installation on the reverse proxy server. Every now and then this caused issues when Postfix managed to start before HAProxy, and stole the network ports from under its feet.

Solution

In systemd based distributions one “right way” to get around this issue is to override the service defaults for the Postfix service, so it doesn’t attempt to start until after HAProxy has started. We don’t want to mess with the package maintainer’s service files as they can change over time by necessity. Instead we should override the service defaults.

sudo systemctl edit postfix.service

The above command does the magic involved in creating an override (creates a file /etc/systemd/system/servicename.service.d/override.conf, and then runs systemctl daemon-reload once you’re done editing so the changes can take hold on the next service start).

Inside the override configuration file we just add a Unit block and add an After clause:

[Unit]
After=haproxy.service

That’s all, really. Save the file and on the next system reboot the services should start in the correct order.

(As I write this we’re approaching the tenth anniversary of World IPv6 Launch Day and most ISPs in Sweden still don’t hand out native IPv6 subnets to their clients but increasingly move them to IPv4 CGNAT despite the obvious issues this creates when attempting to present anything to the Internet, from “serious” web services to game servers!)

Restoring an accidentally migrated mail user to On-Prem Exchange

We recently migrated most of our users to Office 365, and due to a miscommunication, three users that should have stayed on premises were migrated, converted to the RecipientTypeDetails RemoteUserMailbox, and had their local mailboxes disconnected.

Reconnecting their mailboxes failed as they were of the wrong user type:

Connect-Mailbox -Identity "Name Surname" -Database "DB10" -User "Name Surname"
This task does not support recipients of this type. The specified recipient Name Surname is of type User. Please make sure that this recipient matches the required recipient type for this task.
    + CategoryInfo          : InvalidArgument: (Name Surname:UserIdParameter) [Connect-Mailbox], RecipientTaskException
    + FullyQualifiedErrorId : E9DDBACA,Microsoft.Exchange.Management.MapiTasks.ConnectMailbox

The solution was to remove the Exchange properties from the user object:

Get-Recipient -Identity "Name Surname" | Disable-RemoteMailbox

After confirming this is what we want to do, the mailbox could be safely reconnected.

However: After reconnecting the mailboxes, the users still couldn’t start Outlook: The program failed to start with an error message “The set of folders cannot be opened”.

The issue here is that DSAccess caches an erroneous query, and the solution is to update the status of disconnected or soft-deleted mailboxes:

Clean-MailboxDatabase -Identity "DB10"

After this, the users could continue working as usual.

Build your own router with nftables – Part 1

Introduction

A few years ago, Jim Salter wrote a number of articles for Ars Technica related to his “homebrew routers“. Much of what he wrote then still stands, but time marches on, and now that I rebuilt my home router, I figured the lessons should be translated to a modern Ubuntu installation and the more approachable nftables syntax.

The hardware

Any old thing with a couple of network interfaces will do fine. In my case I already had a nice machine for the purpose; a solid state 4-NIC mini PC from Qotom.

The goal

What I wanted to achieve was to replicate my current pfSense functionality with tools completely under my control. This includes being able to access the Internet (router), convert human-readable names into IP addresses and vice versa (DNS), and automatically assign IP addresses to devices on my networks (DHCP) – all of these of course are standard functionality you get with any home router. Since I run some web services from home, I also need to allow select incoming traffic to hit the correct server in my house.

Base installation

I chose the latest LTS release of Ubuntu server for my operating system. Other systems are available, but this is an environment in which I’m comfortable. The installation is mostly a matter of pressing Next a lot, with a couple of exceptions:

First of all, there’s a network configuration screen that fulfills an important purpose: Connect your network cable to a port in the computer and take note of which logical network interface reacts in the user interface. In my case the NIC marked 1 (which I intended to use for my Internet connection or WAN) is called enp1s0, and Interface 4 (which I intended to use for my local network or LAN) is called enp2s0. This will become important further down.

Second we want to make sure to enable the Secure Shell service already here in the installer, to allow remote access after the router goes headless.

After installation has finished, it’s good practice to patch the computer by running sudo apt update && sudo apt upgrade and then rebooting it.

Basic network configuration

The first thing to do after logging in, is to configure the network. The WAN port usually gets its address information automatically from your ISP, so for that interface we want to enable DHCP. The LAN port on the other hand will need a static configuration. All this is configured using Netplan in Ubuntu. The installer leaves a default configuration file in /etc/netplan, so let’s just edit that one:

network:
  ethernets:
    enp1s0:
      dhcp4: true
    enp2s0:
      dhcp4: false
      addresses: [10.199.200.1/24]
      nameservers:
        search: [mydomain.com]
        addresses: [10.199.200.1]
    enp3s0:
      dhcp4: false
    enp5s0:
      dhcp4: false
  version: 2

At this point it’s worth noting that if you already have something on the IP address 10.199.200.1 the two devices will fight it out and there’s no telling who will win – that’s why I chose an uncommon address in this howto.

To perform an initial test of the configuration, run sudo netplan try. To confirm the configuration, run sudo netplan apply.

A router will also need to be able to forward network packets from one interface to another. This is enabled by telling the kernel that we allow this functionality. By editing /etc/sysctl.conf we make the change permanent, and by reloading it using sysctl -p we make the changes take effect immediately.

(Bonus knowledge: The effect of the sed commandline below is to inline replace (-i) the effects of substituting (s) the commented-out string (starting with #) with the active one. We could edit the file instead – and if we don’t know exactly what we’re looking for that’s probably a faster way to get it right – but since I had just done it I knew the change I wanted to perform.)

sudo sed -i 's/#net.ipv4.ip_forward=1/net.ipv4.ip_forward=1/' /etc/sysctl.conf
sudo sysctl -p

Great, so our computer can get an IP address from our ISP, it has an IP address on our local network, and it can technically forward packets but we haven’t told it how yet. Now what?

Router

As mentioned, routing functionality in this case will be provided by nftables:

sudo apt install nftables

This is where things get interesting. This is my current /etc/nftables.conf file. This version is thoroughly commented to show how the various instructions fit together

#!/usr/sbin/nft -f

# Clear out any existing rules
flush ruleset

# Our future selves will thank us for noting what cable goes where and labeling the relevant network interfaces if it isn't already done out-of-the-box.
define WANLINK = enp1s0 # NIC1
define LANLINK = enp2s0 # NIC4

# I will be presenting the following services to the Internet. You perhaps won't, in which case the following line should be commented out with a # sign similar to this line.
define PORTFORWARDS = { http, https }

# We never expect to see the following address ranges on the Internet
define BOGONS4 = { 0.0.0.0/8, 10.0.0.0/8, 10.64.0.0/10, 127.0.0.0/8, 127.0.53.53, 169.254.0.0/16, 172.16.0.0/12, 192.0.0.0/24, 192.0.2.0/24, 192.168.0.0/16, 198.18.0.0/15, 198.51.100.0/24, 203.0.113.0/24, 224.0.0.0/4, 240.0.0.0/4, 255.255.255.255/32 }

# The actual firewall starts here
table inet filter {
    # Additional rules for traffic from the Internet
	chain inbound_world {
                # Drop obviously spoofed inbound traffic
                ip saddr { $BOGONS4 } drop
	}
    # Additional rules for traffic from our private network
	chain inbound_private {
                # We want to allow remote access over ssh, incoming DNS traffic, and incoming DHCP traffic
		ip protocol . th dport vmap { tcp . 22 : accept, udp . 53 : accept, tcp . 53 : accept, udp . 67 : accept }
	}
        # Our funnel for inbound traffic from any network
	chain inbound {
                # Default Deny
                type filter hook input priority 0; policy drop;
                # Allow established and related connections: Allows Internet servers to respond to requests from our Internal network
                ct state vmap { established : accept, related : accept, invalid : drop} counter

                # ICMP is - mostly - our friend. Limit incoming pings somewhat but allow necessary information.
		icmp type echo-request counter limit rate 5/second accept
		ip protocol icmp icmp type { destination-unreachable, echo-reply, echo-request, source-quench, time-exceeded } accept
                # Drop obviously spoofed loopback traffic
		iifname "lo" ip daddr != 127.0.0.0/8 drop

                # Separate rules for traffic from Internet and from the internal network
                iifname vmap { lo: accept, $WANLINK : jump inbound_world, $LANLINK : jump inbound_private }
	}
        # Rules for sending traffic from one network interface to another
	chain forward {
                # Default deny, again
		type filter hook forward priority 0; policy drop;
                # Accept established and related traffic
		ct state vmap { established : accept, related : accept, invalid : drop }
                # Let traffic from this router and from the Internal network get out onto the Internet
		iifname { lo, $LANLINK } accept
                # Only allow specific inbound traffic from the Internet (only relevant if we present services to the Internet).
		tcp dport { $PORTFORWARDS } counter
	}
}

# Network address translation: What allows us to glue together a private network with the Internet even though we only have one routable address, as per IPv4 limitations
table ip nat {
        chain  prerouting {
		type nat hook prerouting priority -100;
                # Send specific inbound traffic to our internal web server (only relevant if we present services to the Internet).
		iifname $WANLINK tcp dport { $PORTFORWARDS } dnat to 10.199.200.10
        }
	chain postrouting {
		type nat hook postrouting priority 100; policy accept;
                # Pretend that outbound traffic originates in this router so that Internet servers know where to send responses
		oif $WANLINK masquerade
	}
}

To enable the firewall, we’ll enable the nftables service, and load our configuration file:

sudo systemctl enable nftables.service && sudo systemctl start nftables.service
sudo /etc/nftables.conf

To look at our active ruleset, we can run sudo nft list ruleset.

At this point we have a working router and perimeter firewall for our network. What’s missing is DHCP, so that other devices on the network can get an IP address and access the network, and DNS, so that they can look up human-readable names like duckduckgo.com and convert them to IP addresses like 52.142.124.215. The basic functionality is extremely simple and I’ll detail it in the next few paragraphs, but doing it well is worth its own article, which will follow.

DNS

The simplest way to achieve DNS functionality is simply to install what the Internet runs on:

sudo apt install bind9

DHCP

We’ll run one of the most common DHCP servers here too:

sudo apt install isc-dhcp-server

DHCP not only tells clients their IP address, but it also tells them which gateway to use to access other networks and it informs them of services like DNS. To set up a basic configuration let’s edit /etc/dhcp/dhcpd.conf:

subnet 10.199.200.0 netmask 255.255.255.0 {
    range 10.199.200.100 10.199.200.254;
    option subnet-mask 255.255.255.0;
    option routers 10.199.200.1;
    option domain-name-servers 10.199.200.1;
}

Load the new settings by restarting the DHCP server:

systemctl restart isc-dhcp-server

And that’s it, really. Check back in for the next article which will describe how to make DNS and DHCP cooperate to enhance your local network quality of life.

Set up TPM support in vCenter on Dell R7515

Quick HowTo/reminder to myself on how to activate TPM on ESXi hosts connected to vCenter.

The smoothest way is to configure the servers before they are connected to vCenter: Otherwise they must be removed from the inventory and re-added.

The BIOS security settings must be correctly configured:

Dell R7515 BIOS menu with System Security highlighted

Select System Security.

Dell R7515 BIOS System Security submenu, TPM Security section

TPM Security must be turned On.

Dell R7515 BIOS TPM Advanced Settings submenu

Under the TPM Advanced Settings menu, TPM2 Algorithm Selection must be set to SHA256.

Dell R7515 System Security submenu, Secure Boot section

Back in the System Security menu, Secure Boot must be Enabled.

Boot the server and add it to vCenter.

Enable the SSH service and log on to the server. Check the TPM status:

# esxcli system settings encryption get | grep Mode
   Mode: NONE

Set the mode to TPM:

# esxcli system settings encryption set --mode TPM

Get the encryption keys and store them somewhere safe, like a password manager:

# esxcli system settings encryption recovery list
Recovery ID                             Key
--------------------------------------  ---
{....}                                  ....

In vCenter, you’ll see a warning for each host, about the encryption key backup status. This last step was what that warning was about. If you’re confident the recovery ID and Key for each host is securely stored, reset the warning to green. The hosts are now utilizing their TPM capability.

Fixing vSAN driver compatibility on Dell R7515

A while back, we purchased some vSAN Ready nodes for a new cluster. The machines came with ESXi installed in an all-NVMe configuration, but when setting up vSAN, Skyline Health kept complaining that the driver used for the write-intensive cache drives wasn’t certified for this purpose.

I opened support cases with both VMware and Dell as I was in a hurry to get the machines running but didn’t know where the problem lay – we had an identically specced cluster that had been manually installed with vSphere 7 earlier where this issue did not occur. Unfortunately none of the support cases ended with a viable resolution: I seem to have gotten stuck with first-line support in both cases and didn’t have time to nag my way to higher levels of support – the shibboleet code word never seems to work in real life.

I finally compared what drivers actually were in use on the new servers versus the old ones and realized the cache disks on the new servers erroneously used the intel-nvme-vmd driver, while on the older hosts all disks used VMware’s own nvme-pcie driver. The solution, then was very simple:

For each host, I first set the machine in Maintenance Mode, enabled the ssh service, and logged in.

I then verified my suspicion:

esxcli software vib list | grep nvme
(...)
intel-nvme-vmd                 2.5.0.1066-1OEM.700.1.0.15843807     INT      VMwareCertified   2021-04-19
nvme-pcie                      1.2.3.11-1vmw.702.0.0.17630552       VMW      VMwareCertified   2021-05-29
(...)

I removed the erroneously used driver:

esxcli software vib remove -n intel-nvme-vmd

And finally I rebooted the server. Rinse and repeat for each machine in the cluster.

After I was done, I re-checked Skyline Health for the cluster, and was greeted with the expected green tickmarks:

Image showing green tickmarks for all tested items.

Reflections on Proxmox VE

I’ve now been using Proxmox VE as a hypervisor in my home lab for a couple of years, and as I’ve reverted to plain Ubuntu Server + KVM, I figured I would try to summarize my thoughts on the product.

Proxmox VE can be described as a low-cost and open-source alternative to VMware vSphere with aspects of vSAN and NSX. The prospect is excellent, and the system scales beautifully all the way from a single (home) lab server with a single traditionally formatted hard drive up to entire clusters with distributed object storage via Ceph; all in a pretty much turnkey solution. If I was involved in setting up an on-prem IT environment for a small- to medium-sized business today, Proxmox VE would definitely be on my shortlist.

So if it’s so good, what made me go back to a regular server distribution?

Proxmox VE, like all complete solutions, works best when you understand the developers’ design paradigm and follow it – at least roughly. It is theoretically based on a Debian core, but the additional layers of abstraction want to take over certain functionality and it’s simply best to let them. Trying to apply a configuration that somehow competes with Proxmox VE will introduce some occasional papercuts to your life: containers that fail to come back up after a restart now and then, ZFS pools that occasionally don’t mount properly, etc. Note that I’m sure I caused these problems on my own by various customizations, so I’m not throwing any shade on the product per se, but the fact remains that I wanted to manage my specific physical hosts in ways that differed from how Proxmox VE would like me to manage them, and that combination made the environment less than optimal.

As these servers are only used and managed by me and I do perfectly fine in a command line interface or using scripts and playbooks, I’ve come to the conclusion that I prefer a minimalist approach and so I’m back to running simple Ubuntu servers with ZFS storage pools for virtual machines and backups, and plain KVM for my hypervisor layer. After the initial setup – a weekend project I will write up for another post – I have the best kind of server environment at home: One I more or less never have to touch unless I want to.

Email address tags in Postfix and Dovecot

What if you could tag the mail address you provide when registering for various services to simplify the management of the inevitable stream of unsolicited mail that follows? If you could register myname+theservicename@mydomain.tld it would make it very easy to recognize mail from that service – and it would make it easy to pinpoint common leaks, whether they’d got their customer database cracked or just sold it to the highest bidder.

The most famous provider of such a service might be Google’s Gmail. But if you run a Postfix server, this functionality is included and may actually already be turned on out-of-the-box. In your main.cf it looks like this:

recipient_delimiter = +

The delimiter can basically be any character that’s valid in the local part of an email address, but obviously you want to avoid using characters that actually are in use in your environment (dots (.) and dashes (-) come to mind).

By default, though, such mail won’t actually get delivered if you use Dovecot with a relatively default configuration for storing mail. The reason is that the + character needs to be explicitly allowed. To fix this, find the auth_username_chars setting and add the + character to it (remembering to uncomment the line):

auth_username_chars = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890.-_@+

That’s it: A single step to enable some additional useful functionality on your mail server.

ZFS backups in Proxmox – Part 2

A while ago I wrote about trying out pve-zsync for backing up some Proxmox VE entities. I kept using regular Proxmox backups for the other machines, though: It is a robust way to get recoverable machine backups but it’s not very elegant. For example all backups are full: There’s no logic for managing incremental or differential backups. The last straw was a bug in the Proxmox web interface where these native full backups kept landing on my SSD-backed disk pool which is stupid for two reasons: a) it gave me no on-site protection from disk failures which after user error is the most likely reason to need a backup, and b) it used up valuable space on my most expensive pool. Needless to say, I scrapped that backup solution (and pve-zsync) completely.

My new solution is based entirely on Jim Salter’s excellent tools sanoid and syncoid. Sanoid now gives me hourly ZFS snapshots of all of my virtual machines and containers and of my base system, with timely purging of old snapshots. On my production server, syncoid makes sure these snapshots are cloned to my backup pool, and on my off-site server, syncoid fetches snapshots from the backup pool on the production server to its own backup pool. This means I have a better, cleaner, faster and most importantly working backup solution with considerably less clutter than before: A config file for sanoid and a few cron jobs to trigger syncoid in the right way.

Troubleshooting vSphere update woes

It’s 2020 and I still occasionally stumble on products that can’t handle international characters.

I’ve been running my update rounds on our vSphere environment, but one host simply refused to perform is update compliance check.

To troubleshoot, I enabled the ssh service and remoted in to the host, looking for errors in /var/log/vua.log. Sure enough, I found an interesting error message:

--> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 33643: ordinal not in range(128)

The number 0xc3 sounds a lot like a Swedish or Norwegian character, so I grep’d the output of esxcfg-info until I found the culprit:

esxcfg-info | grep å
               |----Name............................................Virtual Lab Tången vSAN
                     |----Portset Name..............................Virtual Lab Tången vSAN
                     |----Virtual Switch............................Virtual Lab Tången vSAN
                     |----Virtual Switch............................Virtual Lab Tången vSAN
                     |----Virtual Switch............................Virtual Lab Tången vSAN
                     |----Virtual Switch............................Virtual Lab Tången vSAN
                     |----Virtual Switch............................Virtual Lab Tången vSAN
                     |----Virtual Switch............................Virtual Lab Tången vSAN
                     |----Virtual Switch............................Virtual Lab Tången vSAN
                  |----Name.........................................Virtual Lab Tången vSAN
                        |----Portset Name...........................Virtual Lab Tången vSAN
                        |----Virtual Switch.........................Virtual Lab Tången vSAN
                        |----Virtual Switch.........................Virtual Lab Tången vSAN
                        |----Virtual Switch.........................Virtual Lab Tången vSAN
                        |----Virtual Switch.........................Virtual Lab Tången vSAN
                        |----Virtual Switch.........................Virtual Lab Tången vSAN
                        |----Virtual Switch.........................Virtual Lab Tången vSAN
                        |----Virtual Switch.........................Virtual Lab Tången vSAN
            |----World Command Line.................................grep å

A vLab I created for a couple of my Veeam SureBackup jobs had a Nordic character in its name, and blocked updates. After removing all traces of the virtual lab and the Standard Switch it had created on the host, the same command showed no traces of characters outside of the limited ASCII set, and updating the host went as smoothly as it usually does.

Lesson learned: Client-side issues with localization may have mostly been solved for a decade or two, but server-side there are still reasons – not good ones, but reasons – to stick to plain English descriptors for everything.

Enabling the booking of Teams meetings in Outlook on Mac

This issue had me scratching my head for a while: With the latest version of Microsoft Office and Microsoft Teams installed on my Mac running Catalina, I couldn’t enable the booking of Teams meetings from Outlook.

The solution turned out to be to remove the regular Office programs and replace them with Office 365. The official instructions for how to do that said to log on to https://www.office.com or to https://aka.ms/office-install. Well, tough luck: There was no way to find a download link there.

Instead the correct way seems to be to download Microsoft 365 from the App Store. There was no obvious way to connect the Office suite to my work account, so I started Outlook and tried adding an account. This triggered a dialog about the possibility to activate a trial or connect to an existing subscription, with the perhaps ill-chosen options Activate and Cancel. Turns out if you press Activate you get to choose whether you actually want to activate the trial or activate Microsoft 365 with an existing account.

While the gods of good UX and the Law of Least Astonishment cry alone in a cave, I now do have a button to schedule a Teams meeting in Outlook. If I only could get the Calendar and Datadog apps installed in Teams, my life would be complete…

Oh, and speaking of great user experience: Incoming calls in Teams on the Mac do not quite steal focus – thanks for that, at least – but they hog cmd+shift+D so that attempting to send a mail from Mail.app will decline the incoming call. That’s not a great design choice, Microsoft. Now why would anybody want to use Mail.app instead of Outlook? Simple: Snappiness and good search. I can accept jumping through some hoops for things I rarely do, if my day-to-day tasks aren’t nerfed by software that feels slow and bloated.