Archive for June, 2024

How to Fix Ceph Error “cluster_uuid file exists with value X, != our uuid Y”

Saturday, June 8th, 2024

This error can occur if you are trying to perform a recovery from OSDs, and the cluster_uuid extracted from the recovery does not match the monmap.

# Replace 'pve1' with the name of your monitor
# Stop the monitor
systemctl stop ceph-mon@pve1
# Extract the monitor map to a file called monmap
ceph-mon -i pve1 --extract-monmap monmap
# Change FSID
monmaptool --clobber --fsid $NEW_FSID monmap
# Make any other changes via monmaptool, such as rewriting the monitor list
# Inject the new monmap
ceph-mon -i pve1 --inject-monmap monmap
# Start service
systemctl start ceph-mon@pve1


Mini-Review: Supermicro X11SDV-16C-TP8F

Saturday, June 8th, 2024

With a whopping 16C32T CPU, this board probably has the most powerful embedded CPU I’ve used. I scored a nice deal on one, intending to eventually replace one of my older X10SDV quad-core boards.

The Good

  • 16C32T
  • Lower power (100W TDP CPU)
  • Supports up to 512GB RAM in LRDIMMs, or 256GB of RDIMMs
  • Good I/O – One PCIe x16 slot, one x8 slot, two x4 ports for U.2, an x4 M.2, an x2 M.2, and a miniPCIe slot
  • Compact (Micro-ATX)
  • Six fan headers should be enough even for many server chassis designs
  • BMC can monitor temperature of NVMe drives on the U.2 ports

The Bad

  • Intel X710 for the 10G LANs, despite the block diagram in the manual showing an X557-AT2 like the X10SDV series. This is my first time using the X710, and I can see why they have their reputation.
  • U.2 ports have a few issues, most of which are likely due to the fact that they are using flex I/O lanes rather than normal PCIe lanes:
    • They support coordinated hot add/removal, but only if you booted with a device plugged into them.
    • If you didn’t have a device connected, then they don’t even get a bus number, which can cause other devices to change PCI addresses (definitely the BMC, probably the B-key M.2 and miniPCIe as well).
    • While they can measure NVMe temperature, sometimes the temperature is measured wrong, leading to fans spinning up for no reason.
    • No VMD support.

Haven’t Tried

  • Haven’t attempted to use the U.2 ports with a backplane to see if LEDs work.
  • The BIOS says that the x16 slot supports VMD and hotplug. I can confirm that the BIOS seems to support hotplug events by using a PCIe switch card with known good hotplug support, so I don’t see why it wouldn’t.