Switch-Based NVMe Hotplug – a Few Attempts, and one Success

Let’s say you’ve just bought a chassis with an NVMe backplane, or retrofit one into your chassis. Now, it’s time to see if we can get hotplugging and backplane management working.

First of all, PCIe hotplugging is hard. It’s nothing new – after all, PCI hotplug has been around in the form of PCMCIA cards for decades, and PCIe got the same treatment with the later ExpressCard standard. But the reality is that whether it’s a laptop with a card slot, a system with Thunderbolt, or a server with an NVMe backplane, it’s one of those things that you can only expect to work seamlessly if you buy a full OEM system validated for that purpose. If you cobble together a machine from parts, it’s much more difficult to get any sort of PCIe hotplugging working.

I went through this recently after adding a U.2 backplane. Here’s a few things I tried, some of which worked better than others.

For reference, I’m doing this on an X9DRD board, with a BPN-SAS3-826EL1-N4 backplane, and two KCM51VUG1T60 drives. Unlike newer Supermicro boards where you can use a simple redriver or retimer card to get NVMe hotplug, a switch/bridge is likely the only way to get hotplugging working on these.

First Attempt: Oracle NVMe Switch

This is the Oracle NVMe switch card. It goes by a few different model numbers, including 706434 and NVMSW8.

This one’s pretty short: I bought it because someone was selling them for $35 on eBay. It sounded too good to be true, and it was. It didn’t work at all for me, but for that price, I had to try. The card and its downstream ports showed up, but refused to recognize any drives. Supposedly, the card uses some kind of “lane sharing” to squeeze out more performance, but that is only supported on Oracle Linux and RHEL. On other OSes, it’s supposed to still work as a plain old switch, but I could never get it to work. I found some references to an oprom that might have helped, but my card seemed to have neither a UEFI nor BIOS oprom, nor did it have an HII option in the BIOS. Tested it on multiple hosts, same story every time.

Attempt Two: Ceacent ANU24PE08

Bought it here. This one actually worked, but hotplugging was….weird. I couldn’t get it to work right at first. Then I remembered this very informative article about getting Thunderbolt to work in an unsupported system. I saw some lines in dmesg about bus numbering, so I decided to try it in the highest-numbered slot. Unfortunately, on my X9DRD, the BMC VGA controller always took the highest “slot” number on CPU1. The good news is that CPU2 was wide open due to having none of the onboard peripherals connected to it – it’s only connected to raw PCIe slots. This actually got it working!

However, it’s not perfect. For some reason, /sys/bus/pci/slots only shows one or two slots, rather than the four you’d expect. It also had issues with surprise hotplugging and removal. It would sometimes kind of work on one port, but I wouldn’t count on it. You can still “safely eject” a drive by using echo 1 > /sys/bus/pci/devices/x/remove, and can trigger a rescan with echo 1 > /sys/bus/pci/rescan to make it recognize new devices.

Attempt Three: Supermicro AOC-SLG3-2E4

To absolutely nobody’s surprise, using an SMC PCIe switch on an SMC motherboard with an SMC backplane worked the best. I bought one for $41 (and then with my luck, the seller dropped the price to $32 shortly thereafter). I get slot power control, hotplugging works fine in any PCIe slot on either CPU, surprise insertion works, and surprise removal seems to work despite not being officially supported. The only real downside is that it only has two ports rather than four.

The Hard Part: Backplane Management

SMC U.2 backplanes are supposed to support locate/fail as well as telling you when a drive is safe to remove. The AOC-SLG3-2E4 even has a program from SMC made specifically for this card. Let’s give it a shot!

As you can probably guess, it didn’t work. The SMC utilities are built with the PLX SDK, so you have to use a script from the SDK which insmods the PlxSvc.ko and creates the device nodes. Problem is, the bundled version is built against what seems to be a 2015-era kernel, so it wouldn’t load on a 6.x kernel. I downloaded a new version of the PLX SDK, built the driver, and loaded it up with the script. Despite all that, the SMC programs claim the card doesn’t exist.

A bit of Ghidra and GDB later, and I found that yep, it really thinks the card isn’t there. I tried some of the test programs included with the SDK, and they worked fine. Perhaps you need to match the driver version and SDK version? Since the utilities statically link the PLX library, I can’t just run it against a newer version. I changed gears, and instead decided to just pick through the decompiled AOCSLG32E4EnclosureCli code. I found the PLX API calls it was doing, and created a Python script to do the same:

#!/usr/bin/python2

import sys
import os
import sys
import getopt
import time
import datetime
import random
sys.path.insert(0,"../code")

from PlxSdk import *
from const import *
from misc import *
from bar import *

def main():
        SdkLib = LoadLibrary()
        if SdkLib == None:
                print "Library Not loaded"
                sys.exit(0)
        pkey = PLX_DEVICE_KEY()
        pkey.set_bus(84)
        pkey.set_slot(0)
        pkey.set_function(0)
        rc, dp = DeviceOpen(SdkLib,pkey)
        print "rc,dp", rc, dp
        if rc != PLX_STATUS_OK:
                print "Open Failed: %s" % rc
                sys.exit(0)
        rc, data = PlxMappedRegisterRead(SdkLib, dp, 0x2b8)
        print "%s, %X" % PlxMappedRegisterRead(SdkLib, dp, 0x2c0)
        print "rc %s, data %X" % (rc,data)
        # This check specifically doesn't work because 'rc' returns some kind of wrapped integer rather than a normal python type
        if rc != PLX_STATUS_OK:
                pass
                #print "Read Failed: %s" % rc
                #sys.exit(0)
        print "Data old: %X" % data
        #data = (data & 0xff00) | 0x51040000
        data = (data & 0xff0000) | 0x51000100
        print "Data new: %X" % data
        print PlxMappedRegisterWrite(SdkLib, dp, 0x2b8, data)
        print "Wrote part 1"
        # Idk what this is, maybe some kind of 'commit' command?
        print PlxMappedRegisterWrite(SdkLib, dp, 0x2c0, 0x86033)
        print "Wrote part 2"

        DeviceClose(SdkLib, dp)

if __name__ == "__main__":
    main()

I got successful responses from all the API calls, yet no LED. I probably won’t mess around with it more, since there’s not really much left I could try here.

However, I’d argue that you don’t really need these to work to be able to locate drives. If you disable the device (echo 1 > /sys/bus/pci/devices/x/remove), the activity LED turns off (it is normally on and blinks off to signal activity).

But Would It Work on Another System?

No idea. It’s very difficult to say and depends on a large number of factors.

Things to Test

If you’re looking at doing similar testing in another system, here is a non-exhaustive checklist of things you should check:

  1. Boot the system with all the drives attached to make sure that at least basic functionality is working.
  2. Boot the system with no drives attached, then make sure they hotplug correctly. If #1 works but #2 doesn’t, it’s likely that there’s an issue with reserving bus numbers or some other PCI topology issue.
  3. Check that each switch port is correctly exposed in /sys/bus/pci/slots. If this is your only hotplug-supporting bridge, you should see slots ‘0’ and ‘0-1’ for a two-port switch. A four port switch would also have ‘0-2’ and ‘0-3’. You can additionally confirm by reading /sys/bus/pci/slots/x/address.
  4. Coordinated hotplug – check that echo 1 > /sys/bus/pci/devices/x/remove and echo 0 > /sys/bus/pci/slots/x/power work to remove the device from the PCIe topology, and that echo 1 > /sys/bus/pci/slots/x/power and that echo 1 > /sys/bus/pci/rescan will re-add it.
  5. If surprise insertion is desired – with slot power on (echo 1 > /sys/bus/pci/slots/x/power), do you see messages in dmesg and does the drive function after inserting the drive?
  6. Surprise removal – remove the drive unexpectedly – does it remove itself from the topology, and work once again if you re-insert it? You should wait for a while after removing the drive, as rapid removal and insertion of a device is known to confuse some setups.

Other Devices

There are a few other candidates for NVMe PCIe switches:

  1. Linkreal switches (I believe DiLinKer/DiLiVing switches are also rebrands of these): I found a test report that seems to indicate that at least one of their switch models (LRNV9349-8I) supports coordinated insertion. However, it just doesn’t quite have enough information in the test report to make me take a gamble on a card that costs more than double what a similarly-specced Ceacent switch would cost. Plus, the model for which I found the test report is a x16 card, and I need a x8 at most.
  2. Supermicro AOC-SLG3-4E2P: It would probably work fine. The problem is that it’s $200 – as long as I have spare PCIe slots, it’s cheaper to just buy more AOC-SLG3-2E4 cards. Even without getting a great deal, the usual going rate for the AOC-SLG3-2E4 is about $70, and I ended up buying three of them for $100 total.
  3. Just get a newer system. You can get a newer motherboard with NVMe ports on the motherboard, or at least one that supports hotplugging on a redriver/retimer card, which will typically be significantly cheaper. If you’re running an ancient system, you’ll potentially run into bottlenecks anyway when using NVMe. Even a single-socket first gen Epyc would be a huge upgrade compared to a system like this, and doesn’t cost all that much.

2 Responses to “Switch-Based NVMe Hotplug – a Few Attempts, and one Success”

  1. Online Gamble Site Says:

    Online Gamble Site

    Matt Ventura's blog » Blog Archive » Switch-Based NVMe Hotplug – a Few Attempts, and one Success

  2. 토토사이트 먹튀검증 Says:

    토토사이트 먹튀검증

    Matt Ventura's blog » Blog Archive » Switch-Based NVMe Hotplug – a Few Attempts, and one Success

Leave a Reply