SPDK

xNVMe provides a kernel-bypassing backend implemented using SPDK. SPDK is built and embedded in the main static xNVMe library libxnvme.a.

Compiling and linking your code with xNVMe

To compile e.g. the following hello.c which uses xNVMe:

#include <stdio.h>
#include <libxnvme.h>

int
main(int argc, char **argv)
{
	struct xnvme_opts opts = xnvme_opts_default();
	struct xnvme_dev *dev;

	opts.nsid = 1;

	dev = xnvme_dev_open("0000:03:00.0", &opts);
	if (!dev) {
		perror("xnvme_dev_open");
		return 1;
	}
	xnvme_dev_pr(dev, XNVME_PR_DEF);
	xnvme_dev_close(dev);

	return 0;
}

Then invoke compiled with the following linker flags:

gcc ../backends/xnvme_be_spdk/hello.c \
	-Wl,--whole-archive -Wl,--no-as-needed \
	-lxnvme \
	-Wl,--no-whole-archive -Wl,--as-needed \
	-laio -luuid -lnuma -pthread \
	-o hello

Note

You do not need to link with SPDK/DPDK/liburing, as these are bundled with xNVMe. However, do take note of the linker flags surrounding -lxnvme, these are required as SPDK makes use of __attribute__((constructor)). Without the linker flags, none of SPDK transports will work, as ctors will be “linked-out”, and xNVMe will give you errors such as device not found.

Running this:

chmod +x hello
./hello

Should yield output similar to:

xnvme_dev:
  xnvme_ident:
    trgt: '0000:03:00.0'
    schm: 'pci'
    opts: '?nsid=1'
    uri: 'pci:0000:03:00.0?nsid=1'
  xnvme_be:
    async: {id: 'nvme_driver', enabled: 1}
    sync: {id: 'nvme_driver', enabled: 1}
    attr: {name: 'spdk', enabled: 1}
  xnvme_cmd_opts:
    mask: '00000000000000000000000000000001'
    iomd: 'SYNC'
    payload_data: 'DRV'
    payload_meta: 'DRV'
    csi: 0x0
    nsid: 0x1
    ssw: 9
  xnvme_geo:
    type: XNVME_GEO_CONVENTIONAL
    npugrp: 1
    npunit: 1
    nzone: 1
    nsect: 16777216
    nbytes: 512
    nbytes_oob: 0
    tbytes: 8589934592
    mdts_nbytes: 524288
    lba_nbytes: 512
    lba_extended: 0

Note that the device identifier is hardcoded in the examples. You can use xnvme enum, to enumerate your devices and their associated identifiers.

Unbinding devices and setting up memory

By running the command below 8GB of hugepages will be configured and the device detached from the Kernel NVMe driver:

HUGEMEM=4096 xnvme-driver

The xnvme-driver script is a merge of the SPDK setup.sh script and its dependencies.

The command above should produce output similar to:

0000:03:00.0 (1b36 0010): nvme -> vfio-pci
0000:00:02.0 (1af4 1001): Active mountpoints on /dev/vda, so not binding

If anything else that the above is output from setup.sh, for example:

0000:01:00.0 (1d1d 2807): nvme -> uio_generic

Or:

Current user memlock limit: 16 MB

This is the maximum amount of memory you will be
able to use with DPDK and VFIO if run as current user.
To change this, please adjust limits.conf memlock limit for current user.

## WARNING: memlock limit is less than 64MB
## DPDK with VFIO may not be able to initialize if run as current user.

Then consult the section Enabling VFIO without limits.

Re-binding devices

Run the following:

xnvme-driver reset

Should output similar to:

0000:03:00.0 (1b36 0010): vfio-pci -> nvme
0000:00:02.0 (1af4 1001): Already using the virtio-pci driver

Device Identifiers

Since devices are no longer available in /dev, then the PCI ids are used, such as pci:0000:03:00.0?nsid=1, e.g. using the CLI:

xnvme-driver
xnvme info pci:0000:03:00.0?nsid=1

And using the API it would be similar to:

...
struct xnvme_dev *dev = xnvme_dev_open("pci:0000:01:00.0?nsid=1");
...

Enabling VFIO without limits

If nvme is rebound to uio_generic, and not vfio, then VT-d is probably not supported or disabled. In either case try these two steps:

  1. Verify that your CPU supports VT-d and that it is enabled in BIOS.

  2. Enable your kernel by providing the kernel option intel_iommu=on. If you have a non-Intel CPU then consult documentation on enabling VT-d / IOMMU for your CPU.

  3. Increase limits, open /etc/security/limits.conf and add:

*    soft memlock unlimited
*    hard memlock unlimited
root soft memlock unlimited
root hard memlock unlimited

Once you have gone through these steps, then this command:

dmesg | grep "DMAR: IOMMU"

Should output:

[    0.021117] DMAR: IOMMU enabled

And this this command:

find /sys/kernel/iommu_groups/ -type l

Should have output similar to:

/sys/kernel/iommu_groups/7/devices/0000:01:00.0
/sys/kernel/iommu_groups/5/devices/0000:00:05.0
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/8/devices/0000:03:00.0
/sys/kernel/iommu_groups/8/devices/0000:02:00.0
/sys/kernel/iommu_groups/6/devices/0000:00:1f.2
/sys/kernel/iommu_groups/6/devices/0000:00:1f.0
/sys/kernel/iommu_groups/6/devices/0000:00:1f.3
/sys/kernel/iommu_groups/4/devices/0000:00:04.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0

And user-space driver setup:

HUGEMEM=4096 xnvme-driver

Should rebind the device to vfio-pci, eg.:

0000:03:00.0 (1b36 0010): nvme -> vfio-pci
0000:00:02.0 (1af4 1001): Active mountpoints on /dev/vda, so not binding

Inspecting and manually changing memory available to SPDK aka HUGEPAGES

The SPDK setup script provides HUGEMEM and NRHUGE environment variables to control the amount of memory available via HUGEPAGES. However, if you want to manually change or just inspect the HUGEPAGE config the have a look below.

Inspect the system configuration by running:

grep . /sys/devices/system/node/node0/hugepages/hugepages-2048kB/*

If you have not yet run the setup script, then it will most likely output:

/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0

And after running the setup script it should output:

/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:1024
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:1024
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0

This tells that 1024 hugepages, each of size 2048kB are available, that is, a total of two gigabytes can be used.

One way of increasing memory available to SPDK is by increasing the number of 2048Kb hugepages. E.g. increase from two to eight gigabytes by increasing nr_hugespages to 4096:

echo "4096" > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

After doing this, then inspecting the configuration should output:

/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:4096
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:4096
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0