SPDK
xNVMe provides a kernel-bypassing backend implemented using SPDK. SPDK is built and embedded in the main static
xNVMe library libxnvme.a
.
Compiling and linking your code with xNVMe
To compile e.g. the following hello.c
which uses xNVMe:
#include <stdio.h>
#include <libxnvme.h>
int
main(int argc, char **argv)
{
struct xnvme_opts opts = xnvme_opts_default();
struct xnvme_dev *dev;
opts.nsid = 1;
dev = xnvme_dev_open("0000:03:00.0", &opts);
if (!dev) {
perror("xnvme_dev_open");
return 1;
}
xnvme_dev_pr(dev, XNVME_PR_DEF);
xnvme_dev_close(dev);
return 0;
}
Then invoke compiled with the following linker flags:
gcc ../backends/xnvme_be_spdk/hello.c \
-Wl,--whole-archive -Wl,--no-as-needed \
-lxnvme \
-Wl,--no-whole-archive -Wl,--as-needed \
-laio -luuid -lnuma -pthread \
-o hello
Note
You do not need to link with SPDK/DPDK/liburing, as these are bundled
with xNVMe. However, do take note of the linker flags surrounding
-lxnvme
, these are required as SPDK makes use of
__attribute__((constructor))
. Without the linker flags, none of SPDK
transports will work, as ctors will be “linked-out”, and xNVMe will
give you errors such as device not found.
Running this:
chmod +x hello
./hello
Should yield output similar to:
xnvme_dev:
xnvme_ident:
trgt: '0000:03:00.0'
schm: 'pci'
opts: '?nsid=1'
uri: 'pci:0000:03:00.0?nsid=1'
xnvme_be:
async: {id: 'nvme_driver', enabled: 1}
sync: {id: 'nvme_driver', enabled: 1}
attr: {name: 'spdk', enabled: 1}
xnvme_cmd_opts:
mask: '00000000000000000000000000000001'
iomd: 'SYNC'
payload_data: 'DRV'
payload_meta: 'DRV'
csi: 0x0
nsid: 0x1
ssw: 9
xnvme_geo:
type: XNVME_GEO_CONVENTIONAL
npugrp: 1
npunit: 1
nzone: 1
nsect: 16777216
nbytes: 512
nbytes_oob: 0
tbytes: 8589934592
mdts_nbytes: 524288
lba_nbytes: 512
lba_extended: 0
Note that the device identifier is hardcoded in the examples. You can use
xnvme enum
, to enumerate your devices and their associated identifiers.
Unbinding devices and setting up memory
By running the command below 8GB of hugepages will be configured and the device detached from the Kernel NVMe driver:
HUGEMEM=4096 xnvme-driver
The xnvme-driver
script is a merge of the SPDK setup.sh
script and
its dependencies.
The command above should produce output similar to:
0000:03:00.0 (1b36 0010): nvme -> vfio-pci
0000:00:02.0 (1af4 1001): Active mountpoints on /dev/vda, so not binding
If anything else that the above is output from setup.sh
, for example:
0000:01:00.0 (1d1d 2807): nvme -> uio_generic
Or:
Current user memlock limit: 16 MB
This is the maximum amount of memory you will be
able to use with DPDK and VFIO if run as current user.
To change this, please adjust limits.conf memlock limit for current user.
## WARNING: memlock limit is less than 64MB
## DPDK with VFIO may not be able to initialize if run as current user.
Then consult the section Enabling VFIO without limits.
Re-binding devices
Run the following:
xnvme-driver reset
Should output similar to:
0000:03:00.0 (1b36 0010): vfio-pci -> nvme
0000:00:02.0 (1af4 1001): Already using the virtio-pci driver
Device Identifiers
Since devices are no longer available in /dev
, then the PCI ids are used,
such as pci:0000:03:00.0?nsid=1
, e.g. using the CLI:
xnvme-driver
xnvme info pci:0000:03:00.0?nsid=1
And using the API it would be similar to:
...
struct xnvme_dev *dev = xnvme_dev_open("pci:0000:01:00.0?nsid=1");
...
Enabling VFIO
without limits
If nvme
is rebound to uio_generic
, and not vfio
, then VT-d is
probably not supported or disabled. In either case try these two steps:
Verify that your CPU supports VT-d and that it is enabled in BIOS.
Enable your kernel by providing the kernel option intel_iommu=on. If you have a non-Intel CPU then consult documentation on enabling VT-d / IOMMU for your CPU.
Increase limits, open
/etc/security/limits.conf
and add:
* soft memlock unlimited
* hard memlock unlimited
root soft memlock unlimited
root hard memlock unlimited
Once you have gone through these steps, then this command:
dmesg | grep "DMAR: IOMMU"
Should output:
[ 0.021117] DMAR: IOMMU enabled
And this this command:
find /sys/kernel/iommu_groups/ -type l
Should have output similar to:
/sys/kernel/iommu_groups/7/devices/0000:01:00.0
/sys/kernel/iommu_groups/5/devices/0000:00:05.0
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/8/devices/0000:03:00.0
/sys/kernel/iommu_groups/8/devices/0000:02:00.0
/sys/kernel/iommu_groups/6/devices/0000:00:1f.2
/sys/kernel/iommu_groups/6/devices/0000:00:1f.0
/sys/kernel/iommu_groups/6/devices/0000:00:1f.3
/sys/kernel/iommu_groups/4/devices/0000:00:04.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
And user-space driver setup:
HUGEMEM=4096 xnvme-driver
Should rebind the device to vfio-pci
, eg.:
0000:03:00.0 (1b36 0010): nvme -> vfio-pci
0000:00:02.0 (1af4 1001): Active mountpoints on /dev/vda, so not binding
Inspecting and manually changing memory available to SPDK
aka HUGEPAGES
The SPDK setup script provides HUGEMEM and NRHUGE environment variables to control the amount of memory available via HUGEPAGES. However, if you want to manually change or just inspect the HUGEPAGE config the have a look below.
Inspect the system configuration by running:
grep . /sys/devices/system/node/node0/hugepages/hugepages-2048kB/*
If you have not yet run the setup script, then it will most likely output:
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
And after running the setup script it should output:
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:1024
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:1024
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
This tells that 1024 hugepages, each of size 2048kB are available, that is, a total of two gigabytes can be used.
One way of increasing memory available to SPDK is by increasing the number of 2048Kb hugepages. E.g. increase from two to eight gigabytes by increasing nr_hugespages to 4096:
echo "4096" > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
After doing this, then inspecting the configuration should output:
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:4096
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:4096
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0