SPDK#
xNVMe provides a kernel-bypassing backend implemented using SPDK. SPDK has a lot to offer, however, xNVMe only makes use of the SPDK user-space NVMe driver. That is, the reactor, threading-model, and application-framework of SPDK is not used by xNVMe.
Device Identifiers#
When using user-space NVMe-drivers, such as the SPDK NVMe PMD
(poll-mode-driver), then the operating-system kernel NVMe driver is “detached”
and the device bound to vfio-pci`
or uio-generic`
. Thus, the
device-files in / dev/
, such as /dev/nvme0n1
are not available. Devices
are instead identified by their PCI id (0000:03:00.0`
), and namespace
identifier.
This information is retrievable via xnvme enum
:
xnvme enum
xnvme_enumeration:
- {uri: '0000:03:00.0', dtype: 0x2, nsid: 0x1, csi: 0x0}
- {uri: '0000:03:00.0', dtype: 0x2, nsid: 0x2, csi: 0x2}
This information is usable via the the cli, such as here:
xnvme-driver
xnvme info pci:0000:03:00.0?nsid=1
And using the API it would be similar to:
...
struct xnvme_dev *dev = xnvme_dev_open("0000:03:00.0", opts);
...
Notice that multiple URIs using the same PCI id but with different xNVMe
?opts=<val>
. This is provided as a means to tell xNVMe that you want to
use the NVMe controller at 0000:03:00.0
and the namespace identified by
nsid=1
.
xnvme-driver
xnvme info 0000:03:00.0 --dev-nsid=1
0000:03:00.0 (1b36 0010): Already using the vfio-pci driver
0000:00:02.0 (1af4 1001): Active mountpoints on /dev/vda, so not binding
xnvme_dev:
xnvme_ident:
uri: '0000:03:00.0'
dtype: 0x2
nsid: 0x1
csi: 0x0
xnvme_be:
admin: {id: 'nvme'}
sync: {id: 'nvme'}
async: {id: 'nvme'}
attr: {name: 'spdk'}
xnvme_opts:
be: 'spdk'
mem: 'FIX-ID-VS-MIXIN-NAME'
dev: 'FIX-ID-VS-MIXIN-NAME'
admin: 'nvme'
sync: 'nvme'
async: 'nvme'
oflags: 0x4
xnvme_geo:
type: XNVME_GEO_CONVENTIONAL
npugrp: 1
npunit: 1
nzone: 1
nsect: 2097152
nbytes: 4096
nbytes_oob: 0
tbytes: 8589934592
mdts_nbytes: 524288
lba_nbytes: 4096
lba_extended: 0
ssw: 12
Instrumentation#
…
System Configuration#
…
Driver Attachment and memory#
By default, then it is the operating system responsibility to provide device
drivers, thus by default, then the operating system has bound its NVMe driver
to the NVMe devices attached to your system. For a user-space driver to operate,
then the operating system driver must be detached and bound to a driver such as
vfio-pci
or uio-generic-pci
, such that the NVMe driver can operate in
user-space.
By running the command below 8GB of hugepages will be configured and the device detached from the Kernel NVMe driver:
HUGEMEM=4096 xnvme-driver
The xnvme-driver
script is a merge of the SPDK setup.sh
script and
its dependencies.
The command above should produce output similar to:
0000:03:00.0 (1b36 0010): nvme -> vfio-pci
0000:00:02.0 (1af4 1001): Active mountpoints on /dev/vda, so not binding
If anything else that the above is output from setup.sh
, for example:
0000:01:00.0 (1d1d 2807): nvme -> uio_generic
Or:
Current user memlock limit: 16 MB
This is the maximum amount of memory you will be
able to use with DPDK and VFIO if run as current user.
To change this, please adjust limits.conf memlock limit for current user.
## WARNING: memlock limit is less than 64MB
## DPDK with VFIO may not be able to initialize if run as current user.
Then consult the section Enabling VFIO without limits.
Memory Issues#
If you see a message similar to the below while unbinding devices:
Current user memlock limit: 16 MB
This is the maximum amount of memory you will be
able to use with DPDK and VFIO if run as current user.
To change this, please adjust limits.conf memlock limit for current user.
## WARNING: memlock limit is less than 64MB
## DPDK with VFIO may not be able to initialize if run as current user.
Then go you should do as suggested, that is, adjust limits.conf
, for an
example, see System Configuration.
Re-binding devices#
Run the following:
xnvme-driver reset
Should output similar to:
0000:03:00.0 (1b36 0010): vfio-pci -> nvme
0000:00:02.0 (1af4 1001): Already using the virtio-pci driver
Enabling VFIO
without limits#
If nvme
is rebound to uio_generic
, and not vfio
, then VT-d is
probably not supported or disabled. In either case try these two steps:
Verify that your CPU supports VT-d and that it is enabled in BIOS.
Enable your kernel by providing the kernel option intel_iommu=on. If you have a non-Intel CPU then consult documentation on enabling VT-d / IOMMU for your CPU.
Increase limits, open
/etc/security/limits.conf
and add:
* soft memlock unlimited
* hard memlock unlimited
root soft memlock unlimited
root hard memlock unlimited
Once you have gone through these steps, then this command:
dmesg | grep "DMAR: IOMMU"
Should output:
[ 0.021117] DMAR: IOMMU enabled
And this this command:
find /sys/kernel/iommu_groups/ -type l
Should have output similar to:
/sys/kernel/iommu_groups/7/devices/0000:01:00.0
/sys/kernel/iommu_groups/5/devices/0000:00:05.0
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/8/devices/0000:03:00.0
/sys/kernel/iommu_groups/8/devices/0000:02:00.0
/sys/kernel/iommu_groups/6/devices/0000:00:1f.2
/sys/kernel/iommu_groups/6/devices/0000:00:1f.0
/sys/kernel/iommu_groups/6/devices/0000:00:1f.3
/sys/kernel/iommu_groups/4/devices/0000:00:04.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
And user-space driver setup:
HUGEMEM=4096 xnvme-driver
Should rebind the device to vfio-pci
, eg.:
0000:03:00.0 (1b36 0010): nvme -> vfio-pci
0000:00:02.0 (1af4 1001): Active mountpoints on /dev/vda, so not binding
Inspecting and manually changing memory available to SPDK
aka HUGEPAGES
#
The SPDK setup script provides HUGEMEM and NRHUGE environment variables to control the amount of memory available via HUGEPAGES. However, if you want to manually change or just inspect the HUGEPAGE config the have a look below.
Inspect the system configuration by running:
grep . /sys/devices/system/node/node0/hugepages/hugepages-2048kB/*
If you have not yet run the setup script, then it will most likely output:
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
And after running the setup script it should output:
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:1024
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:1024
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
This tells that 1024 hugepages, each of size 2048kB are available, that is, a total of two gigabytes can be used.
One way of increasing memory available to SPDK is by increasing the number of 2048Kb hugepages. E.g. increase from two to eight gigabytes by increasing nr_hugespages to 4096:
echo "4096" > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
After doing this, then inspecting the configuration should output:
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:4096
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:4096
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
No devices found#
When running xnvme enum
and the output-listing is empty, then there are no
devices. When running with vfio-pci
, this can occur when your devices are
sharing iommu-group with other devices which are still bound to in-kernel
drivers. This could be NICs, GPUs or other kinds of peripherals.
The division of devices into groups is not something that can be easily switched, but you can try to manually unbind the other devices in the iommu group from their kernel drivers.
If that is not an option then you can try to re-organize your physical connectivity of deviecs, e.g. move devices around.
Lastly you can try using uio_pci_generic
instead, this can most easily be
done by disabling iommu by adding the kernel option: iommu=off
to the
kernel command-line and rebooting.
User Space#
Linux provides the Userspace I/O ( uio ) and Virtual Function I/O vfio frameworks to write user space I/ O drivers. Both interfaces work by binding a given device to an in-kernel stub-driver. The stub-driver in turn exposes device-memory and device-interrupts to user space. Thus enabling the implementation of device drivers entirely in user space.
Although Linux provides a capable NVMe Driver with flexible IOCTLs, then a user space NVMe driver serves those who seek the lowest possible per-command processing overhead or wants full control over NVMe command construction, including command-payloads.
Fortunately, you do not need to go and write an user space NVMe driver since a highly efficient, mature and well-maintained driver already exists. Namely, the NVMe driver provided by the Storage Platform Development Kit (SPDK).
Another great fortune is that xNVMe bundles the SPDK NVMe Driver with the xNVMe library. So, if you have built and installed xNVMe then the SPDK NVMe Driver is readily available to xNVMe.
The following subsections goes through a configuration checklist, then shows how to bind and unbind drivers, and lastly how to utilize non-devfs device identifiers by enumerating the system and inspecting a device.
Config#
What remains is checking your system configuration, enabling IOMMU for use by
the vfio-pci
driver, and possibly falling back to the uio_pci_generic
driver in case vfio-pci
is not working out. vfio
is preferred as
hardware support for IOMMU allows for isolation between devices.
Verify that your CPU supports virtualization / VT-d and that it is enabled in your board BIOS.
Enable your kernel for an intel CPU then provide the kernel option
intel_iommu=on
. If you have a non-Intel CPU then consult documentation on enabling VT-d / IOMMU for your CPU.Increase limits, open
/etc/security/limits.conf
and add:
* soft memlock unlimited
* hard memlock unlimited
root soft memlock unlimited
root hard memlock unlimited
Once you have gone through these steps, and rebooted, then this command:
dmesg | grep "DMAR: IOMMU"
Should output:
[ 0.021117] DMAR: IOMMU enabled
And this command:
find /sys/kernel/iommu_groups/ -type l
Should have output similar to:
/sys/kernel/iommu_groups/7/devices/0000:01:00.0
/sys/kernel/iommu_groups/5/devices/0000:00:05.0
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/8/devices/0000:03:00.0
/sys/kernel/iommu_groups/8/devices/0000:02:00.0
/sys/kernel/iommu_groups/6/devices/0000:00:1f.2
/sys/kernel/iommu_groups/6/devices/0000:00:1f.0
/sys/kernel/iommu_groups/6/devices/0000:00:1f.3
/sys/kernel/iommu_groups/4/devices/0000:00:04.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
Unbinding and binding#
With the system configured then you can use the xnvme-driver
script to bind
and unbind devices. The xnvme-driver
script is a merge of the SPDK
setup.sh
script and its dependencies.
By running the command below 8GB of hugepages will be configured, the
Kernel NVMe driver unbound, and vfio-pci
bound to the device:
HUGEMEM=4096 xnvme-driver
The command above should produce output similar to:
0000:03:00.0 (1b36 0010): nvme -> vfio-pci
0000:00:02.0 (1af4 1001): Active mountpoints on /dev/vda, so not binding
To unbind from vfio-pci
and back to the Kernel NVMe driver, then run:
xnvme-driver reset
Should output similar to:
0000:03:00.0 (1b36 0010): vfio-pci -> nvme
0000:00:02.0 (1af4 1001): Already using the virtio-pci driver