Flexible Data Placement#
In this section, you will find a guide on FDP aka Flexible Data Placement support in xNVMe. This will guide you with examples of all the supported FDP log pages, Set/Get features, and I/O management commands. This will also cover the Write command with hints from the perspective of FIO’s xNVMe ioengine.
Concepts and Prelude#
FDP adds an enhancement to NVM Command Set by enabling host guided data placement. This introduces a Reclaim Unit (RU) which is a logical representation of non-volatile storage within a Reclaim Group that is able to be physically erased by the controller without disturbing any other Reclaim Units.
Placement Identifier is a data structure that specifies a Reclaim Group Identifier and a Placement Handle that references a Reclaim Unit.
Placement Handle is a namespace scoped handle that maps to an Endurance group scoped Reclaim Unit Handle which references a Reclaim Unit in each Reclaim Group.
Reclaim Unit Handle (RUH) is a controller resource that references a Reclaim Unit in each Reclaim Group.
For the complete information on FDP, please see the ratified technical proposal TP4146 Flexible Data Placement 2022.11.30 Ratified, which can be found here https://nvmexpress.org/wp-content/uploads/NVM-Express-2.0-Ratified-TPs_20230111.zip
Get log page#
There are 4 new log pages associated with FDP. These are FDP Configuration, Reclaim Unit Handle Usage, FDP Statistics and FDP Events. All these log pages are Endurance Group scoped and hence you need to specify Endurance Group Identifier in Log Specific Identifier field.
For the 4 log pages mentioned above, you can use xNVMe CLI:
The Fdp configuration log page requires dynamic memory allocation, as there can be multiple configuration each having multiple Reclaim Unit Handles. You will have to specify the data size in bytes. The command can be run like:
xnvme log-fdp-config /dev/nvme3n1 --data-nbytes=512 --lsi 0x1
The command should produce output similar to:
# Allocating and clearing buffer...
# Retrieving FDP configurations log page ...
xnvme_spec_log_fdp_conf:
ncfg: 0
version: 0
size: 112
config_desc: 0
ds: 96
fdp attributes: { rgif: 6 fdpvwc: 0 fdpcv: 1 val: 0x86 }
vss: 0
nrg: 32
nruh: 8
maxpids: 127
nns: 256
runs: 40960
erutl: 0
- ruht[0]: 1
- ruht[1]: 1
- ruht[2]: 1
- ruht[3]: 1
- ruht[4]: 1
- ruht[5]: 1
- ruht[6]: 1
- ruht[7]: 1
For Fdp Statistics log page the command can be run like:
xnvme log-fdp-stats /dev/nvme3n1 --lsi 0x1
The command should produce output similar to:
# Allocating and clearing buffer...
# Retrieving FDP statistics log page ...
xnvme_spec_log_fdp_stats:
hbmw: [2097152, 0]
mbmw: [2342912, 0]
mbe: [0, 0]
Similar to the Fdp Configuration log page, you will have to specify the number of Recalim Unit Handle Usage descriptors to fetch. The command can be run like:
xnvme log-ruhu /dev/nvme3n1 --lsi 0x1 --limit 4
The command should produce output similar to:
# Allocating and clearing buffer...
# Retrieving ruhu-log ...
# 4 reclaim unit handle usage:
xnvme_spec_log_ruhu:
nruh: 8
- ruhu_desc[0]: 0x1
- ruhu_desc[1]: 0
- ruhu_desc[2]: 0
- ruhu_desc[3]: 0
The Fdp Events log page will have multiple events. You will have to specify the number of events you want to fetch. You also need to specify whether you need host or controller events. This can be done by log specific parameter. The complete command can be run like:
xnvme log-fdp-events /dev/nvme3n1 --nsid 0x1 --limit 2 --lsi 0x1 --lsp 0x1
The command should produce output similar to:
# Allocating and clearing buffer...
# Retrieving fdp-events-log ...
# 2 fdp events log page entries:
xnvme_spec_log_fdp_events:
nevents: 1
- {type: 0, fdpef: 0x7, pid: 0, timestamp: 564682036659163, nsid: 1, rgid: 0, ruhid: 0, }
- {type: 0, fdpef: 0, pid: 0, timestamp: 0, nsid: 0, rgid: 0, ruhid: 0, }
Set and get-feature#
There are 2 new Set and Get Feature commands, introduced with FDP. These are Flexible Data Placement which controls operation of FDP capability in the specified Endurance Group, and FDP Events which controls if a controller generates FDP Events associated with a specific Reclaim Unit Handle.
xNVMe does not support Namespace Management commands. Thus we cannot enable or disable FDP by sending Set Feature command to the Endurance Group, as it requires deletion of all namespaces in that Endurance Group. However you can check the FDP capability by sending a Get Feature command. The command can be run like:
xnvme feature-get /dev/nvme3n1 --fid 0x1d --cdw11 0x1
The command should produce output similar to:
# cmd_gfeat: {nsid: 0x1, fid: 0x1d, sel: 0x0}
feat: { fdpe: 1, fdpci: 0 }
Command Dword 12 controls whether you want to enable or disable FDP Events. You will have to specify number of events to enable or disable and Placement Handle associated with it. These will be part of the Feat field. This will require you to specify the size in bytes of the data buffer. To enable all the Fdp Events you can run command like:
xnvme set-fdp-events /dev/nvme3n1 --fid 0x1e --feat 0x60000 --cdw12 0x1
The command should produce output similar to:
# cmd_sfeat: {nsid: 01, fid: 0x1e, save: 0x0, feat: 0x60000, cdw12: 0x1}
You can get the status of supported FDP Events. The Command Dowrd 11 remains the same as the Set Feature command. You can run the command like:
xnvme feature-get /dev/nvme3n1 --fid 0x1e --cdw11 0xFF0000 --data-nbytes 510
The command should produce output similar to:
# cmd_gfeat: {nsid: 0x1, fid: 0x1e, sel: 0x0}
nevents: 6 }
{ type: 0, event enabled: 1 }
{ type: 0x1, event enabled: 1 }
{ type: 0x2, event enabled: 1 }
{ type: 0x3, event enabled: 1 }
{ type: 0x80, event enabled: 0 }
{ type: 0x81, event enabled: 0 }
I/O Management#
Two I/O Management commands are introduced with FDP. These are I/O Management Send and I/O Management Receive.
I/O management Receive supports Reclaim Unit Handle Status command. You will have to specify the number of Recalim Unit Handle Status descriptors to fetch. You can run the command like:
xnvme fdp-ruhs /dev/nvme3n1 --limit 4
The command should produce output similar to:
# Allocating and clearing buffer...
# Retrieving ruhs ...
# 4 reclaim unit handle status:
xnvme_spec_ruhs:
nruhsd: 128
- ruhs_desc[0] : { pi: 0 ruhi: 0 earutr: 0 ruamw: 10}
- ruhs_desc[1] : { pi: 1024 ruhi: 0 earutr: 0 ruamw: 10}
- ruhs_desc[2] : { pi: 2048 ruhi: 0 earutr: 0 ruamw: 10}
- ruhs_desc[3] : { pi: 3072 ruhi: 0 earutr: 0 ruamw: 10}
I/O Management Send supports Reclaim Unit Handle Update command. You will have to specify a Placement Identifier for this. You can run the command like:
xnvme fdp-ruhu /dev/nvme3n1 --pid 0x0
The command should produce output similar to:
# Updating ruh ...
FIO xnvme ioengine#
FIO’s xNVMe ioengine provides FDP support since the 3.35 release.
This support is only there with nvme character device i.e. /dev/ng0n1
and with userspace drivers such as spdk
.
Since the kernel support is limited to nvme character device, you can only use
the FDP functionality with xnvme_sync=nvme
or
xnvme_async=io_uring_cmd
backends.
To enable the FDP mode, you will have to specify fio option fdp=1
.
Two additional optional FDP specific fio options can be specified. These are:
fdp_pli=x,y,..
This can be used to specify index or comma separated
indicies of placement identifiers. The index or indicies refer to the
placement identifiers from reclaim unit handle status command. If you don’t
specify this option, fio will use all the available placement identifiers
from reclaim unit handle status command.
fdp_pli_select=str
You can specify random
or roundrobin
as the
string literal. This tells fio which placement identifer to select next after
every write operation. If you don’t specify this option, fio will round robin
over the available placement identifers.
Have a look at example configuration file at: axboe/fio
This configuration tells fio to use placement identifers present at index 4 and 5 in reclaim unit handle usage command. By default we are using round robin mechanism for selecting the next placement identifier.
Using the above mentioned configuration, you can run the fio command like this:
fio ../tutorial/fdp/examples/xnvme-fdp.fio --section=default --ioengine=xnvme --xnvme_async=io_uring_cmd --filename=/dev/ng3n1
The command should produce output similar to:
default: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=xnvme, iodepth=1
fio-3.36
Starting 1 thread
default: (groupid=0, jobs=1): err= 0: pid=117232: Wed Nov 20 06:13:56 2024
write: IOPS=9846, BW=38.5MiB/s (40.3MB/s)(2048KiB/52msec); 0 zone resets
slat (nsec): min=161, max=1892, avg=386.06, stdev=116.28
clat (usec): min=69, max=172, avg=99.96, stdev=10.21
lat (usec): min=69, max=172, avg=100.34, stdev=10.25
clat percentiles (usec):
| 1.00th=[ 80], 5.00th=[ 87], 10.00th=[ 92], 20.00th=[ 94],
| 30.00th=[ 96], 40.00th=[ 98], 50.00th=[ 99], 60.00th=[ 101],
| 70.00th=[ 102], 80.00th=[ 104], 90.00th=[ 109], 95.00th=[ 117],
| 99.00th=[ 141], 99.50th=[ 159], 99.90th=[ 174], 99.95th=[ 174],
| 99.99th=[ 174]
lat (usec) : 100=55.86%, 250=44.14%
cpu : usr=5.88%, sys=94.12%, ctx=1, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,512,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=38.5MiB/s (40.3MB/s), 38.5MiB/s-38.5MiB/s (40.3MB/s-40.3MB/s), io=2048KiB (2097kB), run=52-52msec
Note
If you see no output, then try running it as super-user or via
sudo