Deduplication is a technique for reducing the consumption of storage resources by eliminating multiple copies of duplicate blocks. Compression takes the individual unique blocks and shrinks them. These reduced blocks are then efficiently packed together into physical blocks. Thin provisioning manages the mapping from logical blocks presented by VDO to where the data has actually been physically stored, and also eliminates any blocks of all zeroes.
With deduplication, instead of writing the same data more than once, VDO detects and records each duplicate block as a reference to the original block. VDO maintains a mapping from Logical Block Addresses (LBA) (used by the storage layer above VDO) to physical block addresses (used by the storage layer under VDO). After deduplication, multiple logical block addresses may be mapped to the same physical block address; these are called shared blocks and are reference-counted by the software.
With compression, VDO compresses multiple blocks (or shared blocks) with the fast LZ4 algorithm, and bins them together where possible so that multiple compressed blocks fit within a 4 KB block on the underlying storage. Mapping from LBA is to a physical block address and index within it for the desired compressed data. All compressed blocks are individually reference counted for correctness.
Block sharing and block compression are invisible to applications using the storage, which read and write blocks as they would if VDO were not present. When a shared block is overwritten, a new physical block is allocated for storing the new block data to ensure that other logical block addresses that are mapped to the shared physical block are not modified.
To use VDO with lvm(8), you must install the standard VDO user-space tools vdoformat(8) and the currently non-standard kernel VDO module "kvdo".
The "kvdo" module implements fine-grained storage virtualization, thin provisioning, block sharing, and compression. The "uds" module provides memory-efficient duplicate identification. The user-space tools include vdostats(8) for extracting statistics from VDO volumes.
Note: The performance of TRIM/Discard operations is slow for large volumes of VDO type. Please try to avoid sending discard requests unless necessary because it might take considerable amount of time to finish the discard operation.
lvcreate --type vdo -n VDOLV -L DataSize -V LargeVirtualSize VG/VDOPoolLV lvcreate --vdo -L DataSize VG
Example
# lvcreate --type vdo -n vdo0 -L 10G -V 100G vg/vdopool0 # mkfs.ext4 -E nodiscard /dev/vg/vdo0
lvconvert --type vdo-pool -n VDOLV -V VirtualSize VG/VDOPoolLV lvconvert --vdopool VG/VDOPoolLV
Example
# lvconvert --type vdo-pool -n vdo0 -V10G vg/ExistingLV
Example
# cat <<EOF > /etc/lvm/profile/vdo_create.profile allocation { vdo_use_compression=1 vdo_use_deduplication=1 vdo_use_metadata_hints=1 vdo_minimum_io_size=4096 vdo_block_map_cache_size_mb=128 vdo_block_map_period=16380 vdo_check_point_frequency=0 vdo_use_sparse_index=0 vdo_index_memory_size_mb=256 vdo_slab_size_mb=2048 vdo_ack_threads=1 vdo_bio_threads=1 vdo_bio_rotation=64 vdo_cpu_threads=2 vdo_hash_zone_threads=1 vdo_logical_threads=1 vdo_physical_threads=1 vdo_write_policy="auto" vdo_max_discard=1 } EOF # lvcreate --vdo -L10G --metadataprofile vdo_create vg/vdopool0 # lvcreate --vdo -L10G --config 'allocation/vdo_cpu_threads=4' vg/vdopool1
lvchange --compression [y|n] --deduplication [y|n] VG/VDOPoolLV
Example
# lvchange --compression n vg/vdopool0 # lvchange --deduplication y vg/vdopool1
Note: vdostats(8) currently understands only /dev/mapper device names.
Example
# lvcreate --type vdo -L10G -V20G -n vdo0 vg/vdopool0 # mkfs.ext4 -E nodiscard /dev/vg/vdo0 # lvs -a vg LV VG Attr LSize Pool Origin Data% vdo0 vg vwi-a-v--- 20.00g vdopool0 0.01 vdopool0 vg dwi-ao---- 10.00g 30.16 [vdopool0_vdata] vg Dwi-ao---- 10.00g # vdostats --all /dev/mapper/vg-vdopool0-vpool /dev/mapper/vg-vdopool0 : version : 30 release version : 133524 data blocks used : 79 ...
You can also enable automatic size extension of a monitored VDOPoolLV with the activation/vdo_pool_autoextend_percent and activation/vdo_pool_autoextend_threshold settings.
Note: You cannot reduce the size of a VDOPoolLV.
Note: You cannot change the size of a cached VDOPoolLV.
lvextend -L+AddingSize VG/VDOPoolLV
Example
# lvextend -L+50G vg/vdopool0 # lvresize -L300G vg/vdopool1
Note: The reduction needs to process TRIM for reduced disk area to unmap used data blocks from the VDOPoolLV, which might take a long time.
lvextend -L+AddingSize VG/VDOLV lvreduce -L-ReducingSize VG/VDOLV
Example
# lvextend -L+50G vg/vdo0 # lvreduce -L-50G vg/vdo1 # lvresize -L200G vg/vdo2
Example
# lvchange -ay vg/vpool0_vdata # lvchange -an vg/vpool0_vdata
Example
# lvcreate --type raid1 -L 5G -n vdopool vg # lvconvert --type vdo-pool -V 10G vg/vdopool
A cached VDO data LV cannot be currently resized. Also, the threshold based automatic resize will not work.
Example
# lvcreate --type vdo -L 5G -V 10G -n vdo1 vg/vdopool # lvcreate --type cache-pool -L 1G -n cachepool vg # lvconvert --cache --cachepool vg/cachepool vg/vdopool # lvconvert --uncache vg/vdopool
Example
# lvcreate --type vdo -L 5G -V 10G -n vdo1 vg/vdopool # lvcreate --type cache-pool -L 1G -n cachepool vg # lvconvert --cache --cachepool vg/cachepool vg/vdo1 # lvconvert --uncache vg/vdo1
When a block device is going to be rewritten, its blocks will be automatically reused for new data. Discard is useful in situations when user knows that the given portion of a VDO LV is not going to be used and the discarded space can be used for block provisioning in other regions of the VDO LV. For the same reason, you should avoid using mkfs with discard for a freshly created VDO LV to save a lot of time that this operation would take otherwise as device is already expected to be empty.
UDS requires a minimum of 250 MiB of RAM, which is also the default amount that deduplication uses.
The memory required for the UDS index is determined by the index type and the required size of the deduplication window and is controlled by the allocation/vdo_use_sparse_index setting.
With enabled UDS sparse indexing, it relies on the temporal locality of data and attempts to retain only the most relevant index entries in memory and can maintain a deduplication window that is ten times larger than with dense while using the same amount of memory.
Although the sparse index provides the greatest coverage, the dense index provides more deduplication advice. For most workloads, given the same amount of memory, the difference in deduplication rates between dense and sparse indexes is negligible.
A dense index with 1 GiB of RAM maintains a 1 TiB deduplication window, while a sparse index with 1 GiB of RAM maintains a 10 TiB deduplication window. In general, 1 GiB is sufficient for 4 TiB of physical space with a dense index and 40 TiB with a sparse index.
The VDO target requires storage for two types of VDO metadata and for the UDS index: