Nutanix Erasure Coding (EC-X) is a software-defined, post-process data reduction technology that increases usable storage capacity by replacing traditional replication (RF2/RF3) with parity-based protection. It works best on "write-cold" data (inactive for >7 days), providing significant capacity savings for backups, archives, and file servers. It requires a minimum of 4 nodes.
A typical Nutanix cluster has RF2 enabled by default, this implies
2 copies of data are kept on a Nutanix cluster (one on a local node where the
guest VMs are running, and one remote) – therefore this is called RF2.
The layout below has a replication factor 2 (RF2), whose
primary copies are local and whose replicas are distributed to other nodes
throughout the cluster.
When Curator runs a full scan, it finds write-cold extent
groups based on their age. Write-cold data is data that’s unlikely to be
modified further. For all workloads other than Objects Storage, Curator
considers data that hasn't been written to or overwritten in the last seven
days to be write-cold and eligible for encoding. For Objects Storage, the
postprocess period is reduced to three days due to the immutable nature of
object storage. After the Curator process finds the eligible candidates,
Chronos distributes and throttles the encoding tasks
After the system creates the strips and calculates parity, it removes the replica extent groups to save on storage. The following figure shows the environment and storage savings after AOS finishes EC-X.
That’s all the theory but does it actually work in practise
on a production cluster……
Well, this week I had to turn on EC-X on a production
cluster running over 100 VM workloads that started to give warnings on low space – I
don’t like seeing yellow or red on my clusters!
Below is what the storage stats were displaying in the
cluster…..
As you can see from the above screen capture my 5-node
cluster I have total space usage at around 64% (28.9 TiB) with RF 2 configured.
Once I had enabled EC-X and let Curator do its thing, the
results after a few days speak for themself….
Total space usage is now 49.5% (22.38 TiB) and still dropping. So far, I’ve managed to reclaim approx. 15% space
and this will continue to improve over the coming days.
Thoroughly impressed with this feature from Nutanix. I am defiantly going to keep this function in the back of my pocket for when I need to ‘magic-up’ some space on clusters that start running low on storage space.
No comments:
Post a Comment