Datrium Architecture - Differentiator Series

Since coming to Datrium over a month ago, I have been amazed at the level of talent and product feature advancement that has been orchestrated in such a short time. This has kick started me to write a series of blog posts (in no particular order) on why the Datrium architecture makes so much sense for the modern datacenter. I will be discussing key differentiators and features that make up some of my favorite parts of the Datrium DVX software platform.  

Part 1: Split Architecture

One of the benefits of open convergence, is that you are no longer tied to the traditional two controller storage system, where scale is solely based on available storage and network resources. Datrium’s flexibility goes even beyond the traditional HCI stack, (where compute and data operations share resources in the same appliance). By decoupling performance and data services from capacity a whole new world of possibilities is introduced in the separation of compute and data storage. 

It’s with this that Datrium’s split architecture pioneers a new breed of datacenter designs where flexibility is at heart of what makes the DVX platform so remarkable. For example, you can... 

•    Use commodity-based Flash resources in the host/server where it’s less expensive and more flexible for I/O processing. 
•    Use your own x86 hosts/servers with ESXi, RHV, or Docker as a platform of choice and flexibility for growth.
•    Upgrade performance easily with the latest and greatest flash technology in the hosts/servers. 
•    Lower East/West chatter and remove the vast majority of storage traffic from the network where continued scale can be realized and application performance isolation provides true data locality. 
•    Scale Compute and Storage truly independently.
•    Take advantage of under-utilized CPU Cores for IO processing and data services (Compression, Dedupe, Erasure Coding, Encryption) on your hosts/servers.
•    Utilize stateless hosts while still achieving data integrity, fault and performance isolation between compute and data. No quorum needed, minimum host/servers needed is only one. 
•    Get Secondary Storage in the same platform for a lower TCO.
•    Use multiple based storage controllers by virtue of every host/server introduced into the DVX platform. 

Storage Operations in a Split Architecture:

One example of these benefits in a split architecture is realized when disks die or disk (bitrot or Latent sector) errors occur on a primary or secondary storage system. A rebuild (system) operation is the process that is needed for any data system to be in an healthy fault tolerant state again. Every storage system needs to have a process to rebuild data when data corruption occurs, however, there can be side effects when rebuilds are needed. One of the most common consequences with any rebuild operation is the resource utilization needed to complete successfully. During such an operation, your primary workloads could be slowed and hampered with higher latency.  
Data rebuilds will result in lower performance, and the time to finish the rebuild can be lengthy depending on resource availability, architecture and how much data is resident for a completed healthy state again.  
Another common storage operation is the rebalancing of data when additional capacity or nodes are added to a storage system. Proper rebalancing tasks can drain system resources based on the amount of data that needs to be rebalanced. This sometimes-timely task is important to keep the pool of data available and avoid hot spots from occurring. 

Datrium’s architecture is based on a Distributed Log Structure Filesystem. This filesystem provides a platform where workloads are not hindered from decreased performance during rebuilds or other system operations. The decoupling of performance and data services from capacity has given customers the freedom to finally realize the benefits of true cloud like experience right in their own datacenter. (More on this in a future post) By moving I/O operations to the hosts and keeping data at rest in what we call a data node, we achieve the best I/O latency for applications while keeping data safe and resilient on durable/secondary storage. So, when problems or systems operations occur on durable storage, our intelligent DVX software utilizes underused CPU cores on the hosts for these operations while not stealing any CPU resources from running workloads. You can think of your hosts/servers each as a separate storage controller all working together to facilitate storage operations and system processes. 

As one adds more hosts/nodes to the DVX system, the faster rebuild and/or rebalance operations ensue. During reconstruction of data, multiple hosts/servers can help with system operations. No host to host communication is needed. Each host has its own task and operation to facilitate a faster more efficient rebuild/rebalance on the Datrium data nodes. 
One very cool example of the intelligence in the DVX software is the built-in QOS during a rebuild operation on the data node(s). This is based on the amount of failures or number disks to rebuild. For example, if only one disk fails then less resources from the hosts are needed for the rebuild operation. This is a dynamic process to facilitate the amount of urgency needed for small or larger failures and errors.  

Datrium took the best of traditional 3-tier and HCI architectures into a modern world where performance and system operations are in disagreement. A customer can now utilize open convergence to achieve the performance they never had on-premises and will never get in the cloud. Think of it as future proofing your datacenter for years to come. 

In part two I will discuss Datrium’s Encryption uniqueness and powerful encryption capabilities.