Global Data Efficiency

In today’s modern datacenters and hybrid cloud deployments a new breed of data efficiency is required. As organizations scale their storage architecture it becomes imperative to take advantage of economies of more and more data as it enters into one’s data lake. Not only is it important to monitor the rising egress costs but also the amount active data for consistent performance even during ingest. In this next post of my series I will look at Datrium’s Global Deduplication technology and the power it brings to the infrastructure stack to handle data growth while achieving data efficiency at the same time.  

It’s now table stakes to have localized deduplication in storage architectures, however if customers want to go to the next level of efficiency then Global Deduplication is required to compete in a multi-cloud world. Datrium decided early on to build an architecture that was built on Global Deduplication as the foundation for data integrity and efficiency. It’s with this framework that customers can have assurance for their data, while not compromising on performance and stability for their applications. 

You may be asking, with all this Global Deduplication talk, what can I expect in regard to my own data efficiency. Well, every customer and data set can be different, however looking at our call home data, we average 4.5x data reduction across our customer base. We anticipate this even going higher when our new Cloud DVX service becomes more widely used.

Within the Datrium DVX dashboard UI, a customer can look at the Global Deduplication ratios and also compare that with what data reductions they are getting on a per host basis on flash. As new data is received in RAM on each DVX enabled host, we fingerprint each block inline for local deduplication on the flash in addition to compression and encryption. Then as we write new blocks to the Datrium data node for persistence, DVX will do the Global part of the deduplication by analyzing all data from all hosts, no matter if the data originated from Linux, ESXi, Docker, etc.… we will automatically compare all data blocks. In the small lab example above, we are getting 3.9x efficiency with Global Deduplication and Compression. Keep in mind as your DVX cluster continues to get larger and wider on a single site or multi-site the amount of referenceable data increases which further provides great duplicated efficiency.

Let’s now look at some additional points on what makes Datrium’s Global Deduplication technology so powerful for organizations. 

•    Datrium uses Blockchain (Crypto-Hashing) like technology to provide verifiable correctness when determining what needs to be transmitted is absolute and required. This level of data integrity ensures that all deduplicated data is in a correct state at rest and in transit. (This is a separate post all in of its self for a later date)

•    Built-in to the new Datrium Cloud DVX fully managed service, is a completely multi-site and cloud aware solution. Datrium fingerprints all data with a crypto-hash on ingest. Let’s say for example you have two sites – primary and DR and also an archive on AWS. When data needs to be moved between sites, first the DVX software exchanges fingerprints to figure out what data needs to be copied to the remote sites. Then, only unique data is sent over to the remote sites. This automatically provides WAN optimization and a decrease in RTO’s. This result is a dramatic savings especially on cloud egress costs. 

•    Always-On Global Deduplication. Datrium provides the software intelligence to handle all data types and workloads providing data efficiency locally and globally without having to determine whether dedupe should be on or turned off for a particular workload. 

•    Datrium can seed remote sites automatically and also use existing data sets for metadata exchange for optimized replication.  As an example, current DR environments no longer have to worry about pre-seeding and can instantly take advantage of Datrium’s Global Deduplication for replication savings. 

•    Datrium Cloud DVX can use multiple sites for replication to the cloud for backup/archival. In such a use case, deduplication of the data in the cloud is truly global, where all the data across all the sites is globally deduplicated when data is stored in Amazon S3.

•    Datrium does not need to send any full’s (VM's or Files) over the wire on a periodic basis. It’s an always forever incremental process, since we always compare with what has already been sent or received from other sites, we never send a full again, it's always incremental to what's already there.

Most solutions only look at duplication from an at-target perspective or only localized fingerprints. It’s nice to see that Datrium took the extra time to put together an always-on solution that provides, local, global and over the wire deduplication.