Time Scale: Latency References

In a world where speed is of the utmost importance, it has become apparent to me that there is a notion of relativity in latency. In other words, one needs a form of measurement - a time scale to understand how fast something really was and/or is.

With high frequency low latency trading as depicted in this video, milliseconds is the name of the game. A loss of 2 seconds, as an example, can be a matter of losing millions of dollars or prevention of a catastrophic financial event.  

In using this example, how can one feel what 2 milliseconds feels like? Can one tell the difference between 2 or 3 milliseconds? I find it fascinating that we as humans sometimes base what is fast or slow on what it feels like. In fact, how do you measure a feeling anyway? We usually have to compare (base line) to determine if something is really faster or slower. I would argue that it’s sometimes the results or effect of latency that we measure against. In low latency trading, the effect or result can be devastating, and so there is a known threshold to not go past. However, this threshold is constantly being lowered or being challenged via competitive pressure. This means it’s important to constantly have latency references to measure against in order to determine if the combined effect will have positive or negative results.

This is why testing synthetic workloads (in order to determine what the performance is) can result in inaccuracies of what is truly fast or slow. When one tests only one workload, it’s not depicting the combined effect of all disparate workloads and their interactions as a whole.  Another inaccurate way to measure is to base decisions solely on what the end users feel is faster or slower. I know it can be interesting to see what the end-user thinks, but it’s not an accurate way to look at the whole system, as a way to measure. The results seen for all work done (may be based on a project) is a better way to measure the effect. This can obviously complicate the process of measuring, but there are places to focus that will give a more accurate look on latency effects as a whole if one follows what I call the time scale reference.

Contrasting what we deem fast historically and what is on the horizon is not just interesting but important for baselines. Proper latency measurements become important milestones to feel the full effect of speed and acceleration.

Let’s say for example you had to take a trip from Atlanta, GA to San Francisco, CA in a truck carrying peaches. You had two routes to choose from. One would take 3 days and the other route would take 6 months. Now, if you wanted to take the scenic route, and you had tons of time, without the peaches, you might want to take the longer route. However, if you took 6 months, those peaches would smell quite bad by the time you got to San Francisco!! Using real world analogies like this on a time scale that we can decipher is important in order to see the differences and the effect it may have. Now why did I choose 3 days vs. 6 months for this example? A typical Solid State Disk has an average latency around 100 microseconds. Compare that to a rotational standard hard drive at about 5 milliseconds. If I scale these as to compare how drastic the time difference is between the two, it’s 3 days for the SSD and 6 months for the standard hard drive. Now, I can really see and feel the variance between these two mediums, and why a simple choice like this can make a gigantic difference in the outcome. Let’s now take it up to another level. What if we now had the capability to travel with our truckload of peaches to San Francisco in 6 minutes instead of 3 days or better yet how about 40 seconds? Today 6 minutes is possible as it applies to standard DRAM, but 40 seconds isn’t too far off, as this is representative of the Intel and Micron announcement for 3DXPoint NAND.

If I take these latency numbers and plug them into my datacenter, then I can start to see how simple choices can really have a negative or positive impact. You may now be saying to yourself, “Well if I go with SSD’s today, then tomorrow I basically need to rip and replace my entire stack to take advantage of the newer thresholds of latency, like the new 3DXPoint NAND, or even whatever is next!!” The exciting part is that you don’t have to replace your entire stack to take advantage of the latest and greatest. Your truck carrying those peaches just needs a turbo boost applied to the engine. You don’t need to buy a new truck, which is why choosing the proper platform becomes very important. Choosing the right truck the first time doesn’t tie your hands with vendor lock-in when it comes to performance.

In conclusion, I hope that one can now understand why proper base lines need to be established, and real world measurements need to be referenced. It’s not just a feeling of what is faster. We now are past the point of recognizing the true speed of something as a feeling. It’s the cause and effect or result that determines the effective threshold. Then tomorrow the threshold could lower with new innovations, which is all the more reason to have a time scale of reference. Without a reference point people become accustomed to the world around them, missing out on what it really is like to travel from Atlanta to San Francisco in 40 seconds or less. Don’t miss out on the latency innovations of today and tomorrow; choose your platform wisely

                        My Daughter was inspired to draw this for me based on this post! :) 


FVP Tip: Change Storage Device Display Name

As you might know, you have the ability to change a storage device display name on a particular ESXi host. This can be useful when you have several different devices installed on a given host and/or have different raid controllers backing the devices.

When you are wanting to test several different flash device models with different controllers and configurations with PernixData FVP, then it might become difficult to remember which identifier is which device. 

It's my reccomendation that you add the name of the controller as an extension to a friendlier device name. This way you can monitor performance by SSD with assigned controller.  An example could be: “Intel 520 – H310” The SSD model is represented and the controller is identified as a H310 for a Dell host.

 

 

vSphere Web Client Steps:

  1. Browse to the host in the vSphere Web Client navigator. Click the Manage tab and click Storage.
  2. Select the device to rename and click Rename. 
  3. Change the device name to a name that reflects your needs.

 

Now that you have renamed your flash device then you will see the changed device names show up in the FVP Plugin UI. 

Features of an Enterprise SSD

When looking for a flash device to use for PernixData FVP or other enterprise use cases, performance and reliability are important aspects to factor in. Just because a drive is spec’d with high IOPs and low latency numbers, doesn’t mean that it will keep up at that rate over time with enterprise workloads.

I would guess that most of you would prefer a consistent performing, reliable flash to higher IOPs or lower latency.  This is one reason why I like the Intel S3700 SSD. This drive does a good job at repeatable results and withstands heavy workloads over time. I’m not saying this drive or others are slow, these drives are still very fast, but they do favor consistency and reliability by design.

  

A little over a year ago Intel introduced a technology that enhanced the reliability of MLC flash. Intel called it HET – High Endurance Technology. This is basically an enhancement in firmware, controller and high-cycling NAND for endurance and performance. The optimization was in error avoidance techniques and write amplification reduction algorithms. The result is new enterprise SSD’s that are inexpensive and deliver good performance at predictable behavior. Keep in mind though that not all Intel drives have HET, this is what separates consumer from enterprise class drives.

This is one reason why Intel can claim “10 full drive writes per day over the 5-year life of the drive”. You will also notice that other manufactures/vendors OEM and incorporate Intel’s 25nm MLC HET NAND into their products. The incorporation of HET set’s Intel apart from the rest, but this doesn’t mean however that there are not others to choose from. It’s when you factor price, reliability, performance, and customer satisfaction that currently leads many to the S3700. 

The other important aspect to consider when looking for an enterprise SSD is read/write performance consistency. Some drives are architected just for read performance consistency. So if you have workloads that are balanced between read/write, or are write heavy then you want to look at a drive that provides consistency for both read and write.

As an example, the Intel S3500 gives better read performance consistency while the Intel S3700 gives consistency for both read and write. (Keep in mind that the Intel S3500 doesn't use HET)

 

Intel S3500 

 

Intel S3700

 

I reccomend taking a look at Frank Denneman's current blog Series that goes into some other aspects of flash performance with FVP. 

 


Changing SSD "Drive Type" Error

I recently was reading William Lam's and Duncan Epping's blog posts about the issue of vSphere not correctly reading local SSD drive types on a host. If a SSD is installed behind a RAID controller, the drive type will often show "Non-SSD".

I followed the steps on changing the drive type or as Duncan calls it "Faking an SSD in your Virtualized vSphere lab"!! During this process I received an error message, when running the add rule command.

 

esxcli storage nmp satp rule add --satp VMW_SATP_LOCAL --device naa.60024e805cb75f001886b896132d69ff --option=enable_ssd

 

"Error adding SATP user rule: Duplicate user rule found for SATP VMW_SATP_LOCAL"

 

After doing some digging, I realized that this drive already had a user rule setup. In order to see all user defined rules, run this command.

 esxcli storage nmp satp rule list

(You will notice the rules that have be defined by you will be labeled "user")

 

All I had to do is run the remove command to remove the duplicate user defined rule.

 

esxcli storage nmp satp rule remove --satp VMW_SATP_LOCAL --device naa.60024e805cb75f001886b896132d69ff

 

Then you can run the add rule and reclaim commands as before and all will be successful! 

 

esxcli storage core claiming reclaim -d naa.60024e805cb75f001886b896132d69ff

 

I also found out that you don't have to create a VMFS datastore to run these commands. They worked as available SSD drives as well.

 

Synology RS10613XS+ and RX1213sas

At VMworld 2012 this year, I was able to get a glimpse of the next generation SANS from Synology. The biggest excitement with these new SANS, is the ability to have SSD Read Cache. Since we now have VAAI available, these units have now become enterprise worthy. It's expected that they will also be available in a couple months for under $10k. 

Check out some of the specs and options: 

  • More than 2000 MB per sec., as well as ultra-high transmission efficiency of more than 200,000 IOPS
  • Synology RX1213sas - Up to expand capacity to more than 400TB
  • Supports two 10GbE network port (using the add-in network adapter compatible with PCI-E)
  • Compatible with VMware ® / Citrix ® / Microsoft ® / Hyper-V ®
  • Passively cooled CPU and system fan redundancy mechanism
  • The scalable ECC memory (up to up to 8GB)
  • The operating system Synology DiskStation Manager (DSM)