VMworld 2016 Session Vote

This year I decided to make my first application into the VMworld Call for Papers. I have been wanting to do this for some time but my time and commitments never aligned. It's been a learning experience and so we will see how it goes! So, if you have find interest in learning more about migrating workloads to VVols and all the details that go with this, then please vote for this session. 

Migrating Workloads to VMware VVols [9059]
Careful planning is needed for successful workload migration to VVol based storage. Depending on the scenario and the datacenter environment it’s important to understand what is expected and required when migrating Virtual Machines in a mixed environment of VMFS and VVols. We will look at the steps and options available in a virtualized heterogeneous storage infrastructure, in addition to available VMware partner solutions.

Tech Evangelism Commentary

Since joining DataCore as Technology Evangelist, I have been hearing many in the community asking me, “What exactly is a Tech Evangelist?” “What does a Tech Evangelist do?” I know every company most likely has a tweak or difference in how they articulate a job description or title. However, I believe there are some commonalities, which I wish to address in today’s post. 

Word Study

It is interesting that many people understand the word, “Evangelist,” to have some religious connection to it. That would be true, as I can attest to this first-hand, since my formative years were spent in a Pastor-Evangelist family. This led me later in my life to pursue a “Theology” degree where I was given additional exposure as to how an Evangelist communicates, listens, teaches, preaches, and relates in such a way as to inspire and influence the audience or Community. 

The word, Ευαγγελιστής, actually originated in ancient Greece prior to biblical times, where it was understood to mean a “reward given to a messenger for good news” or “bringer of good news.” Therefore, I can conclude the way “Evangelist” is used in the phrase “Technology Evangelist,” as laden with many values or connotations that I believe are necessary to bring about many positive benefits and results to a company wanting to educate its community.   


Another common belief that I have heard is that a Tech Evangelist is just someone in marketing. In my opinion for a Tech Evangelist to become a trusted advocate, he/she needs to have one foot in engineering and the other foot in marketing. This helps build a bridge where a deeper technical understanding of a product and/or service can properly be disseminated to the general public. However, this isn’t the only place where a Tech Evangelist roams. It’s common to see a Tech Evangelist support the goals and visioning found in Corporate, Support, Sales, Engineering, and Marketing departments. It’s important to understand and interface with each area and apply learning’s where the most help can be provided.  I like “floater” as a term in which a good Evangelist is gifted with listening skills to internally and externally both gain and craft creative content, so as to tell the story most effectively. To be most successful, I believe that a Tech Evangelist needs to work in an organization, hierarchy, or department, that allows for flexibility of thought and experimentation, so as to test the best new ways to communicate the content or message.

I love the lofty Wikipedia definition of a Tech Evangelist:

“A technology evangelist is a person who builds a critical mass of support for a given technology, and then establishes it as a technical standard in a market that is subject to network effects.”

I also believe the core part of the role of a Tech Evangelist is not just educating or communicating to the public, customers, or clients, about a particular technology, but about building an army of believers. One person cannot have a big impact in building critical mass in the industry without an army of supporters all passionately preaching the same message. This is one reason why I think one typically observes startups hiring a Tech Evangelist, or even a large corporation hiring a Tech Evangelist to focus on one product or service in a myriad of products or features. 

Types of Activities

There are many activities that a Tech Evangelist would be involved with; in fact others perform some of the same activities with different titles. The key is to explain how some form of tech works and why it matters rather than just describing a given feature. Below are just a few examples. 

  • Speak at technical conferences
  • Manage an external technical awards programs
  • Produce technical blog posts
  • Produce technical informative white papers
  • Lead social media engagements
  • Participate in the industry community 
  • Company ambassador at technical events
  • Liaison between engineering and external technical experts
  • Drive internal awareness and conversation

In order to do some of these things well, I think there are several important qualities that someone needs to obtain or naturally possess. This by no means is an exhaustive list below: 

  • Informed about the industry and community around them
  • Passionate about what they are doing
  • Authentic & Humble about what they communicate
  • Listening skills to focus on priorities
  • Speaking & Communication skills to capture the essence of an exposition

If I missed any important areas that you think would add to this list, please feel free to comment or contact me

I am listing below in alphabetic order other noted Tech Evangelists who are focused in the Virtualization or Storage industry: If I have missed someone please let me know. I will add them to the list

Update: Since many of you have sent me names to include, I have decided to create a Google Sheet. This sheet will list those that hold the "Tech Evangelist" title and those that don't hold the title but have been nominated and/or conduct activities of a Tech Evangelist in the community. If you find any errors, please let me know, I pulled details from Twitter and LinkedIN. Click Here... 

Additional reading:

Official Announcement: New Calling

I’m excited to officially announce that I joined DataCore Software as their Tech Evangelist! This is a position that I’m truly excited about as it aligns with my goals and aspirations while still engaging in technical conversations. I will be working alongside engineering and with many other great individuals at DataCore. I can’t wait to tell you about some of the hidden gems that I have found in my short time already! There is some great innovation happening at DataCore that I think will amaze many people. 

For those who are unfamiliar, DataCore provides software that abstracts storage and server data services. This abstraction provides enhanced performance, availability, storage management and much more. DataCore was the first to develop storage virtualization and has proven to be very successful. They have over 11,000 customers with over 30,000 deployments around the world. Currently DataCore’s SANsymphony-V software product line has seen 10 generations of innovations and hardening to make it truly enterprise worthy. 

There is so much to tell and so much in the works that I will have to save it for another time! 

New Book Released

I’m proud to announce that I am a co-author on a new book just released. It’s available now for download and later via print edition. Frank Denneman announced on his blog the official release, as he was the main organizer and author for this project. This is not only my first published book, but also a first as a main co-author. It was indeed exciting and challenging at the same time. I can now at some level appreciate those that have tackled such a feat, as it wasn’t the easiest thing I have ever done! :) 

Being a 300-page book that talks about the many architectural decisions when designing for storage performance, is not something for the faint of heart. The focus and desire was to articulate a deeper technical understanding of the software layers that intersect and interact with the differing hardware designs.

I would like to thank Frank for giving me the opportunity to work with him on this project. It was truly a delight and a rewarding experience!

You can directly download the book here. Enjoy!

Examining AeroFS

During VMworld 2015, I took time to learn more about AeroFS. This is a company that really caught me off guard, as I was surprised and excited to hear about a free version available with enterprise features, but before I jump into the details, here is quick company snapshot. 

Founded: 2010
Founders: Yuri Sagalov – CEO & Weihan Wang – CTO
Funding: 15.5 Million from several private investors and firms. Including the likes of Ashton Kutcher
Location: Palo Alto, CA
Customers: 2,000 Organizations – As I understand it even Apple uses them for file sharing! 

AeroFS is a file sharing/syncing software appliance that is deployed and operates locally on-premises. Think of it as your Dropbox but is run totally behind your firewall and without the added monthly expense. I was intrigued because AeroFS gives customers the flexibility to expand their storage without paying additional fees while gaining access to all the enterprise features. 

There is one feature that I think many will like. AeroFS can be deployed in two different ways. One option is to deploy an AeroFS Team Server, where a centralized copy of the data lives on dedicated server. Think of this as your traditional file server but with modern file sync capabilities. The other option is a decentralized approach, where there is no centralized copy of data, each copy is synced with peers when changes are instituted. With this deployment method, no users will notice if your centralized file server is down for any reason. 

Besides the fact that AeroFS is deployed in our own private cloud, you will happy to learn that additional security measures are resident. For one all data transmitted client-to-client and client-to-server (whether on the LAN or on the Internet) is encrypted using AES-256 with 2048-bit RSA. In addition you can manage your own PKI (public key infrastructure) with AeroFS as the root CA. 
I also found it very useful to manage public links with expiration limits and the options to password protect links. This is in addition to the ability to manage more permanent users in the AeroFS UI with two-factor authentication options. 

I found the installation very straightforward as I downloaded and installed their virtual appliance in my vSphere 6.0 U1 lab. In fact, the hardest part is making sure DNS is setup properly internally and externally. Once you have DNS setup it's important that you open the proper firewall ports for connectivity from your devices or endpoints. However this is an area where I think AeroFS could spend some more time on in my opinion. It would help new users if they were provided a blueprint of use case deployment best practices. For example, are you going to use AeroFS for internal sharing or is it going to be public facing, what should the customer think about concerning DNS, Security and redundancy?

After installation, you will have access to HTML5 management interface and the ability to deploy your end points. You will notice in my screenshot, that a lot of functionality is built into the Mac client. 

Supported Platforms:

  • OVA – vSphere & VirtualBox
  • QCow2 - OpenStack, EC2
  • VHD – Hyper-V

When it comes to supported devices, Windows/OSX and Apple/Google apps are available! This is another area where there could be some improvement. I only tried the iPhone app, but it was very basic in functionality and would have liked to see some more advanced functionality. It did work as expected and was easy to deploy just with quick scan of a QR code from my virtual appliance setup screen!

It was also with delight that I noticed they have a preview that you can download to run AeroFS on CoreOS and Docker. You can also use other flavors of Linux by using a short command line bash script to run AeroFS. 

I strongly recommend giving it a try in your lab. AeroFS has a free version with most of the features included up to 30 users. I commend AeroFS for thinking outside the box on this, as it opens the doors for lab users, and the SMB market to take advantage of a low cost, secure alternative to the Drobbox’s of the world. 





FVP Freedom Edition Launch

As you know we shipped PernixData FVP 3.0 version yesterday, but what you might not know is that we also shipped PernixData FVP Freedom Edition. This in my opinion is an exciting addition to the product family and based on the feedback we have already received it’s taking off in a major way!! Keep in mind this is totally free software with no restrictions or time limits. 


For those unfamiliar with the Freedom Edition I have outlined the supported features that come with this release. 

Supported Features
•    vSphere 5.1, 5.5 and 6.0
•    Maximum 128GB of Memory (DFTM) per FVP Cluster
•    Unlimited VM’s and Hosts
•    Write Through Configuration

If you want DFTM-Z (Memory Compression) or the ability to configure Write Back for your virtual machines then you can easily upgrade to our standard and enterprise licensing options. 

Freedom Community Forum
We are launching with the Freedom edition a brand new community forum. This is to provide support and collaboration among the Freedom users. As you might guess, we are planning to add a lot of content over the next few weeks, so the more questions or interaction you have on the forum, the more it will make it useful for the Freedom community. In order to access this forum, you can visit https://community.pernixdata.com and click sign-in. We have enabled SSO support, so all you have to use is your same PernixData download portal account and we will redirect you back into the community forum. 

If you haven’t already requested the Freedom edition, you can request access here. Once registered you will automatically receive an email with instructions on how to gain access to the software and portal. This is totally an automated process, so you will get your Freedom license key the same day you request it!!

PernixData FVP 3.0 - What's New

I’m pleased to announce that PernixData FVP 3.0 has been released to the masses! This has been a combination of many long hours by our engineering and staff in order to reach this unprecedented milestone.

Some of the highlighted features in this release are a result of a seasoned approach to solving storage performance problems while keeping a keen outlook toward what the future holds! In this post I will mention at a high-level what some of the new features are but look for more detailed posts coming soon.

Support for vSphere 6.0
We now have support for vSphere 6.0 using FVP 3.0! If you are running a previous version of FVP, you will need to upgrade to this release in order to gain full vSphere 6 support. If you are in a process of migrating to vSphere 6, we now have support for a migration plan from previous versions of ESXi running FVP. For example, FVP will support mixed environments of vCenter 6.0 with hosts running ESXi 5.1 or newer.  However keep in mind that FVP 3.0 will no longer be supporting vSphere 5.0 as a platform.

New HTML5 based User Interface
FVP 3.0 offers a completely new user experience. FVP 3.0 introduces a brand new standalone webclient where you will be able to configure and monitor all your FVP clusters. In addition, the new standalone webclient now gives you visibility into other FVP clusters that may reside in a different vCenter or vSphere cluster!!

This doesn’t mean you won’t have visibility in the vSphere webclient; we still have a plugin available that will give you the basic FVP analytics. However, all configurations and detailed analytics will only be available in the new standalone webclient.

Some may ask why we built our own webclient which I think is a valid question. The truth is that in order for us to control the full user experience for FVP we had to grow our own while still supporting the vSphere client for those quick looksee’s. I think you will be pleasantly surprised how robust and extensible the new standalone webclient is.

New Audit Log

In addition to providing FVP actions and alarms through vCenter tasks/events, FVP 3.0 now has a separate audit log. This is where you can easily see all FVP related actions and alarms for a given FVP cluster. The part I like is the ease of just doing a quick review of what’s changed without having to visit each host in vCenter.


Redesigned License Activation Process

The license activation process has been streamlined to offer greater simplicity and ease of use.  You can now activate and manage all of your licensing online through the new PernixData UI. All you need is a license key while the new FVP licensing activation process will do the rest. You also have the ability to see more details on what is licensed and what isn’t in the new UI. 

As you can see a lot of innovation has gone into this new release. In fact there is so much to reveal, I'm going to do a series posts over the next few weeks. To learn more and download FVP 3.0 release please visit: http://www.pernixdata.com/products or start a trial at: https://get.pernixdata.com/FVPTrial

Time Scale: Latency References

In a world where speed is of the utmost importance, it has become apparent to me that there is a notion of relativity in latency. In other words, one needs a form of measurement - a time scale to understand how fast something really was and/or is.

With high frequency low latency trading as depicted in this video, milliseconds is the name of the game. A loss of 2 seconds, as an example, can be a matter of losing millions of dollars or prevention of a catastrophic financial event.  

In using this example, how can one feel what 2 milliseconds feels like? Can one tell the difference between 2 or 3 milliseconds? I find it fascinating that we as humans sometimes base what is fast or slow on what it feels like. In fact, how do you measure a feeling anyway? We usually have to compare (base line) to determine if something is really faster or slower. I would argue that it’s sometimes the results or effect of latency that we measure against. In low latency trading, the effect or result can be devastating, and so there is a known threshold to not go past. However, this threshold is constantly being lowered or being challenged via competitive pressure. This means it’s important to constantly have latency references to measure against in order to determine if the combined effect will have positive or negative results.

This is why testing synthetic workloads (in order to determine what the performance is) can result in inaccuracies of what is truly fast or slow. When one tests only one workload, it’s not depicting the combined effect of all disparate workloads and their interactions as a whole.  Another inaccurate way to measure is to base decisions solely on what the end users feel is faster or slower. I know it can be interesting to see what the end-user thinks, but it’s not an accurate way to look at the whole system, as a way to measure. The results seen for all work done (may be based on a project) is a better way to measure the effect. This can obviously complicate the process of measuring, but there are places to focus that will give a more accurate look on latency effects as a whole if one follows what I call the time scale reference.

Contrasting what we deem fast historically and what is on the horizon is not just interesting but important for baselines. Proper latency measurements become important milestones to feel the full effect of speed and acceleration.

Let’s say for example you had to take a trip from Atlanta, GA to San Francisco, CA in a truck carrying peaches. You had two routes to choose from. One would take 3 days and the other route would take 6 months. Now, if you wanted to take the scenic route, and you had tons of time, without the peaches, you might want to take the longer route. However, if you took 6 months, those peaches would smell quite bad by the time you got to San Francisco!! Using real world analogies like this on a time scale that we can decipher is important in order to see the differences and the effect it may have. Now why did I choose 3 days vs. 6 months for this example? A typical Solid State Disk has an average latency around 100 microseconds. Compare that to a rotational standard hard drive at about 5 milliseconds. If I scale these as to compare how drastic the time difference is between the two, it’s 3 days for the SSD and 6 months for the standard hard drive. Now, I can really see and feel the variance between these two mediums, and why a simple choice like this can make a gigantic difference in the outcome. Let’s now take it up to another level. What if we now had the capability to travel with our truckload of peaches to San Francisco in 6 minutes instead of 3 days or better yet how about 40 seconds? Today 6 minutes is possible as it applies to standard DRAM, but 40 seconds isn’t too far off, as this is representative of the Intel and Micron announcement for 3DXPoint NAND.

If I take these latency numbers and plug them into my datacenter, then I can start to see how simple choices can really have a negative or positive impact. You may now be saying to yourself, “Well if I go with SSD’s today, then tomorrow I basically need to rip and replace my entire stack to take advantage of the newer thresholds of latency, like the new 3DXPoint NAND, or even whatever is next!!” The exciting part is that you don’t have to replace your entire stack to take advantage of the latest and greatest. Your truck carrying those peaches just needs a turbo boost applied to the engine. You don’t need to buy a new truck, which is why choosing the proper platform becomes very important. Choosing the right truck the first time doesn’t tie your hands with vendor lock-in when it comes to performance.

In conclusion, I hope that one can now understand why proper base lines need to be established, and real world measurements need to be referenced. It’s not just a feeling of what is faster. We now are past the point of recognizing the true speed of something as a feeling. It’s the cause and effect or result that determines the effective threshold. Then tomorrow the threshold could lower with new innovations, which is all the more reason to have a time scale of reference. Without a reference point people become accustomed to the world around them, missing out on what it really is like to travel from Atlanta to San Francisco in 40 seconds or less. Don’t miss out on the latency innovations of today and tomorrow; choose your platform wisely

                        My Daughter was inspired to draw this for me based on this post! :) 

FVP Color Blindness Accessibility

With much respect and detail our PernixData engineers are interested in every facet of the customer experience. Something that may seem small to others can be a big deal to some. It’s with this that PernixData thinks about every feature in a holistic manner that all can appreciate. One such feature that fits this model is providing visual accessibility to those with color blindness. With 1 in 12 men and 1 in 200 women having some form of color blindness, it becomes important that the FVP UI is readable and understandable no matter the impediment.

 It was in FVP 2.0 that we made modifications to the colors in our UI to deal with the most common forms of color blindness: Deuteranopia (~5% of males), Protanopia (~2.5% of males), and Tritanopia (~.3% of males and females). For example, the “Network Acceleration” line graph was made to be  lime green. In addition all colors were tested with “Color Oracle” and application that simulates different forms of color blindness.

In addition, we made it easy to recognize each line on a chart uniquely identifiable. This was accomplished by providing the ability to toggle lines on/off. For example, if you aren't sure which line is referring to the datastore, just toggle the others off or toggle the datastore selection off/on, and this will clearly show the datastore line.

When designing the FVP interface it was also recognized that color is used as a secondary source of information that provides further insight and impactful information to the primary source. For example, in the host/flash device visualization, the color of the tiles (red, green, yellow) indicates the state of the relevant object. If there is a problem however, alarms/warnings also show an exclamation point on the tile in addition to the coloring of the tiles.


In memory computing - a survey of implementations to date

In Memory Computing Blog Series
Part 1: In Memory Computing - A Walk Down Memory Lane

In the series’ previous post we took a walk down memory lane and reviewed the evolution of In Memory Computing (IMC) over the last few decades. In this blog post we will deep dive into a couple of specific implementations of IMC to better understand both the innovations that have happened to date and their limitations.

The Buffer Cache

Buffer caches, also sometimes referred to as page caches, have existed in operating systems and databases for decades now. A buffer cache can be described a piece of RAM set aside for caching frequently accessed data so that one can avoid disk accesses. Performance, of course, improves the more one can leverage the buffer cache for data accesses as opposed to going to disk.

Buffer caches are read-only caches. In other words only read accesses can potentially benefit from a buffer cache. In contrast, writes must always commit to the underlying datastore because RAM is a volatile medium and an outage will result in the data residing in the buffer cache getting lost forever. As you can imagine this can become a huge deal for an ACID compliant database. ☺ 

The diagrams below depict how the buffer cache works for both reads and writes.

For reads, the first access must always go to the underlying datastore whereas subsequent accesses can be satisfied from the buffer cache

For writes we always have to involve the underlying datastore. The buffer cache provides no benefits.

The fact that the buffer cache only helps reads is severely limiting. OLTP applications, for example, have a good percentage of write operations. These writes will track the performance of the underlying datastore and will most probably mask any benefits the reads are reaping from the buffer cache. 

Secondly, sizing the buffer cache is not trivial. Size the buffer cache too small and it is of no material benefit. Size it too large and you end up wasting memory. The irony of it all, of course, is that in virtualized environments the server resources are shared across VM and are managed by the infrastructure team. Yet, sizing the buffer cache is an application specific requirement as opposed to an infrastructure requirement. One therefore ends up doing this with no idea of what other applications are running on the host or sometimes even without an awareness of how much RAM is present in the server.

API based in memory computing

Another common way to leverage IMC is to leverage libraries or API within one’s applications. These libraries or API allow the application to use RAM as a cache. A prevalent example for this is memcached. You can learn more about memcached here. Here’s is an example of how one would use memcached to leverage RAM as a cache:

function get_foo(foo_id)     
foo = memcached_get("foo:" . foo_id)     
return foo if defined foo      
foo = fetch_foo_from_database(foo_id)     
memcached_set("foo:" . foo_id, foo)     
return foo 

This example shows you how you must rewrite your application to incorporate the API at the appropriate places. Any change to the API or the logic of your application means a rewrite.

Here is a similar example from SQL. 


In the SQL example note that you, as a user, need to specify which tables need to be cached in memory. You need to do this when the table is created. One presumes that if the table already exists you will need to use the ALTER TABLE command. As you know DDL changes aren’t trivial. Also, how should a user decide which tables to keep in RAM and which not to? And does such static, schema based definitions work in today’s dynamic world? Note also that it is unclear if these SQL extensions are ANSI SQL compliant.

In both cases enabling IMC means making fundamental changes to the application. It also means sometimes unnecessarily upgrading your database or your libraries to a version that supports the API you need. And, finally, it also means that each time the API behavior changes you need to revisit your application.

In Memory Appliances

More recently ISV have started shipping appliances that run their applications in memory. The timing for these appliances makes sense given servers with large amounts of DRAM are becoming commonplace. These appliances typically use RAM as a read cache only. In the section on buffer caches we’ve already discussed the limitations of read only caches. 

Moreover since many of these appliances run ACID compliant databases consistency is a key requirement. This means that writes must be logged in non-volatile media in order to guarantee consistency. If there are enough writes in the system then overall performance will begin to track the performance of the non-volatile media as opposed to RAM. The use of flash in place of mechanical drives mitigates this to some extent but if one is going to buy a purpose built appliance for in memory computing it seems unfortunate that one can only be guaranteed flash performance instead.

Most of these appliances also face tremendous startup times. They usually work by pulling all of their data into RAM during startup so that it can be cached. Instrad of using an ‘on demand’ approach they preload the data into RAM. This means that startup times are exorbitantly high when data sizes are large. 

Finally, from an operational perspective rolling in a purpose built appliance usually means more silos and more specialized monitoring and management that takes away from the tremendous value that virtualization brings to the table.


IMC as it exists today put the onus on the end user/application owner to make tweaks for storage performance when in fact the problem lies within the infrastructure. The result is tactical and proprietary optimizations that either make sub-optimal use of the infrastructure or simply break altogether over time. Wouldn’t the right way instead be to fix the storage performance problem, via in memory computing, at the infrastructure level? After all, storage performance is an infrastructure service to begin with. For more information, check out http://www.pernixdata.com 

Infrastructure Level In Memory Computing Video

Watch Satyam Vaghani, CTO and Bala Narasimhan, VP of Products, at PernixData discuss Infrastructure level In-Memory Computing.
Learn how a new standard for in memory computing is forming in the industry where existing applications can take full advantage with zero changes, while future applications can become much easier to develop once they leverage the power of infrastructure level in memory computing.
Grab a cup a tea and enjoy!

FVP Upgrades Using VUM

Starting with FVP version 2.5, a new upgrade process was introduced. As always the vSphere Update Manager could be used to deploy the FVP host extension in the perspective vSphere cluster of hosts. However prior to 2.5 the FVP upgrade process needed to be performed using the host CLI. This required the removal of the old host extension before the new host extension could be installed. Now we have a new supported method where VUM can be used to deploy a new FVP host extension and also upgrade an existing one already installed without the process of manually removing the host extension first! 

Before you begin the upgrade process for FVP, make sure you have downloaded the appropriate VIB from the PernixData support portal. These are VIBs signed and designed for only FVP upgrades using VUM. 

The upgrade also includes putting the host in maintenance mode as required for certified extension level installs. This becomes much more seamless since VUM can handle the transition in and out of maintenance mode. Additionally VUM needs to completely satisfy the compliance of the upgrade, this means a reboot is required for FVP upgrades when using the vSphere Update Manager. 

Using VUM for upgrades is different than using the simple uninstall and install method at a CLI prompt. Essentially VUM installations can not run /tmp/prnxuninstall.sh to uninstall a previous host extension version, as there is no API or scripting capabilities built-in to the VUM product. 

This is why there is a dedicated VIB strictly for the upgrade of FVP. There is no current way to perform a live installation on a running ESX boot partition. This means that a reboot is required since the backup boot partition /altbootbank is used to update the host extension. Then after a host reboot,  the new host extension will be installed to the primary boot partition /bootbank for a compliant running ESX host. 

Once the host extension has been uploaded into the patch repository, it then can be added to a custom VUM baseline, while making sure it’s selected for “Host Extension”, since any other selection would prevent the completion of the upgrade. 

Once VUM has finished scanning, staging against the custom “Host Extension” baseline, (I called mine PRNX) then remediation of the hosts can take place. This is based on each host that is labeled with an X as “non-compliant”. Once a reboot has finished the remediation process will check for host extension compliance, this will ensure that the new host extension has been fully deployed, and if that is the case VUM will report back a check mark for “compliancy” 
As you can see the new method of using VUM for not only new installations but upgrades has made it that much more seamless to have FVP start transforming your environment into an accelerated platform. 

Why I Decided Not To Put Flash In The Array

My story starts about 3 years ago, where at the time I was the Information Systems director for a large non-profit in Atlanta, GA. One of the initiatives at the time was to become 100% virtualized in 6 months; and there were obviously many tasks that needed to be accomplished before reaching that milestone. The first task was to upgrade the storage platform, as we had already surpassed the performance characteristics for the current workloads. As with any project, we looked at all the major players in the market, we ran trials, talked to other customers, and did our due-diligence for the project. It was not only important for us to be mindful of costs being a non-profit but we wanted also to be good stewards in everything we did. 

The current storage system that we were looking to upgrade was a couple 7.2K RPM, 24 TB chassis’. We had plenty of storage for our needs but latency was in the 50ms range encompassing only about 3000 IOPs. Obviously not the best to run a virtualized environment on as you can see!! We looked at the early All Flash Arrays that were just coming out and we also looked at the Hybrid Arrays, all of them promising increased IOPs and lower latency. The problem was that they were not an inexpensive proposition. So, the dilemma of being good stewards and at the same time needing single digit latency with more than 50K IOPs was a challenge to say the least. 

About the same I met a gentleman that told me some magical stories that sounded almost too good to be true! This man’s name is Satyam Vaghani, the PernixData CTO, creator of VVOLS, VAAI and VMFS.  Soon after meeting Satyam, I was given the privilege of getting my hands on an alpha build of PernixData FVP. I ran and tested the product during the alpha and beta stages at which I in turn immediately purchased and became PernixData’s first paying customer. I had never purchased a product in Beta before, but I felt this product was out of the ordinary. The value and the promise were proved even in beta, where I didn’t have to buy new storage just for performance reasons and thus saved the organization collectively over $100,000. This wasn’t a localized problem; it was an architecture problem that no array or multiple storage systems could solve. So, if I were in that position today, I’m sure the calculation over 3 years would be close to $500,000 worth of savings, do to the scale-out nature of the FVP solution. As the environment grew and became 100% virtualized I no longer would have had to think about storage performance in the same way. I no longer would have had to think about the storage fabric connections in the same way as well. Talk about a good feeling of not only being a good steward but also astonishing the CFO on what was achieved. 

This to me validated the waste and inefficiencies that occur when flash is being used at the storage layer. Disk is cheap when used for capacity and so it has never made sense to me to cripple flash performance by putting it behind a network in a monolithic box that can have it’s own constraints and bottlenecks. 

Fast forward to today where flash is now much more prominent in the industry. The story is even stronger today, how can anyone not be conscientious about spending over 100K on a single array that can only achieve 90,000 IOPs with single digit millisecond latency? When someone can buy a single enterprise flash drive for $500 that does over 50K IOPs with microsecond latency, then the question that must be asked, can you defend your decision from the CFO or CIO and feel good about it?

Don’t get me wrong; I’m not saying FVP replaces storage capacity, if you need storage capacity, then go and purchase a new storage array. However, this doesn’t mean that you have to buy an AFA for capacity reasons. There are many cost effective options out there that makes more economic sense, no matter what the dedupe or compression rates that are promised!  

My personal advice to everyone is to be a conscientious objector when deciding to put flash in the array. It didn’t make sense for me 3 years ago and still doesn’t make sense today. 

In Memory Computing - A Walk Down Memory Lane

I'm excited once again to be hosting guest blogger Bala Narasimhan VP of Products at PernixData. He will be doing a new series dedicated to Infrastructure Level In Memory Computing. This is a new way to think and design your environment for the future. Enjoy! 

In Memory Computing (IMC) has seen a revival in recent years. In this blog post we will walk down memory lane (no pun intended) and study the evolution of IMC. We will specifically focus on the role of IMC in conjunction with storage systems.

Here’s a quick timeline capturing the evolution of IMC.

IMC initially manifested itself as buffer caches in operating systems and databases. Buffer caches remain integral parts of these software systems to date. A buffer cache is a read cache for most recently accessed data. A hit in the buffer cache makes reads go faster because it avoids a disk access. Writes however must be synchronized with the underlying datastore. This means a buffer cache help with reads but never help with writes. In Memory Databases (IMDB) that hold all of the data in memory for faster query processing were first introduced in 1990s but didn’t make much of a dent. Recently IMDB have seen a revival and have become an important discussion topic in the database industry. One of the reasons for this revival has been the dramatic increase in the amount of RAM that servers can hold; surpassing a terabyte of RAM per server. IMDB, like buffer caches, work well for reads but require synchronization with a persistent media for writes and therefore incur performance penalties. The early 2000’s saw the introduction of memory caching libraries such as Memcached that allow applications to leverage the performance of RAM as a cache within the application logic. Application owners, via API, can invoke these libraries at well defined locations to use RAM as a cache. Unfortunately, any change in the underlying memory caching libraries or to the application behavior would mean a rewrite of the application altogether.

All of these approaches leverage RAM as a volatile cache. This means all these approaches are read caches and only accelerate parts of an application, if at all. More importantly, all of these approaches push the in memory computing into the application. This has a number of unfortunate side effects:

  • Applications become unnecessarily complicated. They have to maintain complicated caching logic and ensure consistency with the backing store.
  • The onus is now on the end user to figure out how best to leverage IMC. Everything from determining the appropriate buffer cache size to deciding which tables to pin in RAM is now an application owner decision.
  • The application owners do not always control the infrastructure on which the application is deployed. It is therefore unfortunate that they must decide how much memory an application should use for caching without understanding how the infrastructure (for example, servers) are going to be used and will evolve. In addition, applications owners do not always have a holistic view of other applications are running on the same server.
  • Leveraging memory management innovations become challenging. NUMA is a good example of an innovation that cannot be leveraged until an application is enhanced to leverage it even though NUMA itself is an infrastructure level innovation.

All of this has meant that the strengths of RAM as a data layer have not been fully exploited. While this may not have been an issue until now because most servers had only little amounts of RAM, say 32GB or 64GB, this is quickly changing. Servers with terabytes of RAM are not uncommon these days and this trend is only going to continue. The time has therefore come to rethink IMC from the ground up. Enhancing applications to do better read caching and putting the onus on the end user is not the answer. Future blog posts in this series will introduce a high performance, highly scalable way to leverage IMC across one’s entire infrastructure.

My Role Transition

On May 1st, I started a new role in my career path at PernixData. It is with great delight that I’m now a Product Manager within the Product Management group. I look forward to working closely with Bala Narasimhan VP of Product Management. He and I have worked together on many other projects since the early alpha stages of FVP. So, in some ways this transition will feel like the early days!

With that said, I will be running many different projects in my new role. A renewed energy around blogging and staying close to the community is at utmost importance for me. In addition I will be helping with product collateral, documentation, roadmap items, and much more. It’s with this that I thank the wonderful teams and individuals that I have had the privilege to work with! Becoming the 1st customer to Product Manager is something that I could have only dreamed of.

Feel free to reach out and connect with me anytime, as part of my focus is to stay connected to a relevancy in the field, as this can only help any product-focused endeavors!

On a side note, I also thought it was very ironic that on May 1st, my father officially retired. He has been a minister and marriage and family therapist for many years. The timing couldn’t have been predicted, so I humbly congratulate him on his promotion to retirement and thank him for the many years of taking care of his family!


Nashville VMUG Experience


Yesterday, I spent the day at the Nashville VMUG Converge event. I have to say that I been to a lot VMUG’s over the years and all have their strengths and weaknesses, but in IMHO the leadership team in Nashville have a model VMUG organization, that depicts forward thinking and focus.

This particular event wasn’t your typical user conference. It was a more focused event that the Nashville VMUG group organizes ever year. With close to 150 attendees, it’s not user conference size but also no small event either. I talked to some that drove over 4 hours to be part of the day activities!!

This year what really caught my attention was the pilot program the VMUG leadership team put together around educating the community. They invited ITT Technical Institute students & staff to attend and learn about virtualization. We ended up having over 20 students show up just to experience what a VMUG was all about and learn more about virtualization and the ecosystem that surrounds it. The plan is to expand this to other technical colleges and organizations and maybe even setup a mentoring program. This to me is one example of what the VMUG should be all about. I believe we all learn more when we mentor others and interact; it’s not all about who knows the most or who is the most popular speaker at today’s event. If you ask me, it’s about sharing with each other and becoming better at what do each day, no matter who we are.

Great organization and great event!! Thanks for the hospitality yesterday!! Oh, the food was good too!! Who doesn’t like an Ice Cream Sundae in the afternoon!!! :) 



How Can Database Users Benefit - PernixData FVP

I'm pleased to introduce you to Bala Narasimhan, VP of Products at PernixData. He has a wealth of knowledge around databases, and has authored 2 patents for memory management in relational databases. It's my pleasure to have him featured in today's post. He is officially my first guest blogger! Enjoy! 

Databases are a critical application for the enterprise and usually have demanding storage performance requirements. In this blog post I will describe how to understand the storage performance requirements of a database at the query level using database tools. I’ll then explain why PernixData FVP helps not only to solve the database storage performance problem but also the database manageability problem that manifests itself when storage performance becomes a bottleneck. Throughout the discussion I will use SQL Server as an example database although the principles apply across the board.

Query Execution Plans

When writing code in a language such as C++ one describes the algorithm one wants to execute. For example, implementing a sorting algorithm in C++ means describing the control flow involved in that particular implementation of sorting. This will be different in a bubble sort implementation versus a merge sort implementation and the onus is on the programmer to implement the control flow for each sort algorithm correctly.

In contrast, SQL is a declarative language. SQL statements simply describe what the end user wants to do. The control flow is something the database decides. For example, when joining two tables the database decides whether to execute a hash join, a merge join or a nested loop join. The user doesn’t decide this. The user simply executes a SQL statement that performs a join of two tables without any mention of the actual join algorithm to use. 

The component within the database that comes up with the plan on how to execute the SQL statement is usually called the query optimizer. The query optimizer searches the entire space of possible execution plans for a given SQL statement and tries to pick the optimal one. As you can imagine this problem of picking the most optimal plan out of all possible plans can be computationally intensive.

SQL’s declarative nature can be sub-optimal for query performance because the query optimizer might not always pick the best possible query plan. This is usually because it doesn’t have full information regarding a number of critical components such as the kind of infrastructure in place, the load on the system when the SQL statement is run or the properties of the data. . One example of where this can manifest is called Join Ordering. Suppose you run a SQL query that joins three tables T1, T2, and T3. What order will you join these tables in? Will you join T1 and T2 first or will you join T1 and T3 first? Maybe you should join T2 and T3 first instead. Picking the wrong order can be hugely detrimental for query performance. This means that database users and DBAs usually end up tuning databases extensively. In turn this adds both an operational and a cost overhead.

Query Optimization in Action

Let’s take a concrete example to better understand query optimization. Below is a SQL statement from a TPC-H like benchmark. 

select top 20 c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= ':1' and o_orderdate < dateadd(mm,3,cast(':1'as datetime)) and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue;

The SQL statement finds the top 20 customers, in terms of their effect on lost revenue for a given quarter, who have returned parts they bought. 

Before you run this query against your database you can find out what query plan the optimizer is going to choose and how much it is going to cost you. Figure 1 depicts the query plan for this SQL statement from SQL Server 2014 [You can learn how to generate a query plan for any SQL statement on SQL Server at https://msdn.microsoft.com/en-us/library/ms191194.aspx]

Figure 1
You should read the query plan from right to left. The direction of the arrow depicts the flow of control as the query executes. Each node in the plan is an operation that the database will perform in order to execute the query. You’ll notice how this query starts off with two Scans. These are I/O operations (scans) from the tables involved in the query. These scans are I/O intensive and are usually throughput bound. In data warehousing environments block sizes could be pretty large as well. 

A SAN will have serious performance problems with these scans. If the data is not laid out properly on disk, you may end up with a large number of random I/O. You will also get inconsistent performance depending on what else is going on in the SAN when these scans are happening. The controller will also limit overall performance.

The query begins by performing scans on the lineitem table and the orders table. Note that the database is telling what percentage of time it thinks it will spend in each operation within the statement. In our example, the database thinks that it will spend about 84% of the total execution time on the Clustered Index Scan on lineitem and 5% on the other. In other words, 89% of the execution time of this SQL statement is spent in I/O operations! It is no wonder then that users are wary of virtualizing databases such as these.

You can get even more granular information from the query optimizer. In SQL Server Management Studio, if you hover your mouse over a particular operation a yellow pop up box will appear showing very interesting statistics. Below is an example of data I got from SQL Server 2014 when I hovered over the Clustered Index Scan on the lineitem able that is highlighted in Figure 1.

Notice how Estimated I/O cost dominates over Estimated CPU cost. This again is an indication of how I/O bound this SQL statement is. You can learn more about the fields in the figure above here.

An Operational Overhead

There is a lot one can learn about one’s infrastructure needs by understanding the query execution plans that a database generates. A typical next step after understanding the query execution plans is to tune the query or database for better performance.  For example, one may build new indexes or completely rewrite a query for better performance. One may decide that certain tables are frequently hit and should be stored on faster storage or pinned in RAM. Or, one may decide to simply do a complete infrastructure redo.

All of these result in operational overheads for the enterprise. For starters, this model assumes someone is constantly evaluating queries, tuning the database and making sure performance isn’t impacted. Secondly, this model assumes a static environment. It assumes that the database schema is fixed, it assumes that all the queries that will be run are known before hand and that someone is always at hand to study the query and tune the database. That’s a lot of rigidity in this day and age where flexibility and agility are key requirements for the business to stay ahead.

A solution to database performance needs without the operational overhead 

What if we could build out a storage performance platform that satisfies the performance requirements of the database irrespective of whether query plans are optimal, whether the schema design is appropriate or whether queries are ad-hoc or not? One imagines such a storage performance platform will completely take away the sometimes excessive tuning required to achieve acceptable query performance. The platform results in an environment where SQL is executed as needed by the business and the storage performance platform provides the required performance to meet the business SLA irrespective of query plans.

This is exactly what PernixData FVP is designed to do. PernixData FVP decouples storage performance from storage capacity by building a server side performance tier using server side flash or RAM. What this means is that all the active I/O coming from the database, both reads and writes, whether sequential or random, and irrespective of block size is satisfied at the server layer by FVP right next to the database. You are longer limited by how data is laid out on the SAN, or the controller within the SAN or what else is running on the SAN when the SQL is executed.

This means that even if the query optimizer generates a sub optimal query plan resulting in excessive I/O we are still okay because all of that I/O will be served from server side RAM or flash instead of network attached storage. In a future blog post we will look at a query that generates large intermediate results and explain why a server side performance platform such as FVP can make a huge difference.


FVP Management Database Design Decisions

When deciding which database model to use for FVP, it’s important to understand what the goals are in using FVP and the growth potential for the platform. Upon installation, FVP management service builds and connects to a “prnx” SQL database instance. This database is responsible for receiving, storing and presenting performance data. All time series data for all performance charts displayed in the FVP UI are stored in this database, in addition to management metadata as it relates to configurations. Keep in mind however neither the management server nor the FVP database needs to be operational for read/write acceleration to continue during downtime. 

PernixData management server is also responsible for managing fault domain configurations and the host peer selection process for write back fault tolerance. This information is also kept current in the “prnx” database so that any host or cluster changes can be kept accurate for FVP policy changes. This is why it’s imperative that FVP maintain a connection with the vCenter server, so that inventory information can be collected and maintained. 

It was decided early in the FVP design phase not to recreate the wheel and take advantage of already robust operations in SQL server. One of these decisions was to implement SQL rollup jobs into practice for FVP. The SQL rollup job is responsible for keeping only the current valuable data while providing an average for historical reference. Instituting the SQL rollup process lowers the latency and overhead of FVP having to implement the averaging operations. This means all data stored in SQL is not moved nor massaged outside the context of SQL, this provides the security and performance benefits to FVP as an acceleration platform. 

Since part of the SQL server responsibility is to store FVP performance data, it’s important to only store as much data that is relevant and useful. Currently FVP management server only requests 20-second performance samples on all FVP clustered VM’s on each enabled host. This is run using multiple threads so that multiple CPU cores can be utilized for efficiency. During a 24-hour period a large amount of data could be archived. In this case, FVP has a purging schedule that runs every hour to purge all 20-second samples older than 24 hours. This only happens after a SQL rollup has completed within each minute and hour time period averaging the 20-second samples. 

Every minute there are 3 samples (20 seconds each) that are averaged. At the 1 Hour mark, a SQL rollup job runs and at completion FVP will purge all 20-second samples older than 24 hours. In order to view the 20-second samples before the rollup, then look at the performance statistics that are 1 hour or less in the FVP performance UI.  After the 1-hour interval all 20-second samples are discarded after the first SQL rollup and then permanently removed after the purging operation 24 hours later. 

In order to determine a proper SQL capacity for this amount of data, one needs to know how many VM’s they plan to accelerate with FVP and what the potential is for continued expansion. Currently over 80% of the “prnx” database is used to store performance related metrics and this 80% also makes up the majority of data churn within the platform. This means calculating for the 80% will provide ample room for FVP’s operations. 

The PernixData Management Server will insert 1 row (record) in the DB table every 20 seconds for each VM. This can be approximated that each VM will store ~ 1.6KB amount of data every 20 seconds. This data also takes into account the index size for each VM that is referenced. 

If considering SQL Express with a 10GB limitation, knowing the effective data added each day becomes an important piece of information. This design decision could hamper long-term storage or the acceleration of a large number of VM’s. Whether SQL Express is chosen or not, it’s a best practice to either choose “Simple” Mode or have a regular scheduled SQL backups so that log truncation can help limit the continued growth of the SQL log. 

Knowing the approximate data added to the DB each day for said number of VM’s will provide the expectancy when one would reach a 10GB capacity for SQL Express. If for example you have 100 VM’s accelerated with FVP, it will take about 400 days, but for a 1000 VM’s the limitation will be reached in as little as 40 days! 

To understand how our UI displays the averages based on the samples and purging process, below is a chart that illustrates the number samples taken and the average based on the time displayed. Keep in mind whether choosing a custom time range or using the predefined time ranges in the FVP UI, all result in the same samples and averages as indicated in the chart below. 

As you can see it’s important to not only understand the metrics that you are referencing but design appropriately for database sizing and retention, taking into account PernixData FVP’s growth within your virtual environment.