FVP Management Database Design Decisions

When deciding which database model to use for FVP, it’s important to understand what the goals are in using FVP and the growth potential for the platform. Upon installation, FVP management service builds and connects to a “prnx” SQL database instance. This database is responsible for receiving, storing and presenting performance data. All time series data for all performance charts displayed in the FVP UI are stored in this database, in addition to management metadata as it relates to configurations. Keep in mind however neither the management server nor the FVP database needs to be operational for read/write acceleration to continue during downtime. 

PernixData management server is also responsible for managing fault domain configurations and the host peer selection process for write back fault tolerance. This information is also kept current in the “prnx” database so that any host or cluster changes can be kept accurate for FVP policy changes. This is why it’s imperative that FVP maintain a connection with the vCenter server, so that inventory information can be collected and maintained. 

It was decided early in the FVP design phase not to recreate the wheel and take advantage of already robust operations in SQL server. One of these decisions was to implement SQL rollup jobs into practice for FVP. The SQL rollup job is responsible for keeping only the current valuable data while providing an average for historical reference. Instituting the SQL rollup process lowers the latency and overhead of FVP having to implement the averaging operations. This means all data stored in SQL is not moved nor massaged outside the context of SQL, this provides the security and performance benefits to FVP as an acceleration platform. 

Since part of the SQL server responsibility is to store FVP performance data, it’s important to only store as much data that is relevant and useful. Currently FVP management server only requests 20-second performance samples on all FVP clustered VM’s on each enabled host. This is run using multiple threads so that multiple CPU cores can be utilized for efficiency. During a 24-hour period a large amount of data could be archived. In this case, FVP has a purging schedule that runs every hour to purge all 20-second samples older than 24 hours. This only happens after a SQL rollup has completed within each minute and hour time period averaging the 20-second samples. 

Every minute there are 3 samples (20 seconds each) that are averaged. At the 1 Hour mark, a SQL rollup job runs and at completion FVP will purge all 20-second samples older than 24 hours. In order to view the 20-second samples before the rollup, then look at the performance statistics that are 1 hour or less in the FVP performance UI.  After the 1-hour interval all 20-second samples are discarded after the first SQL rollup and then permanently removed after the purging operation 24 hours later. 

In order to determine a proper SQL capacity for this amount of data, one needs to know how many VM’s they plan to accelerate with FVP and what the potential is for continued expansion. Currently over 80% of the “prnx” database is used to store performance related metrics and this 80% also makes up the majority of data churn within the platform. This means calculating for the 80% will provide ample room for FVP’s operations. 

The PernixData Management Server will insert 1 row (record) in the DB table every 20 seconds for each VM. This can be approximated that each VM will store ~ 1.6KB amount of data every 20 seconds. This data also takes into account the index size for each VM that is referenced. 

If considering SQL Express with a 10GB limitation, knowing the effective data added each day becomes an important piece of information. This design decision could hamper long-term storage or the acceleration of a large number of VM’s. Whether SQL Express is chosen or not, it’s a best practice to either choose “Simple” Mode or have a regular scheduled SQL backups so that log truncation can help limit the continued growth of the SQL log. 

Knowing the approximate data added to the DB each day for said number of VM’s will provide the expectancy when one would reach a 10GB capacity for SQL Express. If for example you have 100 VM’s accelerated with FVP, it will take about 400 days, but for a 1000 VM’s the limitation will be reached in as little as 40 days! 

To understand how our UI displays the averages based on the samples and purging process, below is a chart that illustrates the number samples taken and the average based on the time displayed. Keep in mind whether choosing a custom time range or using the predefined time ranges in the FVP UI, all result in the same samples and averages as indicated in the chart below. 

As you can see it’s important to not only understand the metrics that you are referencing but design appropriately for database sizing and retention, taking into account PernixData FVP’s growth within your virtual environment. 

Basic Primer: IOPS

If you are a Virtualization Admin then you most likely have had to get your feet wet when it came to learning the ever-expanding storage market. As the virtualization market has grown so has the amount of storage in the datacenter. This has put a renewed focus on understanding how storage operates in a virtualized cluster. 

In the past Memory and CPU has been something that most have gravitated toward when performance problems arose. While the ticking time bomb in the growth of the virtualized datacenter has been storage performance.

The goal of this post is to give a snapshot understanding why IOPs are one important metric to evaluate when looking at storage performance.

I’m not going to get into the detailed performance characteristics, as that’s not my intent of this post.

Basic Primer:

  • IOPS means: input/output operations per second (A way to measure storage performance on a disk, SAN, SSD, etc.)
  • As a general rule the higher IOPS the better or faster the storage is operating at.
  • The closer a disk is to CPU/Memory the faster the processing time. Network latency can be a huge factor in performance.
  • IOPs are not the only performance metric to look at: Throughput, and Latency is also very important and can affect performance. Best scenario is high IOPs, low latency and high throughput. 

One can classify the input or output operation as a chunk of data that needs to be written or read from disk. As an example, suppose an Exchange database needs to retrieve a list of mailbox objects (Get-MailboxDatabase). This transaction/workload requires information to be accessed from the disk/vmdk in the form of a set amount of inputs and outputs, which the CPU/Memory of the host system will process. How fast this happens is dependent on many things, but it can be measured through the number operations per second it takes for acknowledgement to the application.

This simple example hopefully gave you a better understanding of I/O and why it can be an area easily over looked in regards to application performance.

 If you would like to do a deep dive in understanding storage performance metrics, there is no need for me to recreate the wheel, as others have done an awesome job at telling this part of the story.