Azure Stack Ignite Notes

As I noted in yesterday’s post, I have been intrigued by Microsoft’s approach to Azure Stack. I took a lot of notes during sessions and conversations at Ignite and so I decided to list them out in a new post. For those that are unfamiliar with Azure Stack, then I would suggest this whitepaper as a starting point. 

In a nutshell, Azure Stack is Microsoft’s hyper-converged integrated system for flexible service delivery using Azure based management and automation. It’s Microsoft’s cloud in a box using Azure’s framework to ultimately Keep a consistent experience. Keep in mind this product is not GA yet, so anything I state below may or may not come to fruition. 

Availability: 
Microsoft at Ignite 2016, announced the 2nd technical preview of Azure Stack, with general availability expected in the second half of 2017. I also heard an expectation of TP3 during the 1st quarter of 2017. 
Currently Azure Stack offers IaaS but later this month Microsoft plans to release an update where Microsoft’s PaaS “App Service Resource Provider” will be available on Azure Stack TP2. 
 

Architecture: 
•    Built on top of Server 2016 Server Core, (Nano Server will be supported in the future) However, TP2 is currently using Server 2012 R2 Server Core. 
•    Hyper-Converged Deployment model with a pair of ToR switches and a BMC switch to manage the server integrated stack. 
•    Hyper-V is sealed under Azure Stack, only access to Hyper-V is from API. 
•    Windows Server Failover Clustering is used along with Storage Spaces Direct. 
•    Minimum Resource Req.: Dual Socket, 8 cores per socket. 256GB memory. 
•    RDMA will be supported - converged NIC: SDN + Storage, or use two 10Gb nics, with switch-embedded teaming for port link residency.  
•    Storage Protocol Support: CSVRFS with ReFS, SMB3, SMB Direct, SATA, SAS, NVMe
•    Multi-Resilient Volume: writes are mirrored for performance then as the data gets cold it will then write a large chunk to parity for a balanced approach of performance and capacity. 

Caching Capabilities:
•    All writes up to 256kb are cached. 
•    Reads of 64kb or less are cached on first miss. 
•    Reads of 64kb + are cached on second miss.  
•    Writes are de-staged to HDD in a optimal order. 
•    Sequential reads of 32 + KB are not cached. 
•    In an All Flash System, only Writes are cached
•    Min. req.: 2 cache devices with Min. req.: 4 capacity devices  
 

Integrated Stack:
I heard a lot of people say that they didn’t like the direction of an integrated system requirement for Azure Stack. The belief is that it will lock them into a hardware vendor that may not provide enough flexibility or choice. I attended a few sessions where Microsoft heard this feedback loud and clear, and so it was stated that the current plan is to continue and partner with Dell, HPE, Lenovo to provide the hardware for a complete certified Azure Stack integrated system. It’s after this, where Microsoft hopes additional testing and assurance can be made on a defined HCL, Microsoft hopes to offer Azure Stack as a production ready software solution for a wider array of hardware platforms. I personally had hoped for a software only solution, but I do see why an integrated system will allow Microsoft to control the experience and deliver on the expected promises. I just hope that the integrated system is not priced out of reach of most customers, as we have seen similar examples of this in the past! 

An Integrated System is part of what Microsoft calls a Scale Unit. Each Scale unit must include homogenous hardware; however separate scale units can be heterogeneous hardware. Each scale unit must include at least 4 Azure Stack nodes, with one node dedicated for patch and updates. Each scale unit must also include the items in this list. 

•    A pair of ToR Switches for availability and internal networking. 
•    One BMC switch for hardware management
•    An Aggregate switch for external connections

Each scale unit is part of one region, (same physical location), multiple scale units can be part of the same region as well and so are part of the same fault domain. 

The cool part is that you can designate your Azure Stack deployment as a region and in the future this could become part of Azure when deploying new workloads with the option of not only designating Azure regions but your own on-premises Azure Stack regions. 

One surprise that I found was that the Azure Stack software framework is actually run in virtual machines on top of Hyper-V.  I guess this surprised me because I thought that Azure Stack would have been developed as an application with native hooks into the server operating system. I can certainly understand why they chose this path, but it also makes me wonder about performance in this type of design. This of course can be easily rectified by an optimization product like from DataCore software! :) 

Currently authentication is joined through Azure, but plans to support Facebook, Twitter, MS Account, & AD will be supported for authentication and authorization on Azure Stack. 

If and when you use Azure Stack App Service you will have the ability to deploy new internal assemblies, like Go, Ruby, Java and even components like Oracle Clients. These are types of things that Azure Stack will have that the Azure public app service won't have support for. 

As you can see there is a lot going on here, but I'm only touching the surface. I hope to provide more interesting notes about the Azure Stack Architecture as I have more time to play with TP2. 

 

 

Microsoft Ignite 2016 Experience

Last week I attended Microsoft Ignite 2016 here in Atlanta. It was nice to have this event in my own back yard as its always a bonus to sleep in your own bed at night!  This was the first Microsoft Ignite/TechEd I’ve been to in a while and here are a couple things that I found motivating… 

Here come the Women!!

I have never seen so many women in attendance at a IT convention, not saying I have been to all of them, but I was encouraged to see so many women participating and attending infrastructure, virtualization, and cloud native sessions. Yes, we still had long restroom lines, but it was clear we weren’t alone. I don’t know the percentages, but I wouldn’t be surprised to see that at least 10% of the attendees were women! I find this extremely encouraging as this industry definitely needs a better balance! 

The new Microsoft – 

With 23k plus at Ignite, it was noticeable that the new Microsoft (under Satya Nadella) has some new faces under the cloud native umbrella. It was also evident that a lot of the old faces of Microsoft’s yester-year were present. Microsoft knows that it has a lot to do to convince and educate those that are resistant or reluctant to change. For example, there were a few sessions that were geared to organizational change and digital transformation to the cloud. Not only were the sessions introductory, but educational in hopes to ease fears of the public cloud.  

Since Microsoft has such a wide base of traditional applications and is deeply engrained into on-premises infrastructures, this obviously creates a unique challenge but also provides an opportunity for Microsoft in the new era of cloud. 

Introducing the opportunity: Azure Stack. I believe Azure Stack has the potential to not only put Microsoft on top of cloud revenues but change the competitive direction of AWS and GCP. However, with a longer than planned delays and/or technical missteps Microsoft could leave enough room and time for competitive challengers to enter this new product category. 

Stay tuned for my next post on the Azure Stack architecture, and why I think this new product could accelerate the move to cloud native applications. There are some exciting and noteworthy architectural features that I think will surprise you!  

Great job Microsoft on a well organized event this year? I'm looking forward to attending next year in Orlando, FL. 

FVP Management Database Design Decisions

When deciding which database model to use for FVP, it’s important to understand what the goals are in using FVP and the growth potential for the platform. Upon installation, FVP management service builds and connects to a “prnx” SQL database instance. This database is responsible for receiving, storing and presenting performance data. All time series data for all performance charts displayed in the FVP UI are stored in this database, in addition to management metadata as it relates to configurations. Keep in mind however neither the management server nor the FVP database needs to be operational for read/write acceleration to continue during downtime. 

PernixData management server is also responsible for managing fault domain configurations and the host peer selection process for write back fault tolerance. This information is also kept current in the “prnx” database so that any host or cluster changes can be kept accurate for FVP policy changes. This is why it’s imperative that FVP maintain a connection with the vCenter server, so that inventory information can be collected and maintained. 

It was decided early in the FVP design phase not to recreate the wheel and take advantage of already robust operations in SQL server. One of these decisions was to implement SQL rollup jobs into practice for FVP. The SQL rollup job is responsible for keeping only the current valuable data while providing an average for historical reference. Instituting the SQL rollup process lowers the latency and overhead of FVP having to implement the averaging operations. This means all data stored in SQL is not moved nor massaged outside the context of SQL, this provides the security and performance benefits to FVP as an acceleration platform. 

Since part of the SQL server responsibility is to store FVP performance data, it’s important to only store as much data that is relevant and useful. Currently FVP management server only requests 20-second performance samples on all FVP clustered VM’s on each enabled host. This is run using multiple threads so that multiple CPU cores can be utilized for efficiency. During a 24-hour period a large amount of data could be archived. In this case, FVP has a purging schedule that runs every hour to purge all 20-second samples older than 24 hours. This only happens after a SQL rollup has completed within each minute and hour time period averaging the 20-second samples. 

Every minute there are 3 samples (20 seconds each) that are averaged. At the 1 Hour mark, a SQL rollup job runs and at completion FVP will purge all 20-second samples older than 24 hours. In order to view the 20-second samples before the rollup, then look at the performance statistics that are 1 hour or less in the FVP performance UI.  After the 1-hour interval all 20-second samples are discarded after the first SQL rollup and then permanently removed after the purging operation 24 hours later. 

In order to determine a proper SQL capacity for this amount of data, one needs to know how many VM’s they plan to accelerate with FVP and what the potential is for continued expansion. Currently over 80% of the “prnx” database is used to store performance related metrics and this 80% also makes up the majority of data churn within the platform. This means calculating for the 80% will provide ample room for FVP’s operations. 

The PernixData Management Server will insert 1 row (record) in the DB table every 20 seconds for each VM. This can be approximated that each VM will store ~ 1.6KB amount of data every 20 seconds. This data also takes into account the index size for each VM that is referenced. 


If considering SQL Express with a 10GB limitation, knowing the effective data added each day becomes an important piece of information. This design decision could hamper long-term storage or the acceleration of a large number of VM’s. Whether SQL Express is chosen or not, it’s a best practice to either choose “Simple” Mode or have a regular scheduled SQL backups so that log truncation can help limit the continued growth of the SQL log. 

Knowing the approximate data added to the DB each day for said number of VM’s will provide the expectancy when one would reach a 10GB capacity for SQL Express. If for example you have 100 VM’s accelerated with FVP, it will take about 400 days, but for a 1000 VM’s the limitation will be reached in as little as 40 days! 

To understand how our UI displays the averages based on the samples and purging process, below is a chart that illustrates the number samples taken and the average based on the time displayed. Keep in mind whether choosing a custom time range or using the predefined time ranges in the FVP UI, all result in the same samples and averages as indicated in the chart below. 

As you can see it’s important to not only understand the metrics that you are referencing but design appropriately for database sizing and retention, taking into account PernixData FVP’s growth within your virtual environment. 

Microsoft's V-Tax

Yes, that's right, after Microsoft called VMware's licensing a V-Tax. Microsoft has decided to follow VMware in licensing their Windows Server 2012 for Hyper-V similar to VMware's model. Now both VMware and Microsoft license their Hypervisors per processor. Now, I know to be completly fair, Microsoft doesn't have a memory entitlement like VMware does, but most users will never reach the memory limitations per processor that have been outlined by VMware.

Basically if you want to have more than 10 VM's running on Hyper-V, you will need the the new Datacenter Edition, which is licensed per processor and gives you unlimited VM's. If you are running under 10 VM's, then it's probably cheaper to purchase the standard edition that gives you 2 VM's per processor. This means most enterprise customers will need to purchase the datacenter edition which retails for a whopping $4809.00 plus you need to purchase any needed CALS.

This is really smart on Microsoft's part, because if you are a current VMware customer then it now costs a lot more to run VMware, if you are going to use Microsoft Server 2012 VM's. On the other end of spectrum, it's not really smart for Microsoft when considering their customers, because it now cost more to run Windows on vSphere.

So, in summary if you have a dual processor physical server and you purchase VMware vSphere Enterprise Plus at $4543.50 x2 and then want to have Microsoft Server 2012 with at least two VM's, you will need to fork out $4809.00 x 2 for datacenter edition to be properly licensed.

Total Retail cost: $27,720.00

As you can see this can get very expensive and really isn't good for Microsoft or VMware in the long run, because the customer is the one caught in the middle of this battle.

What say you...???

Download Microsoft's Server 2012 License Datasheet