I'm pleased to introduce you to Bala Narasimhan, VP of Products at PernixData. He has a wealth of knowledge around databases, and has authored 2 patents for memory management in relational databases. It's my pleasure to have him featured in today's post. He is officially my first guest blogger! Enjoy!
Databases are a critical application for the enterprise and usually have demanding storage performance requirements. In this blog post I will describe how to understand the storage performance requirements of a database at the query level using database tools. I’ll then explain why PernixData FVP helps not only to solve the database storage performance problem but also the database manageability problem that manifests itself when storage performance becomes a bottleneck. Throughout the discussion I will use SQL Server as an example database although the principles apply across the board.
Query Execution Plans
When writing code in a language such as C++ one describes the algorithm one wants to execute. For example, implementing a sorting algorithm in C++ means describing the control flow involved in that particular implementation of sorting. This will be different in a bubble sort implementation versus a merge sort implementation and the onus is on the programmer to implement the control flow for each sort algorithm correctly.
In contrast, SQL is a declarative language. SQL statements simply describe what the end user wants to do. The control flow is something the database decides. For example, when joining two tables the database decides whether to execute a hash join, a merge join or a nested loop join. The user doesn’t decide this. The user simply executes a SQL statement that performs a join of two tables without any mention of the actual join algorithm to use.
The component within the database that comes up with the plan on how to execute the SQL statement is usually called the query optimizer. The query optimizer searches the entire space of possible execution plans for a given SQL statement and tries to pick the optimal one. As you can imagine this problem of picking the most optimal plan out of all possible plans can be computationally intensive.
SQL’s declarative nature can be sub-optimal for query performance because the query optimizer might not always pick the best possible query plan. This is usually because it doesn’t have full information regarding a number of critical components such as the kind of infrastructure in place, the load on the system when the SQL statement is run or the properties of the data. . One example of where this can manifest is called Join Ordering. Suppose you run a SQL query that joins three tables T1, T2, and T3. What order will you join these tables in? Will you join T1 and T2 first or will you join T1 and T3 first? Maybe you should join T2 and T3 first instead. Picking the wrong order can be hugely detrimental for query performance. This means that database users and DBAs usually end up tuning databases extensively. In turn this adds both an operational and a cost overhead.
Query Optimization in Action
Let’s take a concrete example to better understand query optimization. Below is a SQL statement from a TPC-H like benchmark.
select top 20 c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= ':1' and o_orderdate < dateadd(mm,3,cast(':1'as datetime)) and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue;
The SQL statement finds the top 20 customers, in terms of their effect on lost revenue for a given quarter, who have returned parts they bought.
Before you run this query against your database you can find out what query plan the optimizer is going to choose and how much it is going to cost you. Figure 1 depicts the query plan for this SQL statement from SQL Server 2014 [You can learn how to generate a query plan for any SQL statement on SQL Server at https://msdn.microsoft.com/en-us/library/ms191194.aspx]