Performance tuning is often considered as a complex technique to master. However, it is not as complex as often perceived. In this blog post, we will discuss one of the most infamous operators – Lazy Spool – and how to optimize the query containing it.

The Common Practice

One of the most common practices in tuning queries is to enable the actual (or estimated) execution plan and optimize any operator that has the highest cost. You can enable execution plan for any query by either pressing CTRL + M (Key Board Shortcut) or by going to Toolbar >> Query >> Include Actual Execution Plan in SQL Server Management Studio (SSMS).

Once the query renders execution plan, users visually observe the execution plan and try to find the operator that has taken the maximum cost for the query and try to optimize that operator. This method is a very common practice among the SQL Server Performance Tuning Experts, and it usually works for most of the common scenarios.

However, if you come across the lazy spool operator in your query, this common practice will not be as effective.

Lazy Spool

Many people confuse Lazy Spool with the Eager Spool operator, but they are very different in several ways. Let us first understand what Lazy Spool operator is.

The spool operators in a query plan suggest that query requires storing data for reuse during the execution of it. To achieve the goal of storing data during the query lifecycle, SQL Server uses Temporary (Temp) Tables. As SQL Server stores the data in the Temp Table, which is essentially created in the TempDB, it creates additional overhead (read/write) to disks. The usage of the TempDB eventually becomes very expensive for queries and often leads to poor performance for the query.

Lazy Spool is a non-blocking operator, but when there is a large amount of data to be processed, it turns out to be a very expensive operator. This operator does not stop the flow of the data while it is loaded and loads the data in the “as on request”. Hence it is called “Lazy” Spool.

Real World Example

One of my recent customers had one query which had been running for over an hour, and after careful research, we figured out that the query had more than one Lazy Spool operator which caused the poor performance of the query.

I have recreated a similar query using a sample database, WideWorldImporters. Let’s take a look at the query and its execution plan along with the Statistics IO.

SET STATISTICS IO ON

GO

USE WideWorldImporters

GO

-- Sample Query

SELECT [InvoiceID], [OrderID]

FROM [Sales].[Invoices] o

WHERE [TotalDryItems] = (SELECT AVG([TotalDryItems])

FROM [Sales].[Invoices] o1

WHERE o.[CustomerID] = o1.[CustomerID]

GROUP BY [CustomerID])

GO

When you run the above query, it will give you the following execution plan.

If you look at the execution plan carefully, you will find three different Lazy Spool operators, each of which take the cost of 0% compared to the entire execution plan.

Upon looking at the messages tab, you can see that it is using TempTable (Worktable is TempTable) and spooling quite a lot of data into TempDB.

(17214 rows affected)

Table 'Invoices'. Scan count 9, logical reads 11994, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 24, logical reads 143680, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

Lazy Spool Optimization

If you follow the common practice to tune the query, you will focus on the most expensive operator which is a clustered index scan. However, as you have read in this blog post, I am sure you understand that in our case, the real culprit is the Lazy Spool operator and there are three instances of the Lazy Operators.

To tune this query, we need to make sure that it does not have to load data into the TempDB lazily or eagerly. There are many different ways one can improve the query where it contains the Lazy Spool, but I prefer to fix the Lazy Spool with the help of Indexes.

Please note that when you create a new index, you need to be very careful that you do not create too many indexes for your table. You may eventually harm your table with insert/update queries.

In our query, there are two WHERE conditions. One WHERE condition is on CustomerID and another one on TotalDryItems. Let us create a new index which has key columns as both of these columns and keep the rest of the columns in the Include and see how the query behaves.

As we have two columns, we are not sure which to keep as the first key column in the index. Hence we will run the following command to see which has higher distinct values.

SELECT COUNT(DISTINCT [TotalDryItems]) AS [CountTotalDryItems],

              COUNT (DISTINCT [CustomerID]) As [CountCustomerID]

FROM [Sales].[Invoices]

GO

The above query returns the following results:

CountTotalDryItems CountCustomerID

------------------   ---------------

6                     663

As column CustomerID has a higher distinct value (essentially higher cardinality), we will keep that column as the first column in our new index.

CREATE NONCLUSTERED INDEX [IX_FirstTry]

              ON [Sales].[Invoices]

              ([CustomerID] ASC, [TotalDryItems] ASC)

INCLUDE ([InvoiceID], [OrderID])

GO

 

Now we have created a new index, let us run our query and see the performance.

First we’ll examine the execution plan. 

 

It is clear from the execution plan, that even though the index scan is happening, instead of using a clustered index, our query is using our nonclustered index which is narrower than the clustered index. Additionally, it is important to note that there are no Lazy Spool operators.

Now let us go and inspect the IO statistics.

Table 'Invoices'. Scan count 2, logical reads 396, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

From the statistics, you can see now that there is no worktable and hence the query does not have to go to Temp DB. Additionally, the logical read on the Invoices table has reduced from 11994 to just 396. The overall IO performance for this query has improved multifold.

 

Final Note

In this blog post, we have seen that, sometimes, to improve query performance we need to think outside of the box. If we had followed the common practice of optimizing the most expensive operator, we might have taken a longer route to tune the query. In this example, we fixed the operator with one of little cost which produced lots of IO read and overall improved performance.

Just like this example, I plan to discuss many other interesting cases in my pre-con session, 21 Essential Scripts: Jump-Start Performance Tuning for Accidental DBAs and regular session at PASS Summit 2018.

If you are facing any issues with SQL Server Performance, you can always reach out to me at pinal@sqlauthority.com.  I am always willing to help resolve your SQL Server performance mysteries.