The official PASS Blog is where you’ll find the latest blog posts from PASS community members and the PASS Board. Contributors share their thoughts and discuss a wide variety of topics spanning PASS and the data community.

How To Start Troubleshooting SQL Server Performance

As a Solutions Engineer for SentryOne, I get to meet SQL Server data professionals daily and talk to them about the challenges that they face. One central theme among all of these conversations is performance troubleshooting. I’d like to take this opportunity to share some thoughts on how to approach a SQL Server performance issue.

When I was formulating this article, I wondered “what do data professionals do FIRST” when faced with a report of an immediate production issue? I took to Twitter and ran this informal poll:

Caption: https://twitter.com/SQLBek/status/1164554586885185537

In addition to the poll results, I received several dozen “Other” answers, most of which involved custom DIY scripts. I appreciated all of the fantastic responses and it helped to anecdotally reinforce the popularity of Adam Machanic’s sp_whoisactive (http://whoisactive.com/).

Okay – Now What?

While answering the question of “what does one do first” is easy, things get much more difficult thereafter. What if the results of your first inquiry do not reveal an obvious root cause? Will your initial findings definitively point you in the right direction? What if your initial findings leave you asking more questions than you started with? Or even worse, what if they lead you towards a false positive and dead end? That’s wasted time and, when Production is suffering, time is money.

In my conversations with other data professionals, I find that many will resort to “hunt & peck” if their initial findings do not yield a definitive culprit - and I can relate to that. I remember when I was a junior DBA, I knew how to gather a wide variety of metrics but had not yet learned a robust process.

Nowadays, I have a preferred approach which I advocated for in my 2018 PASS Summit lightning talk called, “SQL Server Performance Troubleshooting: Bottom to Top… or Top to Bottom?”. In that talk, I present a sample scenario where CPU is running hot, and I first use sp_whoisactive to discover what is running.

Caption: sp_whoisactive – Note two SPIDs that have run for +47 minutes with high CPU usage.

The results show a few long running queries and many other SPIDs with fast running queries. Many will immediately focus on those two SPIDs and spend time dissecting them further, but in this case, the two long running queries were not the source of the high CPU usage. This is a red herring.

Caption: PerfMon - Note an average of 790 Batch Requests/sec.

Switching gears, I next take a top-down approach, evaluating overall server resource utilization. Use of PerfMon helps to identify a critical clue – very high Batch Requests/sec. In the end, we uncover that the initially ignored fast running queries are the true root cause of the CPU pressure. My sample workload was driving hundreds of fast running queries. While each individual query was fast and consumed only a small amount of CPU, the huge volume of those fast queries is what caused high CPU.  

Moral of the Story?

There are two takeaways from this example scenario. First, many will only focus on single, resource intensive queries, but I argue that you need to assess your workload as a whole. You may have a high volume, fast query, workload that also drives tremendous resource overhead.

Second, it's great to have a set of tools like sp_whoisactive, perfmon, DMV queries, etc., but do you have a process to properly bring it all together?  Which do you go to first, second, then third? Imagine a flow chart; if you see X, do you know immediately to go to Y then Z? Or if Not X, then proceed A then B? For example, when I am teaching people about SentryOne, I don't simply show how to use our monitoring solution. I teach a workflow, that coincides with my preferred holistic, top-down approach.

Call to Action

Take a few minutes to stop and consider what your processes are today. Consider not only what tools are at your disposal, but what tools you may be missing as well? If you work with a team, is everyone proficient with all available tools? Do you all follow the same steps or does everyone have their own workflow? Can they all be merged together to create a more robust troubleshooting workflow?

Those are some key questions that you should ask yourself and your colleagues. Next, if you're part of a team, meet, get feedback, then codify it.

Your preparation today will empower you to more rapidly and effectively solve problems you encounter tomorrow.

Andy Yun
About the author

Andy Yun is a SentryOne Senior Solutions Engineer and a Microsoft MVP. He has been working with SQL Server for over 15 years as both a Database Developer and Administrator. Leveraging knowledge of SQL Server Internals and extensive experience in highly transactional environments, he strives to make T-SQL leaner and meaner. Andy is extremely passionate about passing knowledge onto others, regularly speaking at User Groups, SQL Saturdays, and PASS Summit. Andy is a co-founder of the Chicago SQL Association, co-Chapter Leader of the Chicago Suburban User Group, and part of the Chicago SQL Saturday Organizing Committee.

Please login or register to post comments.

Back to Top
cage-aids
cage-aids
cage-aids
cage-aids