The official PASS Blog is where you’ll find the latest blog posts from PASS community members and the PASS Board. Contributors share their thoughts and discuss a wide variety of topics spanning PASS and the data community.

What Are the Different Types of User Defined Functions in SQL Server?

A few years ago, I did quite a few query tuning engagements. Many of my clients were small software companies who typically did not have a SQL Server DBA on staff. I looked for several anti-patterns, or what I like to call “red flags,” to help find ways that I could help improve performance. One common pattern I saw was the use of user defined functions (UDF).

Just the mention of UDFs in the SQL Server community makes folks cringe. UDFs can be “bad things,” and, to the optimizer, they are often black boxes. It’s also not easy to tell when they are bad by looking at the typical tools used for query tuning like execution plans. In fact, the bad UDFs hide what is really going on from the plan.

There are three types of UDFs: Scalar, Inline Table-Valued (ITVF), and Multi-Statement Table-Valued (MSTVF). When you right-click on the Function node in SQL Server Management Studio, you will see the three types that can be created.

Only the ITVFs are innocent, at least as far as telling you the truth in the plan. I’ll begin by explaining it.

Inline Table-Valued UDFs

You can think of ITVFs as a parameterized view. It returns the data in a tabular format. Like any function, you can pass in parameters, and these can be hardcoded values, variables, expressions, or columns from your query. Say that you want to create a function that returns the number of units sold for a product within a set of dates. In this case, it’s written as a function, but nothing is allowed inside the function except for the SELECT statement. Note that examples in this article are from the AdventureWorks2017 database. Here is the first example. In this case, the function accepts two dates as arguments.

You can call the function in a SELECT statement like this:

The first ten rows show that the function returns the list of products and the number sold.

 

Typically, this will be used to join to other tables like this:

Here is a sample of the results:

The execution plan shows that the database engine treated the function as if it were a typical join to the two tables.

Notice that most of the effort to run the query comes from the code running in the UDF.

You can also create an ITVF that takes a value from the outer query like this:

When joining to a table-valued function that takes a parameter from the outer query, you must use the CROSS APPLY or OUTER APPLY syntax instead of the ON clause:

Since the query in the function returns a NULL row when there is not a match, the results look like this after scrolling down:

When using an ITVF, the optimizer treats the query in the UDF as if it was part of the query in the normal way. It’s not treated as a black box. Now, let’s look at what happens when you add more statements to the function.

Multi-Statement Table-Valued Functions

You have quite a bit of leeway when creating functions as long as they do not affect anything outside of their scope. That means they cannot call stored procedures, create objects, or perform DDL statements outside of populating table variables. You can, however, call other functions, set up conditional logic and even loops. Even though these functions return tabular data, they are black boxes to the optimizer and will look “innocent” in the execution plan. Here’s an example:

This function first checks to see if any rows exist for the product and dates. If not, then the table variable is populated with a zero. If there are rows for the product, then the sum is inserted. This seems like a silly thing to do, but, trust me, I have seen this and worse in the real world.

Besides replacing NULL with 0, you can see another difference in the execution plan.

Instead of seeing that the function queries two tables (twice!), it just shows up as a low-cost operator. Recall that the function took up most of the effort in the ITVF. In this case, it is even worse since the tables are queried twice.

Another thing to watch out for is the estimated number of rows. When you look at the operator’s properties, you’ll see that the estimate is 100.

This will be the case even if there are millions of rows returned by the function! Obviously, this can cause some problems when the optimizer is trying to come up with a good plan.

Microsoft has a new feature in SQL Server 2017 that helps in some cases. As long as the function can be joined in the query with INNER JOIN, in other words nothing from the outer query is used as a parameter, the table will be materialized so that the correct row count is used. This is not a panacea; you can still do some bad things with these functions.

Now, I’ll explain Scalar UDFs.

Scalar User Defined Functions

Unlike table-valued functions, scalar functions return one value. They are similar to functions used in other languages and can be used just about anywhere in the query, but most often in the SELECT list. In my opinion, they might not be too bad when they just replace a complex formula, but they can also be used to query data. Again, it’s possible to add conditional logic and looping.

Here is an example:

This function is simple enough, but the performance will be so much worse than just including the tables in the query. The worst thing I’ve seen is when the first function calls another function, which calls another function. That makes sense for procedural programming, but is a big problem for database code.

You can call the function just like many other built in functions:

Again, the execution plan doesn’t show you that the function queries the two tables, and it looks like the function took just a small effort.  

Microsoft added new functionality in 2019 to help with some scalar UDFs by inlining the function code into the SELECT list. It doesn’t work for everything though, and one exception is WHILE loops. This can make a big difference in performance for queries using scalar UDFs that return large result sets.

If you create an extended event session to capture the T-SQL statements, you’ll see that the code inside the function is called over and over.

Conclusion

While it might seem logical to add many user defined functions to T-SQL code to make queries simpler and prettier, they can be the source of performance issues. The bad thing is that it’s not easy to see the impact when using execution plans to troubleshoot. All is not lost, however, because one type, ITVFs, are like regular joins, and Microsoft continues to add functionality that can help UDFs perform better in some situations.

Kathi Kellenberger
About the author

Kathi Kellenberger is the editor of Simple Talk at Redgate and a Data Platform MVP. She has worked with SQL Server for over 20 years and has authored, co-authored, or tech edited over a dozen technical books. Kathi is the co-leader of the PASS Women in Technology Virtual Group and a volunteer at LaunchCode. When Kathi isn’t working, she enjoys spending time with family and friends, cycling, singing, and climbing the stairs of tall buildings. Be sure to check out her courses on Pluralsight.

Please login or register to post comments.

Theme picker

Back to Top
cage-aids
cage-aids
cage-aids
cage-aids