The official PASS Blog is where you’ll find the latest blog posts from PASS community members and the PASS Board. Contributors share their thoughts and discuss a wide variety of topics spanning PASS and the data community.

What’s in my Stage? Automating Data Comparison with Biml

When you build an ETL Solution, at some point you will most likely feel the need to compare the data between your source and your staging (or datawarehouse) database. There may be various reasons for them to be out of sync, such as delta loads, aggregations or added business logic, but one day your phone will ring and whoever is on the other end will tell you that the numbers are wrong. While this doesn’t necessarily have to be an issue within your ETL process, it might just as well be the report itself. In many cases, this is a good starting point to look at.

This article focusses on SSIS as your orchestrator, but the same principles could, obviously, also be applied to Azure Data Factory for example. Also, as we want to solve this task as lightweight as possible, we will, demonstrably, use Biml to implement it!

So what is Biml?

I don’t want to get too deep into the fundamentals of Biml, but in case you’ve never heard of it: In a nutshell, Biml (Business Intelligence Markup Language) is a way to describe Microsoft BI Objects (SSIS Packages, ADF pipelines, T-SQL, SSAS Cubes etc.) through XML. In addition to static XML, it comes with an object model around those entities which can be accessed using BimlScript (using C# and VB.NET). This means, if you need to load 100 tables using a certain pattern, instead of writing 100 manual packages, you only describe the process in XML once and then feed metadata to it. Biml does the rest – automagically.

If you are working with SSIS, Azure Data Factory or just T-SQL, and regularly find yourself repeating yourself by doing the same thing over and over again, you definitely want to look into Biml! To get a full overview of the Power of Biml, The Biml Book might be the most elaborate source, but to get started, here are a few online ressources:

While this article assumes, that you’re already familiar with Biml, even if you haven’t had any Biml exposure yet, you may appreciate how we can add a lightweight logging framework to existing solutions with just one line of code.

Collecting the Data to be Compared

As mentioned previously, we’re trying to do this very lightweight so our solution will only collect numerical data resulting from aggregation functions. Whenever collecting current values, we will collect them for all configured columns and tables.

If this is not applicable, or sufficient for your environment, you may need to extend the code samples a bit.

We will need two tables: One to configure the data that we want to collect and one for the actual data we’ve collected. Just create those in a separate database (we’ll call it BimlDemo_Compare):

Let’s add some sample data to our config table. If you don’t have the AdventureWorks2017 databases yet, you can download them at https://docs.microsoft.com/en-us/sql/samples/adventureworks-install-configure.

These records assume that you have already created a Control (pointing to the database you’ve just created), AW2017_SRC (pointing to an AdventureWorks2017 database) and AW2017_DWH (pointing to an AdventureWorksDW2017 database) connection in your Biml code.

Now for the Biml part, we’ll just add a new file “CBS_GetLogContainer.biml” which can later be used with CallBimlScript:

The file will get 2 parameters: a Connection type (Source or Target), as well as a step name. It will then return a container with a dataflow for every value that should be compared, writing it’s current value as well as the name of the step, the connection type and a time stamp to the log table.

How Can I Integrate This Into My Biml Solution?

As we’re using CallBimlScript, all you need to do is call this new file, whenever you want to collect the aggregations:

A full Biml file could look like this:

How Can I Get Insights From This?

Well, that depends on your needs. If you simply want to get the latest count for each step, a simple query like this might be suifficient:

But of course, you can run any kind of report over your log table, which could also be used to illustrate how your data is growing over time.

I hope this example showed you, once again, the beauty of Biml. With just one line of code, added to your files here and there, you’re adding functionality and value that would normally be so much manual work that you’d probably never even consider it otherwise!

Any questions? Feel free to reach out, and if you’ll be a PASS Summit this year, please say hello. 😊

Ben Weissman
About the author

Ben Weissman has been working with SQL Server since SQL Server 6.5, mainly in the BI/Datawarehousing field. He is a Data Platform MVP, MCSE Data Management and Analytics, MPP Big Data, MPP AI, MPP Data Science, and MPP Data Analytics. He is also the first BimlHero Certified Expert in Germany, a co-author of The Biml Book, and a Certified Data Vault Data Modeler.

Ben has been involved in about 150 BI Projects and is always looking for ways to become more productive and make SQL Server even more fun!

Together with his team at Solisyon, Ben provides training, implementation, and consultancy for SQL/BI developers and data analysts in (upper-) mid-market companies around the globe.

Please login or register to post comments.

Back to Top
cage-aids
cage-aids
cage-aids
cage-aids