PASS Pro Now Available

Welcome to the next evolution of PASS. Unlock exclusive training, discounts, and networking opportunities designed to accelerate your data career. Learn More >

The official PASS Blog is where you’ll find the latest blog posts from PASS community members and the PASS Board. Contributors share their thoughts and discuss a wide variety of topics spanning PASS and the data community.

Getting Started With Big Data Clusters – Part 3

So far, we’ve deployed a Big Data Cluster and explored its Data Virtualization capabilities

Now, it’s time to look how you can use your Big Data Cluster to consume file-based data from a CSV or Parquet file. The Big Data Cluster component to be used is the so-called Storage Pool.

Read more

Getting Started With Big Data Clusters – Part 2

In the previous post, we’ve deployed a Big Data Cluster. Now it’s time to use this system and get some work done on it.

Let’s start with a feature called “Data Virtualization” which is basically PolyBase 2.0. While PolyBase (unlike other BDC features) is also available for standalone SQL Server 2019 installations, it plays an essential role in the BDC architecture. It allows you to bring in data from other data sources like SQL Server, Oracle, and Teradata. It also allows you to bring in data from SharePoint, CRM, or Cloudera through the concept of an external table.

Read more

Getting Started with Big Data Clusters – Part 1

So, you’ve heard about Big Data Clusters in SQL Server 2019 and want to get some hands-on experience? Let’s get you started today! This article will guide you step-by-step on how to deploy a Big Data Cluster in Azure Kubernetes Services (AKS) from a Windows client. The idea is to get your Big Data Cluster up and running fast without going into detail on every step involved, so don’t expect a deep dive on every component along the way.

Read more 1 Comments

What’s in my Stage? Automating Data Comparison with Biml

When you build an ETL Solution, at some point you will most likely feel the need to compare the data between your source and your staging (or datawarehouse) database. There may be various reasons for them to be out of sync, such as delta loads, aggregations or added business logic, but one day your phone will ring and whoever is on the other end will tell you that the numbers are wrong. While this doesn’t necessarily have to be an issue within your ETL process, it might just as well be the report itself. In many cases, this is a good starting point to look at.

This article focusses on SSIS as your orchestrator, but the same principles could, obviously, also be applied to Azure Data Factory for example. Also, as we want to solve this task as lightweight as possible, we will, demonstrably, use Biml to implement it!

Read more

RSS

Theme picker

Back to Top
cage-aids
cage-aids
cage-aids
cage-aids