The official PASS Blog is where you’ll find the latest blog posts from PASS community members and the PASS Board. Contributors share their thoughts and discuss a wide variety of topics spanning PASS and the data community.

Load Streaming Data into Azure Synapse with Azure Stream Analytics

Once streaming data from devices, sensors, applications etc. have been loaded to an Azure Event Hub or IoT Hub, you are ready to configure the Azure Synapse database and the Stream Analytics job to load it.  This part of the series will walk you through the set up and configuration of the Azure Synapse and Stream Analytics job services.  We will run the stream job to pull data from an Event Hub and load it to Synapse while confirming the speed and efficiency of the near real-time data load.

We will be working with weather telemetry data loaded into an Event Hub.  The data consists of temperatures, wind chill, visibility, and wind speed.   Understanding the data allows us to know what the schema should be for our Synapse table that will be loaded.

First, we must create the Azure Resource for Synapse Analytics from the Azure Portal:

Portal.azure.com

Here I created the SQL Pool (data warehouse).  An Azure Synapse workspaces resource (currently in public preview) is also available.  The workspace allows for SQL Pools and other services to be created and integrated together.  When I built this PoC, Azure Synapse Streaming Jobs and Workspaces were in private preview.

Once we select create, we will configure the Azure Synapse Analytics resource as follows.  Feel free to name the data warehouse accordingly.  I am using USAccidentsDW because this data is associated with weather conditions at the time of accidents.  Be careful in selecting the performance level, Azure Synapse can get very costly at higher tiers.  It is always best to start at lower DWUs, monitor performance, and then determine if you need to scale.

From here, we can use SQL Server Management Studio to connect to the Synapse SQL Pool (Data Warehouse).  If you created an Azure Synapse Workspace and a SQL Pool, you can use Azure Synapse Studio to work with the data, develop code, manage and monitor the services.  For demo purposes, I just used the default table construct to create my schema.  When implementing a production data warehouse, your table configurations may be different based on size, functionality, and requirements to achieve proper optimization and performance.

Once you have the Synapse SQL Pool and schema created and configured, we are ready to create the Stream Analytics Job to read from the Event Hub.  This can be done many ways with different interfaces and options.  We will create it from the SQL Pool.

Once we create the Stream Analytics Job, we will configure the resource as follows.  There are 3 sections to the configuration.  The basic requires a unique name with no spaces and a storage account to stage the data. 

From the Input section, you can select an Event Hub or an IoT Hub.  We will use an Event Hub since that’s where we are sending data for this PoC.  I will be creating another blog as part of my Streaming Analytics series on setting up and sending data to Event Hubs.  

In the output section, you will configure the connection to the SQL Pool (Data Warehouse).  Even though we created our table earlier once we had our Synapse SQL Pool, you can also create your table here.

Now that we have all of the necessary services configured, we have to start the job and push data to the Event Hub so the Stream Job can pick it up.

Let’s look at the message monitor of the Event Hub to see the data loaded. 

We then have the option to query the Event Hub directly from the streaming job or the SQL Pool.  Below, we query the Stream data in the Event Hub.  We also have the ability to perform aggregations or manipulations to the data here and load it to the SQL Pool.

Finally, let’s query the SQL Pool from SSMS.  If you configured a Synapse Workspace, you can query this from Synapse Studio.

As you can see,a our Stream Job has read the Event Hub and loaded these records within 15 to 20s.  Azure Stream Analytics is a great option for loading streaming data into Azure Synapse in near real time.  You can now store and utilize this data for reporting and advanced analytics such as machine learning.

Jeremy Frye
About the author

Jeremy Frye is the manager of the Business Intelligence and Data Warehousing team at RDX. He is a Business Intelligence architect and developer with a background in database administration. Jeremy has over 9 years of experience in SQL Server. He has spoken on BI topics at many SQLSaturday's throughout the country, webinars, as well as local and remote SQL user groups. In his spare time, Jeremy often spends time trying to bridge the gap between keyboard strokes and workout reps.

Please login or register to post comments.

Theme picker

Back to Top
cage-aids
cage-aids
cage-aids
cage-aids