The official PASS Blog is where you’ll find the latest blog posts from PASS community members and the PASS Board. Contributors share their thoughts and discuss a wide variety of topics spanning PASS and the data community.

A Game of Hierarchies: Graph DB with SQL Server 2019

Graph DB, a feature for the SQL Server relational database engine was introduced in version 2017. With this addition Microsoft made it very easy to maintain and query graphs via enhancements to T-SQL and to allow access to the graph in combination with “normal” tables.

Before we take a look at new features available with SQL Server 2019, let’s take a look at what a graph is and how you can create a graph in SQL Server 2017.

Graph Theory 101

A graph consists of nodes and edges. Nodes are concrete entities, like “Eddard Stark”, “Winterfell”, or “Medieval Pastry Dough”, and edges connect those nodes with each other.

In my example database (which is based on George R. R. Martin’s “A Game of Thrones”) the node “Eddard Stark” is connected to the node “Robb Stark” via edge “IsFather”; the node “Winterfell” is connected with the node “King’s Landing” via edge “Route” and nodes “Medieval Pastry Dough” and “Medieval Beef and Bacon Pie” are connected via edge “Recipe” (as the dough is an ingredient for the bacon pie). There are many more nodes and you and can download theentire database from: https://AGameOfHierarchies.wordpress.com.

Graph DB with SQL 2017

Both nodes and edges are created as tables in SQL Server. You just add “AS NODE” or “AS EDGE” at the end of the “CREATE TABLE” statement:

SELECT
    LAST_VALUE(b.POIName)
        WITHIN GROUP (GRAPH PATH) AS [POIName]
    ,a.POIName + '->' + STRING_AGG(b.POIName, '->')
        WITHIN GROUP (GRAPH PATH) AS [POIPath]
    ,SUM([route].Miles)
        WITHIN GROUP (GRAPH PATH) AS [Miles]
FROM
    dbo.POI                  a,
    dbo.[Route]     FOR PATH [Route],
    dbo.POI         FOR PATH b
WHERE
    a.POIName = 'King''s Landing' and
    MATCH( SHORTEST_PATH(a(-([Route])->b)+) )
ORDER BY
    1 asc;

Both table types (nodes and edges) are getting extra columns automatically added, which are used to build up the graph. You can find more information at: https://docs.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-overview?view=sql-server-2017#create-graph-objects.

SQL Server 2017 extended the WHERE clause of a SELECT statement with MATCH. (https://docs.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-overview?view=sql-server-2017#query-language-extensions)

What’s new in SQL 2019

Shortest-Path

While in version 2017 we hade to write complex code (T-SQL loops or Common Table Expressions) to query a path within a graph, version 2019 allows us to find the shortest path with the help of the “SHORTEST_PATH” keyword in the MATCH clause. This is the single-best addition to Graph DB!

The following query lists the shortest paths from “King’s Landing” to all other reach-able nodes. To facilitate “SHORTEST_PATH” you have to add “FOR PATH” after the edge (“dbo.[Route]”) and the second node (“dbo.POI”) in the FROM clause and make use of aggregations and add “WITHIN GROUP (GRAPH PATH)” per column in the projection of SELECT.

SELECT
    LAST_VALUE(b.POIName)
        WITHIN GROUP (GRAPH PATH) AS [POIName]
    ,a.POIName + '->' + STRING_AGG(b.POIName, '->')
        WITHIN GROUP (GRAPH PATH) AS [POIPath]
    ,SUM([route].Miles)
        WITHIN GROUP (GRAPH PATH) AS [Miles]
FROM
    dbo.POI                  a,
    dbo.[Route]     FOR PATH [Route],
    dbo.POI         FOR PATH b
WHERE
    a.POIName = 'King''s Landing' and
    MATCH( SHORTEST_PATH(a(-([Route])->b)+) )
ORDER BY
    1 asc;

Edge-Constraints

All the nodes and edges defined in a database are part of the same single graph. Therefore, you can connect every node with all other nodes over every edge. However, some combinations will not make sense: Adding “Eddard Stark” as an ingredient in a recipe will most certainly be an error (as cannibalism is not part of the Game of Thrones world). To tell SQL Server what combinations are allowed we can add constraints on an edge, like on dbo.IsFather in the following example:

ALTER TABLE dbo.IsFather
ADD CONSTRAINT EC_IsFather CONNECTION (dbo.Family TO dbo.Family);

Afterwards nodes from table dbo.POI can’t be connected via edge [dbo].[IsFather] anymore:

INSERT INTO dbo.IsFather
SELECT  (SELECT $node_id FROM dbo.Family WHERE FirstName='Eddard'),
                        (SELECT $node_id FROM dbo.POI WHERE POIName='Winterfell')
Msg 547, Level 16, State 0, Line 82
The INSERT statement conflicted with the EDGE constraint "EC_IsFather". The conflict occurred in database "AGameOfHierarchies", table "dbo.IsFather".
The statement has been terminated.

Furthermore, a constraint prevents orphaned edges by preventing the deletion of connected nodes. The following statement will lead to an error as long as the node with Id 100 is used in an edge:

DELETE FROM dbo.Family WHERE Id=100;
Msg 547, Level 16, State 0, Line 73
The DELETE statement conflicted with the EDGE REFERENCE constraint "EC_IsFather". The conflict occurred in database "AGameOfHierarchies", table "dbo.IsFather".
The statement has been terminated.

Edge constraints make your graph more fool-proof. Two new system views allow you to query existing edge constraints:

SELECT
     EC.name                          AS EdgeConstraintName,
     OBJECT_NAME(EC.parent_object_id) AS EdgeTable,
     OBJECT_NAME(ECC.from_object_id)  AS FromNodeTable,
     OBJECT_NAME(ECC.to_object_id)    AS ToNodeTable,
     EC.type_desc,
     EC.create_date
FROM
     sys.edge_constraints EC
INNER JOIN
     sys.edge_constraint_clauses ECC ON
          EC.object_id = ECC.object_id;

MATCH clause on derived tables

In SQL Server 2019 a sub-query can now be referenced in MATCH. In the following code I extract all the fathers and all the mothers from the dbo.Family table (by checking if they are referenced in dbo.IsFather or dbo.IsMother) and then find the spouses they are married to via MATCH.

SELECT
            parent.FirstName, [partner].Firstname
FROM
            (
            SELECT  FirstName, 'isFather' [status] FROM dbo.Family n WHERE EXISTS (SELECT TOP 1 1 FROM dbo.IsFather e WHERE e.$from_id=n.$node_id)
            UNION ALL
            SELECT  FirstName, 'isMother' [status] FROM dbo.Family n WHERE EXISTS (SELECT TOP 1 1 FROM dbo.IsMother e WHERE e.$from_id=n.$node_id)    
            ) parent,
            dbo.isMarried,
            dbo.Family [partner]
WHERE
            MATCH(parent-(isMarried)->[partner]);

While sub-queries are allowed, we still need a workaround for common-table expressions (CTE).

MATCH clause in MERGE

You can now MERGE your current graph data with new data utilizing MATCH predicate. In the following, I add new information about who bought what in table edges.bought by either updating the attribute PurchasedCount or inserting a new edge (for the customer and product combination).

MERGE edges.bought
    USING ((SELECT @CustomerID, @ProductID) AS T (CustomerID, ProductID)
            JOIN Nodes.Customers ON T.CustomerID = Customers.CustomerID
            JOIN Nodes.StockItems ON T.ProductID = StockItems.StockItemID)
    ON MATCH (Customers-(bought)->StockItems)
WHEN MATCHED THEN
    UPDATE SET PurchasedCount = PurchasedCount + @purchased_count
WHEN NOT MATCHED THEN
    INSERT ($from_id, $to_id, PurchasedCount)
    VALUES (Customers.$node_id, StockItems.$node_id, @purchased_count);

If you want to learn more about SQL Server Graph DB, check out my session at the PASS Summit 2019 (which will be available as a recording a few weeks later).

Don’t wait and download SQL Server 2019 from https://www.microsoft.com/en-us/sql-server/sql-server-2019  today to make use of the brand new features!

Markus Ehrenmueller-Jensen
About the author

Markus Ehrenmueller-Jensen, the founder of Savory Data, has a long history of providing customer solutions in the areas of data engineering, data science, and Business Intelligence. He is a certified software engineer, a graduated business educator, a professor of Databases & Project Management at HTL Leonding, and is certified as an MCSE & MCT. He is a published author and writes articles for well-known journals. He co-founded PASS Austria and organizes SQLSaturdays in Austria. Since 2017 Markus was awarded as a Microsoft Data Platform MVP.

Please login or register to post comments.

Back to Top
cage-aids
cage-aids
cage-aids
cage-aids