SQL Server Graph Databases – Part 4: Working with hierarchical data in a graph database

The series so far:

This first three articles in this series focused on using SQL Server graph databases to work with data sets that contained relationships not easily handled in a typical relational structure, the types of relationships you might find in a social network site or product recommendation engine. One of the topics that the articles touched upon was how to work with hierarchical data and their relationships, as they applied to the FishGraph database used in the examples.

This article digs deeper into the topic of hierarchies, particularly those that contain more complex relationships than what you saw in the examples. The article first provides an overview of how to create and populate the tables necessary to support a hierarchy and then focuses on how to query hierarchical data that contains multiple types of relationships.

Limitations of the HierarchyID Data Type

Working with hierarchical data is certainly not new in SQL Server. Since the product’s early days, you could represent a hierarchical structure by including a foreign key in a table that pointed to the primary key in the same table. In this way, you were able to represent the parent-child relationships that existed within the data set.

Since the release of SQL Server 2008, you’ve been able to use the hierarchyid data type to represent a record’s hierarchical position by defining values that reflect the relationships between a table’s rows. For example, the data type makes it possible to represent how employees are organized within a company’s hierarchical structure, such as the one shown in the following figure.

Although this is a very simple example, it demonstrates the types of relationships that typically make up a basic hierarchy:

  • Dora sits at the top of the hierarchy.
  • Fred and Nalini report to Dora.
  • Rita, Filip, and Adil report to Fred.
  • Ben, Barb, Chen, Gus, and Ane report to Nalini.
  • Mao and Miguel report to Adil.
  • Don and Joyce report to Chen.

As with the above figure, a hierarchy is often represented as an inverted tree structure in which all branches lead to a common root, in this case, Dora.

Unfortunately, not all hierarchies are this simple. For example, an employee might work part time in two positions, which means reporting to two different managers. Or an employee might report to one manager but actually work for another, that is, the employee is managed by someone different from whom the employee reports to.

A hierarchy that reflects real-world situations will likely be a lot less straightforward that the one shown in the previous figure. Even an example as simple as this one can end up being a lot more complex after adding a few exceptions, as shown in the following figure.

Although many of the basic relationships are the same, the hierarchy now includes people who report to more than one manager or who report to one manager but work for another:

  • Dora sits at the top of the hierarchy.
  • Fred and Nalini report to Dora.
  • Rita, Filip, and Adil report to Fred, but Rita works for Ben.
  • Ben, Barb, Chen, Gus, and Ane report to Nalini, but Gus and Ane work for Ben.
  • Mao and Miguel report to both Adil and Chen, working half-time in each position.
  • Don and Joyce report to Chen.

The hierarchyid data type is not equipped to handle anything that does not fit neatly into a basic structure. And even if the hierarchy does fit this model, too many levels can still be difficult to manipulate and maneuver. Graph databases, on the other hand, are made for more complex hierarchies, regardless of the number of levels or types of relationships, making it easier to accommodate the hierarchy, rather than trying to force the hierarchy into a relational structure.

Defining the Graph Tables

The previous articles in this series used the FishGraph database to demonstrate various graph concepts. This article also uses the database, but only to add tables based on the employee hierarchy shown in the above figure. The tables are unrelated to the existing FishGraph tables, so if you’re trying out these examples, you can use any database you want. That said, including the tables in the FishGraph database opens up the potential for creating relationships between the new and original tables with very little effort, should you be inclined to do so.

To implement the employee hierarchy, you need to create and populate three tables, one node table to store the list of employees and two edge tables to store the reports to and works for relationships defined between those nodes.

The first table, FishEmployees, is a node table that contains the ID and first name of each employee, along with the auto-defined $node_id column. The following T-SQL script creates and populates that table:

USE FishGraph;
GO
DROP TABLE IF EXISTS FishEmployees;
GO
CREATE TABLE FishEmployees (
  EmpID INT IDENTITY PRIMARY KEY,
  FirstName NVARCHAR(50) NOT NULL
) AS NODE;
INSERT INTO FishEmployees (FirstName) VALUES
('Fred'), ('Rita'), ('Filip'), ('Adil'), ('Dora'),
('Mao'), ('Miguel'), ('Nalini'), ('Ben'), ('Barb'),
('Chen'), ('Gus'), ('Ane'), ('Don'), ('Joyce');

We won’t spend too much time on how to create and populate graph tables because these concepts are covered extensively in the first article. The main point to keep in mind is that you must include the ASNODE clause when creating a node table and the ASEDGE clause when creating an edge table. The database engine will then add the necessary auto-defined columns.

The next table is ReportsTo, an edge table that records each reports to relationship between employees. You can create and populate the table without including any user-defined columns, as shown in the following T-SQL script:

DROP TABLE IF EXISTS ReportsTo;
GO
CREATE TABLE ReportsTo AS EDGE;
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 1), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 5));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 2), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 1));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 3), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 1));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 4), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 1));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 6), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 4));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 7), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 4));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 8), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 5));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 9), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 8));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 10), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 8));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 11), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 8));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 6), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 11));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 7), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 11));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 12), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 8));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 13), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 8));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 14), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 11));
INSERT INTO ReportsTo ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 15), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 11));

Each INSERT statement retrieves the $node_id values from the FishEmployees table in order to provide the relationship’s originating and terminating nodes (the $from_id and $to_id values, respectively). You can take a similar approach for the third table, FishEmployees, which records each works for relationship between employees:

DROP TABLE IF EXISTS WorksFor;
GO
CREATE TABLE WorksFor AS EDGE;
INSERT INTO WorksFor ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 2), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 9));
INSERT INTO WorksFor ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 12), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 9));
INSERT INTO WorksFor ($from_id, $to_id) VALUES (
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 13), 
  (SELECT $node_id FROM FishEmployees WHERE EmpID = 9));

That’s all there is to defining the employee hierarchy shown in the above figure. Although this is a simple example, it demonstrates many of the concepts associated with building a more complex hierarchy that contains numerous levels and multiple types of relationships. For a single hierarchy, you need a node table for the primary entities and an edge table for each type of relationship within that hierarchy. In this way, you can define as many types of relationships as necessary and define multiple relationships within a single type that share the same child node but different parent nodes. (If you have any questions about creating or populating graph tables, refer back to the first article in this series.)

Returning employee data

Querying graph tables that support a hierarchy is just like querying any tables in a graph database, something we covered in detail in the second article. For example, to view all the reports to relationships in the employee hierarchy, you can run the following SELECT statement:

SELECT emp1.FirstName Employee, emp2.FirstName Manager
FROM FishEmployees emp1, ReportsTo, FishEmployees emp2
WHERE MATCH(emp1-(ReportsTo)->emp2)
ORDER BY Employee, Manager;

The statement uses the MATCH function to return a list of employees and the people they report to. Because the originating and terminating nodes reside in the same table, you must specify the table twice in the FROM clause, providing a different alias for each instance. You can then use those aliases in the MATCH clause, along with the name of the edge table, to specify the relationship employee 1 reports to employee 2. The SELECT statement returns the results shown in the following figure.

You can take a similar approach to view all the works for relationships:

SELECT emp1.FirstName Employee, emp2.FirstName Manager
FROM FishEmployees emp1, WorksFor, FishEmployees emp2
WHERE MATCH(emp1-(WorksFor)->emp2)
ORDER BY Employee, Manager;

This time the query returns the results shown in the next figure, which indicate that three employees work for Ben.

So far, this is all fairly basic. The MATCH function makes it easy to find the relationships defined between nodes, whether those nodes are in the same table or in different tables. To return more specific information, you can modify the WHERE clause to include additional conditions. For example, the following SELECT statement returns the employee ID and name of the manager that Barb reports to:

SELECT emp2.EmpID MgrID, emp2.FirstName Manager
FROM FishEmployees emp1, ReportsTo, FishEmployees emp2
WHERE MATCH(emp1-(ReportsTo)->emp2)
  AND emp1.FirstName = 'barb';

When adding this type of condition to the WHERE clause, be sure to reference the correct node. In this case, you should use the emp1 table alias, rather than emp2, because you need to reference the originating node, not the terminating one. The SELECT statement returns the results shown in the following figure.

As you can see, retrieving the name of an employee’s manager is fairly straightforward. However, not all queries are this simple. For example, if you want to know which employees report to more than one manager, you must take a different approach. One solution is to create a common table expression (CTE) that retrieves the node IDs of the employees with more than one manager. You can then use the CTE as one of the conditions in the WHERE clause of the outer SELECT statement to return those relationships:

WITH rt AS
(
  SELECT $from_id FromID
  FROM ReportsTo
  GROUP BY $from_id
  HAVING COUNT(*) > 1
)
SELECT emp1.FirstName Employee, emp2.FirstName Manager
FROM FishEmployees emp1, ReportsTo, FishEmployees emp2
WHERE MATCH(emp1-(ReportsTo)->emp2)
  AND ReportsTo.$from_id IN (SELECT FromID FROM rt)
ORDER BY Employee, Manager;

The CTE retrieves the $from_id values from the ReportsTo table, using the GROUPBY and HAVING clauses to return only those IDs that include multiple instances. The WHERE clause in the outer SELECT statement then uses an IN expression to compare the $from_id values to those returned by the CTE, giving us the results shown in the following figure.

As you can see, the results show the two employees, Mao and Miguel, and their managers, Adil and Chen. Now suppose you want to view the name of employees who report to one manager, but work for another manager. In this case, you don’t need to use a CTE because the MATCH function can give you the information you need:

SELECT emp1.FirstName Employee, 
  emp2.FirstName ReportsTo, emp3.FirstName WorksFor
FROM FishEmployees emp1, ReportsTo, FishEmployees emp2, 
  WorksFor, FishEmployees emp3
WHERE MATCH(emp2<-(ReportsTo)-emp1-(WorksFor)->emp3)
ORDER BY Employee;

As you’ll recall from the second article, you can point the MATCH relationships in either direction. In this case, the function includes two relationships with both of them originating with the emp1 node. The first relationship is a reports to relationship that terminates with the emp2 node. The second relationship is a works for relationship that terminates with the emp3 node. As a result, the SELECT statement will return only those rows in which an employee both reports to an individual and works for an individual, as shown in the following results.

As expected, only three employees fit this scenario. If you instead want to view only those employees who work for Ben, you can simplify your query even further:

SELECT emp1.EmpID, emp1.FirstName Employee
FROM FishEmployees emp1, WorksFor, FishEmployees emp2
WHERE MATCH(emp1-(WorksFor)->emp2)
  AND emp2.FirstName = 'Ben';

In this case, the SELECT statement returns the ID and name of each employee who works for Ben. To get this data, you need include only a simple MATCH relationship and a second WHERE clause condition limiting the relationships to those that terminate with Ben, giving you the following results.

You can do something similar to find the employees who report to Nalini:

SELECT emp1.EmpID, emp1.FirstName Employee
FROM FishEmployees emp1, ReportsTo, FishEmployees emp2
WHERE MATCH(emp1-(ReportsTo)->emp2)
  AND emp2.FirstName = 'Nalini';

Now the SELECT statement returns the results shown in the next figure.

The flexibility of the MATCH clause, along with the ability to use CTEs when necessary, allows you to return different types of data sets from a hierarchy. It might take some trial-and-error to get it right, but eventually you should be able to retrieve the results you want. In some cases, however, the MATCH clause—with or without a CTE—will not return the data you need, and you’ll have to look to other strategies.

Bumping up against graph limitations

Because the graph features are so new to SQL Server, it’s not surprising that they have several limitations. For example, you cannot use the MATCH function on derived tables, which means you cannot use the function in the recursive member of a CTE. Graph databases also don’t support transitive closure—the ability to search recursively through graph tables beyond the first level.

To get around these limitations, you must use more traditional T-SQL. For example, you can retrieve a list of employees who report directly or indirectly to a specific manager by creating a recursive CTE, without using the MATCH function:

WITH emp AS
(
  SELECT $node_id NodeID, FirstName Employee, 
    CAST(NULL AS NVARCHAR(50)) Manager
  FROM FishEmployees
  WHERE FirstName = 'Nalini'
  UNION ALL
  SELECT fe.$node_id, fe.FirstName Employee, emp.Employee Manager
  FROM FishEmployees fe INNER JOIN ReportsTo rt
      ON fe.$node_id = rt.$from_id 
   INNER JOIN emp
      ON rt.$to_id = emp.NodeID
)
SELECT Employee, Manager FROM emp
WHERE Manager IS NOT NULL;

The CTE uses the UNIONALL operator to recurse through the employees and determine who reports to Nalini. The first SELECT statement establishes the form of the result set, based on the Nalini record, and provides the starting point for recursing through the data. The Manager column acts as a placeholder for the corresponding column returned by the second SELECT statement, which joins the FishEmployees and ReportsTo tables to the emp CTE to provide the recursive element. The outer SELECT statement then retrieves the data from the CTE, filtering out the Nalini record, as shown in the following results.

As you can see, the first five employees report directly to Nalini, and the remaining four employees report directly to Chen, who is one of the five who reports to Nalini.

You can also use a recursive CTE to return a list of employees and the levels, or tiers, in which those employees are positioned within the hierarchy:

WITH emp AS
(
  SELECT $node_id NodeID, FirstName Employee, 
    CAST('N/A' AS NVARCHAR(50)) Manager, 1 AS Tier
  FROM FishEmployees
  WHERE FirstName = 'Dora'
  UNION ALL
  SELECT fe.$node_id, fe.FirstName Employee, emp.Employee Manager, 
    (Tier + 1) AS Tier
  FROM FishEmployees fe INNER JOIN ReportsTo rt
      ON fe.$node_id = rt.$from_id 
    INNER JOIN emp
      ON rt.$to_id = emp.NodeID
)
SELECT Employee, Tier, Manager 
FROM emp
ORDER BY Tier, Manager, Employee;

This time around, the first SELECT statement in the CTE returns a row for Dora, who sits at the top of the hierarchy. The statement also returns an additional column, Tier, which is assigned a value of 1 to represent the first tier. The second SELECT statement then adds 1 to the Tier value with each recursion. Most of the other T-SQL elements are the same as in the preceding example, except that now the results look much different, as shown in the following figure.

You can also use this statement structure to return a single tier of employees, in this case, the third tier:

WITH emp AS
(
  SELECT $node_id NodeID, FirstName Employee, 
    CAST('N/A' AS NVARCHAR(50)) Manager, 1 AS Tier
  FROM FishEmployees
  WHERE FirstName = 'Dora'
  UNION ALL
  SELECT fe.$node_id, fe.FirstName Employee, emp.Employee Manager, 
    (Tier + 1) AS Tier
  FROM FishEmployees fe INNER JOIN ReportsTo rt
      ON fe.$node_id = rt.$from_id 
	INNER JOIN emp
      ON rt.$to_id = emp.NodeID
)
SELECT Employee, Manager 
FROM emp
WHERE Tier = 3
ORDER BY Tier, Manager, Employee;

The most important difference here, when compared to the preceding example, is that the WHERE clause now specifies which tier to return, giving you the results shown in the following figure.

As you can see, only the employees in the third tier are included in the results, along with the names of their managers, Fred and Nalini. You can, of course, come up with other ways to slice and dice the data, depending the type of information you’re trying to retrieve.

Viewing a Hierarchy

There might be times when you want to get a less ‘recursive’ view of the data so you can see everyone in the management chain under a specific manager in a single view. To demonstrate how to do this, you can take your cue from Phil Factor’s informative article SQL Server Closure Tables, where he describes how to use closure tables to represent hierarchies.

To make his approach work on the FishEmployees hierarchy, you should create a function that retrieves the ID and name of each employee who reports to a specific manager. The function should also assign the tier level to each employee, relative to that manager, as shown in the following example:

DROP FUNCTION IF EXISTS GetEmployees;
GO
CREATE FUNCTION GetEmployees (@empid int)  
RETURNS TABLE  
AS  
RETURN   
(  
WITH emp AS
(
  SELECT $node_id NodeID, EmpID, FirstName Employee, 0 AS Tier
  FROM FishEmployees
  WHERE EmpID = @empid
  UNION ALL
  SELECT fe.$node_id, fe.EmpID, fe.FirstName Employee, (Tier + 1) AS Tier
  FROM FishEmployees fe INNER JOIN ReportsTo rt
      ON fe.$node_id = rt.$from_id 
   INNER JOIN emp ON rt.$to_id = emp.NodeID
)
SELECT EmpID, Employee, Tier FROM emp
);  
GO

The function takes a single parameter, the manager’s employee ID, and returns all the employees who report directly or indirectly to that manager. You will run the function for each employee so you have a mapping of the entire employee chain, as you’ll see shortly.

The challenge with this approach is that it does not handle employees who report to more than one manager, as is the case with the FishEmployees hierarchy. For this example, you can run the following two DELETE statements to remove the conflict, keeping in mind that ultimately you would have to include the logic necessary to handle this situation:

DELETE ReportsTo
WHERE $from_id = (SELECT $node_id FROM FishEmployees WHERE EmpID = 6)
  AND $to_id = (SELECT $node_id FROM FishEmployees WHERE EmpID = 11);
DELETE ReportsTo
WHERE $from_id = (SELECT $node_id FROM FishEmployees WHERE EmpID = 7)
  AND $to_id = (SELECT $node_id FROM FishEmployees WHERE EmpID = 11);

Next, you should create a temporary table, retrieve the first employee ID from the FishEmployees table, and then run a WHILE loop that uses the function to populate the table:

DROP TABLE IF EXISTS #temp;
GO
CREATE TABLE #temp(
  MgrID INT,
  MgrName VARCHAR(50),
  EmpID INT,
  EmpName NVARCHAR(50),
  Tier INT);
DECLARE  @mgrid INT = 
  (SELECT MIN(EmpID) FROM FishEmployees);
WHILE @mgrid IS NOT NULL
BEGIN
  DECLARE @mgrname NVARCHAR(50) = 
    (SELECT FirstName FROM FishEmployees WHERE EmpID = @mgrid);
  INSERT INTO #temp
  SELECT DISTINCT @mgrid, @mgrname, EmpID, Employee, Tier 
    FROM GetEmployees(@mgrid);
  SELECT @mgrid = MIN(EmpID) FROM FishEmployees WHERE EmpID > @mgrid;
END;

The WHILE loop runs the GetEmployees function for each employee ID until there is none left, adding the returned data to the #temp table with each iteration.

The #temp table essentially acts as the type of closer table described by Phil Factor. I’ve included the employee and manager names in the results as an easy way to verify the data, but they’re not necessary to the final product. One thing to note about this process is that the tier levels assigned to each employee are relative to the specific employee being called when the function runs, rather than applying universally across the entire hierarchy. This is essential to creating a proper closure table.

Once the #temp table is populated with the employee data, you can run a SELECT statement that retrieves the management chain by using a combination of self-joins and careful sorting:

SELECT STUFF((SELECT '<-' + fe.FirstName
  FROM #temp t1 INNER JOIN #temp t2 ON t2.EmpID = t1.EmpID
    INNER JOIN FishEmployees fe ON fe.EmpID = t2.MgrID
  WHERE t1.MgrID = t.MgrID  
    AND t1.EmpID <> t1.MgrID
    AND t1.EmpID = t.EmpID
  ORDER BY t2.Tier DESC
  FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS ReportTrail
FROM #temp t
WHERE t.MgrID = 
  (SELECT EmpID FROM FishEmployees 
    WHERE FirstName LIKE 'Dora') 
  AND t.EmpID <> t.MgrID
ORDER BY ReportTrail;

The query creates the self-joins within a subquery, specifying several WHERE conditions to limit the results. The subquery uses the FORXML output type, along with the value function available to the XML data type, to get the data in a linear format. The outer WHERE clause applies additional filters, along with another ORDERBY clause, providing the results shown in the following figure.

For more information about how the closure table and query work, refer to the Phil Factor article, which goes into far more detail about the different ways you can use closure tables with hierarchical data.

Working with hierarchical relationships

SQL Server graph databases have a number of limitations, especially when it comes to the types of advanced querying available to more mature graph technologies. As a result, you might find yourself having to revert to the sort of workarounds demonstrated in the last section. However, because the SQL Server graph database features are based on traditional relational tables, you can usually figure out some way to get the data you need, even if that data lies within a complex hierarchical structure. You might have to work a bit to return the desired results, but eventually you should be able to get at the correct information.

The post SQL Server Graph Databases – Part 4: Working with hierarchical data in a graph database appeared first on Simple Talk.

SQL Server Graph Databases – Part 3: Modifying Data in a Graph Database

SQL Server 2017 makes it possible to implement graph databases within a relational table structure, allowing you define complex relationships between data much easier than with traditional approaches. Under this model, nodes (entities) and edges (relationships) are implemented as tables in a user-defined database, with the graph features integrated into the database engine. As a result, you can use familiar T-SQL statements to work with the graph tables, and you can use the graph features in conjunction with other SQL Server components and tools, such as columnstore indexes, Machine Learning Services, and SQL Server Management Studio (SSMS).

This article is the third in a series that covers different aspects of SQL Server graph databases. In the first two articles, you learned how to create, populate and query node and edge tables. In this article, you’ll learn how to delete and update graph data, as well as how to insert data in ways we have not yet covered.

Like the previous articles, this article includes a number of examples that target the FishGraph database, which is based on a fictitious fish-lovers forum. The database includes the FishSpecies, FishLover, and FishPost node tables and the Likes, Posts, and LinksTo edge tables. The following figure shows the data model used to build the database.

If you worked through the first two articles, this should all look familiar to you. The rectangles represent the nodes, and the arrows connecting the nodes represent the edges, with the arrows pointing in the direction of the relationships. You can download the T-SQL script used to create and populate the database at the bottom of this article.

Because we covered how to insert data in the first article, we’ll start this article with how to delete graph data and then move on to some new tricks for inserting data. We’ll leave data updates till the end because they’re not quite as straightforward as with inserting or deleting data.

Deleting Graph Data

For the most part, deleting data from a node or edge table is just like deleting data from any other table. Things only get sticky if you base your deletion on any of the auto-defined columns (node_id, $edge_id, $from_id, or $to_id), but even then, it’s not all that complicated.

Before we get into the nitty gritty of data deletion, first insert a row into the Likes table:

INSERT INTO Likes ($from_id, $to_id) VALUES (
    (SELECT $node_id FROM FishLover WHERE FishLoverID = 2), 
    (SELECT $node_id FROM FishPost WHERE PostID = 3));

The INSERT statement adds the relationship fish lover 2 likes fish post 3. You can verify that the relationship has been properly created by running the following SELECT statement:

SELECT Lover.Username, Post.Title
  FROM FishLover Lover, Likes, FishPost Post
  WHERE MATCH(Lover-(Likes)->Post)
    AND Lover.FishLoverID = 2
    AND Post.PostID = 3;

The SELECT statement uses the MATCH function to return only those rows in which a fish lover likes a fish post. The other two WHERE clause conditions refine the search to the specific user and post, giving us the following results.

I won’t go into much detail about the INSERT and SELECT statements because I covered how to add and query data in the first two articles, at least to the extent that these statements are used here. If you have any questions about what’s going on, be sure to refer back to those articles.

With that in mind, let’s look at how to delete the relationship you just added. One approach is to use subqueries in the WHERE clause to retrieve the $node_id values from the originating and terminating nodes (to try out the DELETE statements, just rerun the INSERT statement as needed):

DELETE Likes
  WHERE $from_id = (SELECT $node_id FROM FishLover WHERE FishLoverID = 2)
    AND $to_id = (SELECT $node_id FROM FishPost WHERE PostID = 3);

Because the deletion is based on the values in the $from_id and $to_id columns in the Likes table, you need to compare them to $node_id values from the originating and terminating records. As you’ll recall from the first article, the database engine automatically generates the $node_id values in the node tables, creating each one as a JSON string that provides the type (node or edge), schema, table, and a BIGINT value unique to each row.

Of course, you can pass in the $node_id values as literal stings in the WHERE clause, just like you can pass in a literal string if comparing the $edge_id column (which we’ll get to shortly). However, building the JSON strings is not always as convenient as constructing subqueries. The key to using subqueries is being able to target the correct records in the originating and terminating nodes. In this case, we’re able to use the FishLoverID and PostID values from the node tables.

However, this is not the only approach you can take to deleting the data. You can also use the MATCH function to specify which relationship to delete, as shown in the following example:

DELETE Likes
  FROM FishLover Lover, Likes, FishPost Post
  WHERE MATCH(Lover-(Likes)->Post)
    AND Lover.FishLoverID = 2
    AND Post.PostID = 3;

As you’ll recall from the second article, the MATCH function lets you define a search pattern based on the relationships between nodes. You can use the function only in the WHERE clause when querying a node or edge table. In this case, the function filters out all rows except for those in which a fish lover likes a fish post. The other WHERE clause conditions ensure that that the correct relationship is deleted.

You can achieve the same results by joining the tables and foregoing the MATCH function altogether:

DELETE lk
  FROM Likes lk INNER JOIN FishLover fl
      ON lk.$from_id = fl.$node_id
    INNER JOIN FishPost fp
      ON lk.$to_id = fp.$node_id
  WHERE fl.FishLoverID = 2 AND fp.PostID = 3;

All we’re doing here is joining the two node tables and one edge table. The FROM clause joins the tables based on $node_id values, and the WHERE clause filters the data based on the FishLoverID and PostID values. Although this statement is a bit more cumbersome that the preceding example, it uses a traditional T-SQL statement to delete the data, an approach that’s familiar to many.

System Functions for Graph Database

As mentioned above, you can also delete a relationship from an edge table by comparing the $edge_id column to a literal JSON value, as shown in the following example:

DELETE Likes
  WHERE $edge_id = '{"type":"edge","schema":"dbo","table":"Likes","id":15}';

This approach works fine as long as you don’t mind having to construct a JSON snippet every time you want to delete a row. However, SQL Server 2017 also comes with a set of built-in system functions that help simplify this process. For example, you can use the EDGE_ID_FROM_PARTS function to construct the edge ID based on the SQL Server object ID and graph ID, as shown in the following DELETE statement:

DECLARE @obj INT = OBJECT_ID('dbo.Likes');
  DELETE Likes
  WHERE $edge_id = EDGE_ID_FROM_PARTS(@obj, 15);

The object ID is the unique integer assigned to each object in a SQL Server database. The graph ID is the unique integer assigned to each $edge_id value (the id element). The id value works much like the values assigned to an IDENTITY column, with the value incremented by one with each insertion.

For this example, I used the OBJECT_ID function to retrieve the object ID for the Likes table and then assigned that value to the @obj variable, which I then passed in as the first argument when calling the EDGE_ID_FROM_PARTS function. You do not need to do this in a variable, but it can help to make the code more readable.

For the second argument, I passed in a value of 15 as the id portion of the JSON value. The key to make sure you’re passing in the correct id value. Otherwise, that’s all there is to using the EDGE_ID_FROM_PARTS function, and the other graph functions are just as basic. SQL Server 2017 includes six graph functions in all, as described in the following table.

Built-in function Description

OBJECT_ID_FROM_NODE_ID

Extracts the object ID from a $node_id value.

GRAPH_ID_FROM_NODE_ID

Extracts the graph ID from a $node_id value.

NODE_ID_FROM_PARTS

Constructs a JSON node ID from an object ID and graph ID.

OBJECT_ID_FROM_EDGE_ID

Extracts the object ID from an $edge_id value.

GRAPH_ID_FROM_EDGE_ID

Extracts the graph ID from an $edge_id value.

EDGE_ID_FROM_PARTS

Constructs a JSON edge ID from an object ID and graph ID.

Microsoft currently provides little information about these functions and offers no examples, but if you play around with them, you’ll quickly get a sense of how they work. No doubt we’ll eventually see more information about each function, and we might even see additional functions. Until then, the existing ones represent a good starting point, not only for deleting data, but also for building other types of queries, as you’ll see in the next section.

Revisiting Data Inserts and Data Queries

In the first article in this series, you saw several examples of how to insert data into node and edge tables. Inserting data into a node table works just like any other SQL Server table. You specify the target columns and their values and leave it to the database engine to populate the $node_id column.

If you want to remove the new row so you can try different ways to insert the same data, you can use the following DELETE statement as needed, specifying the correct FishLoverID and PostID values:

DELETE Likes
  FROM FishLover Lover, Likes, FishPost Post
  WHERE MATCH(Lover-(Likes)->Post)
    AND Lover.FishLoverID = 2
    AND Post.PostID = 3;

Edge tables are similar in this respect, except that the database provides values for the $edge_id column. However, edge tables also include the $from_id and $to_id columns, which you must manually populate when you insert data. One way to do this is to use subqueries that retrieve the $node_id values from the originating and terminating node tables, as you saw earlier in the article (duplicated here for your convenience):

INSERT INTO Likes ($from_id, $to_id) VALUES (
    (SELECT $node_id FROM FishLover WHERE FishLoverID = 2), 
    (SELECT $node_id FROM FishPost WHERE PostID = 3));

Although this approach works fine, you can instead use the NODE_ID_FROM_PARTS function to construct the $node_id values when inserting the data:

DECLARE @obj1 INT = OBJECT_ID('dbo.FishLover');
  DECLARE @obj2 INT = OBJECT_ID('dbo.FishPost');
  INSERT INTO Likes ($from_id, $to_id) VALUES (
    NODE_ID_FROM_PARTS(@obj1, 1), 
    NODE_ID_FROM_PARTS(@obj2, 2));

The example first obtains the object IDs from the relationship’s originating and terminating node tables and saves the IDs to the @obj1 and @obj2 variables, which are then used when calling the NODE_ID_FROM_PARTS function. (Again, you do not need to use variables.)

For the second function argument, you should use the graph IDs from the $node_id values in the originating and terminating node tables. These values are different from the FishLoverID and PostID columns because SQL Server starts at 0 when assigning the graph ID to a $node_id or $edge_id value.

You can verify that the relationship has been correctly added to the Likes table by running the same query as earlier:

SELECT Lover.Username, Post.Title
  FROM FishLover Lover, Likes, FishPost Post
  WHERE MATCH(Lover-(Likes)->Post)
    AND Lover.FishLoverID = 2
    AND Post.PostID = 3;

If everything is working as expected, you should see the results shown in the following figure.

Up to this point in the series, the example INSERT statements left it to the database engine to generate the $node_id and $edge_id values; however, you can specify those values when inserting data. For example, the following T-SQL retrieves the object ID for the FishPost table and uses it to construct the $node_id value for the new row:

DECLARE @obj INT = OBJECT_ID('dbo.FishPost');
  INSERT INTO FishPost ($node_id, Title, MessageText) VALUES
  (NODE_ID_FROM_PARTS(@obj, 106), 'A day in the life of fish', 
    'Donec pede justo, porttitor eu, consequat vitae, eleifend ac, enim.');

By taking this approach, you can specify what integer value to assign as the graph ID used to build the node ID (in this case, 106). This can be handy if you’re pulling data in from another source and want to use the existing ID assigned to each row as the graph ID. Just be sure to use a unique value for each graph ID, or the database engine will return an error.

Also note that SQL Server documentation briefly mentions that you cannot insert data into the $node_id or $edge_id column of a graph table. If you do, according to Microsoft, you’ll receive an error. This has not been my experience. I’ve had no problem inserting the ID values, as long as I pass them in using the correct JSON form. Because graph databases are so new to SQL Server, it’s not surprising to run into these types of inconsistencies.

With that in mind, you can verify that the row has been inserted into the FishPost statement by running the following SELECT statement:

SELECT GRAPH_ID_FROM_NODE_ID($node_id) AS NodeID, PostID, Title 
  FROM FishPost;

Notice that the SELECT clause uses the GRAPH_ID_FROM_NODE_ID function to return only the graph ID from each $node_id value, making it easier to confirm the value we just added. The following figure shows what the FishPost table should look like at this point.

Suppose you now want to return only the row that contains the last graph ID value inserted into the table. This can be a little tricky because SQL Server sometimes treats this type of table as though it includes two IDENTIY columns, in this case, $node_id and PostID, even though a SQL Server table can theoretically include with only one such column.

One approach you might consider to return the row is to use the @@IDENTITY system function in your WHERE clause:

SELECT GRAPH_ID_FROM_NODE_ID($node_id) AS NodeID, PostID, Title 
  FROM FishPost
  WHERE GRAPH_ID_FROM_NODE_ID($node_id) = 
    (SELECT @@IDENTITY);

The @@IDENTITY function returns the last identity value inserted into an identity column in the current session, regardless of where the value was inserted. In this case, the function is used to compare the graph ID to the last inserted identity value. If the values match, the row is returned, as shown in the following figure.

Because the SELECT statement returned the expected row, you know that the @@IDENTITY function returned a value of 106, the last inserted graph ID. (Note that you might see a different value for the PostID column if you inserted other data in the FishPost table.) You can verify that 106 is being returned by using the following statement:

SELECT @@IDENTITY;

Assuming you didn’t run any other INSERT statements in your last session, you should receive the correct value.

You might also consider using the IDENT_CURRENT function rather than the @@IDENTITY function because you can target a specific table or view and it’s not limited to the current session. When calling the IDENT_CURRENT function, you must provide the name of the target table, as shown in the following example:

SELECT GRAPH_ID_FROM_NODE_ID($node_id) AS NodeID, PostID, Title 
  FROM FishPost
  WHERE GRAPH_ID_FROM_NODE_ID($node_id) = 
    (SELECT IDENT_CURRENT('FishPost'));

Unfortunately, this approach generates no results because, in this case, the database engine returns the last value inserted into the PostID column. You can confirm this by running the following SELECT statement:

SELECT IDENT_CURRENT('FishPost')

The statement returns a value of 6 rather than 106, which is why the previous SELECT statement returns no rows.

When I tried all this out, I expected the @@IDENTITY and IDENT_CURRENT functions to return the same value and could find nothing in the Microsoft fine print to suggest why they might be different. Things got even stranger when I created a node table that did not include an IDENTITY column. This time both the @@IDENTITY function and IDENT_CURRENT function returned a NULL value after I inserted a row. Next, I created a table that included the IDENTITY column but inserted a row without specifying the $node_id value. In this case, both functions returned a value of 1. In addition, my results were always the same whether or not I called the identify functions within the same batch as the INSERT statement.

Apparently, Microsoft still has some details to work out with graph databases, or I’m missing something important about how identity functions work. Just in case it’s not me, you might want to proceed with caution when using identity functions in conjunction with graph databases.

In the meantime, it you want to delete the row you just added, you can use the GRAPH_ID_FROM_NODE_ID function to extract the graph ID when comparing it to the value 106:

DELETE FishPost WHERE GRAPH_ID_FROM_NODE_ID($node_id) = 106;

Clearly, the graph-related functions can come in handy at times, depending on the type of queries you’re trying to perform.

Data Updates Not Allowed

Unfortunately, the process of updating data in a graph database is not as straightforward as adding or deleting data. That’s not to say you can’t perform any updates, you just can’t perform them on the auto-defined graph columns ($node_id, $edge_id, $from_id, and $to_id).

To test this out, start with the following INSERT statement, which adds a row to the FishPost table:

DECLARE @obj INT = OBJECT_ID('dbo.FishPost');
  INSERT INTO FishPost ($node_id, Title, MessageText) VALUES
  (NODE_ID_FROM_PARTS(@obj, 107), 'Another day, another fish', 
    'Donec pede justo, porttitor eu, consequat vitae, eleifend ac, enim.');

If you now want to modify data in one of the user-defined columns, you can run an UPDATE statement just like you would on any table. For example, the following UPDATE statement modifies the title of the row you just inserted:

UPDATE FishPost
  SET Title = 'Another data of fishing'
  WHERE GRAPH_ID_FROM_NODE_ID($node_id) = 107;

If you were to query this row, you would see that the title has been updated. However, suppose you now want to update the $node_id value by changing the graph ID to 207:

UPDATE FishPost
  SET $node_id = NODE_ID_FROM_PARTS(OBJECT_ID('FishPost'), 207)
  WHERE GRAPH_ID_FROM_NODE_ID($node_id) = 107;

This time, the UPDATE statement returns the following error on my system:

The same goes for trying to update the $edge_id value in an edge table:

UPDATE Likes
  SET $edge_id = EDGE_ID_FROM_PARTS(OBJECT_ID('Likes'), 216)
  WHERE GRAPH_ID_FROM_EDGE_ID($edge_id) = 16;

The statement generates the same type of error as the previous statement, and the same limitation holds true for the $from_id and $to_id columns, even though these are not computed columns and you provide the values yourself. For example, the following UPDATE statement tries to update a relationship in the Likes table:

UPDATE Likes
  SET $from_id = NODE_ID_FROM_PARTS(OBJECT_ID('FishLover'), 3)
  WHERE GRAPH_ID_FROM_EDGE_ID($edge_id) = 16;

Once again, you get the same error. In fact, the only way you can update an auto-defined graph column is to first delete the applicable record and then insert a new record with the correct data, as shown in the following example:

DELETE FishPost
  WHERE GRAPH_ID_FROM_NODE_ID($node_id) = 107; 
  INSERT INTO FishPost ($node_id, Title, MessageText) VALUES
  (NODE_ID_FROM_PARTS(OBJECT_ID('dbo.FishPost'), 207), 'Another day, another fish', 
    'Donec pede justo, porttitor eu, consequat vitae, eleifend ac, enim.');

Although the graph ID value is the only data that is changing, the entire record must be deleted and a new one added. To confirm that the record has been updated, you can use the following SELECT statement:

SELECT GRAPH_ID_FROM_NODE_ID($node_id) AS NodeID, PostID, Title 
  FROM FishPost
  WHERE GRAPH_ID_FROM_NODE_ID($node_id) = 207;

The statement returns the results shown in the following figure, which indicates a NodeID value of 207.

As long as you know how to delete and insert records, you should have no problem performing your updates. Although this approach adds a bit of complexity to the process, it’s not too bad overall. Just be sure you delete and add the data within a single transaction.

Modifying Data in a Graph Database

For the most part, working with graph databases is fairly straightforward, once you figure out the basics of querying node and edge tables. The graph-related functions can help, but so can understanding how these tables work together to define complex relationships.

Be aware, however, that the graph database features do not include a mechanism for validating relationships. For example, you can create relationships where they don’t belong, such as a fish species liking a fish post. You can also delete a relationship’s originating or terminating record without deleting the relationships itself. For a more complete discussion of these issues, check out Dennes Torres’s excellent article SQL Graph Objects in SQL Server 2017: the Good and the Bad.

Because the tables in a graph database are similar to typical SQL Server tables, you can usually get around most limitations, as long as you understand the basics of querying and modify graph data. And the best way to learn those basics is to try the graph database features for yourself, using the type of real-world data that drives the applications you’re supporting.

The post SQL Server Graph Databases – Part 3: Modifying Data in a Graph Database appeared first on Simple Talk.

Exactly how graph data modeling can help evaluate data source tools

material, consisting of E-Guides, news, ideas and also more. Step 2 of 2: to bridge the voids between naturally linked information

as well as apply chart analytics to locate

brand-new understandings not typically afforded by traditional relational database administration systems.There is an expanding pool of chart database

systems and also

items, yet while lots of use test licenses or community variations, the example applications are typically really casual as well as offer only a little inkling of the prospective power of graph information modeling.

While these sample applications demonstrate the core essentials of exactly how the chart database functions, they just touch on the possible breadth of information administration as well as logical capabilities these tools provide.In some situations, the graph strategy is confusing, and also some individuals don’t completely understand how to map their information to a graph version . In others, an absence of expertise about the execution of the chart framework can end up being a traffic jam. The outcome is that people may be able to dabble with those tiny chart examples, yet they might be prevented when trying to craft an affordable prototype or proof of concept. However, numerous relational databases have concealed graph representations hidden within the tabular structure, as well as there are some concrete steps that an expert can take to find the data’s inner chart. As a matter of fact, these chart information modeling steps are not much various than those absorbed establishing a relational design. Exactly how chart databases work Chart databases utilize an alternate technique to data depiction by capturing information regarding entities as well as their features, in addition to the partnerships amongst those entities as excellent things. The structure of chart databases is mathematical graph concept. Charts contain a collection of vertices– which are likewise referred to as nodes or points– that represent the modeled entities, linked by edges– which are also described as links, links or connections– that capture the way that two entities are related.For instance, your data source may refer to customers— which is one entity– who live at details addresses– which become a different entity. Lives-At is one sort of connection that connects a customer to an area as well as would be the tag assigned to the edge between the specific consumer node as well as the certain area node. Without the loss of abstract principle, we can think that every connection can be stood for as a three-way including a topic– the source of the edge– the partnership and also the item– the target of the side. As an example:132 Key Ave. Tenley, Md., 29817 This example is one instance of a more basic triple partnership: A lot of, if not all graph database systems are engineered to be able to ingest a depiction of a chart containing 2 artifacts: The listing of nodes and the list of sides

in between those nodes. Utilizing this foundation, we can adhere to these chart information modeling steps to create those 2 artefacts: Discover the entities. Review

your data sets to determine those core nouns that can be either the topics or things of a partnership. Some instances include customer, staff member, vendor, company, place, acquisition deal, insurance policy case, workflow step, item, part, film, author, book, etc.Find each entity’s residential properties. Entity

  • homes resemble entity qualities in the relational model. As an example, a motion picture’s residential or commercial properties might include year, format and copyright information. However, when you determine features that are likewise entities, you can start to identify relationships.Find each relationship’s properties. These are the attributes related to the links between entities
  • . To continue our instance, a star may contribute in a phase bet a limited engagement, making the start day and also end date properties of the play’s relationship between an actor as well as a function. The auto mechanics of creating the prototype entail a series of resource data set scans. The initial collection of scans discover the unique set
  • of entities of each entity course, in addition to the buildings for every of those entities. When the complete collection of entities is collected, that collection of vertices can be result to a consistent file

that can subsequently be ingested by the graph data source system.The second collection of scans will remove the connections between the various entities, in addition to the properties of those web links. Because these partnerships connect recognized entities, see to it that you collect just partnership triples that refer to the entities logged during the prior collection of scans. When those sides have actually been collected, output the triples to a relentless sides file.Together, those two files– the vertices and the sides– represent the graph.

This workout integrates a repeatable procedure for removing a graph from the resource information with the real artifacts that can be packed into a graph data source system. The outcome is a simplified approach for producing prototypes for the examination of different chart database tools.

Dig Deeper on Information stockroom software program-ADS BY GOOGLE