1

I have to count edges between neighbors of nodes in a graph in SQL Server whereas I have tables as GraphNodes and GraphEdges. The structure of tables is available in previous question

Relating to previous question, here is the aspect of question is different as:
I have to execute these steps to perform the task i.e.

  1. take a node say V from GraphNodes

  2. have to have DISTINCT neighbors list for V (e.g. in a TABLE variable SQL)

  3. check (COUNT DISTINCT) links between neighbors of V in GraphEdges
  4. output V with DISTINCT links between its neighbors

The query I have tried for a single node works fine i.e.

SELECT GN.id, COUNT(DISTINCT(CONCAT(GE.Source_Node,'-', GE.Target_Node))) AS NeighborLinks
FROM GraphEdges GE
JOIN GraphNodes GN ON GN.id = 512
WHERE Source_Node IN (SELECT DISTINCT Target_Node FROM GraphEdges WHERE Source_Node = 512
                      UNION ALL
                      SELECT DISTINCT Source_Node FROM GraphEdges WHERE Target_Node = 512
                     ) 
  AND Target_Node IN (SELECT DISTINCT Target_Node FROM GraphEdges WHERE Source_Node = 512
                       UNION ALL
                       SELECT DISTINCT Source_Node FROM GraphEdges WHERE Target_Node = 512
                     )
GROUP BY GN.id  

I have taken id = 512 as a sample where it is id in GraphNodes. This query outputs as:

+-------+-----------------+
|  id   |   NeighborLinks |
+-------+-----------------+
|  512  |   6             |
+-------+-----------------+  

The reason for using UNION ALL in WHERE clause is that the id i.e. 512 exists in both columns i.e. Source_Node and Target_Node as well, so have to select DISTINCT neighbors from both columns is necessary. Moreover, using same list for GE.Source_Node and GE.Target_Node because have to check links only between neighbors of V i.e. 512.
The question is how to use what I think the TABLE variable or any other method to sort out this problem of providing long list of values instead of 512

I have came up with this solution regarding table variable but got error using table variables inside query as:
Try 1

DECLARE @ID TABLE(id INT)
DECLARE @S_Neighbor TABLE (id INT)
DECLARE @T_Neighbor TABLE (id INT)

INSERT INTO @ID SELECT id FROM GraphNodes

INSERT INTO @S_Neighbor SELECT DISTINCT Source_Node 
                        FROM GraphEdges 
                        WHERE Target_Node IN (SELECT id FROM @ID)
--UNION ALL
INSERT INTO @T_Neighbor SELECT DISTINCT Target_Node 
                        FROM GraphEdges 
                        WHERE Source_Node IN (SELECT id FROM @ID)

SELECT GN.id,COUNT(DISTINCT(CONCAT(GE.Source_Node,'-', GE.Target_Node))) AS Mutual_Links
FROM GraphEdges GE
JOIN GraphNodes GN ON GN.id = @ID
WHERE Source_Node IN (SELECT DISTINCT Target_Node 
                      FROM GraphEdges 
                      WHERE Source_Node IN @T_Neighbor

                      UNION ALL

                      SELECT DISTINCT Source_Node 
                      FROM GraphEdges 
                      WHERE Target_Node IN @S_Neighbor)
   AND Target_Node IN (SELECT DISTINCT Target_Node 
                       FROM GraphEdges 
                       WHERE Source_Node IN @S_Neighbor

                       UNION ALL

                       SELECT DISTINCT Source_Node 
                       FROM GraphEdges 
                       WHERE Target_Node IN @T_Neighbor)
GROUP BY GN.id  

I also tried this:
Try 2

DECLARE @ID_COUNTER INT
DECLARE @MAX_ID INT

SET @ID_COUNTER = 1
SET @MAX_ID = 148410

WHILE @ID_COUNTER <= @MAX_ID
BEGIN
  (
    SELECT GN.id, 
    COUNT(DISTINCT(CONCAT(GE.Source_Node,'-', GE.Target_Node))) AS Mutual_Links
    FROM GraphEdges GE
    JOIN GraphNodes GN ON GN.id = @ID_COUNTER
    WHERE Source_Node IN (SELECT DISTINCT Target_Node 
                          FROM GraphEdges WHERE Source_Node = @ID_COUNTER
                          UNION ALL
                          SELECT DISTINCT Source_Node 
                          FROM GraphEdges WHERE Target_Node = @ID_COUNTER
                         ) 
       AND Target_Node IN (SELECT DISTINCT Target_Node 
                           FROM GraphEdges WHERE Source_Node = @ID_COUNTER
                           UNION ALL
                           SELECT DISTINCT Source_Node 
                           FROM GraphEdges WHERE Target_Node = @ID_COUNTER
                          )
    GROUP BY GN.id
   )
  SET @ID_COUNTER += 1
END

I have used @MAX_ID = 3 and it took 56 sec to return output whereas @MAX_ID is originally = 148410. Though the returned values for NeighborLinks are correct but the output shown is in three separate windows for as:

id  NeighborLinks
1   53

 id NeighborLinks
 2  318

id  NeighborLinks
3   297

6
  • ...you wrote all this to ask how to pass list of int values as a parameter? Commented Feb 28, 2016 at 10:09
  • 1
    You are already suggesting a table variable yourself. What is keeping you from using table variables? An alternative would be to use a temporary table for storing the id's. Commented Feb 28, 2016 at 10:16
  • @IvanStarostin Not all yeah, but its a two fold issue, one is how to store and pass int values as parameter repeatedly and other is how to JOIN same column twice Commented Feb 28, 2016 at 10:21
  • @TT. I have tried table variable but unable to sort out Commented Feb 28, 2016 at 10:22
  • Show us the error message and the line which it corresponds to. Most likely, instead of WHERE Source_Node IN @T_Neighbor you should write WHERE Source_Node IN (SELECT ID FROM @T_Neighbor AS TN). JOIN GraphNodes GN ON GN.id = @ID is also wrong and it is completely unclear to me what you are trying to achieve here. Commented Feb 28, 2016 at 11:25

1 Answer 1

1

I think you're looking for something like the snippet below.

I've created a temporary table for the graph nodes (#graph_edges). The nodes for which you want to look up the number of (distinct) neighbors is in temporary table #nodes.

CREATE TABLE #graph_edges(source_node INT NOT NULL,target_node INT NOT NULL);

CREATE TABLE #nodes(id INT NOT NULL PRIMARY KEY);
--INSERT INTO #nodes(id)VALUES(512),(513),(514); -- specific nodes to look up in the graph
INSERT INTO #nodes(id)
SELECT source_node FROM #graph_edges UNION SELECT target_node FROM #graph_edges; -- lookup for all distinct nodes ID's in the graph


SELECT id,neighbor_links=COUNT(*)
FROM
    (
        SELECT n.id,l=ge.source_node,r=ge.target_node
        FROM #nodes AS n
             INNER JOIN #graph_edges AS ge ON
                 ge.source_node=n.id
        UNION -- union of the two sets, this filters duplicate rows (ie no duplicate source_node,target_node row will appear in the derived table)
        SELECT n.id,l=ge.target_node,r=ge.source_node
        FROM #nodes AS n
             INNER JOIN #graph_edges AS ge ON
                 ge.target_node=n.id
    ) AS l
GROUP BY id
ORDER BY id;

DROP TABLE #nodes;
DROP TABLE #graph_edges;
Sign up to request clarification or add additional context in comments.

12 Comments

@TT.--I have executed your answer but it returns 0 rows as you are INSERTING the VALUES INTO #nodes manually ?
@Taufel Did you substitute #graph_nodes with your GraphNodes table in the script? The results will be zero if there are no nodes in the graph.
@TT.--I think there is not entry in script as #graph_nodes
@Taufel I meant to say #graph_edges with GraphEdges.
@TT.--As you have created these tables right above the script that's why I didn't disturbed your script and also you have dropped at end as these were temporary tables? And what INSERT INTO #nodes(id)VALUES(512),(513),(514); this means? Just 3 values ?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.