18.4. PageRank Algorithm

Graph Algorithms (similarity, centrality and clustering) in APOC are deprecated and about to be removed. Please use the algorithms in neo4j-graph-algorithms instead.

18.4.1. Setup

Let’s create some test data to run the PageRank algorithm on.

// create 100 nodes
FOREACH (id IN range(0,1000) | CREATE (:Node {id:id}))

// over the cross product (1M) create 100.000 relationships
MATCH (n1:Node),(n2:Node) WITH n1,n2 LIMIT 1000000 WHERE rand() < 0.1

CREATE (n1)-[:TYPE_1]->(n2)

18.4.2. PageRank Procedure

PageRank is an algorithm used by Google Search to rank websites in their search engine results.

It is a way of measuring the importance of nodes in a graph.

PageRank counts the number and quality of relationships to a node to approximate the importance of that node.

PageRank assumes that more important nodes likely have more relationships.

Caution: nodes specifies the nodes for which a PageRank score will be projected, but the procedure will always compute the PageRank algorithm on the entire graph. At present, there is no way to filter/reduce the number of elements that PageRank computes over.

A future version of this procedure will provide the option of computing PageRank on a subset of the graph.

MATCH (node:Node)
WHERE node.id %2 = 0
WITH collect(node) AS nodes
// compute over relationships of all types
CALL apoc.algo.pageRank(nodes) YIELD node, score
RETURN node, score
ORDER BY score DESC
MATCH (node:Node)
WHERE node.id %2 = 0
WITH collect(node) AS nodes
// only compute over relationships of types TYPE_1 or TYPE_2
CALL apoc.algo.pageRankWithConfig(nodes,{types:'TYPE_1|TYPE_2'}) YIELD node, score
RETURN node, score
ORDER BY score DESC
MATCH (node:Node)
WHERE node.id %2 = 0
WITH collect(node) AS nodes
// peroform 10 page rank iterations, computing only over relationships of type TYPE_1
CALL apoc.algo.pageRankWithConfig(nodes,{iterations:10,types:'TYPE_1'}) YIELD node, score
RETURN node, score
ORDER BY score DESC