Documentation / Algorithms
The PageRank Algorithm
The PageRank is an algorithm that measures the “importance” of the nodes in a
graph. It assigns to each node a rank. This rank corresponds to the
probability that a “random surfer” visits the node. The surfer goes from node
to node in the following way: with probability d she chooses a
random outgoing arc and with probability 1 - d she “teleports” to a
random node (possibly not connected to the current). The probability
d is called damping factor. By default it is 0.85 but it can be
customized using the setDampingFactor(double)
method. The ranks are real
numbers between 0 and 1 and sum up to one.
Usage
This implementation uses a variant of the power iteration algorithm to compute the node ranks. It computes the approximate ranks iteratively going closer to the exact values at each iteration. The accuracy can be controlled by a precision parameter . When the L1 norm of the difference between two consecutive rank vectors becomes less than this parameter, the result is considered precise enough and the computation stops.
This implementation works with both directed and undirected edges. An undirected edge acts as two directed arcs.
The graph dynamics is taken into account and the ranks are not computed from
scratch at each modification in the structure of the graph. However, the
ranks become less and less accurate after each modification. To establish the
desired precision, one must either explicitly call compute()
or ask
for a rank of a node by calling getRank(Node)
.
The computed ranks are stored in node attribute. The name of this attribute
can be changed by a call to setRankAttribute(String)
but only before
the call to init(Graph)
. Another way to obtain the ranks is to call
getRank(Node)
. The second method is preferable because it will
update the ranks if needed and will always return values within the desired
precision.
Example
Complexity
Each iteration takes O(m + n) time, where n is the number of nodes and m is the number of edges. The number of iterations needed to converge depends on the desired precision.
Reference
- Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd. The PageRank citation ranking: Bringing order to the Web. 1999