Detect Community Structures in Networks
Formations JEDI – April 21 2016
#Community structure We will try to detect community structure in networks.
Intuitively, communities are groups of nodes in a network, where:
#Community structure Lots of complex networks exhibit community structure.
Let’s present a method to detect structures and handle network dynamics.
#Agenda ###In this tutorial we will:
This is not not an academic, but more a way to show you how to combine the various building blocks of GraphStream to experiment on dynamic networks.
#Determining community structure
Most often we use two kinds of criteria:
We are focused here in the first one.
#Determining communities
Once we have such a measure, several techniques can be used to find the communities:
#Modularity One of the most used measure is the modularity \(Q\).
Intuition: \(Q\) measures the fraction of intra-communities edges minus the same fraction if the network had edges at random (with the same communities divisions). M. E. J. Newman (2006)
Modularity gives results in \(\left[-\frac{1}{2} .. 1\right]\).
#Modularity Suppose a given network with modules:
How to determine its modularity ?
#Modularity
We could compare the proportion of internal links \(I_c\) in each community \(c\) to the number of edges \(m\). Links in green \(O_c\) go out of the community \(c\).
\(Q = \sum_c\frac{I_c}{m}~~~~~~~\)
This would not be sufficient, since putting all nodes in the same community would produce a perfectly modular network !
#Modularity
Instead we compare the ratio \(\frac{I_c}{m}\) with the expected value in the same network but with all its links randomly rewired, that is:
\[\frac{(2 I_c + O_c)^2}{(2m)^2}~~~~~~~~\]
\(Q = \sum_c\frac{I_c}{m} - \sum_c\frac{(2 I_c + O_c)^2}{(2m)^2}\)
#Network dynamics?
If the network under analysis evolves, it becomes impossible to recompute in real-time the whole modules each time a change occurs in the graph.
#Graph layouts
A novel approach to determine modules uses graph layouts.
Most layout algorithms are force based:
#The Lin-Log layout
Most force based algorithms try to the minimize energy.
Lin-Log is based on a \((a,r)\)-energy model.
#The Lin-Log layout and network dynamics
After a change in the network the algorithm computes the layout from its previous equilibrium state.
Chances are that reusing previous state costs less than a complete re-computation (c.f. re-optimization).
The Lin-Log layout was proposed by Andreas Noack (2007).
#Practical session
We will see how to:
#How GraphStream handles display
GraphStream puts the display of the graph in a separate thread or process or host.
Usually the display will evolve in parallel of the main application running on the graph.
#How GraphStream handles graph layouts
By default the viewer creates another thread to handle the layout. The default Layout algorithm is a derivative of the Frutcherman-Reingold one.
Open the modularity
package (src/org/graphstream/demo/modularity
), then the LinLogLayout.java
file.
public class LinLogLayout {
// ...
private Graph graph;
private Viewer viewer;
public void findCommunities(String fileName)
throws IOException, GraphParseException {
graph = new SingleGraph("communities");
viewer = graph.display(true);
graph.read(fileName);
}
}
It creates a graph, displays, launches an automatic layout on it (the display(true)
argument) and then reads it.
#Before proceeding
Before proceeding, to avoid compilation problems copy and paste the following imports to your program, just under the already present imports:
import org.graphstream.algorithm.ConnectedComponents;
import org.graphstream.algorithm.measure.Modularity;
import org.graphstream.graph.Edge;
import org.graphstream.stream.ProxyPipe;
import org.graphstream.ui.graphicGraph.GraphPosLengthUtils;
import org.graphstream.ui.graphicGraph.stylesheet.StyleConstants.Units;
import org.graphstream.ui.layout.springbox.implementations.LinLog;
import org.graphstream.ui.spriteManager.Sprite;
import org.graphstream.ui.spriteManager.SpriteManager;
#Step 2
##Step 2
private LinLog layout; // 2
private double a = 0; // 3
private double r = -1.3; // 3
private double force = 3; // 3
public void findCommunities(String fileName) throws ... {
graph = new SingleGraph("communities");
viewer = graph.display(false); // 1
layout = new LinLog(false); // 2
layout.configure(a, r, true, force); // 3
layout.addSink(graph); // 4
graph.addSink(layout); // 5
graph.read(fileName);
while(true) { // 6
layout.compute(); // 6
} // 6
}
Let’s run it and observe the difference with the former layout.
#Step 3
Problem: The viewer runs in its own thread any interaction with it is not reflected on the graph. Try to grab a node with the mouse and move it… See?
Let’s fix this:
##Step 3
private ProxyPipe fromViewer; // 1
public void findCommunities(String fileName) throws ... {
graph = new SingleGraph("communities");
viewer = graph.display(false);
fromViewer = viewer.newThreadProxyOnGraphicGraph(); // 1
layout = new LinLog(false);
layout.configure(a, r, true, force);
layout.addSink(graph);
graph.addSink(layout);
fromViewer.addSink(graph); // 2
graph.read(fileName);
while(! graph.hasAttribute("ui.viewClosed")) { // 4
fromViewer.pump(); // 3
layout.compute();
}
}
Now try to grab a node in the display.
##Finding community structure with a Lin-Log layout Lin-Log creates long edges outside communities and short edges between nodes in the same communities.
Let’s use this property to obtain a quick (and hopefully good enough) approximation of the communities.
#Step 4
#Step 4
private double cutThreshold = 1; // 4
public void findCommunities(String fileName) throws ... {
// ...
graph.addAttribute("ui.stylesheet", styleSheet); // 1
graph.addAttribute("ui.antialias"); // 2
graph.read(fileName);
while(! graph.hasAttribute("ui.viewClosed")) {
fromViewer.pump();
layout.compute();
showCommunities(); // 3
}
}
public void showCommunities() { // 3
// ...
}
protected static String styleSheet = // 1
"node { size: 7px; fill-color: rgb(150,150,150); }" +
"edge { fill-color: rgb(255,50,50); size: 2px; }" +
"edge.cut { fill-color: rgba(200,200,200,128); }";
#Step 4 a From the stylesheet:
ui.class
set to "cut"
are grey.First part of the showCommunities()
method :
public void showCommunities() {
int nEdges = graph.getEdgeCount();
double length, averageLength = 0;
for(Edge edge : graph.getEachEdge()) { // 1
length = GraphPosLengthUtils.edgeLength(edge);
edge.setAttribute("length", length); // 2
averageLength += length; // 3
}
averageLength /= nEdges; // 3
// ...
}
Second part of the showCommunities()
method: select which edge is inter-community.
"cut"
attribute.public void showCommunities() {
// ...
for(Edge edge : graph.getEachEdge()) { // 1
length = edge.getNumber("length");
if(length > averageLength * cutThreshold) { // 2
edge.addAttribute("ui.class", "cut"); // 3.1
edge.addAttribute("cut");
} else {
edge.removeAttribute("ui.class"); // 3.2
edge.removeAttribute("cut");
}
}
}
#The Zachary Karate Club The graph we use as a demo comes from a well known social study in a Karate Club.
Nodes represent members and edges their friendship ties.
At a given time one of the member left the club to create its own club. Some members stayed in the old one, while others quit to join the new one.
This graph usually admits at least two communities (the two clubs), although smaller sub-communities can be observed.
#Computing the number of communities
GraphStream contains a algorithm that compute and update the number of connected components of a graph.
This algorithm can ignore edges with a specific attribute (say "cut"
).
Sprites in the viewer help visual the number of communities.
#Step 5 a
Compute the number of connected components
#Step 5 a
// ...
private ConnectedComponents cc; // 1
public void findCommunities(String fileName) throws ... {
graph = new SingleGraph("communities");
viewer = graph.display(false);
fromViewer = viewer.newThreadProxyOnGraphicGraph();
layout = new LinLog(false);
cc = new ConnectedComponents(graph); // 2
layout.configure(a, r, true, force);
cc.setCutAttribute("cut"); // 3
// ...
}
// ...
Display the number of connected components using a sprite:
#Step 5 b
// ...
private SpriteManager sm; // 1
private Sprite ccCount; // 1
public void findCommunities(String fileName) throws ... {
// ...
cc = new ConnectedComponents(graph);
sm = new SpriteManager(graph); // 1
ccCount = sm.addSprite("CC"); // 1
// ...
cc.setCutAttribute("cut");
ccCount.setPosition(Units.PX, 20, 20, 0); // 2
// ...
while(! graph.hasAttribute("ui.viewClosed")) {
//...
showCommunities();
ccCount.setAttribute("ui.label", // 3
String.format("Modules %d", cc.getConnectedComponentsCount()));
}
}
// ...
protected static String styleSheet = // 4
"node { size: 7px; fill-color: rgb(150,150,150); }" +
"edge { fill-color: rgb(255,50,50); size: 2px; }" +
"edge.cut { fill-color: rgba(200,200,200,128); }" +
"sprite#CC { size: 0px; text-color: rgb(150,100,100); text-size: 20; }";
#Computing the modularity Now that communities are identified, we can measure the quality of the partition.
GraphStream has a modularity algorithm that follows each update on the graph.
"module"
attribute on nodes."module"
attribute.#Step 6 a
// ...
private Modularity modularity; // 1
public void findCommunities(String fileName) throws ... {
cc = new ConnectedComponents(graph);
sm = new SpriteManager(graph);
ccCount = sm.addSprite("CC");
modularity = new Modularity("module"); // 2
modularity.init(graph); // 3
layout.configure(a, r, true, force);
cc.setCutAttribute("cut");
ccCount.setPosition(Units.PX, 20, 20, 0);
cc.setCountAttribute("module"); // 4
// ...
}
#Step 6 b
Display the modularity value using a sprite:
#Step 6 b
// ...
private Sprite ccCount, modValue; // 1
public void findCommunities(String fileName) throws ... {
// ...
modularity = new Modularity("module");
modValue = sm.addSprite("M"); // 1
// ...
cc.setCutAttribute("cut");
ccCount.setPosition(Units.PX, 20, 20, 0);
cc.setCountAttribute("module");
modValue.setPosition(Units.PX, 20, 40, 0); // 2
// ...
while(! graph.hasAttribute("ui.viewClosed")) {
//...
ccCount.setAttribute("ui.label",
String.format("Modules %d", cc.getConnectedComponentsCount()));
modValue.setAttribute("ui.label", // 3
String.format("Modularity %f", modularity.getMeasure()));
}
}
protected static String styleSheet =
// ... // 4
"sprite#CC { size: 0px; text-color: rgb(150,100,100); text-size: 20; }" +
"sprite#M { size: 0px; text-color: rgb(100,150,100); text-size: 20; }";
#That’s it!
Some other graphs are provided in the data
directory of the eclipse project.
You can experiment on them, and play with the cutThreshold
parameter as well as the \((a, r)\) and force
parameters. Here is a set of good values for these graphs:
Graph | a | r | force | cutThreshold |
---|---|---|---|---|
karate.gml | 0 | -1.3 | 3 | 1 |
dolphins.gml | 0 | -1.2 | 8 | 0.8 |
polbooks.gml | 0 | -1.9 | 5 | 0.8 |
Let’s try the method on a really dynamic graph.
Consider a Boids Simulation.
The LinLogLayoutAndBoids.java
file in the modularity
package contains a working example of LinLog analysis of a Boids simulation.