Network Analyses for Official Statistics

Mark van der Loo | uRos2025 tutorial

Getting Ready

Download the zipped RStudio project

https://www.markvanderloo.eu/files/share/uros2025networks.zip

Or go to markvanderloo.eu/publications, find the tutorial under conference contributions and click on the materials link

Unzip and open the RStudio project file

Reading, investigating, and plotting networks

Try the following code

See exercise_1.R

library(igraph)
d <- read.csv("data/moreno.csv")
moreno <- graph_from_data_frame(d, directed=FALSE)

Try the following functions

Each time, check the output and interpret together with your neighbour.

summary(moreno)
diameter(moreno)
V(moreno)
E(moreno)
# play with the 'breaks' parameter.
hist(degree(moreno))

Plotting

layout <- layout_with_fr(moreno)
plot(moreno, layout=layout, vertex.size=4, vertex.label=NA)

See ?igraph.plotting for all parameters.

Exercise

Try different layout functions. See ?layout_ for options.

Challenge

Create a ‘nice’ plot of the airport connections.

Clustering

What is a cluster?

Intuition

Nodes within a cluster are more connected to each other than to others.

Formalization

A clustering is a partition of nodes that maximizes modularity.

Modularity

Partition the nodes into \(m\) subsets. The modularity of this partition is given by

\[ Q = \sum_{i=1}^m (C_i - C^0_i), \]

where \(C^0_i\) is a correction based on the expected number of links if they were assigned at random.

Properties

  • Larger is better (\(Q\in[-\tfrac{1}{2},1]\))
  • \(Q\approx0\) means: no meaningful communities.

Community detection

Problem

There are \(2^{|V|}\) ways to partition a graph.

Solutions

Many heuristic algorithms, including

  • fast and greedy
  • Louvain
  • Leiden

Your turn

See exercise_2.R

cl <- cluster_louvain(moreno)
plot(cl, moreno)

Exercises

  • Experiment with different clustering algorithms
  • Investigate the cl object. What are it’s elements?

Node importance

Closeness

The closeness of a node is the reciprocal of the sum over the distances to all other nodes.

\[ \textrm{closeness}(v) = \frac{1}{\sum_{w\in V(g)}\textrm{distance}(v,w)} \]

igraph::closeness(g)

Note For directed networks, there is a version for incoming and outgoing distances.

Betweenness

The betweenness of a node measures how often a node appears in a shortest path between two nodes.

\[ \textrm{betweenness}(v) = \sum_{s\not=t\in V(g)} \frac{\textrm{# shortest paths from }s\textrm{ to }t\textrm{ via }v } {\textrm{# shortest paths from }s\textrm{ to }t} \]

igraph::betweenness(g)

Eigenvector centrality

Your centrality equals the sum over centralities of your neighbours. This leads to the following equation.

\[ \boldsymbol{Ac}=\lambda\boldsymbol{c}. \]

Where \(A_{ij}=1\) when \(i\) and \(j\) are connected (0 otherwise), \(\mathbf{c}\) the vector of centralities, and \(\lambda\) the eigenvalues.

igraph::eigen_centrality(g)$vector

Note. There are variants for directed graphs

Pagerank

A node is important when important neighbours point to you

\[ \textrm{pagerank}(v) = (1-d) + d\sum_{w\to v}\textrm{pagerank}(w) \]

Where \(d\in[0,1]\). Originally \(d=0.85\)

igraph::page_rank(g)

Note Originally developed for directed graphs (like the internet).

Your turn

See exercise_3.R

Create the following plot