Betweenness Centrality#

For usage in partitioning approaches and for evaluation of node and edge usage, we need to calculate betweenness centrality. Because we need to respect the Partition Requirements, we need to calculate betweenness centrality in a special way, using the shortest paths we calculated in Restricted Distance Calculation. We will show this conceptually using pair-wise dependencies, as shown by Brandes (2001), and modified from networkx.algorithms.centrality.betweenness.

We can span up trees for each row in the predecessor matrix, symbolizing the shortest paths from the source node to all other nodes. Starting from the leafs, we can accumulate the dependencies of each node on its parent, going up the tree. This way, we can accumulate the dependencies of all nodes (and edges!) on the way.

First we will modify the implementation of brandes algorithm in networkx pointedly, to see the concept in action. Then, we will simplify it to our needs. Lastly we compare the performance.

Modified implementation#

Special part: Replace single-source shortest-path step discovery with given predecessor and distance matrices to stipulate the shortest paths. Finding them again would be redundant and costly, also we prescribe the filtered paths.

import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import pandas as pd
from networkx.algorithms.centrality.betweenness import \
    _single_source_dijkstra_path_basic, \
    _rescale
from scipy.sparse.csgraph import dijkstra

from superblockify.metrics.distances import shortest_paths_restricted

2026-03-23 16:12:20,324 |     INFO | __init__.py:11 | superblockify version 1.0.2

Let’s use the same graph as in the second example of Restricted Distance Calculation.

Show code cell source

Hide code cell source

# Create planar graph, similar to a street network
G = nx.MultiDiGraph(nx.Graph(
    [
        (10, 11, {"weight": 1}),
        (11, 12, {"weight": 1}),
        (12, 13, {"weight": 1}),
        (13, 0, {"weight": 1.5}),
        (13, 14, {"weight": 1}),
        (14, 0, {"weight": 1}),
        (0, 10, {"weight": 1}),
        (0, 1, {"weight": 1}),
        (10, 1, {"weight": 1}),
        (1, 2, {"weight": 1}),
        (2, 3, {"weight": 1}),
        (3, 4, {"weight": 1.5}),
        (4, 5, {"weight": 1}),
        (5, 9, {"weight": 1}),
        (5, 6, {"weight": 1}),
        (7, 2, {"weight": 1}),
        (8, 7, {"weight": 0.5}),
        (7, 1, {"weight": 1}),
        (8, 9, {"weight": 0.7}),
        (6, 9, {"weight": 1}),
        (8, 4, {"weight": 1}),
        (9, 1, {"weight": 1}),
        (0, 18, {"weight": 0.4}),
        (18, 2, {"weight": 0.4}),
        (6, 15, {"weight": 0.8}),
        (15, 16, {"weight": 1}),
        (16, 17, {"weight": 1}),
        (17, 6, {"weight": 1}),
    ]
))
G.add_node(19)  # isolated node
# Delete directed edges (1, 9), (6, 17), (10, 1)
G.remove_edges_from([(1, 9), (6, 17), (10, 1)])
# Add longer edge 0 -> 13
G.add_edge(0, 13, weight=G[0][13][0]["weight"] * 2)

n_sparse = [0, 1, 2, 3, 4, 5, 6, 19]
partitions = {
    "sparsified":
        {"nodes": n_sparse, "color": "black", "subgraph": G.subgraph(n_sparse)},
    "G_r": {"nodes": [7, 8, 9], "color": "crimson"},
    "G_g": {"nodes": [10, 11, 12, 13, 14], "color": "mediumseagreen"},
    "G_b": {"nodes": [15, 16, 17], "color": "dodgerblue"},
    "G_o": {"nodes": [18], "color": "darkorange"},
}
for name, part in partitions.items():
    if "subgraph" not in part:
        # subgraph for all edges from or to nodes in partition
        part["subgraph"] = G.edge_subgraph(
            # [(u, v) for u, v in G.edges if u in part["nodes"] or v in part["nodes"]]
            [e for e in G.edges if e[0] in part["nodes"] or e[1] in part["nodes"]]
        )
    part["nodelist"] = part["subgraph"].nodes
    for node in part["nodes"]:
        G.nodes[node]["partition"] = part["color"]

nx.draw(G, with_labels=True, node_color=[G.nodes[n]["partition"] for n in G
        .nodes],
        font_color="white",
        pos=nx.kamada_kawai_layout(G),
        ax=plt.figure(figsize=(8, 5)).gca(),
        connectionstyle="arc3,rad=0.1",
        )

../../_images/937d3ccb51f99905422d214b12b25e35766c7fa4d855feda2e1a80cca09483d2.png

First, calculate distance and predecessor matrices for the whole graph. Once plain, once restricted to the sparsified nodes.

node_order = list(range(len(G.nodes)))
G_sparse = nx.to_scipy_sparse_array(G, nodelist=node_order, weight="weight")
G_sparse.indices, G_sparse.indptr = G_sparse.indices.astype(
    np.int32), G_sparse.indptr.astype(np.int32)
dist, pred = dijkstra(G_sparse, directed=True, indices=node_order,
                      return_predecessors=True)
dist_restr, pred_restr = shortest_paths_restricted(G, partitions, weight="weight",
                                                   node_order=node_order)

Now we want to calculate node and edge betweenness centrality for the whole graph, using three methods

a new dijkstra pass, just as in networkx.edge_betweenness_centrality()
with given distance and predecessor matrices
with restricted distance and predecessor matrices

A function that generates the same output as the NetworkX internal function networkx.algorithms.centrality.betweenness._single_source_dijkstra_path_basic(), so we can swap it out.

from numpy import argsort

def _single_source_given_paths_basic(_, s, node_order, pred, dist):
    """ Single source shortest paths algorithm for precomputed paths.

    Parameters
    ----------
    _ : np.array
        Graph. For compatibility with other functions.
    s : int
        Source node id.
    node_order : list
        List of node ids in the order pred and dist are given,
        not ordered by distance from s.
    pred : np.array
        Predecessor matrix for source node s.
    dist : np.array
        Distance matrix for source node s.

    Returns
    -------
    S : list
        List of nodes in order of non-decreasing distance from s.
    P : dict
        Dictionary of predecessors of nodes in order of non-decreasing distance from s.
    sigma : dict
        Dictionary of number of shortest paths to nodes.
    D : dict
        Dictionary of distances to nodes.

    Notes
    -----
    Modified from :mod:`networkx.algorithms.centrality.betweenness`.

    Does not include endpoints.
    """
    # Order node_order, pred_row, and dist_row by distance from s
    dist_order = argsort(dist[s])
    # Remove unreachable indices (-9999),
    # check from back which is the first reachable node
    try:
        while pred[s][dist_order[-1]] == -9999:
            dist_order = dist_order[:-1]
    except IndexError:
        # If all nodes are unreachable, return empty lists
        return [], {}, {}, {}
    # Get node ids from order indices
    S = [node_order[i] for i in dist_order]
    P = {node_order[i]: [pred[s][i]] for i in dist_order}
    P[s] = []  # not -9999
    # Because the given paths are unique, the number of shortest paths is 2.0
    sigma = dict.fromkeys(S, 2.0)
    D = {node_order[i]: dist[s][i] for i in dist_order}
    return S, P, sigma, D

To calculate not only node betweenness, but also edge betweenness, as well as length and linearly scaled betweenness, we need to modify the function. This function returns the different kinds of betweenness in a dict. For the edge \(t = 17\), we plot the predecessor graphs for the three methods.

t = 17

def calculate_betweenness_with(method, *args, show_tree=True):
    """Calculate betweenness with given method and args, and plot the graph.
    Show tree graph of predecessors for node ``t``."""
    betweenness = dict.fromkeys(G, 0.0)
    betweenness_len = betweenness.copy()  # Length scaled betweenness
    betweenness_lin = betweenness.copy()  # Linear scaled betweenness
    betweenness_edge = betweenness.copy()
    betweenness_edge.update(dict.fromkeys(G.edges(), 0.0))
    betweenness_edge_len = betweenness_edge.copy()
    betweenness_edge_lin = betweenness_edge.copy()
    # b[v]=0 for v in G and b[e]=0 for e in G.edges
    # Loop over nodes to collect betweenness using pair-wise dependencies
    for s in G:
        S, P, sigma, D = method(G, s, *args)
        # betweenness, _ = _accumulate_basic(betweenness, S.copy(), P, sigma, s)
        # betweenness_edge = _accumulate_edges(betweenness_edge, S.copy(), P, sigma, s)
        delta = dict.fromkeys(S, 0)
        delta_len = delta.copy()
        while S:
            w = S.pop()
            coeff = (1 + delta[w]) / sigma[w]
            coeff_len = (1 / D[w] + delta[w]) / sigma[w] if D[w] != 0 else 0
            for v in P[w]:
                c = sigma[v] * coeff
                c_len = sigma[v] * coeff_len
                if (v, w) not in betweenness_edge:
                    betweenness_edge[(w, v)] += c
                    betweenness_edge_len[(w, v)] += c_len
                    betweenness_edge_lin[(w, v)] += D[w] * c_len
                else:
                    betweenness_edge[(v, w)] += c
                    betweenness_edge_len[(v, w)] += c_len
                    betweenness_edge_lin[(v, w)] += D[w] * c_len
                delta[v] += c
                delta_len[v] += sigma[v] * coeff_len
            if w != s:
                betweenness[w] += delta[w]
                betweenness_len[w] += delta_len[w]
                betweenness_lin[w] += D[w] * delta_len[w]
        if s == t and show_tree:
            plot_graph_from_predecessors(P, s, method.__name__)
    # Normalize betweenness values
    betweenness = _rescale(betweenness, len(G), normalized=True, directed=True)
    betweenness_len = _rescale(betweenness_len, len(G), normalized=True, directed=True)
    betweenness_lin = _rescale(betweenness_lin, len(G), normalized=True, directed=True)
    for n in G:  # Remove nodes
        del betweenness_edge[n]
        del betweenness_edge_len[n]
        del betweenness_edge_lin[n]
    betweenness_edge = _rescale(betweenness_edge, len(G), normalized=True,
                                directed=True, endpoints=True)
    betweenness_edge_len = _rescale(betweenness_edge_len, len(G), normalized=True,
                                    directed=True, endpoints=True)
    betweenness_edge_lin = _rescale(betweenness_edge_lin, len(G), normalized=True,
                                    directed=True, endpoints=True)
    return {
        "Node": betweenness,
        "Edge": betweenness_edge,
        "Node_len": betweenness_len,
        "Edge_len": betweenness_edge_len,
        "Node_lin": betweenness_lin,
        "Edge_lin": betweenness_edge_lin
    }


cb = calculate_betweenness_with(_single_source_dijkstra_path_basic, "weight")
cb_paths = calculate_betweenness_with(_single_source_given_paths_basic,
                                      node_order, pred, dist)
cb_restr = calculate_betweenness_with(_single_source_given_paths_basic,
                                      node_order, pred_restr, dist_restr)

../../_images/e413e105327220eedb7967bfcaf556820efd7c95d1598ace1f7dfa561097b4f0.png

../../_images/cdbef95830d69b83aa7a68bb63da05994f75a6f3b2da990fc212da890bf5d51e.png

../../_images/5b84362cb97c92ee004356bfdba78b6396605eede3117eb80f7925298ab59382.png

The most obvious difference between the predecessor graphs is that the paths to node 17 go through the sparsified, black nodes. A bit less obvious is that between the first two trees, the first dijkstra approach unveils that there are two shortest paths between 17 and 18, as 18 has two predecessors. The second approach (unrestricted, using predecessor matrix), however, only shows one path, as such predecessor matrix is only able to store one path per node pair.

The comparison between the node betweenness shows the same. The first approaches yield mostly the same results, but 2 and 0 as predecessors of 18 differ. The maximal difference is just \(1.1\%\).

display(pd.DataFrame({
    ("", "C_B"): cb["Node"], ("", "C_B_paths"): cb_paths["Node"],
    ("", "C_B_restr"): cb_restr["Node"],
    ("Len scales", "C_B"): cb["Node_len"],
    ("Len scales", "C_B_paths"): cb_paths["Node_len"],
    ("Len scales", "C_B_restr"): cb_restr["Node_len"],
    ("Lin scales", "C_B"): cb["Node_lin"],
    ("Lin scales", "C_B_paths"): cb_paths["Node_lin"],
    ("Lin scales", "C_B_restr"): cb_restr["Node_lin"]
}
).sort_values(by=[("", "C_B")], ascending=False)
        .style.background_gradient(cmap="Blues", axis=None)
        .format(precision=3)
        .map_index(lambda x: f"color: {G.nodes[x]['partition']}" if x in G
                   .nodes else ""). \
        set_table_attributes('style="font-size: 12px"'),
        )

				Len scales			Lin scales
	C_B	C_B_paths	C_B_restr	C_B	C_B_paths	C_B_restr	C_B	C_B_paths	C_B_restr
9	0.289	0.289	0.053	0.233	0.233	0.040	0.538	0.538	0.052
0	0.287	0.274	0.255	0.209	0.229	0.212	0.327	0.383	0.420
2	0.262	0.268	0.324	0.221	0.222	0.272	0.382	0.387	0.850
6	0.239	0.239	0.239	0.201	0.201	0.200	0.445	0.445	0.551
18	0.229	0.229	0.000	0.217	0.217	0.000	0.352	0.352	0.000
7	0.224	0.216	0.071	0.205	0.197	0.055	0.437	0.414	0.092
8	0.217	0.213	0.068	0.176	0.180	0.037	0.424	0.423	0.046
1	0.153	0.153	0.355	0.115	0.115	0.291	0.258	0.258	0.823
15	0.126	0.126	0.126	0.093	0.093	0.092	0.248	0.248	0.301
10	0.103	0.100	0.105	0.075	0.073	0.077	0.141	0.139	0.200
13	0.055	0.058	0.053	0.043	0.044	0.043	0.047	0.050	0.046
16	0.045	0.045	0.045	0.009	0.009	0.008	0.036	0.036	0.037
5	0.042	0.042	0.232	0.032	0.032	0.202	0.058	0.058	0.675
11	0.032	0.032	0.039	0.008	0.008	0.011	0.023	0.023	0.031
4	0.030	0.034	0.271	0.017	0.020	0.237	0.029	0.032	0.812
3	0.017	0.021	0.284	0.009	0.005	0.254	0.015	0.013	0.882
12	0.008	0.011	0.013	0.004	0.006	0.007	0.005	0.007	0.009
14	0.000	0.050	0.042	0.000	0.021	0.014	0.000	0.047	0.035
17	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
19	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000

When sorting after the linearly scaled edge betweenness for restricted paths, we see mostly sparsified nodes at the top.

Show code cell output

Hide code cell output

					Len scales			Lin scales
		C_B_e	C_B_e_paths	C_B_e_restr	C_B_e	C_B_e_paths	C_B_e_restr	C_B_e	C_B_e_paths	C_B_e_restr
3	4	0.029	0.042	0.163	0.020	0.026	0.144	0.037	0.051	0.632
3	2	0.036	0.026	0.168	0.035	0.026	0.156	0.043	0.026	0.623
4	5	0.026	0.026	0.145	0.025	0.025	0.124	0.042	0.042	0.623
2	1	0.007	0.008	0.150	0.005	0.006	0.135	0.007	0.008	0.602
5	6	0.032	0.032	0.126	0.028	0.028	0.103	0.059	0.059	0.572
4	3	0.024	0.018	0.161	0.012	0.006	0.148	0.027	0.018	0.530
2	3	0.041	0.050	0.171	0.025	0.035	0.154	0.058	0.085	0.509
6	15	0.129	0.129	0.129	0.100	0.100	0.099	0.393	0.393	0.498
0	1	0.017	0.016	0.189	0.009	0.008	0.182	0.017	0.016	0.421
1	2	0.033	0.042	0.158	0.022	0.031	0.146	0.055	0.083	0.421
5	4	0.026	0.026	0.145	0.020	0.020	0.138	0.046	0.046	0.345
15	16	0.089	0.089	0.089	0.057	0.057	0.056	0.264	0.264	0.316
1	10	0.079	0.079	0.095	0.063	0.063	0.074	0.182	0.182	0.306
1	0	0.064	0.055	0.095	0.054	0.044	0.074	0.161	0.132	0.306
6	5	0.032	0.032	0.126	0.027	0.027	0.122	0.051	0.051	0.231
0	14	0.041	0.089	0.082	0.016	0.066	0.055	0.041	0.184	0.221
10	11	0.070	0.068	0.076	0.043	0.043	0.049	0.148	0.146	0.213
13	0	0.076	0.076	0.076	0.074	0.074	0.074	0.146	0.146	0.146
1	7	0.024	0.024	0.055	0.024	0.024	0.045	0.024	0.024	0.129
10	0	0.080	0.079	0.076	0.079	0.078	0.075	0.116	0.113	0.111
7	8	0.203	0.189	0.068	0.190	0.178	0.057	0.499	0.461	0.109
15	6	0.084	0.084	0.084	0.084	0.084	0.084	0.108	0.108	0.108
8	9	0.179	0.179	0.061	0.160	0.160	0.042	0.493	0.493	0.059
14	13	0.007	0.058	0.050	0.007	0.029	0.021	0.007	0.076	0.056
12	13	0.046	0.047	0.050	0.045	0.046	0.047	0.047	0.050	0.053
9	8	0.034	0.034	0.034	0.028	0.028	0.028	0.051	0.051	0.051
6	9	0.126	0.126	0.032	0.122	0.122	0.027	0.231	0.231	0.051
11	12	0.036	0.037	0.045	0.013	0.015	0.018	0.036	0.037	0.047
16	17	0.047	0.047	0.047	0.012	0.012	0.011	0.047	0.047	0.047
9	6	0.126	0.126	0.032	0.106	0.106	0.029	0.402	0.402	0.047
16	15	0.045	0.045	0.045	0.045	0.045	0.045	0.045	0.045	0.045
17	6	0.045	0.045	0.045	0.045	0.045	0.045	0.045	0.045	0.045
7	1	0.032	0.032	0.037	0.029	0.029	0.036	0.047	0.047	0.045
11	10	0.043	0.042	0.042	0.042	0.041	0.041	0.043	0.042	0.042
14	0	0.041	0.039	0.039	0.041	0.039	0.039	0.041	0.039	0.039
8	7	0.063	0.068	0.045	0.056	0.061	0.038	0.059	0.068	0.033
2	18	0.078	0.087	0.032	0.071	0.074	0.018	0.100	0.109	0.032
9	1	0.145	0.145	0.026	0.137	0.137	0.026	0.326	0.326	0.026
2	7	0.184	0.171	0.018	0.172	0.159	0.016	0.447	0.414	0.021
13	12	0.020	0.021	0.016	0.011	0.012	0.011	0.021	0.024	0.018
4	8	0.028	0.037	0.013	0.027	0.035	0.012	0.030	0.041	0.017
0	10	0.028	0.026	0.016	0.021	0.020	0.012	0.037	0.034	0.017
0	18	0.199	0.189	0.016	0.191	0.187	0.014	0.326	0.317	0.016
7	2	0.037	0.042	0.013	0.035	0.040	0.012	0.051	0.063	0.014
18	2	0.204	0.205	0.032	0.201	0.203	0.036	0.366	0.367	0.014
12	11	0.009	0.011	0.011	0.007	0.007	0.007	0.009	0.011	0.011
8	4	0.022	0.013	0.011	0.011	0.010	0.009	0.022	0.013	0.011
5	9	0.032	0.032	0.008	0.032	0.032	0.008	0.032	0.032	0.008
13	14	0.007	0.008	0.008	0.004	0.005	0.005	0.007	0.008	0.008
18	0	0.072	0.071	0.016	0.071	0.070	0.020	0.095	0.094	0.008
9	5	0.032	0.032	0.008	0.011	0.011	0.005	0.032	0.032	0.008
17	16	0.003	0.003	0.003	0.003	0.003	0.003	0.003	0.003	0.003
0	13	0.050	0.000	0.000	0.023	0.000	0.000	0.063	0.000	0.000

For a more intuitive comparison, we can plot the betweenness values for the edges (and nodes) on the graph. The size of the nodes is proportional to the node betweenness and the color of the edges is proportional to the edge betweenness.

../../_images/c62ad9c1c19108e7c768f575703ff6a021b6f5f36d037f8bda0c67581c579b4f.png

../../_images/dd56982a2d9e0611261cae39e778f602ffd85f0b2f622a77a7275d3ad08157e7.png

../../_images/1765c41912eaf393dc715be9473d1da91c1aad1106340037f36bc045b8d49755.png

Simplify algorithm#

For our case of given graphs, the algorithm can be simplified, as we always only have one path between two nodes. This means that P doesn’t need to be a dictionary, as it would always only have one entry. All outputs can be lists. We do not need sigma as it would always be 2. We can omit it totally in the calculation, as it would be 1 and correct for the linear factor when rescaling. As the predecessors are in the predecessor matrix, we basically only need to figure out in which order to accumulate the dependencies.

def _single_source_given_paths_simplified(dist_row):
    """Sort nodes, predecessors and distances by distance.

    Parameters
    ----------
    dist_row : np.array
        Distance row sorted non-decreasingly.

    Returns
    -------
    S : list
        List of node indices in order of distance.

    Notes
    -----
    Does not include endpoints.
    """
    dist_order = argsort(dist_row)
    try:
        # Remove unreachable indices (inf), check from back which is the first
        # reachable node
        while dist_row[dist_order[-1]] == np.inf:
            dist_order = dist_order[:-1]
        # Remove immediately reachable nodes with distance 0, including s itself
        while dist_row[dist_order[0]] == 0:
            dist_order = dist_order[1:]
    except IndexError:
        # If all nodes are unreachable, return empty list
        return []
    return list(dist_order)

The iteration over the nodes then becomes:

def simplified_betweenness(node_order, edge_list, dist, pred):
    """Simplified betweenness centrality calculation."""
    node_indices = list(range(len(node_order)))
    betweenness = dict.fromkeys(node_indices, 0.0)
    betweenness_len = betweenness.copy()  # Length scaled betweenness
    betweenness_lin = betweenness.copy()  # Linear scaled betweenness
    betweenness_edge = betweenness.copy()
    betweenness_edge.update(dict.fromkeys(
        [(node_order.index(u), node_order.index(v)) for u, v in edge_list],
        0.0))
    betweenness_edge_len = betweenness_edge.copy()
    betweenness_edge_lin = betweenness_edge.copy()
    # b[v]=0 for v in G and b[e]=0 for e in G.edges
    # Loop over nodes to collect betweenness using pair-wise dependencies
    for s in node_indices:
        S = _single_source_given_paths_simplified(dist[s])
        # betweenness, _ = _accumulate_basic(betweenness, S.copy(), P, sigma, s)
        # betweenness_edge = _accumulate_edges(betweenness_edge, S.copy(), P, sigma, s)
        delta = dict.fromkeys(node_indices, 0)
        delta_len = delta.copy()
        # S is 1d-ndarray, while not empty
        while S:
            w = S.pop()
            # No while loop over multiple predecessors, only one path per node pair
            v = pred[s, w]  # P[w]
            d = dist[s, w]  # D[w]
            # Calculate dependency contribution
            coeff = 1 + delta[w]
            coeff_len = (1 / d + delta[w])
            # Add edge betweenness contribution
            if (v, w) not in betweenness_edge:
                betweenness_edge[(w, v)] += coeff
                betweenness_edge_len[(w, v)] += coeff_len
                betweenness_edge_lin[(w, v)] += d * coeff_len
            else:
                betweenness_edge[(v, w)] += coeff
                betweenness_edge_len[(v, w)] += coeff_len
                betweenness_edge_lin[(v, w)] += d * coeff_len
            # Add to dependency for further nodes/loops
            delta[v] += coeff
            delta_len[v] += coeff_len
            # Add node betweenness contribution
            if w != s:
                betweenness[w] += delta[w]
                betweenness_len[w] += delta_len[w]
                betweenness_lin[w] += d * delta_len[w]
    # Normalize betweenness values and rename node index keys to ids
    scale = 1 / ((len(node_order) - 1) * (len(node_order) - 2))
    for bc_dict in [betweenness, betweenness_len, betweenness_lin]:  # u_idx -> u_id
        for v in bc_dict.keys():
            bc_dict[v] *= scale
            v = node_order[v]
    for n in node_indices:  # Remove nodes
        del betweenness_edge[n]
        del betweenness_edge_len[n]
        del betweenness_edge_lin[n]
    scale = 1 / (len(node_order) * (len(node_order) - 1))
    for bc_e_dict in [betweenness_edge, betweenness_edge_len,
                      betweenness_edge_lin]:  # (u_idx, v_idx) -> (u_id, v_id)
        for e in bc_e_dict.keys():
            bc_e_dict[e] *= scale
            e = (node_order[e[0]], node_order[e[1]])

    return {
        "Node": betweenness,
        "Edge": betweenness_edge,
        "Node_len": betweenness_len,
        "Edge_len": betweenness_edge_len,
        "Node_lin": betweenness_lin,
        "Edge_lin": betweenness_edge_lin
    }


cb_paths_2 = simplified_betweenness(node_order, G.edges(keys=False), dist, pred)
cb_restr_2 = simplified_betweenness(node_order, G.edges(keys=False), dist_restr,
                                    pred_restr)

Show code cell output

Hide code cell output

Node: False
keys: True
09999999999999999 0.1111111111111111
031578947368421054 0.03508771929824561
010526315789473684 0.011695906432748537
05789473684210526 0.06432748538011696
2736842105263158 0.30409356725146197
049999999999999996 0.05555555555555555
15263157894736842 0.1695906432748538
26842105263157895 0.2982456140350877
021052631578947368 0.023391812865497075
034210526315789476 0.038011695906432746
042105263157894736 0.04678362573099415
2894736842105263 0.3216374269005848
2394736842105263 0.26608187134502925
21578947368421053 0.23976608187134502
2131578947368421 0.23684210526315788
22894736842105262 0.2543859649122807
12631578947368421 0.14035087719298245
04473684210526316 0.049707602339181284
Edge: True
keys: True
Node_len: False
keys: True
07313767820943767 0.0812640868993752
008219771107034904 0.00913307900781656
0061403508771929825 0.00682261208576998
044295461952093856 0.049217179946770946
22946370816687553 0.2549596757409728
021116233284387647 0.023462481427097386
11539523149980213 0.12821692388866904
22181564165124942 0.24646182405694375
005480001421123074 0.006088890467914524
019813615507248455 0.02201512834138717
03220881150244031 0.03578756833604479
2325841273344125 0.2584268081493472
2014165967171499 0.223796218574611
19667861852237278 0.218531798358192
18006572332036724 0.20007302591151915
21724997606004473 0.24138886228893858
09276404798359016 0.10307116442621128
008966952492744458 0.009963280547493842
Edge_len: True
keys: True
Node_lin: False
keys: True
13896758494845704 0.15440842772050783
023359176261386148 0.025954640290429053
007017543859649122 0.007797270955165692
049915064363695624 0.05546118262632846
38272005205036447 0.4252445022781828
0473048193471913 0.05256091038576811
25828897902651365 0.28698775447390407
3871473764597995 0.43016375162199955
01283262944726276 0.014258477163625289
03185852410754838 0.0353983601194982
05752803060282284 0.06392003400313649
5378626268058401 0.5976251408953779
44474809271817645 0.49416454746464056
41385367319495386 0.4598374146610599
42254482067639965 0.4694942451959995
35204737799703467 0.39116375333003855
24752834967722848 0.27503149964136503
0357698896125187 0.039744321791687444
Edge_lin: True
keys: True

Performance comparison#

We compare the performance of the simplified and the original betweenness centrality calculation.

%timeit calculate_betweenness_with(_single_source_dijkstra_path_basic, "weight", show_tree=False)
%timeit calculate_betweenness_with(_single_source_given_paths_basic, node_order, pred, dist, show_tree=False)
%timeit simplified_betweenness(node_order, G.edges(keys=False), dist, pred)

2.34 ms ± 4.54 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

1.22 ms ± 11.8 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

1.09 ms ± 10.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

On the first look, the simplified version is not faster, this is because the example graph is too small. But for larger graphs the implemented function betweenness_centrality() is orders of magnitude faster.

In the Implementation we also use arrays for the betweenness and dependency values \(\delta\), which is more efficient than dictionaries. For acceleration, we also use the numba package to compile the functions to machine code. numba provides the njit() decorator, which can be used to pre-compile functions and easy parallelization.

Real world example#

To compare further we use a real city, not especially large, but large enough for us to see a difference.

Show code cell output

Hide code cell output

2026-03-23 16:12:48,306 |     INFO | tessellation.py:101 | Calculating edge cells for graph with 1256 edges.

2026-03-23 16:12:50,375 |     INFO | tessellation.py:155 | Tessellated 785 edge cells in 0:00:02.

2026-03-23 16:12:50,690 |     INFO | ghsl.py:129 | Using the GHSL raster tiles for the bounding box (10529206.185883993, 3665258.196608499, 10534037.079623077, 3670687.995752596).

2026-03-23 16:12:57,865 |  WARNING | features.py:148 | CPLE_AppDefined in DeprecationWarning: 'Memory' driver is deprecated since GDAL 3.11. Use 'MEM' onwards. Further messages of this type will be suppressed.

Distributing population over road cells:   0%|          | 0/2355 [00:00<?, ?Cells/s]

Distributing population over road cells:  67%|██████▋   | 1570/2355 [00:00<00:00, 7464.16Cells/s]

Distributing population over road cells:  67%|██████▋   | 1570/2355 [00:00<00:00, 7426.73Cells/s]

2026-03-23 16:12:58,904 |     INFO | utils.py:265 | Highway counts (type, count, proportion): 
                             count  proportion
highway                                       
residential                    711    0.572003
tertiary                       184    0.148029
primary                        132    0.106195
secondary                      122    0.098150
unclassified                    45    0.036203
living_street                   18    0.014481
primary_link                    10    0.008045
secondary_link                   9    0.007241
tertiary_link                    8    0.006436
[tertiary, unclassified]         2    0.001609
[unclassified, residential]      2    0.001609

2026-03-23 16:12:58,908 |     INFO | utils.py:299 | Graph stats: 
                                             0
Number of nodes                            539
Number of edges                           1256
Average degree                        4.660482
Circuity average                      1.099485
Street orientation order              0.306443
Date created               2026-03-23 16:12:47
Projection                          EPSG:32650
Area by OSM boundary (m²)      18634010.566506

2026-03-23 16:12:58,909 |     INFO | base.py:185 | Initialized MissionTown_main(ResidentialPartitioner) with 532 nodes and 1243 edges.

2026-03-23 16:12:59,220 |     INFO | checks.py:86 | The partitioning MissionTown_main is valid.

from superblockify.metrics.measures import betweenness_centrality
from superblockify.metrics.distances import calculate_path_distance_matrix
%timeit betweenness_centrality(part.graph, node_list, *calculate_path_distance_matrix(part.graph, weight = "travel_time", node_order = node_list), weight="travel_time")
%timeit nx.betweenness_centrality(part.graph, weight="travel_time")

86.2 ms ± 914 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

1.54 s ± 3.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Now, the speedup is obvious. The simplified version is about 10 times faster than the original networkx implementation, already for a street graph that can be considered small.

The code scales with the number of nodes and number of edges. As in real world cities the edges do not scale with the number of nodes, the runtime is well bearable for simplified graphs of metropolitan cities.

Implementation#

superblockify.metrics.measures.betweenness_centrality(graph, node_order, dist_matrix, predecessors, weight='length', attr_suffix=None, k=None, seed=None, max_range=None)[source]

Calculate several types of betweenness centrality for the nodes and edges.

Uses the predecessors to calculate the betweenness centrality of the nodes and edges. The normalized betweenness centrality is calculated, length-scaled, and linearly scaled betweenness centrality is calculated for the nodes and edges. When passing a k, the summation is only done over k random nodes. [R9ed03ee06e8a-1] [R9ed03ee06e8a-2] [R9ed03ee06e8a-3]

Parameters:

graphnx.MultiDiGraph: The graph to calculate the betweenness centrality for, distances and predecessors must be calculated for this graph
node_orderlist: Indicating the order of the nodes in the distance matrix
dist_matrixnp.ndarray: The distance matrix for the network measures, as returned by superblockify.metrics.distances.calculate_path_distance_matrix()
predecessorsnp.ndarray: Predecessors matrix of the graph, as returned by superblockify.metrics.distances.calculate_path_distance_matrix()
weightstr, optional: The edge attribute to use as weight to decide which multi-edge to attribute the betweenness centrality to, by default “length”. If None, the first edge of the multi-edge is used.
attr_suffixstr, optional: The suffix to append to the attribute names, by default None
kint, optional: The number of nodes to calculate the betweenness centrality for, by default None
seedint, random_state, or None (default): Indicator of random number generation state. See Randomness for additional details.
max_rangefloat, optional: The maximum path length to consider, by default None, which means no maximum path length. It is measured in unit of the weight attribute.

Raises:

ValueError: If weight is not None, and the graph does not have the weight attribute on all edges.

Notes

Works in-place on the graph.

It Does not include endpoints.

Modified from networkx.algorithms.centrality.betweenness.

The weight attribute is not used to determine the shortest paths, these are taken from the predecessor matrix. It is only used for parallel edges to decide which edge to attribute the betweenness centrality to.

If there are \(<=\) 2 nodes, node betweenness is 0 for all nodes. If there are \(<=\) 1 edges, edge betweenness is 0 for all edges.

References

[1]

Linton C. Freeman: A Set of Measures of Centrality Based on Betweenness. Sociometry, Vol. 40, No. 1 (Mar., 1977), pp. 35-41 https://doi.org/10.2307/3033543

[2]

Brandes, U. (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2), 163–177. https://doi.org/10.1080/0022250X.2001.9990249

[3]

Brandes, U. (2008). On variants of shortest-path betweenness centrality and their generic computation. Social Networks, 30(2), 136–145. https://doi.org/10.1016/j.socnet.2007.11.001

superblockify.metrics.measures.__accumulate_bc(s_idx, pred_row, dist_row, edges_uv, edge_padding, max_range)[source]

Calculate the betweenness centrality for a single source node.

Parameters:

s_idxint: Index of the source node.
pred_rownp.ndarray: Predecessors row of the graph.
dist_rownp.ndarray: Distance row of the graph.
edges_uvnp.ndarray, 1D: Array of concatenated edge indices, sorted in ascending order.
edge_paddingint: Number of digits to pad the edge indices with, max_len of the nodes.
max_rangefloat: Maximum range to calculate the betweenness centrality for.

Returns:

node_bcnp.ndarray: Array of node and edge betweenness centralities.

Betweenness Centrality

Contents

Betweenness Centrality#

Modified implementation#

Simplify algorithm#

Performance comparison#

Real world example#

Implementation#