superblockify.population package#

Submodules#

superblockify.population.approximation module#

Population approximation for the superblockify package.

See reference notebook for a detailed description of the population approximation.

superblockify.population.approximation.add_edge_population(graph, overwrite=False, **tess_kwargs)[source]#

Add edge population to edge attributes in the graph.

Calculates the population and area of the edges. First tessellates the edges and then determines the population with GHSL data. Function writes to edge attributes population and area of the graph in-place. Furthermore, cell_id is added to the edge attributes, for easier summary of statistics later. The graph attribute edge_population is set to True. With this information, population densities can be calculated for arbitrary subsets of edges.

Parameters:
graphnetworkx.MultiDiGraph

The graph to tessellate.

overwritebool, optional

If True, overwrite existing population and area attributes. Only depends on the graph attribute edge_population and not on the actual attributes.

**tess_kwargs

Keyword arguments for the superblockify.population.tessellation.get_edge_cells() function.

Raises:
ValueError

If the graph already has population and area attributes and overwrite is False.

ValueError

If the graph is not in a projected coordinate system.

ValueError

If the limit and the edge points are disjoint.

Notes

The graph must be in a projected coordinate system.

superblockify.population.approximation.get_edge_population(graph, batch_size=10000, **tess_kwargs)[source]#

Get edge population for the graph.

Calculates the population and area of the edge. First tessellates the edges and then determines the population with GHSL data. The population distribution process is parallelized with multiprocessing in batches of edges.

Parameters:
graphnetworkx.MultiDiGraph

The graph to tessellate.

batch_sizeint, optional

Number of edges to process in one batch. By default, 10000. It must be greater than 0. If it is greater than the number of edges, all edges are processed in one batch.

**tess_kwargs

Keyword arguments for the superblockify.population.tessellation.get_edge_cells() function.

Returns:
geopandas.GeoDataFrame

A GeoDataFrame with the tuple of edge keys as index and the population and area of the edge as columns, as well as the tessellation cells as geometry. The CRS will be in World Mollweide.

Raises:
ValueError

If the batch size is not greater than 0.

ValueError

If the graph is not in a projected coordinate system.

ValueError

If the limit and the edge points are disjoint.

Notes

The graph must be in a projected coordinate system. Output CRS is World Mollweide. It uses the STRtree index to speed up the intersection. [1]

References

[1]

Leutenegger, Scott T.; Edgington, Jeffrey M.; Lopez, Mario A. (February 1997). “STR: A Simple and Efficient Algorithm for R-Tree Packing”. https://ia600900.us.archive.org/27/items/nasa_techdoc_19970016975/19970016975.pdf

superblockify.population.approximation.get_population_area(graph)[source]#

Calculate the population of a graph or subgraph.

Calculates the population and area of the graph.

Parameters:
graphnetworkx.MultiDiGraph

Graph or subgraph. Must have edge attributes population, area and cell_id.

Returns:
populationfloat

Population of the subgraph.

areafloat

Area of the subgraph.

Raises:
ValueError

If the graph does not have the population attributes.

superblockify.population.approximation.load_ghsl_as_polygons(file, window=None)[source]#

Get polygonized GHSL data.

Polygonizes the GHSL population raster data and returns the population in a GeoDataFrame. Area with no population is not included.

Parameters:
filestr

Path to the raster file. It Can be a tile or the whole raster.

windowrasterio.windows.Window, optional

Window of the raster to resample. If None, the whole raster will be loaded.

Returns:
geopandas.GeoDataFrame

A GeoDataFrame derived from the GHSL population raster data. Includes geometry and population columns.

Notes

When not passing a window, the whole raster will be loaded. Make sure the raster is not too big.

superblockify.population.approximation.population_fraction(ghsl_polygon, population, road_cell)[source]#

Function returns fractional population count between road_cell and ghsl_polygon.

Parameters:
ghsl_polygonshapely.geometry.Polygon

Polygon of GHSL cell.

populationfloat

Population of GHSL cell.

road_cellshapely.geometry.Polygon

Polygon of road cell.

Returns:
float

Fractional population count between road_cell and ghsl_polygon.

superblockify.population.ghsl module#

GHSL IO functions for the population submodule of the superblockify package.

superblockify.population.ghsl.download_ghsl(urls, save_dir='/home/runner/work/superblockify/superblockify/data/ghsl', timeout=60)[source]#

Download the GHSL population raster tile.

Check if the raster tiles are already downloaded, and if not, download and unpack them. Create the save directory if it does not exist.

Parameters:
urlsstr or list

URL(s) of the raster tile(s).

save_dirstr, optional

Directory to save to and look for the raster tile(s), by default GHSL_DIR.

timeoutint or float, optional

Timeout in seconds for the download, by default config.DOWNLOAD_TIMEOUT.

Returns:
str or list

Path(s) to the downloaded raster tile(s).

Raises:
ValueError

A given URL does not exist.

ValueError

A given URL does not return a zip file.

Notes

The GHSL raster tiles are between 1.4M and 99M in size. The sum of all tiles is 4.9G.

superblockify.population.ghsl.get_ghsl(bbox_moll=None)[source]#

Get the GHSL population raster path(s) for the given bounding box.

There are two working modes:
  1. config.FULL_RASTER is set. This path, to the whole GHSL raster, is returned.

  2. Otherwise: With bbox_moll given, the needed raster tile(s) are determined, and their paths are returned. If they are not yet in config.GHSL_DIR, they are downloaded from the JRC FTP server.

Parameters:
bbox_molllist, optional

Boundary of the place in Mollweide projection. [minx, miny, maxx, maxy] Needs to be given if the full raster is not available at config.FULL_RASTER.

Returns:
str or list

Path(s) to the GHSL raster tile(s).

Raises:
ValueError

If bbox_moll is not given and config.FULL_RASTER is not set.

ValueError

If config.FULL_RASTER is invalid.

ValueError

If the bounding box has invalid coordinates.

superblockify.population.ghsl.get_ghsl_urls(bbox_moll)[source]#

Get the URLs of the GHSL population raster tiles that contain the boundary.

Parameters:
bbox_molllist

Boundary of the place in Mollweide projection. [minx, miny, maxx, maxy]

Returns:
list

URLs of the GHSL population raster tiles that contain the boundary.

Raises:
ValueError

If the bounding box spans more than two tiles in each dimension.

ValueError

If the bounding box spans empty tile.

Notes

Bounding boxes spanning areas larger than two tiles in each dimension are not supported. Use the whole raster instead.

superblockify.population.ghsl.resample_load_window(file, resample_factor=1, window=None, res_strategy=None)[source]#

Load and resample a window of a raster file.

Parameters:
filestr

Path to the raster file. It Can be a tile or the whole raster.

resample_factorfloat, optional

Factor to resample the window by. Values > 1 increase the resolution of the raster, values < 1 decrease the resolution of the raster by that factor in each dimension.

windowrasterio.windows.Window, geopandas.GeoDataFrame, optional

Window of the raster to resample, by default None. When given a GeoDataFrame, the window is the bounding box of the GeoDataFrame.

res_strategyrasterio.enums.Resampling, optional

Resampling strategy, by default Resampling.nearest if resample_factor > 1 (up-sampling), Resampling.average if resample_factor < 1 (down-sampling).

Returns:
raster_rescalednumpy.ndarray

Resampled raster.

res_affinerasterio.Affine

Affine transformation of the resampled raster.

superblockify.population.ghsl.row_col(y_moll, x_moll)[source]#

Resolves the row and column of the GHS-POP raster tile that contains the given point.

Parameters:
y_mollfloat

y-coordinate of the point in Mollweide projection.

x_mollfloat

x-coordinate of the point in Mollweide projection.

Returns:
col, rowint, int

Column and row of the tile.

Notes

The GHS-POP raster tiles are each 100km x 100km on the Mollewide projection. Latitude has its origin at the equator, but latitude has an offset of -41km. This function was reversely engineered from the GHS-POP raster tile names found on the JRC FTP server (see dataset overview).

superblockify.population.tessellation module#

Graph Tessellation for the population submodule of the superblockify package.

superblockify.population.tessellation.add_edge_cells(graph, **tess_kwargs)[source]#

Add edge tessellation cells to edge attributes in the graph.

Tessellates the graph into plane using a Voronoi cell approach. Function writes to edge attribute cells of the graph in-place. Furthermore, cell_id is added to the edge attributes, for easier summary of statistics later.

The approach was developed inspired by the momepy.Tessellation class and tessellates with scipy.spatial.Voronoi.

Parameters:
graphnetworkx.MultiDiGraph

The graph to tessellate.

**tess_kwargs

Keyword arguments for the superblockify.population.tessellation.get_edge_cells() function.

Raises:
ValueError

If the graph is not in a projected coordinate system.

ValueError

If the limit and the edge points are disjoint.

Notes

The graph must be in a projected coordinate system.

superblockify.population.tessellation.edges_to_points(edges, segment=25)[source]#

Convert edges to points.

Parameters:
edgesgeopandas.GeoDataFrame

The edges to convert to points.

segmentfloat

The maximum distance for the point interpolation. Default is 25.

Returns:
pointsgeopandas.GeoDataFrame

The points.

indiceslist of int

The indices of the points in the edges.

Notes

The points are interpolated along the edges with a maximum distance of segment.

superblockify.population.tessellation.get_edge_cells(graph, limit=None, segment=25, show_plot=False)[source]#

Get edge tessellation cells for the graph.

Tessellates the graph into plane using a Voronoi cell approach.

The approach was developed inspired by the momepy.Tessellation class and tessellates with scipy.spatial.Voronoi.

Parameters:
graphnetworkx.MultiDiGraph

The graph to tessellate.

limitshapely.geometry.Polygon or None

The limit of the tessellation. Must be in the same CRS as the graph. If None, it will be calculated as the exterior of the 100m buffered unary union of the graph’s edges.

segmentfloat

The maximum distance for the point interpolation. Default is 25.

show_plotbool

If True, a plot of the tessellation will be shown. Default is False.

Returns:
geopandas.GeoDataFrame

A GeoDataFrame with the tuple of edge keys as index and the tessellation cells as geometry.

Raises:
ValueError

If the graph is not in a projected coordinate system.

ValueError

If the limit and the edge points are disjoint.

Notes

The graph must be in a projected coordinate system.

superblockify.population.tessellation.get_edge_polygons(graph)[source]#

Prepare edge polygons for tessellation.

This returns a GeoDataFrame where edges with same start and end node are merged, if their geometry is equal or a reversed version of the other.

Parameters:
graphnetworkx.MultiDiGraph

The graph to tessellate.

Returns:
edgesgeopandas.GeoDataFrame

The edges with their polygons.

superblockify.population.tessellation.reconstruct_edge_cells(voronoi_diagram, indices, crs)[source]#

Reconstruct edge cells from a Voronoi diagram.

Regions with the hull index -1 are discarded.

Parameters:
voronoi_diagramscipy.spatial.Voronoi

The Voronoi diagram to reconstruct.

indiceslist

The indices of the points in the Voronoi diagram.

crsvalue

The CRS used for the GeoDataFrame. Must be the same as the graph. Anything compatible with pyproj.CRS.from_user_input().

Returns:
cellsgeopandas.GeoDataFrame

The Voronoi cells by their indices.

Module contents#

Population init, subpackage for the GHSL Population data