superblockify.population package#
Submodules#
superblockify.population.approximation module#
Population approximation for the superblockify package.
See reference notebook for a detailed description of the population approximation.
- superblockify.population.approximation.add_edge_population(graph, overwrite=False, **tess_kwargs)[source]#
Add edge population to edge attributes in the graph.
Calculates the population and area of the edges. First tessellates the edges and then determines the population with GHSL data. Function writes to edge attributes population and area of the graph in-place. Furthermore, cell_id is added to the edge attributes, for easier summary of statistics later. The graph attribute edge_population is set to True. With this information, population densities can be calculated for arbitrary subsets of edges.
- Parameters:
- graphnetworkx.MultiDiGraph
The graph to tessellate.
- overwritebool, optional
If True, overwrite existing population and area attributes. Only depends on the graph attribute edge_population and not on the actual attributes.
- **tess_kwargs
Keyword arguments for the
superblockify.population.tessellation.get_edge_cells()
function.
- Raises:
- ValueError
If the graph already has population and area attributes and overwrite is False.
- ValueError
If the graph is not in a projected coordinate system.
- ValueError
If the limit and the edge points are disjoint.
Notes
The graph must be in a projected coordinate system.
- superblockify.population.approximation.get_edge_population(graph, batch_size=10000, **tess_kwargs)[source]#
Get edge population for the graph.
Calculates the population and area of the edge. First tessellates the edges and then determines the population with GHSL data. The population distribution process is parallelized with multiprocessing in batches of edges.
- Parameters:
- graphnetworkx.MultiDiGraph
The graph to tessellate.
- batch_sizeint, optional
Number of edges to process in one batch. By default, 10000. It must be greater than 0. If it is greater than the number of edges, all edges are processed in one batch.
- **tess_kwargs
Keyword arguments for the
superblockify.population.tessellation.get_edge_cells()
function.
- Returns:
- geopandas.GeoDataFrame
A GeoDataFrame with the tuple of edge keys as index and the population and area of the edge as columns, as well as the tessellation cells as geometry. The CRS will be in World Mollweide.
- Raises:
- ValueError
If the batch size is not greater than 0.
- ValueError
If the graph is not in a projected coordinate system.
- ValueError
If the limit and the edge points are disjoint.
Notes
The graph must be in a projected coordinate system. Output CRS is World Mollweide. It uses the STRtree index to speed up the intersection. [1]
References
[1]Leutenegger, Scott T.; Edgington, Jeffrey M.; Lopez, Mario A. (February 1997). “STR: A Simple and Efficient Algorithm for R-Tree Packing”. https://ia600900.us.archive.org/27/items/nasa_techdoc_19970016975/19970016975.pdf
- superblockify.population.approximation.get_population_area(graph)[source]#
Calculate the population of a graph or subgraph.
Calculates the population and area of the graph.
- Parameters:
- graphnetworkx.MultiDiGraph
Graph or subgraph. Must have edge attributes population, area and cell_id.
- Returns:
- populationfloat
Population of the subgraph.
- areafloat
Area of the subgraph.
- Raises:
- ValueError
If the graph does not have the population attributes.
- superblockify.population.approximation.load_ghsl_as_polygons(file, window=None)[source]#
Get polygonized GHSL data.
Polygonizes the GHSL population raster data and returns the population in a GeoDataFrame. Area with no population is not included.
- Parameters:
- filestr
Path to the raster file. It Can be a tile or the whole raster.
- windowrasterio.windows.Window, optional
Window of the raster to resample. If None, the whole raster will be loaded.
- Returns:
- geopandas.GeoDataFrame
A GeoDataFrame derived from the GHSL population raster data. Includes geometry and population columns.
Notes
When not passing a window, the whole raster will be loaded. Make sure the raster is not too big.
- superblockify.population.approximation.population_fraction(ghsl_polygon, population, road_cell)[source]#
Function returns fractional population count between road_cell and ghsl_polygon.
- Parameters:
- ghsl_polygonshapely.geometry.Polygon
Polygon of GHSL cell.
- populationfloat
Population of GHSL cell.
- road_cellshapely.geometry.Polygon
Polygon of road cell.
- Returns:
- float
Fractional population count between road_cell and ghsl_polygon.
superblockify.population.ghsl module#
GHSL IO functions for the population submodule of the superblockify package.
- superblockify.population.ghsl.download_ghsl(urls, save_dir='/home/runner/work/superblockify/superblockify/data/ghsl', timeout=60)[source]#
Download the GHSL population raster tile.
Check if the raster tiles are already downloaded, and if not, download and unpack them. Create the save directory if it does not exist.
- Parameters:
- urlsstr or list
URL(s) of the raster tile(s).
- save_dirstr, optional
Directory to save to and look for the raster tile(s), by default GHSL_DIR.
- timeoutint or float, optional
Timeout in seconds for the download, by default
config.DOWNLOAD_TIMEOUT
.
- Returns:
- str or list
Path(s) to the downloaded raster tile(s).
- Raises:
- ValueError
A given URL does not exist.
- ValueError
A given URL does not return a zip file.
Notes
The GHSL raster tiles are between 1.4M and 99M in size. The sum of all tiles is 4.9G.
- superblockify.population.ghsl.get_ghsl(bbox_moll=None)[source]#
Get the GHSL population raster path(s) for the given bounding box.
- There are two working modes:
config.FULL_RASTER is set. This path, to the whole GHSL raster, is returned.
Otherwise: With
bbox_moll
given, the needed raster tile(s) are determined, and their paths are returned. If they are not yet inconfig.GHSL_DIR
, they are downloaded from the JRC FTP server.
- Parameters:
- bbox_molllist, optional
Boundary of the place in Mollweide projection. [minx, miny, maxx, maxy] Needs to be given if the full raster is not available at
config.FULL_RASTER
.
- Returns:
- str or list
Path(s) to the GHSL raster tile(s).
- Raises:
- ValueError
If
bbox_moll
is not given andconfig.FULL_RASTER
is not set.- ValueError
If
config.FULL_RASTER
is invalid.- ValueError
If the bounding box has invalid coordinates.
- superblockify.population.ghsl.get_ghsl_urls(bbox_moll)[source]#
Get the URLs of the GHSL population raster tiles that contain the boundary.
- Parameters:
- bbox_molllist
Boundary of the place in Mollweide projection. [minx, miny, maxx, maxy]
- Returns:
- list
URLs of the GHSL population raster tiles that contain the boundary.
- Raises:
- ValueError
If the bounding box spans more than two tiles in each dimension.
- ValueError
If the bounding box spans empty tile.
Notes
Bounding boxes spanning areas larger than two tiles in each dimension are not supported. Use the whole raster instead.
- superblockify.population.ghsl.resample_load_window(file, resample_factor=1, window=None, res_strategy=None)[source]#
Load and resample a window of a raster file.
- Parameters:
- filestr
Path to the raster file. It Can be a tile or the whole raster.
- resample_factorfloat, optional
Factor to resample the window by. Values > 1 increase the resolution of the raster, values < 1 decrease the resolution of the raster by that factor in each dimension.
- windowrasterio.windows.Window, geopandas.GeoDataFrame, optional
Window of the raster to resample, by default None. When given a GeoDataFrame, the window is the bounding box of the GeoDataFrame.
- res_strategyrasterio.enums.Resampling, optional
Resampling strategy, by default Resampling.nearest if resample_factor > 1 (up-sampling), Resampling.average if resample_factor < 1 (down-sampling).
- Returns:
- raster_rescalednumpy.ndarray
Resampled raster.
- res_affinerasterio.Affine
Affine transformation of the resampled raster.
- superblockify.population.ghsl.row_col(y_moll, x_moll)[source]#
Resolves the row and column of the GHS-POP raster tile that contains the given point.
- Parameters:
- y_mollfloat
y-coordinate of the point in Mollweide projection.
- x_mollfloat
x-coordinate of the point in Mollweide projection.
- Returns:
- col, rowint, int
Column and row of the tile.
Notes
The GHS-POP raster tiles are each 100km x 100km on the Mollewide projection. Latitude has its origin at the equator, but latitude has an offset of -41km. This function was reversely engineered from the GHS-POP raster tile names found on the JRC FTP server (see dataset overview).
superblockify.population.tessellation module#
Graph Tessellation for the population submodule of the superblockify package.
- superblockify.population.tessellation.add_edge_cells(graph, **tess_kwargs)[source]#
Add edge tessellation cells to edge attributes in the graph.
Tessellates the graph into plane using a Voronoi cell approach. Function writes to edge attribute cells of the graph in-place. Furthermore, cell_id is added to the edge attributes, for easier summary of statistics later.
The approach was developed inspired by the
momepy.Tessellation
class and tessellates withscipy.spatial.Voronoi
.- Parameters:
- graphnetworkx.MultiDiGraph
The graph to tessellate.
- **tess_kwargs
Keyword arguments for the
superblockify.population.tessellation.get_edge_cells()
function.
- Raises:
- ValueError
If the graph is not in a projected coordinate system.
- ValueError
If the limit and the edge points are disjoint.
Notes
The graph must be in a projected coordinate system.
- superblockify.population.tessellation.edges_to_points(edges, segment=25)[source]#
Convert edges to points.
- Parameters:
- edgesgeopandas.GeoDataFrame
The edges to convert to points.
- segmentfloat
The maximum distance for the point interpolation. Default is 25.
- Returns:
- pointsgeopandas.GeoDataFrame
The points.
- indiceslist of int
The indices of the points in the edges.
Notes
The points are interpolated along the edges with a maximum distance of segment.
- superblockify.population.tessellation.get_edge_cells(graph, limit=None, segment=25, show_plot=False)[source]#
Get edge tessellation cells for the graph.
Tessellates the graph into plane using a Voronoi cell approach.
The approach was developed inspired by the
momepy.Tessellation
class and tessellates withscipy.spatial.Voronoi
.- Parameters:
- graphnetworkx.MultiDiGraph
The graph to tessellate.
- limitshapely.geometry.Polygon or None
The limit of the tessellation. Must be in the same CRS as the graph. If None, it will be calculated as the exterior of the 100m buffered unary union of the graph’s edges.
- segmentfloat
The maximum distance for the point interpolation. Default is 25.
- show_plotbool
If True, a plot of the tessellation will be shown. Default is False.
- Returns:
- geopandas.GeoDataFrame
A GeoDataFrame with the tuple of edge keys as index and the tessellation cells as geometry.
- Raises:
- ValueError
If the graph is not in a projected coordinate system.
- ValueError
If the limit and the edge points are disjoint.
Notes
The graph must be in a projected coordinate system.
- superblockify.population.tessellation.get_edge_polygons(graph)[source]#
Prepare edge polygons for tessellation.
This returns a GeoDataFrame where edges with same start and end node are merged, if their geometry is equal or a reversed version of the other.
- Parameters:
- graphnetworkx.MultiDiGraph
The graph to tessellate.
- Returns:
- edgesgeopandas.GeoDataFrame
The edges with their polygons.
- superblockify.population.tessellation.reconstruct_edge_cells(voronoi_diagram, indices, crs)[source]#
Reconstruct edge cells from a Voronoi diagram.
Regions with the hull index -1 are discarded.
- Parameters:
- voronoi_diagramscipy.spatial.Voronoi
The Voronoi diagram to reconstruct.
- indiceslist
The indices of the points in the Voronoi diagram.
- crsvalue
The CRS used for the GeoDataFrame. Must be the same as the graph. Anything compatible with
pyproj.CRS.from_user_input()
.
- Returns:
- cellsgeopandas.GeoDataFrame
The Voronoi cells by their indices.
Module contents#
Population init, subpackage for the GHSL Population data