As you already learned in this cookbook, raw spatial data, such as irregular street segments and scattered intersection points, cannot be fed directly into most standard Machine Learning algorithms. To make this data meaningful for ML models, we need to transform it into a structured format. In this exercise we will do this by dividing the area of interest into regular cells using segmentation and computing aggregate statistics for each of those cells. This creates a standardized matrix of features, made up of our grid cells, that ML models can easily process.
So our next processing step will be to create two different grids for the area inside our bounding box:
A hexagonal grid with a side length of 250m
A square grid with a side length of 500m
What is Segmentation and why do we need it?¶
Segmentation refers to dividing a large geographic area into smaller, regularly shaped cells, typically squares or hexagons.
We need segmentation because raw spatial data doesn’t work well with most Machine Learning models. In order to be useful for the ML model, we need to pre-process the data in a way it can easily understand. By placing a grid over the area, we can group (aggregate) the data inside each cell. For example, we can count the number of intersections within each cell. This turns messy spatial data into a clean table where each cell is one row and the calculated values are features.
Square grids are simple and easy to work with, while hexagonal grids often give a more balanced view of neighboring areas. Therefore, the choice of grid type can influence the results which should be taken into consideration when choosing a certain grid.
Another important factor is the grid size. Using different grid sizes (like our 250m hexagons and 500m squares) helps capture patterns at different levels, and while finer grids will catch more details, coarser grids will be better for displaying general trends in the data.
Imports and Configuration¶
import sys
from pathlib import Path
import branca.colormap as cm
import folium
import geopandas as gpd
import pandas as pd
import requests
from shapely import wkb
from shapely.geometry import box
# ruff: noqa: D205
# Importing custom functions
sys.path.append(str(Path("..").resolve()))
from src.geoai.cookbook_functions import (
create_hexagonal_grid,
create_polygon_grid,
get_base_map,
)
# Configuration
CITY_NAME = "Vienna"
BBOX = "16.335005,48.187854,16.400923,48.209995"
DATA_DIR = "./osm_data"
OUTPUT_DIR = "./output"
# Computing the center of the bounding box
min_lon, min_lat, max_lon, max_lat = map(float, BBOX.split(","))
bounds = [[min_lat, min_lon], [max_lat, max_lon]]
center_lat = (min_lat + max_lat) / 2
center_lon = (min_lon + max_lon) / 2Now we can read in the data from our CSV file containing the results from the last notebook about feature engineering.
# Load the CSV file
csv_url = "https://gitlab.tuwien.ac.at/api/v4/projects/13972/repository/files/intersections.csv/raw?ref=main&lfs=true"
PROJECT_ROOT = Path.cwd()
csv_path = PROJECT_ROOT / "output" / "Vienna" / "intersections.csv"
# download the file in case it does not extist
# ensure directory exists
csv_path.parent.mkdir(parents=True, exist_ok=True)
# download only if file doesn't exist
if not csv_path.exists():
response = requests.get(csv_url, timeout=30)
response.raise_for_status()
csv_path.write_bytes(response.content)
else:
pass
df_int = pd.read_csv(csv_path)
# Convert the 'geom' hex string back into Shapely Point objects.
df_int["geometry"] = df_int["geom"].apply(lambda x: wkb.loads(bytes.fromhex(x)))
# Create the GeoDataFrame using the new 'geometry' column
intersections_gdf = gpd.GeoDataFrame(df_int, geometry="geometry", crs="EPSG:4326")Grids¶
In the next cell, we set up our two different grid. As already mentioned, those are:
A hexagonal grid with a side length of 250m
A square grid with a side length of 500m
# Determine Map Bounds from the intersections (with 100m buffer)
intersection_meters = intersections_gdf.to_crs("EPSG:3857")
minx, miny, maxx, maxy = intersection_meters.total_bounds
total_area = box(minx, miny, maxx, maxy)
# Hex Grid (250m)
hex_grid = create_hexagonal_grid(total_area, side_length=250)
hex_grid.set_crs("EPSG:3857", inplace=True)
hex_grid_geo = hex_grid.to_crs("EPSG:4326")
# Rect Grid (500m)
minx, miny, maxx, maxy = total_area.bounds
rect_grid = create_polygon_grid(
width=maxx - minx,
height=maxy - miny,
cell_size=(500, 500),
origin=(minx, miny),
crs="EPSG:3857",
)
rect_grid_geo = rect_grid.to_crs("EPSG:4326")Now we can plot the two grids we just created on the map to have a look at them and see how they divide the area.
# Creating the hexagonal grid
m_hex_grid = get_base_map(center_lat, center_lon)
folium.GeoJson(hex_grid_geo).add_to(m_hex_grid)
m_hex_grid# Creating the rectangular grid
m_rect_grid = get_base_map(center_lat, center_lon)
folium.GeoJson(rect_grid_geo).add_to(m_rect_grid)
m_rect_gridAggregation¶
Now that we have created our grids and calculated the metrics for the intersections, the next step is to bring them together.
We need to figure out which intersections fall within which grid cell. This process is called a “Spatial Join”. Once we know which intersections belong to which cell, we can summarize the data for that specific area.
For every cell in our grid, we will calculate:
Density: The total number of intersections inside the cell.
Average Irregularity: The average “skewness” of the intersections in that cell.
Since we want to do this exact same calculation for both the Hexagonal and the Rectangular grids, we will first set up a helper function. This allows us to apply the logic to both grids without having to write the code twice.
def aggregate_intersection_stats(
grid_gdf: gpd.GeoDataFrame, intersections_gdf: gpd.GeoDataFrame
) -> gpd.GeoDataFrame:
"""Link intersections to grid cells using a Spatial Join.
Returns a GeoDataFrame with these statistics merged into the grid geometry.
"""
joined = gpd.sjoin(grid_gdf, intersections_gdf, how="left", predicate="contains")
stats = (
joined.groupby(joined.index)
.agg({"osm_id": "count", "delta": "mean"})
.rename(columns={"osm_id": "total_count", "delta": "avg_delta"})
)
if "i_tpe" in joined.columns:
t_counts = joined.pivot_table(
index=joined.index, columns="i_tpe", aggfunc="size", fill_value=0
)
t_counts.columns = [f"count_{c}" for c in t_counts.columns]
stats = stats.join(t_counts, how="left")
grid_out = grid_gdf.merge(stats, left_index=True, right_index=True)
cnt_cols = ["total_count"] + [c for c in grid_out.columns if "count_" in c]
grid_out[cnt_cols] = grid_out[cnt_cols].fillna(0)
return grid_out
hex_data = aggregate_intersection_stats(hex_grid_geo, intersections_gdf)
rect_data = aggregate_intersection_stats(rect_grid_geo, intersections_gdf)Visualisation¶
Now that we have processed the data and calculated the statistics, we can finally visualise the results on our map.
We will take the aggregated values and plot them onto the hexagonal and rectangular grids. To do this efficiently, we will define a helper function. This allows us to easily add different layers to the map we created at the start without having to rewrite the plotting code every time.
def add_grid(
data: pd.DataFrame,
col: str,
name: str,
cmap_colors: list[str],
m: folium.Map,
) -> None:
"""Add a Choropleth layer to the map."""
actual_colors = cmap_colors
show = True
# Create Legend
vmin = data[col].min()
vmax = data[col].max()
# Handle completely empty data
if pd.isna(vmin) or pd.isna(vmax):
vmin, vmax = 0, 1
colormap = cm.LinearColormap(colors=actual_colors, vmin=vmin, vmax=vmax)
colormap.caption = name
m.add_child(colormap)
# Create Layer
geojson_layer = folium.GeoJson(
data,
style_function=lambda feature: {
"fillColor": colormap(feature["properties"][col])
if feature["properties"][col] is not None
else "transparent",
"color": "gray",
"weight": 0.5,
"fillOpacity": 0.7 if feature["properties"][col] is not None else 0,
},
tooltip=folium.GeoJsonTooltip(
fields=["total_count", "avg_delta"], aliases=["Count:", "Avg Delta:"]
),
)
fg = folium.FeatureGroup(name=name, show=show)
geojson_layer.add_to(fg)
fg.add_to(m)Now, we create a feature group for our intersection points so we can add them to our visualisations.
# Create a feature group for all intersection points
fg_pts = folium.FeatureGroup(name="Points", show=True)
for _, row in intersections_gdf.iterrows():
# Color logic
color = "blue"
if row.get("i_tpe") == "Car":
color = "crimson"
elif row.get("i_tpe") == "Path":
color = "green"
# Popup
popup_html = f"""
<div style="font-family: sans-serif; font-size: 12px;">
<b>ID:</b> {row.get("osm_id")}<br>
<b>Type:</b> {row.get("i_tpe")}<br>
<b>Arms:</b> {row.get("num_ways")}<br>
<b>Delta:</b> {row.get("delta"):.2f}<br>
</div>
"""
folium.CircleMarker(
[row.geometry.y, row.geometry.x],
radius=1,
color=color,
fill=True,
popup=folium.Popup(popup_html, max_width=200),
).add_to(fg_pts)Visualising Intersection Density (Hex Grid)¶
Now we will create the visualisation for the intersection density on our hexagonal grid. This layer will act as a heatmap, showing us “hotspots” where there are many intersections in a small area. We add it to both our final map that we will save in the end as well as a temporary map so we can display it now.
# Creating a new map and adding the layers
m_hex_density = get_base_map(center_lat, center_lon)
add_grid(
data=hex_data,
col="total_count",
name="Hex Density",
cmap_colors=["#ffffb2", "#bd0026"],
m=m_hex_density,
)
fg_pts.add_to(m_hex_density)
m_hex_densityVisualising Intersection Density (Rectangular Grid)¶
We will now do the exact same thing for the rectangular grid.
This allows us to compare the results and see how using squares instead of hexagons changes the visualisation of the “hotspots.” As before, we add this layer to our final map to save it for later, and we also display it on a temporary map to inspect it immediately.
# Creating a new map and adding the layers
m_rect_density = get_base_map(center_lat, center_lon)
add_grid(
data=rect_data,
col="total_count",
name="Rect Density",
cmap_colors=["#ffffb2", "#bd0026"],
m=m_rect_density,
)
fg_pts.add_to(m_rect_density)
m_rect_densityVisualising Intersection Irregularity (Hex Grid)¶
Next, we visualise the average irregularity (or ‘delta’) for the hexagonal grid.
Unlike the previous step, which showed us how many intersections there are in a cell, this layer shows us how those intersections look. We are looking at the geometry here: areas with higher values contain intersections with more irregular, skewed angles, while lower values suggest a cleaner, more standard grid layout.
We again add this layer to our final map and display it on a temporary map below.
# Creating a new map and adding the layers
m_hex_delta = get_base_map(center_lat, center_lon)
add_grid(
data=hex_data,
col="avg_delta",
name="Hex Delta",
cmap_colors=["#f2f0f7", "#54278f"],
m=m_hex_delta,
)
fg_pts.add_to(m_hex_delta)
m_hex_deltaVisualising Intersection Irregularity (Rectangular Grid)¶
Finally, we visualise the average irregularity for the rectangular grid.
This completes our set of visualisations. By mapping the irregularity on the square grid, we can see if the patterns of complex or skewed intersections look different compared to the hexagonal grid. As always, we add this layer to our final map and display it below on a temporary map.
# Creating a new map and adding the layers
m_rect_delta = get_base_map(center_lat, center_lon)
add_grid(
data=rect_data,
col="avg_delta",
name="Rect Delta",
cmap_colors=["#f2f0f7", "#54278f"],
m=m_rect_delta,
)
fg_pts.add_to(m_rect_delta)
m_rect_deltaSummary¶
In this notebook you learned:
What Segementation is and why it is necessary
Types of grids and how to create them
How to implement a Spatial Join to aggreggate the features for the grid