After learning about the theory of OpenStreetMap and Overpass Turbo in the last notebook, we can now apply this knowledge in a practical exercise. This notebook will show you how to download data directly from OSM via an API.
Imports¶
First, we will import all necessary libaries and custom functions.
import time
from pathlib import Path
import folium
import requests
# ruff: noqa: T201Then, we set up our configuration. We start by specifying our bounding box, which is set by the coordinates of the corners of the area we are interested in. The area we are using here is inside of Vienna and has a size of roughly 12 square kilometers. We then also set the paths to our input and output directories and compute the center of the bounding box which we will need later on to center the maps for our visualisation.
# Configuration
CITY_NAME = "Vienna"
BBOX = "16.335005,48.187854,16.400923,48.209995"
DATA_DIR = "./osm_data"
OUTPUT_DIR = "./output"
# Computing the center of the bounding box
min_lon, min_lat, max_lon, max_lat = map(float, BBOX.split(","))
bounds = [[min_lat, min_lon], [max_lat, max_lon]]
center_lat = (min_lat + max_lat) / 2
center_lon = (min_lon + max_lon) / 2Setting up the Folium Map¶
We will be using Folium throughout this cookbook to visualise our results on a map. If you want to learn more about this library, you can check out the chapter on Visualisation.
As we will be creating a lot of maps in this cookbook, we will first set up a helper function that helps us not have to rewrite the same code for them all. This function will be used for all Folium maps created throughout this whole cookbook.
def get_base_map() -> folium.Map:
"""Initialize the interactive map."""
# Center map on the average location of the intersections
return folium.Map(
location=[center_lat, center_lon],
zoom_start=14,
tiles="CartoDB positron",
zoom_control="bottomright",
)As our first visualisation we now take a look at our bounding box on the map. All of the data we will use for this exercise is contained within this area.
m_bbox = get_base_map()
folium.Rectangle(bounds=bounds, color="#ff0000", fill=False, weight=2).add_to(m_bbox)
m_bboxDownloading the data¶
Now that we have set everything up, we can start by downloading our data in XML format from OpenStreetMap using Overpass Turbo.
You might notice that the code specifies several mirrors to use for the query instead of the standard API. We do this because the main server sometimes times out during peak hours. If this happens and gives us an error, we fall back to using one of the mirrors.
What is a mirror?¶
A mirror in this context is an alternative server that provides the same data and service. For OpenStreetMap’s Overpass API, multiple organizations run their own copies of the service around the world. They all give you the same kind of map data, just from different machines.
In our case, we can use the different mirrors as backups: instead of relying on only one server (which might be slow, busy, or temporarily down), we can try several mirrors. If one of them fails for whatever reason, another one might still work and give us the data we’re looking for. This ensures we can download the data reliably for this notebook.
file_path = Path(DATA_DIR) / f"{CITY_NAME}.xml"
# Ensure output directory exists
Path(DATA_DIR).mkdir(parents=True, exist_ok=True)
bbox_ = BBOX.replace(" ", "")
# Mirrors to try (fallbacks)
urls = [
f"https://overpass-api.de/api/map?bbox={bbox_}",
f"https://overpass.kumi.systems/api/map?bbox={bbox_}",
f"https://overpass.private.coffee/api/map?bbox={bbox_}",
f"https://api.openstreetmap.fr/oapi/map?bbox={bbox_}",
]
HTTP_OK = 200
HTTP_TOO_MANY_REQUESTS = 429
MAX_ATTEMPTS_PER_MIRROR = 2
headers = {
"User-Agent": "Mozilla/5.0 (OSM downloader)",
"Accept": "*/*",
}
success = False
for url in urls:
print(f"Attempting download from: {url}")
for _attempt in range(MAX_ATTEMPTS_PER_MIRROR):
try:
response = requests.get(url, timeout=(10, 300), headers=headers)
if response.status_code == HTTP_OK:
file_path.write_bytes(response.content)
print("Download complete.")
success = True
break
elif response.status_code == HTTP_TOO_MANY_REQUESTS:
print("Server busy (429). Waiting 2 seconds...")
time.sleep(2)
else:
print(f"Failed with status code: {response.status_code}")
break
except requests.RequestException:
print("Error connecting to mirror")
time.sleep(1)
if success:
break
if not success:
msg = f"All download attempts failed for bbox: {BBOX}"
raise FileNotFoundError(msg)
xml_file_path = file_path.resolve()
if file_path.exists():
print(f"You can take a look at the file at: {xml_file_path}")
else:
msg = "Download failed."
raise FileNotFoundError(msg)Attempting download from: https://overpass-api.de/api/map?bbox=16.335005,48.187854,16.400923,48.209995
Download complete.
You can take a look at the file at: /builds/cookbooks/private/intersection-analysis-cookbook/notebooks/osm_data/Vienna.xml
The file we just downloaded is in XML format.
What is XML?¶
XML (Extensible Markup Language) is a hierarchical markup language designed to store and transport data in a structured way. Its key strength is that it is both machine-readable and human-readable. This means you can open the file in a standard text editor and read the content without needing any special software.
Before we process it in the next notebooks, we can inspect the raw file to understand how OSM data is organized.
In order to do this, we will print the first few lines of our file. When you look at the output, you can see it has two distinct sections: the metadata and the actual content. We see that the first few lines contain metadata, so data about the file itself. This contains the API version, the generator (Overpass), the license, the download timestamp, and the specific bounding box coordinates. After that the actual content starts and this information is structured into distinct elements: nodes, ways and tags.
with xml_file_path.open() as f:
for _ in range(15):
print(f.readline().strip())<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API 0.7.62.11 87bfad18">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base="2026-06-09T22:14:27Z"/>
<bounds minlat="48.1878540" minlon="16.3350050" maxlat="48.2099950" maxlon="16.4009230"/>
<node id="199732" lat="48.2084326" lon="16.3600662" version="8" timestamp="2023-09-29T16:36:11Z" changeset="141921840" uid="571107" user="Nielkrokodil">
<tag k="crossing" v="no"/>
<tag k="highway" v="traffic_signals"/>
</node>
<node id="199734" lat="48.2071432" lon="16.3596770" version="22" timestamp="2012-03-21T15:31:30Z" changeset="11053419" uid="473925" user="caigner"/>
<node id="199735" lat="48.2068359" lon="16.3558556" version="11" timestamp="2023-09-29T17:10:22Z" changeset="141923090" uid="571107" user="Nielkrokodil"/>
<node id="199736" lat="48.2076973" lon="16.3553748" version="6" timestamp="2011-12-02T08:10:05Z" changeset="10014813" uid="26818" user="David & Christine Schmitt"/>
<node id="199754" lat="48.2115243" lon="16.3585430" version="30" timestamp="2024-11-13T21:31:01Z" changeset="159108304" uid="18608884" user="Dornbacher"/>
Summary¶
In this notebook you have learned:
What Folium is and how to set up a Folium map
What an API mirror is and why it is useful
How to use the Overpass Turbo API to download data from OSM