Course project for Databases II.
Project's Grade: 10/10.
Team 3 scope: restaurants, theaters, libraries in Milan (Milano) using the Foursquare Places API.
This repository contains code only. The folders data/ and reports/ are generated locally (they are not included in the repository because they can be large and may contain raw API responses).
The project implements:
- POI collection from Foursquare Places API using adaptive spatial tiling (grid/tiles) and raw-response storage.
- Data cleaning/preprocessing (duplicates, invalid coordinates, Milan-area checks) and production of a clean dataset.
- Indexing and benchmarking: KD-tree, Quad-tree, R-tree (multiple parameterizations) + comparison with SQLite RTree.
- Spatial queries: Nearest Neighbor, Bounding Box, Radius (500m), Composite (spatial + attribute filter).
- Interactive visualization of query results using Folium.
- Requirements
- How to get your own Foursquare API Key
- Installation
- Configuration
- Run pipeline (from scratch)
- Generated outputs (local)
- Troubleshooting
- References
- Python 3.11+ (or any modern Python 3.x)
- Windows / Linux / macOS (tested mainly on Windows)
- Python packages:
requests,pandas,numpy,scipyrtree(libspatialindex backend)pyqtreefolium
Note:
rtreedepends on libspatialindex. On Windows it usually works via pip wheels. See Troubleshooting if needed.
This project uses the Foursquare Places API. To run the extraction script, you must generate your own API key from the Foursquare Developer Console.
- Create / log in to your Foursquare Developer account.
- Create a new Project in the Developer Console.
- Generate a Service API Key for that project (this is the key you will use as a Bearer token).
- Copy and securely store the key (you will typically see it only once).
- Use the key in requests as:
- HTTP header:
Authorization: Bearer <YOUR_KEY>
- HTTP header:
Security note: Do not commit API keys to GitHub. Use environment variables or a local
.envthat is excluded via.gitignore.
Windows (PowerShell):
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pipLinux/macOS (bash/zsh):
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pippip install requests pandas numpy scipy rtree pyqtree foliumWindows (PowerShell):
$env:FSQ_API_KEY="PUT_YOUR_KEY_HERE"Linux/macOS (bash/zsh):
export FSQ_API_KEY="PUT_YOUR_KEY_HERE"Windows (PowerShell):
mkdir data, reports -ErrorAction SilentlyContinue
mkdir data\raw, data\clean -ErrorAction SilentlyContinue
mkdir reports\maps -ErrorAction SilentlyContinueLinux/macOS:
mkdir -p data/raw data/clean reports/mapsThe scripts below are the final project entry points:
extractfromFoursquare.pycleaningReport.pyindexesBenchmark.pysqliteBenchmark.pyfoliumVisualize.py
python extractfromFoursquare.pyExpected (locally):
- raw responses under
data/raw/ - SQLite database created/updated (e.g.,
data/milano_places.sqlite)
python cleaningReport.pyExpected (locally):
reports/cleaning_report.json- clean export under
data/clean/(e.g., JSONL/CSV depending on implementation)
Recommended with 5 repeats:
python indexesBenchmark.py --repeats 5Expected (locally) under reports/:
index_build_times_allthemes.csvindex_query_times_allthemes.csvindex_query_summary_allthemes.csvquery_sets_allthemes.json
python sqliteBenchmark.py --repeats 5Expected (locally) under reports/:
db_build_times_allthemes.csvdb_query_times_allthemes.csvdb_query_summary_allthemes.csv
Run visualization for a specific query id (qid ranges from 0 to 9):
python foliumVisualize.py --qid 0
python foliumVisualize.py --qid 9If your visualization script supports choosing backend:
python foliumVisualize.py --backend indexes --qid 9
python foliumVisualize.py --backend db --qid 9Expected (locally):
- HTML maps under
reports/maps/... - Open the produced
index.html(if generated) in a browser.
Because data/ and reports/ are not committed, you will generate these locally by running the pipeline.
data/raw/— raw API responses (JSON) per requestdata/clean/— cleaned dataset exportsdata/milano_places.sqlite— SQLite database (deliverable copy)
reports/cleaning_report.json— cleaning summary (duplicates, invalid coords, bbox checks)reports/query_sets_allthemes.json— 10 queries per theme per query typereports/index_*_allthemes.csv— index benchmark outputsreports/db_*_allthemes.csv— DB benchmark outputsreports/maps/— Folium HTML maps for NN/BBOX/Radius/Composite
If you encounter an error similar to:
NearMinimumOverlapFactor must be ... less than both index and leaf capacities
Then near_minimum_overlap_factor must be strictly less than both index_capacity and leaf_capacity.
Fix by lowering the overlap factor or increasing capacities consistently (the benchmark script includes safe parameter choices).
- Try a clean venv:
pip install rtree - If it fails, use an appropriate wheel for your OS/Python version or use conda.
Folium maps visualize results of a single query (selected by qid), not the full dataset.
Change --qid to visualize a different location, or implement an “all points” overview map (cluster/heatmap).
- Foursquare Places API documentation (Get Started, API keys)
- SciPy
scipy.spatial.KDTreedocumentation rtree/ libspatialindex documentation- Pyqtree documentation
- SQLite RTree module documentation
- Folium documentation