Our road network representation for the San Francisco road network is available for download. What that means is that we have taken all the road segments for the map shown in the figure and converted it into a single database called asdo that you can load into your PostgreSQL database.
The latitude/longitude bounding box for the road network is [37.2,-122.6, 38, -121.7]. The asdo table is about 7.4GB in text format without any compression. After compressing to tar.gz format, the size is 1.9 GB. We now describe how you can download this representation and install it on your local PostgreSQL.
Download the Bay Area Oracle
aws s3 cp s3://spatialdatabase/bayarea <your_local_path> --recursive
If you do not want to install the AWS CLI, simply download the files by clicking on the following URLs:
https://s3.amazonaws.com/spatialdatabase/bayarea/Readme.md https://s3.amazonaws.com/spatialdatabase/bayarea/bayarea_plsql.sql https://s3.amazonaws.com/spatialdatabase/bayarea/oracles_bayarea_0_127.out.tar.gz https://s3.amazonaws.com/spatialdatabase/bayarea/datasets.tar.gz
Uncompress the files: “oracles_bayarea_0_127.out.tar.gz”, “dataset.tar.gz”. Read the “Readme.md” file.
tar -zxvf oracles_bayarea_0_127.out.tar.gz tar -zxvf datasets.tar.gz cat Readme.md
Load Oracle into PostgreSQL
Load the dataset in PostgreSQL by issuing the following commands. The file bayarea_plsql.sql contains the table creation commands as well as the function definitions for the dist() functions.
psql -U <user_name> -h <server_address> <db_name> < ./bayarea_plsql.sql
The file bayarea_plsql.sql performs the following operations.
- Creates a new schema called bayarea
- Creates a table called “asdo” with a schema (code bigint, d real)
- Loads “oracle_0_127.out” dataset into the oracle table, and build the necessary indices
- Defines basic functions Z2(), Z4(), and dist(). Z2(lat, lon) produces a code that is unique to a particular pair of latitude and longitude. Z4() similarly produces a code for a particular pair of source and destination. dist() function produces the actual road distance by simply looking up the asdo relation.
Now that the loading is complete, we are ready to perform our first query query. We first connect with the database using psql (or pgadmin) and then issue a simple SELECT query.
# First connect to the database using psql psql -U <user_name> -h <server_address> <db_name>
set search_path = bayarea; -- Change active schema to bayarea \timing on; -- Displays time taken by a query SELECT dist(37.405073750176946, -122.0745849609375, 37.52551993630743, -121.94480895996094);
The oracle returns a result of 28343.6 meter in a few microseconds.
Note about Road Network
The road network we used for the bay area is from http://www.dis.uniroma1.it/challenge9/download.shtml. One egregious error in this road network is that State Route 84 and 92 are not connected. This means that we cannot go from the Foster City to the Hayword using State Route 92. We are aware of this problem and will soon fix the problem by using a road network from a reliable source.
Bay Area Datasets
We have provided three datasets for download so that you can try queries on real datasets. Please look inside the dataset/ directory. We have a dataset of 1396 parking spots in San Francisco. Another dataset of 5001 registered restaurants in San Francisco. Finally, we have a much larger dataset consisting of 12 million GPS crumbs produced by 500 taxis in the bay area collected that was collected over a period of one month. We now describe the procedure to load each of these datasets into the PostgreSQL database.
Parking dataset of 1396 parking spots in San Francisco. This dataset was obtained from San Francisco Open Data Initiative at https://data.sfgov.org/Transportation/Off-Street-parking-lots-and-parking-garages/uupn-yfaw
This table contains several attributes, but only the following few attributes are interesting for our sample query:
id => Primary key
lat, lon => Latitude and longitude
code => Z2(lat, lon)
capacity => Capacity (i.e., how many spots are available)
To install this dataset please issue the following commands:
cd datasets/ psql -U <user_name> -h <hostname> <db_name> < parking.sql
This dataset contains the 5001 registered restaurants in San Francisco city and was obtained from here:
The interesting attributes are:
- id => Primiary key
- lat, lon => Latitude and Longitude of the restaurant
- code => Z2(lat, lon)
To load the dataset, please issue the following command:
psql -U <user_name> -h <hostname> <db_name> < restaurants.sql
Taxi Dataset is from http://cabspotting.org/api and we use the dataset collected by http://crawdad.cs.dartmouth.edu/~crawdad/epfl/mobility/. This dataset contains over 12 million points, 500 taxis
The format of each mobility trace file is the following – each line contains [id, taxiid, tripid, lat, lon, occupancy, time], e.g.: [1, 37.75134 -122.39488 0 1213084687],
- id => Unique id for each GPS crumb
- taxi_id => each taxi has a unique id
- tripid => Globally unique trip id
- lat and lon => Latitude and longitude
- code => Z2(lat, lon)
- occupancy => (1 = occupied, 0 = free)
- ts => UNIX epoch format when this event occurred.To install this dataset, first install the oracle relation
psql -U <user_name> -h <hostname> <db_name> < taxi.sql
Please look at this article for several complex queries one can do with taxi trajectories.
Here is an example of how to use our technology to perform a complex analytical query. The following query finds the top 100 restaurants that has the maximum number of parking spots within 500 meters from them. This query on 5001 restaurants and 1396 parking spots executes in less than 3 seconds!
set search_path = bayarea; \timing on; SELECT foo.id1 as id, count(foo.capacity) as howmany FROM (SELECT x.id AS id1, y.id AS id2, dist(x.code, y.code) AS dist, capacity FROM restaurants x, parking y WHERE x.id != y.id AND x.lat <= y.lat + 0.5 / 110 AND x.lat >= y.lat - 0.5 / 110 AND x.lon <= y.lon + 0.5 / 110 AND x.lon >= y.lon - 0.5 / 110) AS foo WHERE dist <= 500 GROUP BY id ORDER BY howmany DESC LIMIT 100;