Short Term Rental
Tracking Airbnb in Toronto using Registration Data
The City of Toronto maintains several open datasets related to city operations - updated at a range of frequencies. This project is focused on the Short Term Rental Registration data (updated daily). This data is useful in tracking active Airbnb and similar short term accomodations in Toronto. In order to retrieve insights from these applications, we’d need to gather the data for analysis. However, the data has some limitations - mainly that only active listings are available, not all the historical data.
Per the given metadata, A short-term rental is all or part of a dwelling unit rented out for less than 28 consecutive days in exchange for payment. Short-term rentals include bed and breakfasts, but exclude hotels, motels, student residences owned or operated by publicly funded or non-profit educational institutions. See the City’s page for more information.
In this article, we walk through a practical example of managing and updating a database of short-term rental listings using and SQLite. The goal is to maintain an up-to-date record of rental properties by regularly incorporating new data and updating existing records. Here’s a step-by-step guide on how this can be achieved.
Setting Up the Environment
Package Imports: To begin, we import necessary packages including:
pandas
for data manipulation,sqlite3
for database interactions,hashlib
for hashing, and- a custom script (
ShortTermRental
) to fetch from Toronto’s Open Data.
Fetching from Toronto’s Open Data
Setting Up the Environment
To get started, we need to set up our environment by importing the necessary packages. In this case, we’ll use requests
to make API calls, pandas
for data manipulation, and datetime
to handle date and time information.
import requests
import pandas as pd
from datetime import datetime, date
Fetching Rental Data
The core of our data process involves an API to retrieve short-term rental data:
Define the API Endpoint: We first define the base URL for the API provided by the City of Toronto’s open data portal. Much of this will be provided by the developper tab on the Open Data Portal. Feel free to do a high level request and navigate around the response to find what you may be looking for.
Make the API Call: We send a GET request using the requests library to the API to retrieve the data package information
Process the Data: The API response contains various resources related to the dataset. We loop through these resources, checking if they are active data stores (
datastore_active
). For each active resource, we fetch the data in CSV format:Return the Data: Finally, the function returns the combined DataFrame containing all the rental data.
def fetch_rental_data():
# To hit the API, you'll be making requests to:
base_url = "https://ckan0.cf.opendata.inter.prod-toronto.ca"
# Datasets are called "packages". Each package can contain many "resources"
url = base_url + "/api/3/action/package_show"
params = { "id": "short-term-rentals-registration"}
package = requests.get(url, params = params).json()
# Placeholder df to hold the extracted information
df_big = pd.DataFrame()
# To get resource data:
for idx, resource in enumerate(package["result"]["resources"]):
# for datastore_active resources - returns a bool:
if resource["datastore_active"]:
# To get all records in CSV format:
url = base_url + "/datastore/dump/" + resource["id"]
df_temp= pd.read_csv(url)
df_temp['filename'] = url
df_temp['first_seen'] = datetime.now().strftime('%Y-%m-%d')
df_temp['last_active'] = datetime.now().strftime('%Y-%m-%d')
df_big = pd.concat([df_big,df_temp])
return df_big
This script provides a streamlined approach to fetching and processing short-term rental data. This method is not only applicable to rental data but can be adapted to various other data sources and types. Feel free to browse Toronto’s Open Data Portal
Database Setup and Initialization
SQLite Database Creation: We create an SQLite database (
ShortTermRental.db
) and define a tableshort_term_rentals
to store rental data. The table includes columns for a unique hash, rental details, and date information.Populating Initial Data: Using a CSV file with initial rental data, we calculate the unique hash for each record and insert the data into the
short_term_rentals
table. This sets up our base dataset.
Handling New Data
Hashing Function: A hash function is used to create a unique identifier for each rental record based on specific columns such as
operator_registration_number
,address
,unit
,postal_code
,property_type
,ward_number
, andward_name
. This identifier helps in quickly comparing new records with existing ones to identify updates or new entries.Creating Temporary Table: A temporary table (
temp_rental
) is created to hold new rental data fetched periodically. This table mirrors the structure of the main table but is used to manage and compare new entries.Fetching and Inserting New Data: New rental data is fetched using the custom script and inserted into the
temp_rental
table. Hash values are generated for this new data to facilitate efficient comparison with existing records.Identifying New Records: A query is executed to identify records in
temp_rental
that are not present inshort_term_rentals
. This helps in pinpointing new listings that need to be added to the main database.Updating the Main Database: New records are appended to the
short_term_rentals
table. Additionally, thelast_active
field is updated to reflect the most recent activity date for active listings.
Finalizing Changes
Commit and Save: Changes are committed to the database to ensure that all updates are saved. The connection to the database is closed to finalize the session.
Export New Data: A CSV file (
new_information_only.csv
) is generated to keep a record of newly added entries for further analysis or reporting.
Testing and Validation
- Verification: Various queries are used to verify that the updates and new entries are correctly reflected in the database. This includes checking for records with updated
last_active
dates and ensuring that the data inshort_term_rentals
matches expectations.
By following these steps, we ensure that our rental data remains current and accurate. This approach can be adapted to manage other types of datasets and databases, demonstrating a powerful method for data integration and management using and SQLite.
Visualization
Once we are able to reliably collect historical information, we can start to visualize the information through Power BI for example