Visualising Population Growth and New Developments
We recently completed a piece of work to see if we could better understand where the pockets of new developments are occurring and subsequently, what that means for population changes and service demand at a granular level across Greater Melbourne. As part of the analysis, we sourced permit approval data from the Victorian Building Authority and performed DBSCAN clustering to understand where the areas of activity were occurring and how that has changed over the past decade. Each dot represents an approved project that has created additional residential dwellings with the red, yellow and green colouring representing clusters of high construction activity with descending intensity. A more detailed methodology is included below.
Methodology
Data Preparation
Data was first sourced from the Victorian Building Authority (VBA). As the format can vary year-on-year, work was completed upfront in Python to ingest, collate and standardise the data, which can be found here:
Geocoding
Once the data was in a useful format, the next step was to geocode the data using Geoscape’s Geocoded National Address File (GNAF). It’s essentially a dataset containing the latitude and longitude of every address in Australia. This allows us to take the long-form address of the planned building and assign it a set of coordinates through a series of table joins. As the VBA data does not contain numbered street addresses, a method of random sampling was used to determine the precise street number to use. Geosape’s GNAF dataset and associated documentation can be found here:
Identifying Clusters
After the data was geocoded, the task was then to find a way of determining the hotspots of activity. We decided to use density-based spatial clustering of applications with noise (DBSCAN) with progressively greater euclidean distance limits to define the high, medium and low spatial clusters. The DBSCAN algorithm was chosen as it is particularly effective at capturing organically arising clusters that can be missed when using pre-determined geometries and can represent the dynamic/shifting nature of development. The algorithm was implemented using Python’s SciKit-Learn machine learning library and the coordinates were transformed to a Mercator projection to facilitate the calculation of the euclidean distance between points. Projects were also weighted by the number of new dwellings constructed, such that an apartment building had a greater contribution to its cluster than an individual house.
Once the clusters were identified, Geopandas was used to spatially join the cluster centre points to suburbs in order to provide viewers with a reference point for each cluster to better understand where the activity was taking place. Suburb geometries came from the Australian Bureau of Statistics State Suburbs (SSC). To finish the cluster visualisation, Python’s SciPy was used to obtain the convex hulls of each cluster.
Base Map
The open-source GIS software QGIS was used to create the base map. This was done from a combination of Bing Maps satellite imagery via the API tile server, overlayed with the rail and major road centreline from Open Street Maps shapefile exports. This allowed us to fully customise the base map and produce something that would let viewers recognise Melbourne’s geography without over-powering the data that needed to be plotted on top.
Animation
To illustrate the dynamic nature of the construction, we used a rolling 6-month window from the projects’ permit dates. We used Python libraries MatPlotLib and Basemap to plot and customise the output for each date. Each date became a single frame in the video, which was subsequently combined using MoviePy.