Fun with Open Data: Splunking Bike Share Toronto

With the New Year, and cold winter, now upon us here in Toronto we thought it would be fun to kick it off by revisiting our award winning Hackathon entry from last years Splunk’s Partner Technical Symposium and adapting it to provide insights for our very own Toronto’s Bike Share platform leveraging their Open Data.

Our approach

We took the following steps to transform, and enhance, the application to leverage Toronto’s Bike Share data:

  • Compare the app’s original Ford GoBike data feeds with Toronto Bike Share to assess updates required for the app
  • Leverage the Splunk Add-on Builder to develop modular inputs for standardization of sources such as system information, station status, etc.
  • Include additional data feeds such as Open Weather Map data for Toronto and upcoming Festivals & Events in Toronto to provide additional contextual information
  • Update existing dashboards and develop net-new ones

Application interface enhancements

One of the more exciting and noticeable updates to the original application is the user interface. The user interface now provides a much better user experience when navigating the data driven by the following updates:

  • Shiny and cool custom tabs view!
  • New colouring and styles applied to panels!
  • Status Icons including a progress indicator!

Station Insights dashboard showing details of a specific bike share station

How we did it

Step 1: Data Comparison

The first step was to compare data from both the platforms for portability of the existing configurations within the app. This is where the General Bike Feed Specification (GBFS) provided a head-start. The specification defines a set of guidelines for formatting, presenting and maintaining the data feeds. GBFS is being implemented by an increasing number of bike sharing platforms. More information about the GBFS specification is available at the North American Bike Share Association’s (NABSA) GitHub page located at https://github.com/NABSA/gbfs.

The Toronto Bike Share public data is available from the City of Toronto’s website through its Open Data Catalogue (https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue). For our use case, the Ford GoBike and Toronto Bike Share platforms are compliant with GBFS.

Step 2: Data On-boarding

Since the GBFS feeds are standardized, we decided to leverage Splunk Add-on Builder to develop modular inputs for each of the data feeds, allowing easier integration with other GBFS based bike-sharing platforms in the future.

Also included in the on-boarding process was the historical trip information, which is not part of the GBFS specification, and available in the form of CSV files. Due to the non-standardized nature of trip data, differences in the field names and values were observed between the GoBike and Toronto Bike Share trip data files. With some date format adjustments in the trip data, it was ready for Splunk. The Toronto trip data contained fewer fields compared to the Ford GoBike data, eg. user gender and year of birth were not available in Toronto dataset. Due to the differences in the availability of information, additional correlation through automatic lookups was required within the app for ease of use.

Keeping up with the data feed integration flow, we also decided to augment the data with additional feeds such as weather information (www.openweathermap.com) and “Festival & Events” (from Open Data Catalogue). This could help with providing additional context in correlating or assessing a trend at a particular station such as increased requirement of free docks due to an upcoming event nearby or correlating current trend in rental volume due to unfavourable weather.

Step 3: Dashboard Updates

At this point, we had sufficient data to start working on the dashboards. It was exciting to see some of the existing dashboard panels already displaying results. As expected, these were the panels using common GBFS compliant data such as live station status information. Panels relying on the historical trip data however, needed some attention. With some tweaking of searches and additional logic, the dashboards were fully operational again.

Updates to the fleet operations map

The main map was updated during the process to incorporate a more granular scoring logic to show more status levels and marker colors for displaying station data.

Updates to the Station Insights Dashboard

Closest Stations:

A new panel on the Station Insights dashboard displays a list of three closest stations with the summary of their latest status, for quick glance and comparison to the currently selected station.

Upcoming events:

The “Events and Festivals” feed was integrated within the Station Insights dashboard to display a list of scheduled events close to a selected station


Fun Data-driven Observations

While working with the Toronto Bike Share Open Data we also made some fun observations and noted trends. The following are some of the fun observations from the live station feed and the historical trip data for the year 2016.

  • The usage statistics based on user type indicated that approximately 75% of the total trips were completed by members
    • Based on the total volume of trips for Q3-Q4 of 2016 from the Open Data Catalogue, approximately 75% of the trips were attributed to users with a membership. Toronto Bike Share offers an annual membership plan for $99 + Tax which would make this an attractive option, and often a quicker one for short commutes to-and-from work, or simply traveling a short distance from point A to point B.
  • Most number of transactions (trip starting or ending) at a particular station were for the Union Station location.
    • It was observed through the trip data for the Q3-Q4 2016 timeframe that Union Station location recorded the most number of transactions, where the user either started or ended the trip. A major portion of the trips was attributed to members, with high utilization rates around 8:00-9:00 AM and 4:00-5:00 PM during weekdays. This suggests members traveling to and from Union Station to a nearby work location using the Bike Share service. The column chart below shows an example of a trip volume trend for a weekday at the Union Station location indicating an increase in volume around the rush hour timeframes.
  • Members were observed to use the service more often during weekdays, while more casual riders used the service during the weekends.
    • In line with the prior observation of members possibly using the trips for commute to or from work, it was further noticed that the ridership by members was higher during the weekdays. The casual riders on the other hand were observed to make more trips during the weekends, indicating possible usage by families and tourists to explore the area. This also corroborates with the shorter average trip duration for members vs longer duration for casual riders. The below charts illustrate this trend for the month of July 2016.
  • Most number of trips with the same starting and ending bike station were observed for “Bay St / Queens Quay W (Ferry Terminal)” location.
    • The “Bay St / Queens Quay W (Ferry Terminal)” station is located right by the Toronto Islands Ferry Terminal and along with the regular trips, the station recorded the highest number of trips among all locations where the user started and ended the trip at the same station. It was further observed that majority of these return trips were completed by non-members, with trips extending to up-to 4 hours suggesting usage by tourists to explore the area and possibly extending the use for visits to the Islands. The following chart illustrates this trend.
  • Consistently lowest dock availability throughout the week was observed at “Jimmie Simpson Park (Queen St E)” station location.
    • The chart below shows an example of the number of docks available throughout a week. There was no recent trip data available to correlate the trend further however with the available 2016 trip data charted below for the month of July indicates there is usually a fair balance of rentals vs returns at this location which could be leading to a consistently low availability of docks at the station.

Future Opportunities

To continue the fun of analyzing Open Data in 2019, we have already identified additional areas of opportunity such as analyzing public bike share data alongside additional historical information to assess the impact, such as weather conditions or organized events in the vicinity on the rental trends would be quite interesting. Toronto’s Open Data Catalogue also provides data for various other platforms such as TTC ridership data, Housing Permits, etc. which can be leveraged to gain better insights into a specific area of interest.

Conclusion

As an occasional user of the Toronto Bike Share system, it was exciting to work with the public data and experience the data in action, and possibly contributing to the trip data during my usage.

The observations and trends noted here are cursory and a window into the wealth of insights that can be gained from machine data in general. The power of data correlation can be leveraged to understand the trends and further improve services and optimize costs within an organization. The best place to start is, “Listen to Your Data” (as Splunk likes to put it).

Looking to expedite your success with Splunk? Click here to view our Splunk Professional Service offerings.

© Discovered Intelligence Inc., 2019. Unauthorised use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.