Today on Mean, Median and Moose, we have data on climate, disasters, and the weather.
Car accidents and the weather
The Transport Canada National Collision Database provides detailed statistics for Canadian motor vehicle accidents for the period between 1999 to 2019. There are a variety of measures you can slice and dice the data on (interestingly though none related to geography). Let’s take a look at the weather related data.
First, let’s look at collisions based on weather events:
Surprised? Surely it’s easier to get into a car accident when it’s raining or snowing outside? That may be the case, but it’s important to consider that there may be more clear and sunny days within the time period you’re looking at. The per-day average may look very different. Unfortunately for us there’s no location data and it’s not sunny everywhere in Canada all at once – so we can’t calculate that out. What a shame.
One thing we can do though is look at other dimensions like severity of data. To normalize this data we can take the severity measures and normalize by the number of accidents in the category. For example, this see if weather has a noticeable effect on the number of vehicles per collision:
Interestingly clear and sunny still beats out inclement weather for the number of vehicles per collision. What might be happening here is individual crashes being so numerous that they drown out the 20 car pileups caused by freezing rain.
What about people per collision?
Raining starts to show up here interestingly. Now for injuries and fatalities. Visibility is the most dangerous when it comes to injuries – perhaps a result of having less time to slow down?
Fatalities tell an even more stark story:
Deadliest and Costliest Natural Disasters
Our panel is fortunate not to encounter natural disasters too frequently in Windsor ON, but natural disasters can hit anywhere in Canada, and over the years, some have been very deadly and some very costly. The Canadian Disaster Database holds data on all disasters in Canada, including natural, technological, and conflict events that have happened since 1900. It classifies a significant disaster event as one in which one or more of these criteria are met:
- 10 or more people killed
- 100 or more people affected/injured/infected/evacuated or homeless
- An appeal for national/international assistance
- Historical significance
- Significant damage/interruption of normal processes such that the community affected cannot recover on its own
The data it holds describes where and when a disaster occurred; the number of injuries, evacuations, and fatalities; and an estimate of the costs. We took a look at meteorological – hydrological events, which includes avalanches, cold events, droughts, floods, geomagnetic storms, heat events, hurricanes/typhoons/tropical storms, other storms, storm surges, storms and severe thunderstorms, tornadoes, wildfires, and winter storms, from 1900 to 2023 to see which disasters were the deadliest and the costliest over this time period.
According to the database, the top 3 deadliest natural disasters in Canadian history were all heat events, with the deadliest being a cross-Canada heat event that began on July 5, 1936 and lasted until July 17, 1936, causing 1,180 fatalities as temperatures reached greater than 32°C in most regions through that period. The second deadliest heat event occurred in Vancouver and Fraser BC in July 2009, causing 455 fatalities, and third deadliest occurred in Ontario and Quebec in 2010, causing 280 fatalities.
It is interesting to see the evolution of data collection as the 1936 event data provides very little context, while the 2010 event data comes with additional statistics, listing that in Toronto, paramedics received 51% more complaints about breathing problems and 39% more calls related to fainting, in Montreal, heat-related deaths doubled, and across 8 health regions of Quebec, there was a 33% increase in mortality rate.
The fourth and fifth deadliest natural disaster events are much more historically interesting than the first three, with the fourth deadliest event being a storm and severe thunderstorms event over Lakes Huron, Erie, and Ontario from November 7, 1913 to November 13, 1913, that caused 270 sailors to drown when 34 ships went down during the storm that saw winds up to 140 km/h – entire crews of eight ships were lost! The fifth deadliest natural disaster was a wildfire event in Cochrane and Matheson ON on July 29, 1916 that caused 233 fatalities and 8000 evacuated from the area as both towns were entirely destroyed by a fire that resulted from a small blaze started by lightning and was made worse by fires started by sparks from a passing locomotive.
As for the costliest natural disasters in Canadian history, a winter storm from January 4, 1998 to January 10, 1998 takes the cake, causing an estimated total of $4,635,720,433 in costs, which includes federal and provincial disaster financial assistance arrangements (DFAA) payments, provincial department payments, municipal costs, other government department costs, insurance payments, and NGO payments. The second costliest event was the April 2016 wildfires in Fort McMurray, costing an estimated total of $4,068,678,000 and resulting in 2 fatalities and 90,000 people evacuated from their homes. The third costliest was the June 2013 flooding across southern Alberta at an estimated total of $2,715,742,000, which also saw 4 fatalities and 100,000 people evacuated from their homes.
It’s important to note the database displays cost data in the dollar amount of the year that the event took place or the year the specific payment was made, so it can be difficult to compare events, but the database does provide a “Consumer Price Index Normalization” conversion tool to help with this. Additionally, there is no standardized guideline for collecting cost and loss data, and financial data can take years to finalize, so estimates are sometimes provided in the interest of keeping the database current. Overall, there is a lot of missing or “unknown” data in the database, so this has to be kept in mind when considering the accuracy and completeness of the data.
US Extreme Weather Events
That National Oceanic and Atmospheric Administration (NOAA) in the US oversees weather forecasting and data collection. They keep a dataset from 1950-2022 (updated annually) of extreme storms and weather events. There are three types of data location data – with latitude and longitude based locations of weather station based on two criteria.
Events in this official NOAA database are selected based upon the following criteria:
- Storm has sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce.
- Storm events contains rare, unusual, weather phenomena or significant meteorological events, such as extreme temperature.
The other two dataset these same events types contain fatalities and longer form detail data including estimates of damage cost data. I had tried to find an equivalent dataset for Canada, the closest I could find was the Disasters dataset that Katie discussed and as you can see by the criteria they are close but not necessarily apples to apples. Within this data there is different types of data being collected which also creates challenges:
In 2021 there were over 61,000 unique extreme weather events in the United States.
Another interesting piece with this data is the sourcing. They track where each of these events occur and who spots them. Social media is an emerging source for active short term events like Debris Flows (land slides), flash floods etc. while traditional weather forecasting and spotting tend to hand warnings for thunderstorms or hurricanes.
Finally the latitude and longitude data is available for approximately 53,000 of the 61,000 events that occurred in 2021 allowing a map like this to be created.
Each dot is one of the storm events geo located to the latitude and longitude point, the colouring is based on the month which the storm event occurred.
Consuming Canadian Climate Data
Canada has data from weather stations across the country going back to the 19th Century. The earliest observations in the data set go back to 1840, with a decent amount of data series starting in 1870. Weather data is often distributed in binary formats that require special software tools to deal with, but Canada’s Ministry of the Environment and Climate Change publishes weather station data series with data available down to the hour in some cases. The main challenge for a non-specialist in working with this data set is that each individual data series within the larger series is provided as a separate CSV file, for example the daily data series is available as a set of 12 CSV files per year, one for each month, each containing a row summarizing the daily observations at that station. This multiplies quickly and is very hard to manage if you are looking for data across a larger geographic area than a single station.
To help with this problem, Doug hacked together a Python script that automates the process of downloading monthly data files, processing them, and populating a single database table with the result. It uses a slightly modified version of Environment and Climate Change’s Station Inventory spreadsheet to source data for individual stations.
By selectively removing rows from the Station Inventory spreadsheet, you can specify any subset of stations to collect observations for, and modifying the start and end dates of each data series in the spreadsheet will serve to limit the data consumed to the specified period. The code is available on GitHub and is dedicated to the public domain for anyone to use and modify.
Downloading the data one file at a time takes a decent amount of time – a few seconds per monthly report – and the database table can become very large. The data volume is also a consideration – after processing the first 90 or so stations on the list (out of a little less than 9,000) the table sits at around 13 million rows. It’s a good idea to limit your data intake to only the stations and years you’re interested in.
Once you’ve got the data you want in the table you can use SQL queries to slice and dice the data any way you’d like. The project’s SQL folder contains a few sample queries to get you started.