This time, we’re talking about political data in Canada: election results, data on our representatives, and the data that political parties collect. If you haven’t listened yet, you can do so using the player above or by searching for Mean, Median, and Moose in your favourite podcast app.
John took a look at the demographics of the Canadian federal MPs that were elected in 2019. You used to have to do this with sources like Wikipedia. A little while ago, John did this for the 2015 parliament by developing a script to access Wikipedia’s list of MPs who won election and then following the links within that list to get demographic information inside each MP’s info box. Since then the Canadian government has made this easier. You can now find a list of MPs with basic demographics like age, gender and occupation here.
Here are some interesting things about them. First, their ages. Almost all MPs are between the ages of 40 and 70 and only five MPs were between the ages of 20 and 29 (Note that the data is self reported and so 29 MPs did not list their age or date of birth).
As you probably know, the MPs are mostly male. In fact, there are more than twice as many male MPs compared to females. What you might not know is that the age of the female MPs that did get elected skews younger.
In the podcast, Katie talked about why that might be. She covered how media coverage, gendered roles and perceptions and even how the electoral system may affect representation. She wrote a paper on it a little while ago if you’re interested.
What about across parties? The age distribution looks like this:
Let’s look at the ages by gender and regions:
Grids of bar graphs like this are great alternatives to stacked bar charts and can make it easier to compare categories to each other. They call these graph trellises, and Vega Lite makes it easy to do. Check out the relatively flat distribution across the age of the Western Canadian group of Conservative MPs. None in that group that are 20-29 though.
How about self reported occupation? This is a tough one to nail down for a few reasons. There are multiple occupations for each MP when it’s reported. Also, the data is free-form which can cause issues. For example two labour lawyers might each list “Lawyer” and “Labour Lawyer”. With unstructured and informal data like this, sometimes less formal visualizations can be useful. They can give you an idea about the nature of the data without providing specific numbers or the impression that the data is perfectly accurate. Here’s a word cloud of all the self reported occupations in the list of MPs from the parliament website:
We decided to also make this an interactive visualization. You can click on one of the words and see a breakdown of the percentage of MPs reporting that included that word and what they reported.
You can find that interactive version of the word cloud as well as a all the other graphs and the code to generate them at this link.
Elections Canada publishes poll-by-poll election results for every federal riding in CSV format. They also publish map data at the riding and national level. This makes it fairly easy to create a map using QGIS or another GIS tool. Doug has made his share of this kind of map, and you can find them all over the internet on news and political sites.
Probably the best one out there that uses a traditional geographic representation is http://www.election-atlas.ca/. They have comprehensive results of federal elections and by-elections going back to 1896 as well as referendums, provincial elections going back over a hundred years.
One of the challenges in visualizing election results in Canada is the way the population is clustered. Canada is a big, sparsely populated country with a few more densely-populated areas. Using the map of Canada and riding boundaries gives a pretty misleading picture. One single federal riding (Nunavut) is more than 20% of the whole country by area. The three northernmost ridings are roughly 40% of the area of the country.
A way to address this challenge is to use a cartogram giving each federal riding equal size. A few people have done some work on this. Luke Andrews built and open sourced a website using a hexagon grid. You can check out the website at https://electoralcartogram.ca and see the source at https://github.com/attaboy/electoralcartogram. Dewey Dunnington built a different visualization in R using the geogrid package: https://fishandwhistle.net/post/2019/canada-ridings-hex/. The one Doug ended up using is the R package mapcan, which was used to build these visualizations showing riding-by-riding change in vote share for each party.
To build these visualizations, Doug needed to get all of the riding-by-riding results into a single data frame, which brings us to another challenge of dealing with Canadian election data: Canada does not have a single election for Prime Minister or other key federal officials – it has 338 individual elections for MP and the party winning the majority gets to govern (or plurality in the case of a minority government). Elections Canada publishes each riding’s result in a separate CSV file. To support complex queries down to the level of individual polls, Doug built an SSIS package that imports each riding’s poll-by-poll result into a relational database. You can check out the SSIS package as well as get a copy of the complete database and supporting queries here: https://github.com/dsartori/CanadianElectionResults
Relational databases atomize data into the smallest possible related chunks and connect them to each other via relationships. This is useful when you frequently update a database and want to make sure it stays consistent. It’s also handy in a case like this when you don’t know ahead of time all the ways that you might like to query your database. You can write queries (sometimes complex ones) to get at any arrangement of data that you like, with confidence that your results will be consistent and accurate. This is an entity-relationship diagram of the Canadian Election Results database:
One important thing to note is that poll identifiers are not stable between elections. Polls change shape and the poll number is not guaranteed to be the same from one election to the next. If you want to analyze riding results from one election to the next you need to use maps and do it visually, or transpose poll results onto a more stable geography like census districts (within a single census), municipal polls, or arbitrary neighbourhood boundaries. Riding boundaries and names can also change between elections although this is less common. The database design reflects these challenges: polls and electoral districts are unique for each election year.
This structure allows you to generate summary data and visualizations for the whole country, visualize individual ridings by poll, and to compare riding results at the poll level. These queries can help you turn up interesting spots for further study, like these 12 ridings where the candidate who won the most polls did not win the seat:
What is interesting about this list is that all but 3 of the ridings are rural ridings with low population density. In urban and suburban areas, voter turnout and party support tends to cluster in neighbourhoods and this effect is amplified by typical campaign tactics, which focus on areas of strongest support first. You can see the reason campaigns tend to do this by calculating the probability of encountering a voter who will support a particular party in a specific poll: divide the number of votes for the party by the total number of electors in the poll. This gives you the rough probability of earning a vote at each door.
Doug wrote a query to assess the value of every poll in the country for each party. The first thing you notice in looking at these results is that there are a fair number of polls with a total turnout higher than 1. These are mostly mobile polls, although some are not listed as mobile polls, where the number of expected electors was lower than the number who voted. There is some noise in Elections Canada’s data at the poll level – putting aside polls that report 0 electors, fewer than 1% of polls report a total number of electors that is lower than the total of votes cast.
Putting these anomalies aside, the most “valuable” poll for any party, based on results of the 2019 election was Pinsent Arm in Labrador, which had a value of 87% – of 39 eligible voters, 35 people cast a vote and 34 of them voted for the Liberal candidate. In 2019, for every 10 doors a campaign knocks on in Pinsent Arm, almost 9 of them will be Liberals who intend to vote.
What you Missed From Stats Can
In 2019, the total fertility rate (TFR), or the number of children that a woman would have over the course of her reproductive life, declined to 1.47 births per woman from 3.94 in 1959. Canada’s TFR has been below the replacement rate of 2.1 births per women since 1971, meaning that the number of babies being born is not enough for the current population to replace itself.
Over the last six decades, the average age of first-time mothers increased from 23.2 years in 1959 to 29.4 years in 2019. This trend, common in other countries including the United States, coincides with increased participation for women aged 25 to 54 years in the workforce and a rise in university-educated women. According to data from the Labour Force Survey, the percentage of women in the workforce increased from 22% in 1950 to 84% in 2019, and the proportion of women with a university degree nearly tripled from 14% in 1990 to 40% in 2019.
Data available that breaks out births by time of year, material status, characteristics between mother and child (Mother’s age and birth weight as an example)
Lumber production decreased 7.4% from June to 4 491.1 thousand cubic metres in July. Production was 1.5% higher than in July 2019.
Sawmills shipped 4 553.0 thousand cubic metres of lumber in July, down 4.2% from June and down 0.7% from July 2019.
- I know nothing of the lumber industry but it was hard to get some wood for home improvement projects this summer.
- Also the report is just titled Sawmills July 2020 and it kind of made me laugh. Obviously this is a monthly report
The estimated gross domestic product (GDP) at market prices for underground economic activity in Canada reached $61.2 billion, or 2.7% of total GDP, in 2018.
The underground economy decreased 0.8% in real terms in 2018 on a year-over-year basis, compared with a year-over-year growth of 1.9% in 2017.
four industries accounted for more than half of underground economic activity: residential construction (26.2%), retail trade (12.3%), finance, insurance, real estate, rental and leasing and holding companies (10.3%), and accommodation and food services (9.1%).
Wages that are not accounted for in payroll records and tips on undeclared transactions were $26.0 billion, equivalent to 2.3% of official compensation of employees.
In 2018, underground economic activity associated with household final consumption expenditure was $40.4 billion, which accounted for 66.0% of the underground economy. Of the household final consumption expenditure in the underground economy, 29.2% was associated with household purchases of alcoholic beverages, tobacco and cannabis, 20.6% was spent on food, beverage and accommodation services, and a further 17.2% was used on housing, water, electricity, gas and other fuels. The household-purchased alcoholic beverages, tobacco and cannabis from the underground economy accounted for 25.1% of the official total economy household expenditure. The food, beverage and accommodation services accounted for 9.3%.
That’s it! I hope you enjoyed this month’s episode and blog post.