In this post we have a grab-bag of election related topics to go through. The first two deal with data on the results of the 2021 election. The third topic is about indigenous representation in the house of commons. And the last topic is on methods for comparing data on voting polls between elections.
Combining Election and Census Data
Frazier took the riding by riding election results and decided to compare it to the 2016 Census. I sorted the riding data by party and compared it to the median incomes for all of their candidates and the candidates who won.






Two big takeaways from this data – only the Conservatives of the big 3 national parties have a significant positive trendline between the median income of the riding and their vote percentage. Both the Liberals and NDP have flat or slightly declining trendlines.
The second takeaway is the winning floor for the parties. For the Liberals they win their first seats at just over 31% of the vote, which happened to be Windsor-Tecumseh, the lowest percentage of victory for an now Liberal MP. The NDP had the lowest percentage of vote for a winning candidate at 28% in Nanaimo—Ladysmith in BC where the Conservatives, Greens and NDP all took over 25%. For the most part the NDP clusters their support around over 40% which creates challenges for the party to win at scale. Finally the Conservatives have a lowest winning percentage of 36% with the clear majority of their wins coming in at over 40%. All of this illustrates the voter efficiency of the major parties.
One of the dominant issues of the election was affordability of housing. All parties had major platform planks that focused on affordability, increasing support, shaping demand. So I was curious based on the percentage of residents who were housing insecure in each riding in 2016 how that percentage lined up with eventual votes for the winning candidate of each big 3 party.



If we assuming that housing affordability has gotten worse since 2016 (surprise it has) the positive trendline with Liberal and NDP votes to housing insecurity as urban areas where housing was and continues to be the most impacted. If you squint pretty hard you could see a window where this could have been a bit of a predictor of the impacts of those policies. It is possible that the Conservatives poorer performance in urban areas was partially connected to a lack of connection of their housing policy as Liberal and NDP candidates and MPs have been taking about these issues while the conservatives who base in rural areas has only seen housing increases in the past year or two may not have been motivated by the issue.
Differences in Geographic Voting Patterns: 2019 to 2021
The 2021 election is often described as a “Groundhog Day” election where nothing changed and the parties fought each other to a draw. There’s some truth to that – the seat counts hardly changed and the popular vote percentages for the parties didn’t move much. That said, for those paying attention to the election, it was very different from 2019. The issues were different, the Conservative party ran to the left of what it did in 2019, and there were surprises like the uproar that the English language debate caused in Quebec. So, did anything change?
One thing to look at instead of the total number of seats won or lost and the popular vote, is to look at the per-riding changes from 2019 to 2021. We built a tool (https://observablehq.com/@johnhaldeman/vote-differences-canadian-federal-election-2019-to-2021) to help you visualize and investigate these changes between the two elections. Let’s start with the Liberals:




The Liberals weakened in some parts of Ontario, Quebec, and the maritimes, but strengthened in Alberta, and BC. Looking at the Conservatives, geographic shifts are even starker with a large number of votes migrating East and West out of Alberta and Saskatchewan:

The Conservatives running toward the center did indeed have an effect, just not enough to push them over the edge.
Our tool also allows you to look at the top 20 ridings in terms of vote percentage gains and losses. Here’s the NDP showing a large East/West split in terms of gains and losses:

And this picture shows what the Green collapse looked like (save that anomaly in Kitchener Center affected by the Liberal drop-out):

Indigenous MPs Since 1871
September 30th was the National Day for Truth and Reconciliation, so in light of this, Katie explored data from the Parliament of Canada Information Service on the indigenous origins of Members of Parliament since 1871. Indigenous suffrage has been a hard-fought battle, with many Indigenous people being disenfranchised throughout Canada’s history. Inuit people gained the right to vote in 1950, and status Indigenous people only gained the right to vote in 1960, but they have continued to face barriers to exercising their right to vote. It’s good to keep this in mind when taking a look at the data.
There have been 48 MPs with Indigenous origins since 1871, with 19 being First Nations, 8 being Inuit, and 21 being Metis. Up until 1963, only Indigenous MPs with Metis Indigenous origin had been elected. The first First Nations MP elected was Len Marchand in 1968, and the first Inuit MP elected was Peter Ittinuar in 1979.

What provinces and territories have Indigenous MPs been elected in across Canada? Manitoba has had the most representation, with 12 MPs with Indigenous origin elected there since 1871, followed by the Northwest Territories with 8, and BC, Quebec, and Saskatchewan tied at 5. The Yukon Territories, New Brunswick, and Prince Edward Island have yet to elect an MP with indigenous origins. Note here that these numbers add up to 49 instead of 48 MPs – one MP was elected in the Northwest Territories, which became Nunavut in 1999, so she is included twice in this data set.

And as far as political affiliation goes? The Liberal Party has had the most Indigenous MPs by a large margin – 26 elected since 1871 compared to only 14 from all conservative parties (there’s been a lot over the years!), 9 from the NDP, 4 Independents, and 2 from the Bloc Quebecois. Again, these numbers don’t add up to 48 – some MPs were elected more than once but with different parties, and some crossed the floor while they were MPs.
Of the 4573 MPs who have ever been elected, just 48 have been Indigenous. They have represented only 1.05% of MPs, while they are currently 4.9% of the population. Considering this, we have a long way to go in Canada to increase representation of Indigenous people in Parliament.
Updating the Canadian Election Results Database
Last year, Doug built a relational database of Canadian election results using Microsoft SQL Server. This year while waiting for the poll-by-poll results for the 2021 federal election, he added geospatial data for each poll across the 2015 and 2019 elections. Besides supporting applications of the database targeting maps, this change enables users of the database to compare poll-by-poll data geographically.
The GIS data for Canadian electoral districts and the polls within them is available on the Canadian government’s open data website: 2015 and 2019. Unfortunately, this site does not supply the data in a format that is easy to use with SQL Server. Turning to the handy open-source GIS software QGIS, the data can be transformed into a CSV file with geography stored in the WKT format (“Well-Known Text”), which can be directly processed by the spatial tools in SQL Server. The table for 2019 data looks like this:
CREATE TABLE [dbo].[pd_2019](
[WKT] [varchar](max) NULL,
[PDNUMSFX] [nvarchar](255) NULL,
[PDTYPE] [nvarchar](255) NULL,
[FED_NUM] [nvarchar](255) NULL,
[ADVPOLLNUM] [nvarchar](255) NULL,
[PD_NUM] [nvarchar](255) NULL,
[PDNUMSFXCO] [nvarchar](255) NULL,
[CIVICNUM] [nvarchar](255) NULL,
[CIVICNUMSF] [nvarchar](255) NULL,
[STREETNAME] [nvarchar](255) NULL,
[STREETTYPE] [nvarchar](255) NULL,
[STREETDIR] [nvarchar](255) NULL,
[BLDGNAMEEN] [nvarchar](255) NULL,
[BLDGNAMEFR] [nvarchar](255) NULL,
[POLLNAME] [nvarchar](255) NULL
)
The next step in this process is importing the 2015 and 2019 polling boundary CSV files into SQL Server as interim tables with the WKT data encoded as text. Once this is done, and the Poll table extended to include a new “geo” column for spatial data, a simple update query using a spatial function to interpret the WKT data will add the relevant data to each poll:
UPDATE Poll SET geo = GEOMETRY::STGeomFromText(pd.WKT,0)
FROM poll p
INNER JOIN ED e ON e.EDID = p.EDID AND e.electionyear = 2019
INNER JOIN pd_2019 pd ON CASE WHEN ISNUMERIC(p.PollNumber)= 1 THEN p.Pollnumber
ELSE LEFT(p.pollnumber,len(pollnumber)-1)
END = pd.pd_num and pd.fed_num = e.ecnumber
As we mentioned last year, it’s currently difficult to compare results at the poll level across elections because Elections Canada does not guarantee that polls will stay the same shape or have the same number from one election to another:
One important thing to note is that poll identifiers are not stable between elections. Polls change shape and the poll number is not guaranteed to be the same from one election to the next. If you want to analyze riding results from one election to the next you need to use maps and do it visually, or transpose poll results onto a more stable geography like census districts (within a single census), municipal polls, or arbitrary neighbourhood boundaries. Riding boundaries and names can also change between elections although this is less common. The database design reflects these challenges: polls and electoral districts are unique for each election year.
Unstable identifiers are a real problem for data analysis. Loading geographic data into the SQL database allows us to use the spatial functionality of Microsoft SQL Server to identify polls that match the boundaries of a poll in a previous election. Working with the data we found that poll boundaries are often static between elections, and a prototype query that identifies the geographically similar polls between the 2015 and 2019 election finds that most polls are more than 90% coterminous with a poll from the previous election.
In a riding where poll boundaries were pretty static, but poll numbers changed this query is all you need to map polling division numbers across elections:
declare @ECNUMBER varchar(10) = '35117'
print 'processing ' + @ecnumber
select
e.ECNameEn,
f.pollNumber as [2015 poll],
n.pollNumber as [2019 poll],
CASE WHEN f.geo.STArea() > 0 THEN
round((f.geo.STIntersection(n.geo).STArea() / f.geo.STArea()),4)
ELSE 0 END,
c2015.candidatelastname as [2015 Candidate],
r2015.votes as [2015 votes],
c2019.candidatelastname as [2019 Candidate],
r2019.votes as [2019 votes],
r2019.votes - r2015.votes as [change]
from poll f
inner join poll n on f.geo.STIntersects(n.geo) = 1
left join pollresult r2015 on f.pollID = r2015.pollID
left join pollresult r2019 on n.pollID = r2019.pollID
inner join candidate c2015 on r2015.candidateid = c2015.candidateid
inner join candidate c2019 on r2019.candidateid = c2019.candidateid and c2019.partyid = c2015.partyid
left join ed e on e.edid = f.edid
where
f.edid = (select edid from ed where electionyear = 2015 and ecnumber = @ECNUMBER)
and n.edid = (select edid from ed where electionyear = 2019 and ecnumber = @ECNUMBER)
and round((f.geo.STIntersection(n.geo).STArea() / f.geo.STArea()),4) > 0.2
order by e.edid,f.pollNumber
There’s a lot going on here lining up the different tables we need but the key column in the query is this one:
round((f.geo.STIntersection(n.geo).STArea() / f.geo.STArea()),4)
It uses the STArea() spatial method to calculate the degree of overlap between a 2015 poll (f.geo) and a 2019 poll in the same riding (n.geo), first finding the polygon that represents the intersection of the two polls. The calculation is the intersection of the 2015 poll with the 2019 poll divided by the total area of the 2015 poll. If you get an area greater than 95% or so it’s likely that the poll boundaries are not substantially different.
Running this query for my home riding of Windsor West, I found that all polls were represented in the result set and this provides an easy way to compare results across elections. Here’s a small sample of the output for Windsor West:

Some ridings have pretty convoluted poll boundaries, like these ones in the riding of Long Range Mountain in Newfoundland.

Complicated polygons will cause problems for SQL Server’s spatial functions in calculating the degree of overlap between polls. To get around this problem we can use the Reduce() method that SQL provides to reduce the complexity of the shape while retaining its overall area and shape. Everywhere you see the geo column specified, add the Reduce() method. This method takes one argument: the error tolerance in meters.
round((f.geo.Reduce(1).STIntersection(n.geo.Reduce(1)).STArea() / f.geo.Reduce(1).STArea()),4)
There are still a few wrinkles to iron out: sometimes Elections Canada returning officers will split a poll for administrative reasons, which can result in a poll in one election mapping to two smaller polls in another election. Those instances are doubled in the result set, and sometimes poll boundaries change by a significant margin.
Even when poll or riding boundaries change, you can make use of techniques like the one described in the paper Logical and Physical Design of Spatial Non-Strict Hierarchies in Relational Spatial Data Warehouse for relating and comparing overlapping geographies: use the degree of overlap as a distribution factor to distribute votes among polls.
Later this month Doug will be doing a deep-dive tech talk on all this so be sure to watch our YouTube Channel for updates. To get a CSV file of the results of a 2015 to 2019 poll comparison, click here.