Forecast of the 2022 Midterm Elections

Finally, the prediction

Hello everyone! My name is Ethan Kelly & as of writing this piece, I am a sophomore living in Leverett House, studying Government & Computer Science. In this final blog post, we will be exploring the 2022 midterm elections & my model for the congressional elections. This has been a product of an entire semester of studying various impacts on the outcomes of elections & I am excited to finalize my prediction just one day before the elections.

To begin, this model is based on forecast ratings from 3 major political forecast sources: The Cook Political Report, Larry Sabato’s Crystal Ball, and Inside Elections. All of these forecasters take into account various variables when it comes down to their predictions. Whether this be inflation, presidential approval, generic ballot, previous voting history, turnout expectations, GDP growth, among many, many others, they formulate their predictions into a system of ratings. This system works from Safe Democrat to Safe Republican. My model translates these ratings into a 7 point scale, which can be broken down in the following translation:

1 - Safe Democratic victory 2 - Likely Democratic victory 3 - Lean Democratic victory 4 - Pure toss-up 5 - Lean Republican victory 6 - Likely Republican victory 7 - Safe Republican victory

  • One exception to this 7 point scale was Inside Election’s usage of Tilt characterizations (to which Tilt Democratic was 3.5 & Tilt Republican was 4.5)

The expert ratings utilized came from the past decade, 2012-2020. This was specifically chosen given the consistency of the congressional districts, while honoring a more-recent trend in hyper-partisanship across the nation. Additionally, these forecasters have been much more outward facing in recent time, rather than in a historical context. Essentially – the data only goes so far back. I acquired this data through the primary method of scraping the websites of the 3 major forecasters. To do this, the data required research across the Wayback machine (a website time traveler), though I also acquired data from a senior analyst at Larry Sabato’s Crystal Ball. To get the data necessary, I contact J. Miles Coleman via Twitter & received the Crystal Ball data from 2012-2020.

The procedure for this model came in the dataset, where the average was calculated of all of the forecast ratings & used as a general “average.” This average allows us to explore the combination of ratings, additionally avoiding toss-ups, as no district had all 3 ratings as “tossups.”

To see how accurate our model is, we are going to find the correlation between the expert ratings & Democratic party vote share. The decision to make this my model was rooted in the weeks of forecasts based on different outcomes in past blog posts. During one of the weeks, my partner (in class) Julia & I made a joint presentation on the risk of using expert predictions in a model. We talked about the risk of overfitting, as one may be taking into account a multitude of factors as they make a model that includes expert forecasts. Essentially, if Cook decides to use GDP growth in their forecast for house districts, then I make a model including both GDP growth & expert forecasts, I am account for a variable twice over.

Additionally, much of my work in political analysis is understanding the accuracy of these expert forecasters; organizations that make hundreds of thousands of dollars should have a pretty accurate prediction, right? Well, this is a question I am excited to find the answer to after tomorrow evening (or maybe Election Week 2.0)!

To begin in this forecast, we will be exploring the linear regression model between the past years plotted onto a graph.

As seen above, there is a clear correlation between the Democratic Party’s vote share & expert ratings across districts. The data points in that plot are a combination of the 2012-2020 congressional elections. To make this correlation most clear, we find the R-squared value to be 0.697. This shows a semi-strong correlation between the two variables, thus indicating that there is a signficant level of accuracy between the forecasts and the actual results.

There is only one coefficient, besides the constant, in this linear regression model – average rating. The summary of the model shows that as average rating goes up by 1, the expected Democratic Party voteshare decreases by 7%. This means, the closer we get to a Republican victory, the less Democrats end up getting in the final results. Makes sense!

Let’s move on to see what the predictions are saying across all of these forecasts.

== Predictions by Individual Expert Forecast ==

These models are all created by training an expert forecast model with data from 2012-2020. As mentioned previously, we have combined all of the expert ratings across historical & elections and compared the final Democratic Party vote share. For these models, we trained the linear regression model, then added in the new 2022 forecasts. Using its now-trained knowledge, it applied it to each of the districts with current ratings. Spoiler alert: this is the most uninteresting thing I’ve done so far for this prediction… sadly.

We begin with the Cook Political Report:

  • These seemingly are numbered by alphabetical & numeric order (i.e. 1 is Alaska At-Large, 8 is Alabama-07, etc.).

As you can see, there is QUITE a significant lower and upper interval within these predictions. As we talk about limitations, this likely drives the reason why there is high uncertainty. With just 5 elections to base accuracy off of, it is very possible that the predictions could be wildely off from what is expected. If we were to consider the upcoming midterms as a potential fluke elections, we could see results close to the lower & upper levels of Democratic Party vote share.

Altogether, there is much that could be improved upon given the proper resources (or potentially 10-20 years in the future). The problem with this project being conducted now is the limited data and limited understanding (in the modern age) of political data on a congressional level. Moving forward, I hope to update this forecast in 2024, with additional time to bypass some of the limitations, cross check data, and more.

== Conclusion ==

In conclusion, the model finds a results that would probably get any political pundit in trouble if they were to publish it. This means – I am counting on being wrong after tomorrow. But who knows? This model was focused on seeing how accurate & knowledgable the political predictors were when it came down to congressional races, and honestly, I expected more Republican seats than what we have. Though, because of the high uncertainty and high levels of difference between the lower and upper bounds, there is still A LOT of wiggleroom for Democrats and Republicans in these races.

Though this model predicts a Democratic majority, the Democratic Party can still likely expect to lose the midterm elections. The good thing about the human voice behind a model such as this one, is that I recognize this model will likely be wrong. I am looking forward to reflecting onto the accuracy of this model (in a sense, a combination of other models, variables, etc.).

Thank you so much for taking the time to read my final forecast – let’s see what happens :).

