Building a sumo wrestling match predictor using machine learning
Purpose of this article
This article is a narrative designed to enhance the analysis I conducted for my final capstone project in the Udacity course on Data Science.
For this final project, we were given quite a bit of liberty in our chosen domain, and since I’ve developed a heartfelt interest in sumo over the past few years I decided to merge my interest in machine learning with my interest in sumo.
Now, this probably sounds pretty hipster to my fellow westerners (maybe anyone outside of Japan and Mongolia) but it’s a fascinating sport that goes far beyond two big dudes trying to push each other out of a ring. This article will just skim the surface of what sumo is all about but I hope any reader will have their curiosity piqued.
Intended audience
This article is written in a way that is intended primarily for a technical audience. It’s specifically designed to complement my analysis and provide some color and insight into what I have there.
To understand the analysis I will provide some background in sumo but for the sake of time, it will not be a comprehensive sumo guide. The point here is to provide some story and detail to accompany my code.
Additional information and resources
To understand the mechanics of this project requires only a basic understanding of this sport. If you have absolutely no idea what sumo is this definition from Wikipedia can provide the general gist.
You can get a flavor of what actual sumo fighting looks like by watching one of the most beloved wrestlers of our time (circa 1980s), Chiyonofuji, in a series of matches. Note this doesn’t provide any idea of the ritual, pageantry, and build up that makes this sport great, this is just the actual clash which is what ultimately my project is focused on.
This is just the tip of the iceberg so if this seems interesting to you I’ll recommend additional resources below.
Small disclaimer
I work in tech as a software manager and my opinions are strictly my own. I’m taking a Udacity course on Data Science for fun and make no claim of being an expert in data science or sumo. This is a learning experience and I would truly welcome any feedback or corrections.
What are we trying to accomplish here?
An appreciation of sumo aside the goal of this analysis is to tackle the following:
- To create better visuals that help an interested fan understand the history of a particular wrestler. While there are some statistics available on a given wrestler in official sumo channels I personally find these stats difficult to digest. They are mostly writing, or plain dates, with no sense of context. Sure this wrestler weighs 300lbs now but is that more or less than what they have weighed in the past? Their current rank is M6 (see below) but how does their overall career look? I’d like to create a few visuals that answer questions of interest to me, specifically a heatmap which I find leads to a lot of insight.
- To use some form of machine learning to predict who will win a given match. The ideal algorithm will take some information about two wrestlers and have it spit out who is most likely to win. Even after watching off and on for a few years I still find that in many of the matches I have no clue who both wrestlers are. If we could predict with some accuracy (let’s say even 55% of the time) that wrestler A would beat wrestler B that would give me some indication of who to cheer on (depending on my mood the likely winner or the underdog). Some initial reading shows that Gaussian Naive Bayes, LogisticRegression, and Decision Tree Classifier will be the most likely candidates.
Data Understanding & Data Preparation
As mentioned above this article is meant to enhance the solution that can be found in my SumoPredictor jupyter notebook.
The article and notebook are designed with the same headers to facilitate following along, I will opt for leaving code in the notebook and the majority of the flavor and story in the article.
Data Source
The analysis is conducted from two different data sets from data.world that had exactly what I was looking for. It is a history of individual wrestlers (banzuke.csv) and the matches between them (results.csv).
For the most part, I will use English words in place of the Japanese counterpart but I think it makes for more interesting reading to sprinkle in a few Japanese terms.
Rikishi
The first of these terms is rikishi which is a sumo wrestler.
I found that rikishi was easy to remember when I was first learning sumo and will use it hereafter to discuss a particular wrestler.
A rikishi can come from anywhere in the world (mostly Japan, Mongolia, but plenty of other countries) but lives in Japan. They typically start their journey early in their life and they are the star of this article.
We care about several aspects of a given rikishi as explained below.
Banzuke.csv
To help understand more about rikishi let’s explore the banzuke.csv data set by each column.
Note after the lower divisions were dropped there were 0 NA values for all columns.
Column One — basho
Renamed: No
A “basho” is a tournament and is another Japanese term that I found easy to remember and will use throughout this article. Each basho takes place on an odd-numbered month of the year (Jan, March, May, July, Sept, Nov) and is held in various cities around Japan.
For each basho, there is a document released called a banzuke that outlines the wrestlers and their rankings. The data set used here is also called banzuke but outside of that, there is no need to remember this term.
The basho data we have goes from 1983 to 2020 (37 years) and is denoted in a “year.month” format (e.g. 2020.01). For my analysis, I was actually curious if the month played any part in a rikishi’s chance of winning and I split basho up into basho_year and basho_month.
Column Two — id
Renamed: No
Each rikishi has a unique id in this dataset. This is our only immutable column that identifies a given rikishi (even their name can change).
Column Three — rank
Renamed: No
Rikishi are ranked in a hierarchy of divisions and rank.
For the purpose of this analysis, I reduced the dataset to the top division only (called Makuuchi which readers do not have to memorize). The lower divisions are perfectly respectable but this top division is the best-of-the-best and the only division of interest to me.
It is possible for rikishi to rise and fall between divisions and ranks but for our purposes, a rikishi is only relevant when they are in the top division.
The top division rankings work as follows:
- The bottom ranks are all called Maegashira (hereafter abbreviated to “M” no need to memorize this term) and each rikishi is given a number to indicate their relative ranking starting with the lowest M16 up to M1. These are the rank and file of the top division. To get to this level a rikishi needs to fight through five other divisions which is no small feat! But for our purposes, the M’s are the lowest ranking members.
- There are always two of each M ranking at a given time (2 x M16, 2 x M15…etc). M’s rank today typically stops at M16 but in our dataset, this actually goes all the way down to M18 which was an interesting insight. It appears the rules changed in 2004.
- Next up from the M’s is the rank of Komusubi, which is colloquially referred to as “the meat grinder.” Is it the most grueling rank where you are pitted against all of the top-ranked rikishi and is a sort of testing ground to see if you are ready for the top tier.
- The top tier of the top division is split between three ranks: Sekiwake to Ozeki to Yokozuna (the top).
- Once a rikishi has made it to the rank of Yokozuna they are that rank for the rest of their career and cannot be demoted. All other ranks including Ozeki are ephemeral and a rikishi must maintain a certain number of wins to stay at that rank.
For the rank column I performed several data cleaning and transformations:
- All rankings that did not fit in the highest division were removed. This is an analysis for the top ranking only and even though a given rikishi can drop into lower divisions and pop back up we consider all divisions below as a black box.
- Each rank is divided into an idea of “east” and “west” which technically speaking denotes a slight difference in rank (east is higher than west) but practically speaking it’s irrelevant and the idea of east/west is dropped in this analysis.
- This column contains alphanumeric codes like Y1e, Y2eHD that require too much thinking to digest so all ranks were converted to either MX (e.g. M16 to M1) or their proper name (e.g. Yokozuna).
- All ranks are hierarchical and I created another column specifically for comparing rank. In this new column, each rank was assigned a number (starting with M18 the absolute lowest provided with a rank of 0 all the way to Yokozuna ranked at 21). This ranking value was a safe way to provide a number to categorical data as there is a clear linear hierarchy between the ranks.
Column Four — rikishi
Renamed: No
The rikishi column contains the wrestler’s name. This name is not their birth name but more of a stage name similar to what you might see in western wrestling (e.g. Hulk Hogan). These names are unique and serve as a sort of identifier. However, it cannot be relied upon as an immutable identifier as rikishi can change their name for various reasons. I like the idea of keeping this column as rikishi to throw some Japanese flair in there so this column is unaltered.
Column Five — heya
Renamed: Yes — Stable
A heya is where the wrestlers live and train and can be translated as stable. Life in a stable is super interesting but outside the scope of this analysis. If you’re curious about life in a stable I would recommend this documentary.
For our purposes, I altered the column name to stable as I constantly forget what heya means.
My assumption for the stable was that it would have a massive impact on their performance. Stables are run by retired rikishi and this is where current wrestlers live, train, and learn how to become the best wrestlers they can be. A rikishi does not typically wrestle against members of his stable unless there is some sort of playoff situation.
I found the stable to be the most difficult and disappointing aspect of this analysis. It just seems natural that where you train, where you live, where you learn would have a massive impact on your likelihood of winning a match. Because there are quite a number of stables I went through several ways of handling this categorical data as we learned and as found in this helpful article.
One-hot encoding presented too many columns and made it difficult to get any insight. Label encoding showed no correlation to winning which was a complete surprise, and just for good measure I tried hash encoding which showed zero correlation as well. I took this to mean that the stable really didn’t have as much impact as I would have thought.
In all honesty, this was an aspect of sumo I was very excited to deep dive but after seeing the low correlation I decided to exclude it from the analysis as it just complicated the code and increased time to run.
Column Six- shusshin
Renamed: Yes — Hometown
Shusshin is the rikishi’s hometown. This column was unused throughout the analysis and only the column name was changed to make it easier to remember.
It’s possible that coming from a particular place would impact a given wrestler but I made a conscious choice to omit this from the analysis. This is more a matter of pride for locals and is actually a sore point for some Japanese sumo fans as Mongolians have been dominating sumo over the last few years.
Maybe an individual’s hometown or nationality has an impact on their ability to win but it feels like an ethical grey area to me so I decided to not use it even if that is at the expense of accuracy.
Column Seven — birth_date
Renamed: No
A typical rikishi begins their career quite young and this sport is grueling physically. My suspicion was that a rikishi’s age would play a big part in predicting their win. The birth date vs. the basho date was used to determine their age at the time of that particular tournament and stored in a separate column called age.
I also recently read the book Outliers which demonstrated how a person’s birth month can impact their standing in a particular sport. Super interesting so I split the birth_date into the year and month to see if there was any correlation between their month and their ability to win. There wasn’t and because finding the birth month for each column added quite a bit of additional time I removed it from the end analysis.
Column Eight and Nine — height and weight
Renamed: Yes-height_cm and weight_kg
I created two columns based on feet/inches and pounds because it was easier for me to digest. Often when watching matches someone will discuss a rikishi’s weight in kg and that just doesn’t mean anything to me. I either have to do some estimates using mental math or pull up a calculator so having these conversions handy is just a time saver.
Column Ten, Eleven, and Twelve — prev, prev_w, prev_l
Renamed: Yes-previous_rank, previous_wins, previous_losses
What I really love about sumo is the amount of pageantry and build up associated with a basho. There is a very methodical and progressive build-up throughout each day, and equally throughout the entire tournament, that gives it a really epic feeling.
While this build-up is fun for fans I imagine it takes a psychological toll on the rikishi. A given match can last only seconds but thinking about their upcoming match takes around 24 hours. There is a lot of silent downtime, a lot of time to get psyched up…or psyched out.
If a wrestler fails to get at least 8 wins they are demoted (the number of ranks they drop is determined by the Sumo Association and can vary based on the number of losses). There is also a lot of pressure on each wrestler as they are always in danger of dropping rank unless they are Yokozuna. But even for Yokozuna who are not in danger of being demoted if they have enough losses over many basho they are asked to retire.
So the wrestlers grind it out battling one another every other month, for 15 days straight, with constant pressure to be there (if you are injured and can’t fight that’s a loss). In addition the Sumo Association doesn’t give anyone an easy ride, they constantly pit winners against one another to put them to the test. If two relatively ranked individuals are doing well in a given basho, let’s say 5–0 each it’s likely they will be pitted against one another to define a clear leader early on (one will emerge 6–0 the other 5–1). So if you’re a young upstart down in the low M’s and you are having a good basho your confidence is probably running pretty high when “bam!” 24 hours before your next match you find out you’re pitted against someone far beyond your rank! What is that going to do to your confidence?
I wanted to capture the psychological impact of how a rikishi was performing (did they just get up to a new rank and forced to face a whole new class of wrestler? did they drop from a previous rank and is that impacting their judgment worrying about dropping further?) by using the previous rank, previous wins, and previous losses in my analysis.
For the analysis, the previous rank was modified to match the rank as outlined above.
Problem 1 — Better rikishi visualizations
With the data cleaned and massaged the visualizations were fairly straightforward.
The desired end state of this problem was to provide a given rikishi, by name, and have it pull up some relevant stats and visuals to give some more flavor to that wrestler.
While watching sumo, if you watch the full match and not just clips, there is a lot of time to explore both wrestlers. The rikishi go through a series of rituals as they prepare for their face-off (see image below for a small taste).
During this time what I wanted to have was a place where I could put their name and have it pull up things I typically wonder.
Specifically, I used some of pandas built-in visuals and seaborn’s visuals to answer the questions I typically want to know.
Here is an example of my personal favorite wrestler: Hakuho
By entering his name (since it’s a hassle to look for the id) the notebook automatically pulls up the identifier (since names can change) and pulls the data for this wrestler.
Rank History
The first thing I’m always curious about is their rank history. Looking at the official sumo page you can get an idea of this but it’s painful to digest. I want a simple time-series visualization that tells me more of a picture of the relative rankings.
To accomplish this I use the rank_as_value that I set up earlier and sorted the graph by basho_year. The time-series below gives me a pretty good sense (of what I already knew) that Hakuho climbed really fast through the ranks.
At first, I didn’t like the light blue block of color that demonstrated the rise and fall during a given year and wanted to see every rank laid out by basho. That ended up being way too challenging to digest and I got used to the light blue demonstrating the range of ranks during a given year which is all I really care about.
In this image you can see Hakuho entered the top division in 2004 he had some ups and downs but by 2006 he was in the top three ranks consistently. He hit the rank of Yokozuna in 2008 and as described above will be at that rank until he retires.
Just for fun here is another wrestler, the oldest wrestler in the upper division Kyokutenhō who had a pretty interesting career in the upper division bouncing around quite a bit in rank before retiring in 2015.
Physical Traits
The second aspect of a wrestler I’m interested in is their weight. I mean as much as there is a lot of cool things around sumo the main attraction is the enormity of the wrestlers. Hakuo is 6.3 and a big boy but he’s not so big as to look out of proportion and “fat” he’s actually quite muscular and fairly fit.
To envision his weight I like to see it from many angles. A boxplot to show the range, a times-series to see how it’s changing over time, and a histogram to show the frequency of a given weight.
Ok 340lbs is pretty heavy, super fun!
Outside of that, I do print out a bunch of additional information that is interesting to me. Specifically the maximum age of any wrestler (41 if you’re curious which is Kyokutenhō from above although the lower divisions have people as old as 50) and the relative age of this particular rikishi (For Hakuho he’s been in the upper-division between 19 to 35!).
I’m always curious about a wrestler’s height (6.3), his stable(s) (Miyagino), and fellow stablemates (‘Koryu’, ‘Chikubayama’, ‘Kobo’, ‘Hakuho’, ‘Ryuo’, ‘Daikiho’, ‘Ishiura’, ‘Enho’).
Coincidentally my least favorite wrestler is Enho, I had no idea they were stablemates!
Results.csv
The results dataset shows the results of all of the matches that took place for all of the basho between 1983.01 to 2020.11.
There are 15 days in a given basho and each day a rikishi fights one opponent. Typically (unless you are in the meat grinder as per above) you begin the basho by fighting similarly ranked individuals. As you progress you are matched against individuals of roughly the same rank or score.
The individual with the highest score wins the entire basho (i.e. a perfect score would be 15–0, the worst would be 0–15).
Winning the basho is actually unimportant to this analysis as we are actually more interested in the result of individual match-ups. As explained above individuals are paired up against people of similar rank and they will often pit early winners against one another. That means about mid-way through the basho you have a pretty good idea of who is contending for the cup.
But the real joy of watching sumo for me is enjoying each match regardless of who is going to win the overall cup.
Columns
The modifications to this dataset are similar to those above and should be more self-explanatory.
The basho and ids match the items in the banzuke.csv file above and together act as a unique identifier for a match.
I transformed the ranks to be more human-readable and parsed out the wins and losses to create the “psychological state” of the wrestler at that time. If they are on a winning streak are they pumped up to win again? Or perhaps feeling the pressure to keep winning? If they’re losing are they psychologically drained and in their head or are they motivated to bounce back?
Kimarite
The kimarite column is the only new term that requires some explanation.
In short, it is the winning technique. To me, this is not that interesting as it’s challenging to remember as there are many terms and it happens very quickly.
For example “Maintaining close contact with the opponent’s body, usually by a grip on the mawashi, the opponent is forced backwards out of the ring (frontal force out).” is called Yorikiri (see Wikipedia for more info)
Nonetheless, it’s still interesting to group the top ten kimarite used across ALL matches vs. a given rikishi’s top ten. Below you can see Hakuho’s top ten kimarite on the right. Yorikiri represents about 30% of all winning techniques both for Hakuho and all of our upper-division.
Problem 2 — Match prediction
The second problem for this project was trying to see if I could build a match predictor. Given two rikishi, and some relevant stats about them, can we predict a winner with more than 50% accuracy? i.e. can we do better than a random guess?
Over time as you watch a significant number of matches you start to recognize the characters, you learn their names, and generally develop an affinity for some wrestlers. However, there are many matches, even after watching for years, where I just have no idea who to root for. This makes for a number of matches where I just don’t care who wins or losses.
It would make watching each match that much more interesting to have some sort of prediction on who is most likely to win so I have an opinion and interest in more matches!
Preparation
There was some effort involved to prepare the data in the way I wanted. For each row (a match between two rikishi) I wanted to have the relevant statistics at that time.
For a given basho and two rikishi ids, I applied various functions to get their age, weight, height, previous wins, previous losses, and rank.
I actually experimented with their birth month, and stable but removed these fields as they provided little value and just adding additional computation.
I then removed all unnecessary columns to keep the dataframe leaner. This included all string columns and columns that just added noise (e.g. weight in kg since I had pounds).
Heatmap
The heatmap was the most interesting and insightful analysis from this entire project. I find heatmaps that demonstrate correlation to be a fun way to glean insight and there were a lot of surprises here as outlined below.
Iterating over each row the following observations jumped out at me.
The primary focus is the impact on rikishi1_win (or rikishi2_win) but I’ll share some general observations as well.
basho_month — As mentioned above basho are held on every odd month. They also alternate between Tokyo and other cities so my assumption here was that there would be some impact. The crowd size changes, some rikishi must have a hometown where they have more fans, maybe the energy of a place changes. But quite interesting there doesn’t seem to be any impact.
rank, current wins, and current losses — as mentioned above rikishi are pitted against people in a similar situation. They typically fight against similar ranks and people with similar win/loss. No surprise that these are highly correlated.
age — I was a little surprised to see that age had less impact on wins, and pretty much everything than what I had expected.
height and weight —For anyone getting into sumo for the first time the first myth that gets dispelled is that the biggest heaviest wrestlers always win. The assumption is that sumo is all about getting “fat” and the bigger you get the harder it is to push you out of the ring. It’s evident after watching an entire basho that it’s simply not the case; a lot of the shorter nimble wrestlers make short work of their heavy counterparts. The data shows this as well.
previous_wins and previous_losses— as the data shows there is no real correlation to a given wrestlers previous wins or losses on their ability to win a match. I think this speaks to the sumo association doing a great job of pairing wrestlers up with roughly the same scores. There is an expected correlation between a rikishi’s rank and their wins/losses. This makes sense as described above if a wrestler does not get a required number of wins they drop in rank, if they are demonstrating they can win at their current rank they will rise as well.
Predictions
I ended up trying quite a few machine learning techniques to see if I could find a good predictor. There were three that gave results over my goal of 55% as outlined below.
To measure the accuracy of my models I used several techniques to get a general sense of how they were performing:
- Precision — I am curious about how many of my predicted positive results are actually positive. There is no real “cost” of having false positives since a match can really go either way based on a lot of factors. But I’d like to know roughly what the expected precision is.
- Recall — Although guessing about sumo wrestling results doesn’t have huge consequences I’m still curious about the possibility of potentially positive results being marked as negative.
- Accuracy — This is the measure of all correctly identified cases and was my go-to measurement while I was testing. True positives and true negatives are an important metric for this project and I’m using Accuracy as “my number” (although in reality as you will see below they are often similar).
- F1 — This metric provides a bit of additional peace of mind although I’m not as concerned about false negatives and false positives it gives a more rounded idea of how the algorithm performs.
- Actual basho — For each model I ran the predictor against all matches for a given basho. This was the most important (and fun) metric to see how the model performs on actual matches.
Gaussian Naive Bayes
Gaussian Naive Bayes (GNB) exceeded my expectations and gave some excellent scores with an overall anticipated accuracy of 77%.
Pitting this model against the 2020.11 basho yielded predictions of 77% which was extremely exciting. I tested it against various basho and it consistently performed around this rating.
precision recall f1-score support
0 0.77 0.77 0.77 19166
1 0.77 0.77 0.77 19012
accuracy 0.77 38178
macro avg 0.77 0.77 0.77 38178
weighted avg 0.77 0.77 0.77 38178
Accuracy: 77.19108534591085%
confusion matrix Predicted Loss Predicted Win
True Loss 14787 4379
True Win 4317 14695
Predictions for the 2020.11 basho: 77%
Logistic Regression
Logistic Regression gives a very similar prediction with 79% accuracy.
Pitting this model against the 2020.11 basho also yielded predictions of 77% which gave me more confidence in my GNB results. I tested it against various basho and it consistently performed around this rating.
precision recall f1-score support
0 0.79 0.80 0.79 19166
1 0.79 0.79 0.79 19012
accuracy 0.79 38178
macro avg 0.79 0.79 0.79 38178
weighted avg 0.79 0.79 0.79 38178
Accuracy: 79.38604885073737%
confusion matrix Predicted Loss Predicted Win
True Loss 15237 3929
True Win 3935 15077
Predictions for the 2020.11 basho: 77%
Decision Tree Classifier
But the real climax of this whole project was when I came across the Decision Tree Classifier (DTC). Interestingly I found it for a visual that shows a sort of tree that was just a heaping mess that gave me no insight.
Rather than toss it I kept it around and as I was preparing to wrap up the project decided to give it a whirl. The accuracy was similar to the other models around 72% which was lower than expected.
But the real joy was when I pitted it against the 2020.11 basho and got a 93% accuracy rating! That was too good to be true so I’ve triple checked everything but it seems to be correct. I’ve run it against all of the 2020 bashos and it consistently is over a 90% prediction rating!
precision recall f1-score support
0 0.73 0.74 0.73 19166
1 0.73 0.73 0.73 19012
accuracy 0.73 38178
macro avg 0.73 0.73 0.73 38178
weighted avg 0.73 0.73 0.73 38178
Accuracy: 72.7905972570559%
confusion matrix Predicted Loss Predicted Win
True Loss 14089 5077
True Win 5158 13854
Predictions for the 2020.11 basho: 93%
I honestly can’t wait to try this out against the upcoming January basho to see how it fares. My concern now isn’t going to be how the wrestlers are performing but rather how my model will be performing!
Tuning
After determining that DTC resulted in the best predictions I doubled down to try and fine-tune the model using GridSearchCV.
In the end GridSearchCV only deviated from the defaults for one parameter: splitter from “best” to “random.” I actually removed the fine tuning from my notebook but after submission was asked to include something about it so I’m adding it back.
I’m not sure if the default just was the most optimal or if I wasn’t able to find the right combination of test parameters. For all of the float values, I attempted various numbers including 0.0, 1.0, .5, 2.0, 100.0, 1000.0 but the most performant was always 0.0.
For any integer parameters, I tried Fibonacci 1, 2, 3, 5, 8, 13, and tossed in a few larger numbers 100, 1000 as well. Each time it picked the default as the most optimal. Because of the significant increase to run time I removed this section and just opted for the default but see my notebook for the parameters that worked well and didn’t include a significant time increase.
Conclusion
This entire project has been exciting, and I really enjoyed playing around with various rikishi and various basho. I set out specifically to create something I would use personally and I’ve succeeded in my endeavor.
End-to-end-summary
I came into this project knowing that I wanted to try to explore sumo data. My expectations were low that I would find a good data set and even if I did that I would be able to glean interesting insight.
The two questions/problems/goals listed above: to make better visuals and to try and predict winners, were decided before any data exploration. I took the time to think about what I cared about vs. looking at the data and seeing what I could answer (to mimic a more real-world situation and do something I cared about).
After spending some time exploring Kaggle and doing random searches, the data.world datasets (banzuke.csv and results.csv) looked promising. I was thankful they provided good column descriptions and that the data went back as far as it did (although full disclosure I’m disappointed that I couldn’t see more data on Chiyonofuji who is one of the most historically significant rikishi).
Exploring the data was enjoyable and a lot of what I performed is not included in my notebook for the sake of being concise. It was fun and I spent a lot of time just looking up various wrestlers and getting a general sense of what was in the data.
The most difficult parts as outlined above were working with the stables (categorical variables) and transforming the data in ways that I wanted.
Once the data was transformed and explored it was fairly trivial to make the visualizations. It was mostly a matter of tweaking the image, playing around with various types of visuals, and then just enjoying them for various wrestlers.
Machine learning components were by far the most rewarding part of the project. My expectations were low that I would be able to get a strong prediction so achieving 70%+ was exciting. I actually put a lot of my eggs in the logistic regression basket and focused heavily on trying to make that work leaving DTC (which as outlined above I had used for a tree visual) to the side.
I’m thankful I kept it in my notebook and finally went back and tried it. To have 90%+ prediction success just felt unreal so I’ve been going back over it to find any potential error. It will be a lot of fun to try this against the upcoming basho.
I did use GridSearchCV to try and optimize but no matter what I threw at it the optimization always suggested the defaults with the one exception as listed above.
Final Metrics
In the end after discovering a high prediction rate with DTC I opted to focus only on this model. The end metrics look as follows:
precision recall f1-score support
0 0.73 0.76 0.74 19166
1 0.75 0.72 0.73 19012
accuracy 0.74 38178
macro avg 0.74 0.74 0.74 38178
weighted avg 0.74 0.74 0.74 38178
Accuracy: 73.68643206978423%
confusion matrix Predicted Loss Predicted Win
True Loss 14549 4617
True Win 5372 13640
Predictions for the 2020.11 basho: 92%
What this is essentially saying is that 74% of the predictions made by the model for who would win were correct. This is a harmonic mean between the precision and recall scores which are focused primarily on false positives, false negatives as outlined above.
A 74% prediction far exceeded my hope of even 55% (slightly better than a guess). The Sumo Association pairs rikishi quite well to challenge them and sports contain a lot of unmeasurable variables like mood, ego, and drive that make it hard to guess with high accuracy. The fact that my model was able to predict 92% of the November 2020 basho was, and I’m not ashamed to say this, thrilling.
Moving forward
Based on my results in the future I would do the following:
- I’d like to find a better way to merge the two datasets. Running everything up to and after the merging (where I apply functions to get the age, weight, height, etc. in the results dataframe) takes a long time to run and I feel like there is an optimization in there I am missing. I did investigate and tried using raw=True but it is still quite painful.
- I think this could be a really cool web app that could automatically provide this data from the current banzuke (list of matches). After this course is complete I’d like to play around with publishing this in a way similar to what we did for our disaster relief pipeline project.
- I’d like to try it out over 2021 to see if the predictions hold. It’s very possible that I’ve made an error somewhere and it looks more accurate than it really is. If this works out against live matches that could be really exciting and I might want to play with the last dataset which is “odds” to see if I can “beat the odds” for a basho (just for fun I’m not confident in these results and would advise no reader to use this)
Additional Resources
If you’re interested in learning more about Sumo I recommend these resources.
- Freakonomics — My journey into sumo started six years ago when I first read about it in the book Freakonomics. Generally speaking, I’m not a big sports fan but the writing captured my imagination spurred me to dive a bit deeper.
- Jason’s All Sumo Channel — From there I learned most of what I know from this youtube channel Jason’s All Sumo Channel. I highly recommend this channel and think Jason is a good guy and he does a great job; it’s like watching some T.V. with your buddy. Jason got me hooked on sumo and he provides commentary that is really geared towards people new to the sport so it’s very accessible.
- Sumo: A Thinking Fan’s Guide — I came across a book called Sumo: A Thinking Fan’s Guide to Japan’s National Sport that really matches my personal way of appreciating sumo. It’s a “don’t take yourself too seriously” approach that really speaks to me and helped me understand the sport a little deeper. One of the last sections laments about the lack of stats in sumo and that has always been in the back of my mind which was the inspiration for this project.