Hello, in this article, I am going to detail a dataset that I built a few weeks ago on the game Hearthstone.
Hearthstone kezako !?
Hearthstone is an online card game made by Blizzard in 2013, that is using the lore of the franchise Warcraft started by Blizzard in 1994. I will not dig in detail on the genesis of this game, but I will invite you to watch this video of esport is a nutshell.
The principle of this game is quite simple; you are start by picking a Hero between 9 available; each of this hero has a specific ability and have access to some particular cards.
You can find more details on the heroes here if you want to learn more about it.
After the selection of the hero, the player has to build his deck by respecting some rules:
- Select 30 cards between the three categories (minion, ability, and weapon)
- It cannot contain more than twice the same card
- It cannot include cards that are specific to another hero
After that, when a match of Hearthstone started, each opponent has a life counter of 30, to play the cards you need to use mana that you earn at the end of each turn or when you played some specific cards.
I don’t want to enter too much in detail of a match progression, but I will invite you to watch this video that is explaining the principle of Hearthstone.
I like this game, and I started to play it in 2013 at it’s release, and I find it amusing. I think that I am more a casual player at Hearthstone because I am just relaunching the game very periodically mostly after the Blizzcon during a few weeks, and it’s always a pain to restart because the meta of the game changed, etc.
A few years ago, I found this Kaggle dataset about Hearthstone, and some decks scraped on a site called HearthPwn.
This is a community of people that are sharing around the game (decks, cards , competitions) I am highly recommending you to have a look at it.
So in terms of data, the dataset on Kaggle that I was mentioning contains around 360000 decks, and it was enjoyable to start by work with it, but the datsset was old so I decided to build a system to collect the data from Hearthpwn (same process that in my CrossFit article, reminder when you are scraping don’t be an idiot and don’t overload the server, etc.).
I collected around 800000 decks and all the information on the cards (I am not sharing the dataset because I am not the owner).
Overview of the cards and decks
One of the things that is interesting in this dataset is the number of users (contributors) that produced some decks on the website and how many decks they are producing. In this boxplot, there are the quartiles of the number of decks built by the contributor (without the fliers).
Most of the users are producing only one deck (0-50%), 25% of them produced between 1-3 decks, and in the last quartile, they are creating between 3 and 6 decks. I made some clusters around the number of decks built per player.
Let’s now look at the release of the decks per week.
Most of the big content release are followed by a batch of new decks on Hearthpwn (thanks captain obvious).
From a hero perspective, some preferences are emerging. In the following figure, there is the number of decks built in function of the hero.
The top3 hero seems to be the priest, mage, and paladin, and there is not like a very dominant one.
From a card perspective in the decks, I built a simple evaluation of the type of cards inside the decks.
The graph is maybe not clear, but the idea is to illustrate the repartition of the cards in a deck, and this graph is showing that for example 50% of the decks have:
- 0 weapons
- between 0 and 10 spells
- between 0 and 17 minions
As we can see, most of the decks are not using weapons, and it seems a better plan to have more minions than spells.
Let’s now have a look at a more advanced analysis of the data.
Heroes VS decks VS cards
From a card perspective, the heroes are a little bit different in terms of repartition on their associated cards.
Most of the weapons are associated with the warrior/rogue/hunter/shaman and paladin, and each hero has around 150 cards associated. Let’s have a look now on the evaluation of the content of the deck from a minion perspective.
The warlock seems to be the kind of hero that needs to have more minions inside than the average decks. Let see from the spells perspective.
The warlock is rising from the masses (it’s not following the trend with the minions). And to finish let’s see on the weapons side.
With the last two graphs, we illustrate the usage of weapons on the specific archetypes that are possessing weapons in their cards.
I am going to take some time to analyze a little bit more in detail information about the cards associated with a specific hero. Let’s start with the analysis of the cost of the cards.
As we can see, the first quartile on the cost is identical for all the heroes, and the cost value is 2 (quick to play). For the second quartile except for the druid, the cost of the cards is between 2 and 3. In the third quartile, there is some split that are happening in the hero:
- cost 5 (warlock/priest/warrior/hunter/paladin/mage)
- cost 4 (rogue/shaman)
- cost 6 (druid)
From a general perspective, this analysis illustrates the difference of gameplay with the hero based on their ability. Let’s now have a look at the minion side of the cards on their attack/health point.
In terms of minions (health/attack), there is not big difference in the boxplot between the heroes.
It is a very light analysis of the cards because there is a fundamental element on the cards that I avoid his property because some have some properties that could be more aligned with the hero (the text field under the name of the card, in this case when the minion is dying it substitute your hero with Lord Jaraxxus).
Let’s now have a look at the total cost/attack/health of the decks depending on the hero.
As we can see from a cost perspective, the warlock has an extensive range of values from 20 to 220 ish. About the third quartile, all the heroes have a median value close to 100 manas (maybe a good take away for the building of a deck). Let’s now see the attack points of the decks.
From an attack point of view, the druid has the broadest range of value possible (0-122 ish), but the median total attack of a deck seems to be around 50 points. Let’s see the health of the decks.
In this case, the range of the value for the health are quite large for all the heroes, but I will say that the median cost is around 70 points.
This analysis is very high level of the dataset, and it seems that in general on Hearthpwn deck built:
- most contains less than 17 minions / 10 spells / 0 weapons
- the median cost of the deck is around 100 mana
- the median attack of the decks is around 50 points
- the median health of the decks is around 70 points.
In addition to these findings, all that data gave me some ideas.
Why not build a recommender system for cards?
How to build a (simple!) recommender of cards for Hearthstone
Let’s be honest is not going to be the craziest recommender system that I am going to make, but let’s start small. For me, the most straightforward recommender system is the “most popular items” recommender.
The usage of cards for each hero (and in all the decks) is the core of this recommender. In the following table, I built the top25 cards for each hero (but you can find on this Google spreadsheet, the ranking of all the cards).
I am planning to use this dataset as a source of articles around the recommendations system. This field of machine learning is exciting in terms of processes/algorithms, a good reading on this topic is Practical recommender systems of Kim Falk.
And honestly, It’s an excellent book on this topic, and I think that I am going to use it as the skeleton for my future articles.
Stay tuned, and don’t hesitate to give some feedback.