The data
We currently have available all Champion Ladder match data for seasons 1-51. This post will just be a quick rundown of what is in the dataset, so that I have a reference for when we start to do some real analysis.
The spreadsheets contain one line for each game, so the number of rows gives us the number of games and the number of columns are the number of parameters recorded for each match.
ccl_data = map(ccl_files, read_csv)
num_games <- map(ccl_data, nrow)
num_parameters <- map(ccl_data, ncol)
bind_rows(Games = num_games, Parameters = num_parameters, .id = "") %>%
kable(format.args = list(big.mark=','))
| CCL1 | CCL2 | CCL3 | CCL4 | CCL5 |
|---|---|---|---|---|
| 18,033 | 30,496 | 18,950 | 22,963 | 17,290 |
| 95 | 95 | 95 | 95 | 95 |
So around 20,000 games in each season (except for a big spike in games for season two) and each match has 95 stats recorded. These can be broken down to coach, team and other categories. What they are measuring is generally self-explanatory from the column heading, but some can be a bit confusing.
Coach data
Prefixed with coaches.[0|1]. for home/away coach:
idcoachNumeric coach idcoachnamecoachcyanearnedcoachxpearned
Team data
Prefixed with teams.[0|1]. for home/away team:
idteamlistingNumeric team ididracesNumeric id of team raceteamnameteamlogoFile name for team logo?valueTV of team (as displayed on team page, so missing journeymen?)scoreTouchdownscashbeforematchpopularitybeforematchFan factorpopularitygaincashspentinducementscashearnedActual amount of cash gained by teamcashearnedbeforeconcessionEqual tocashearnedunless there was a concessionwinningsdicespirallingexpensesnbsupporterspossessionball% of game spent in posession of balloccupation[own|their]% of game spent in posession with ball on own/opponent’s half of the pitchmvpNumber of MVPs for a team (useful for figuring out who conceeded?)inflictedpassesInflicted, ie. how many passes this team completedinflictedcatchesinflictedinterceptionsinflictedtouchdownsinflictedtackles[inflicted|sustained]casualtiesArmour breaks, not actual casualties. Those are called injuries[inflicted|sustained]ko[inflicted|sustained]injuries[inflicted|sustained]deadinflictedmetersrunninginflictedmeterspassinginflictedpushoutsCrowdsurfssustainedexpulsionsfrom fouls
Other data
uuid|idUnique game id as hexadecimal/decimal numberleaguename|competitionnamestadiumStadium type for the home teamstructstadiumStadium enhancement (or NA)started|finishedTime game started/finished- Several base URLs for image assets (for putting together match summary screens?)
Things to note
Inflicted casualities/kos/injuries/deaths for one team does not equal the sustained casualities/kos/injuries/deaths for the other team because of injuries that arise through failed dodges and GFIs (maybe failed blocks too?).
We can see how many players have been sent off for fouling, but the number of fouls doesn’t seem to be recorded anywhere.
Text encoding seems to have gotten messed up somewhere, so coach/team names with non-standard characters will come out garbled.
Crowdsurfs do not appear to be recorded properly, because none are recorded across all five Champion Ladder seasons.
Thanks to Dode for providing the data↩
Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Email