{"id":122,"date":"2020-01-22T17:36:34","date_gmt":"2020-01-22T17:36:34","guid":{"rendered":"https:\/\/light-and-code.com\/?p=122"},"modified":"2020-01-23T16:43:00","modified_gmt":"2020-01-23T16:43:00","slug":"building-a-rating-system-for-team-games","status":"publish","type":"post","link":"https:\/\/light-and-code.com\/?p=122","title":{"rendered":"Building a rating system for team games"},"content":{"rendered":"\n<p>In the last months, I have been working on an app to manage the results of foosball matches.&nbsp; This was a side project for me to try out the ionic framework and to use the Firebase toolkit in a real application.<\/p>\n\n\n\n<p>Both\nprojects (ionic and firebase) have a large community and receive a lot of\npraise online. There is a big problem, though, that tends to be underestimated:\nThere is a difference between tutorial content and application!<\/p>\n\n\n\n<p>In a tutorial, all answers to questions are laid out for you, you don\u2019t have to find them. While that format has the advantage of introducing the core problems, names, entry points, etc. of a project to you, it hides one aspect: If you have a project idea and want to use say ionic or firebase in that project, how hard will it be to get it running for your special requirements? <\/p>\n\n\n\n<p>The answer to this question is complex because there is no easy way to determine it and no straight forward path to it either. One part of the answer could be \u201cHow many posts on StackOverflow do you find about a certain problem you have?\u201d. Imagine you run into a problem while setting up a given toolkit on a special operating system \u2013 for argument&#8217;s sake let\u2019s say Gentoo Linux. One of the commands you are supposed to run doesn\u2019t work. You google the error message. The number of good responses, the time it takes until you have resolved your issue, is crucial to how well a framework suits your needs and a tutorial has nothing to do with that.<\/p>\n\n\n\n<p>For that reason, I try to come up with little project ideas of my own and implement them on diverse software stacks to get to know them. The experience of applying them is very educational I feel.<\/p>\n\n\n\n<p>One of\nthese project ideas was to create an app that could store results from foosball\nmatches we play at work. However, if you hand an applied mathematician a long\nlist of game results (numbers) he will eventually want to do something with\nthose numbers. So I decided to figure out how to compute the strength of the\nplayers involved. This was especially interesting to me because I play online\ngames that also have MMR (Multiplayer Matchmaking Rating) systems and find the\ntopic interesting.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction to ELO<\/h2>\n\n\n\n<p>The logical starting point is chess. The ELO system in chess is well established and works really well in the sense, that it offers good predictive properties for matches for two players whose ELO has been computed on a sufficiently large dataset of match results. The idea is simple: Every player is assigned a rating number. The higher this number is, the stronger the player is assumed to be. For every match result between two players, an update is computed where they exchange a certain number of points so the total number of points across all players stays the same (Number of players in the system * starting rating). The \u201conly\u201d difficult part is then to compute said update.<\/p>\n\n\n\n<p>The system is especially simple because there are only three possible results for a chess match: player a wins (1 : 0), player b wins (0 : 1) or a draw (0.5 : 0.5). The basic assumption of ELO is, that the expected outcome for a player should be $$ E_a = \\frac{1}{1+10^{(R_b \u2013 R_a)\/400}} $$<\/p>\n\n\n\n<p>Where \\(R_b\\) is the rating of the opponent, \\(R_a\\) the rating of the player himself, and 400 a tuning parameter. We can then observe multiple properties about this system: <\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>The expected result of both players adds up to 1 as do all the possible outcomes above: $$ E_a + E_b = 1 $$<\/li><li>If \\( R_a = R_b \\) we find $$E_a = \\frac{1}{1+10^0} = 0.5 $$ so for two equally strong players the expected outcome is a tie which makes sense and<\/li><li>If \\( R_a > R_b \\) we find $$ c := (R_b- R_a) \/400 &lt; 0$$ and therefore<br> $$ E_a = \\frac{1}{1+ 10^{c}} > \\frac{1}{1 + 1} = 0.5 $$ and vice versa for \\( R_a &lt; R_b \\).<\/li><\/ul>\n\n\n\n<p>Graph 1\nshows how many points this system expects a player to make based on the\nassumption his rating is 1600.<\/p>\n\n\n\n<p>Infobox: You can recreate this graph in Gnuplot using the following commands:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>> Set xrange [1200:2000]\n> Set ytics 0.5\n> Set grid\n> Set yrange [0:1]\n> Ra = 1600\n> Ea(x) = 1.0 \/ ( 1.0 + 10.0**((x - Ra)\/400.0))\n> Plot Ea(x)<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Extension to multiplayer<\/h2>\n\n\n\n<p>When we play foosball in the office, however, we don\u2019t play one-against-one (1v1). Instead, we usually play 2v2. For each team, one player plays the two defensive lines and the other player plays the two offensive lines. We also switch teams, so computing a rating for the team would not suffice, so building a rating system for this setup is not as simple as for chess \u2013 especially because a player can be very good playing one position and unexperienced playing the other one, so a single rating value for a player is also not enough.<\/p>\n\n\n\n<p>To account\nfor the givens of the situation I decided to give every player two rating\nvalues: one for when they play offense and another for their performance on\ndefense. A first, easy, approach would be, to regard a match as two games with\nthe same result. \u201cOffender A against Defender B\u201d would be the one match and\n\u201cDefender A against Offender B\u201d the other and I would give both the same\nresult.<\/p>\n\n\n\n<p>To show the shortcomings of this solution imagine the following situation: 3 beginners play with a world champion. Team A has beginners in both positions. The world champion would play defense on team B. The consequence would be that team A would not score a single goal, therefore Team B would likely win independently of how strong team b\u2019s offense player would be. A 10:0 win for offense B vs. defense A, however, would mean a big rating increase for the B\u2019s offense player (assuming the three beginners were all still at their starting rating of 1600). The system is currently blind to the strength of the player that you are playing with when judging your performance \u2013 which makes no sense for a team game.<\/p>\n\n\n\n<p>For this weighing impact, I chose to go with a \u00be to \u00bc distribution, so the relevant match for you is mostly your matchup against your direct opponent but also your teammate&#8217;s matchup against their opponent. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Computing a rating update<\/h2>\n\n\n\n<p>For standard rating, the update equation is $$ R^{new}_a = R^{old}_a + 20 * (S_a \u2013 E_a) $$<\/p>\n\n\n\n<p>where \\(S_a\\) is the actual outcome of the match and \\(E_a\\) the expected outcome on the basis of the rating discussed above. So if \\(S_a\\) is bigger than \\(E_a\\) you gain rating, otherwise, you lose rating. For the 2v2 foosball case we need to make two adaptations:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>The score has to be transformed to represent the performance. The obvious choice is to say 0:10 is a performance of 0 and 10:0 is a performance of 1 (we always play until one side scores their tenth goal, no overtime). 10:10 is not possible, so there are 20 possible outcomes: 10 where team A wins and 10 where team b wins. We distribute these results evenly between 0 and 1. So if you win, your performance is $$ S_a = 1 \u2013 \\frac{g_b}{19} $$ <br> where g_a is the number of goals the opposing team scored. If you lose, your performance is $$ S_a = 0 +\\frac{g_a}{19}. $$<br> The number 19 occurs because you split the interval 0:1 into 19, not 20 intervals of even size since you are connecting the endpoints basically and 19 intervals separate 20 points.<\/li><li>To deal with the above problem of weighting the performances you compute the rating update as follows: $$ E_{d,a} = \\frac{1}{1 + 10^{\\frac{(0.75 \\cdot R^o_b + 0.25 \\cdot R_{d,b}) &#8211; (0.75 \\cdot R_{d,a} + 0.25 \\cdot R_{o,a}) &#8211; }{400}}} $$ where \\(d\\) and \\(o\\) represent defense and offense, \\(a\\) and \\(b\\] for the teams. This is one of two update values that have to be computed. The other one is $$ E_{o,a} = \\frac{1}{1 + 10^{\\frac{(0.75 \\cdot R_{d,b} + 0.25 \\cdot R_{o,b}) &#8211; (0.75 \\cdot R_{o,a} + 0.25 \\cdot R_{d,a}) &#8211; }{400}}}. $$<br> The updates are then $$ R^{new}_{d,a} = R^{old}_{d,a} + 20 \\cdot ( S_a \u2013 E_{d,a}), $$ $$ R^{new}_{o,a} = R^{old}_{o,a} + 20 \\cdot ( S_a \u2013 E_{o,a}), $$ $$ R^{new}_{o,b} = R^{old}_{d,a} &#8211; 20 \\cdot ( S_a \u2013 E_{d,a}) \\text{ and} $$ $$ R^{new}_{d,b} = R^{old}_{d,a} &#8211; 20 \\cdot ( S_a \u2013 E_{o,a}). $$<\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Predicting game results<\/h2>\n\n\n\n<p>Once you\nhave acquired a lot of game results, you can also start to compute predictions\nof game results. Given two teams and if all their players have well adjusted\nrating values, such a prediction is possible as follows:<\/p>\n\n\n\n<p>We assume\nthat the results will follow a normal distribution around the expected outcome.\nWe can compute the expected outcome by averaging the ratings of the players on\na team and then computing the expected outcome and transforming that value back\nto a score. <\/p>\n\n\n\n<p>We can then\nexpect the results to have a normal distributed outcome around that value. So\nwe know what the center of the distribution is and we also know, that the sum\nof the probability of all possible results of a game has to be 100%. However,\nthis is only one condition on two unknowns, because the normal distribution has\ntwo unknowns at this point: <\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>The value at its peak, i.e. the\nprobability of the predicted result and<\/li><li>The variance of the distribution,\ni.e. how likely is it, that results deviate from the prediction, or how strong\nis the influence of luck on a result.<\/li><\/ol>\n\n\n\n<p>If we have many results in a database, we have a way to compute the variance of results based on past deviations against the prediction for those games. So for all games, we compute the expected outcome and the square of the deviation against that outcome and receive an estimate of the variance of results which we can use to compute the probability of results. The following picture shows prediction versus reality in an app I developed based on this algorithm.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"767\" src=\"https:\/\/light-and-code.com\/wp-content\/uploads\/2020\/01\/Predition-1024x767.png\" alt=\"\" class=\"wp-image-123\" srcset=\"https:\/\/light-and-code.com\/wp-content\/uploads\/2020\/01\/Predition-1024x767.png 1024w, https:\/\/light-and-code.com\/wp-content\/uploads\/2020\/01\/Predition-300x225.png 300w, https:\/\/light-and-code.com\/wp-content\/uploads\/2020\/01\/Predition-768x575.png 768w, https:\/\/light-and-code.com\/wp-content\/uploads\/2020\/01\/Predition.png 1030w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>So now we have:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>A\nstable rating system for games.<\/li><li>An\nextension to team games.<\/li><li>An\nalgorithm to predict results based on player ratings.<\/li><\/ul>\n\n\n\n<p>I will soon\nadd an article explaining how to implement this in JavaScript. Until then \u2013\nstay tuned!<\/p>\n","protected":false},"excerpt":{"rendered":"<div class=\"tmnf_excerpt\"><p>I decided to figure out how to compute the strength of players in multiplayer games. This was especially interesting to me because I play online games that also have MMR (Multiplayer Matchmaking Rating) systems.<\/p>\n<\/div>","protected":false},"author":1,"featured_media":124,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[34,2],"tags":[33,11,32,31,16],"class_list":["post-122","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-highlights","category-math","tag-elo","tag-math","tag-project","tag-rating","tag-science"],"jetpack_featured_media_url":"https:\/\/light-and-code.com\/wp-content\/uploads\/2020\/01\/kicker-4450909_1920.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/light-and-code.com\/index.php?rest_route=\/wp\/v2\/posts\/122","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/light-and-code.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/light-and-code.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/light-and-code.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/light-and-code.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=122"}],"version-history":[{"count":4,"href":"https:\/\/light-and-code.com\/index.php?rest_route=\/wp\/v2\/posts\/122\/revisions"}],"predecessor-version":[{"id":175,"href":"https:\/\/light-and-code.com\/index.php?rest_route=\/wp\/v2\/posts\/122\/revisions\/175"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/light-and-code.com\/index.php?rest_route=\/wp\/v2\/media\/124"}],"wp:attachment":[{"href":"https:\/\/light-and-code.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/light-and-code.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=122"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/light-and-code.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}