On increasing the K-factor in the Elo rating system
The latest decision of FIDE to postpone the increase of the K-factor finally caused disputes on whether such increase was needed at all. The nature of these disputes till now has given little contribution to the creation of verity. The rating system is a mathematical model and it should not be discussed only on qualitative level. In some judgments – first of all in those of Bartlomiej Macieja – there are mathematical calculations but they are quite naпve. Being at the same time a grand master and a graduating student of the mathematical statistics department, I hope to submit a qualified opinion on this problem and turn the course of further disputes to a more constructive direction. First of all I’ll say that increasing the K-factor up to 20 seems to me at least as a controversial idea. Nevertheless in my judgments I try to remain impartial as much as possible and separate objective mathematical facts from my subjective opinions.
The article is long, so I’ll describe shortly what a reader should expect. Issues related to the value of the K-factor for players whose rating is less than 2400, as well as those with master’s and grand master’s norms, are not going to be examined. This article will provide answers to Macieja’s arguments as a supporter of the K-factor increase and comments to the long-standing research of Jeff Sonas, according to whom the best value of K is equal to 24. There will also be provided quantitative assessments of effects related to the growth of K. Mathematical terms, where possible, were replaced with common language analogues and I ask those, who are good in mathematics, not to loathe this too much. Finally those, who might find the article too long, can just read an overview at the end.
«Compensation» of a more frequent rating counting?
Macieja’s first argument is as follows: more frequent rating counting results in “effective decrease of the K-factor”. Therefore, upon each increase in rating counting, the K value should be increased, “in order not to change the system too much”. The value K=10 was accepted for an annual publication of ratings, now they are going to be re-counted once in two months. As long as the K coefficient was “forgotten” to be increased at previous reductions of counting period, now its increase is absolutely required.
But generally speaking, it is assumed that at each counting, players’ ratings, either increasing or decreasing, on the average better reflect their level of play (if we do not agree to it, then Elo system should not be modified but just rejected). To count subsequent games, new ratings are used, which are in average closer to the “true” value then the previous ones. It is clear that the accuracy of each next rating-list cannot become worse on this account, and therefore the more frequent counting of rating is, the better. In this sense the very idea of necessity to compensate something unambiguously useful seems strange.
Since Macieja anyway admits that doubling the K-factor “is indeed much bigger than only to compensate the effect of more frequently published rating”, let’s put an end to this argument and come back to it at a more appropriate moment.
Macieja advances another argument in favor of even bigger (than up to 20) increase of K coefficient. He considers two players who play 80 games during a year with absolutely equal results. In his opinion it is logical to expect that if in the beginning of the year the difference between the players was 100 points of rating, then at the end of the year their ratings should align. For this a K=24 coefficient is needed if ratings are published four times a year and K=28 if ratings are published every two months. This is a much more serious argument. In order to answer it, a more general statement needs to be considered firstly.
Here is the full article.
Too confusing!
Yes, I didn’t understand anything. It’s a boring issue anyway, at least it is for the majority of chess players. The proposal is for Susan or another World Champion to make the final decision.
Dmitriy Jakovenko hits a very important point, and let me try to avoid being “Too confusing!” in explaining it further.
In chess performances there is
(A) variation caused by skill, and
(B) variation caused by “variance”—the statistical “luck of the draw”.
A major area of statistics is called “ANOVA”, standing for “Analysis of Variance”. These are techniques to calculate what % EV(A) of the observed variance can be ascribed to a systematic factor (A), versus being pure chance (B). You want EV(A) to be as high as possible.
Usually the more data you have, the higher you can get EV(A)—while if you have just a few data points, there’s little you can say apart from chance. The operational question is, *how much* data do you need to get an acceptably high EV(A)? This amount of data can be called the “Horizon of Relevance”—say H(R) for short.
Now the key point of Jakovenko’s article is that a choice of K-factor pretty much imposes a horizon H(K) by fiat. It’s not a sharp horizon, although it’s OK that he speaks of it as a sharp horizon when saying that K = 20 effectively makes H(K) = 20 games. Sharp or not, the key point is, how does H(K) compare with H(R)?It may be that such a comparison is already implicit in the analyses by Jeff Sonas and others (? Glickman? Nolan? Coulom?) showing that K = 20 is a better predictor of results than K = 10, with the “sweet spot” being K = 24. However, prediction and ANOVA are generally separate issues. ANOVA is a standard-enough buzzword that I would expect to find it in a Google search along with chess/Elo/rating, and I can say that the discussions I’ve read so far linked from ChessBase News have not treated “variance” in line with my point.
My own sense is that for chess, H(R) is at least 50 games. This seems to fall in line with what Jakovenko is arguing. Thus I’ve arrived at Jakovenko’s destination, but by a more-formal road. In any event, I feel that the proponents of a higher K-factor on mathematical grounds need to address this point, formally.
In chess terms, the issue has always been understood as a tradeoff between () keeping up with rapidly improving ratings, versus () not punishing established players too much for random rough patches. And also () preventing declining-and-retiring players from extracting too many points from the rating pool. So politically it needs to be talked about as a tradeoff—and this requires a mass-scale evaluation. Jakovenko’s article like many of the others talks in terms of one “Player X”, but is there any nice summary where people have run all of the past 10-20 years with various K-factors, so that we can make a birds-eye appraisal of which system would have been most “juste” to all players?
Wow, you’re worse than Jakovenko!! Good luck with the statistics. Chess is not about that, it’s about not looking at the opposite sex playing next to your board in critical positions.
Mr. Georgious Makropolous and Mr. Ignacius Leong are the most competent in this area. This is why they have been chosen by Mr. Ilymzinow to confront the urgent matter. I trust their conclusions, which were published somewhere before.
eximaff