We thank IM Erik Kislik from the U.S. for sharing a comprehensive and insightful overview of the TCEC Superfinal between Stockfish andKomodo.
He analyzes the decisions made by the engines and uses concrete examples to explain the reasons why the ex-champion Komodo found itself in trouble in the critical games of the match.
We encourage you to dive into this intriguing article and join the discussion here, at our Facebook page Chessdom Friends or at the TCEC chat!
Author: IM Erik Kislik
TCEC gradually gets better and better over time, with stronger engines and a constantly-improving opening selection. It can attract amateurs and professionals alike because of the extremely high quality of play, theoretical battles, and a viewer’s natural curiosity to see how the strongest engines handle different position types and pawn structures. It is a nice thing to see that the authors of the top two programs are competitive with each other in a friendly and positive way. To the untrained eye (and to the trained eye as well), it is quite hard to tell what’s going on and what’s the difference between the top two engines.
Komodo and Stockfish operate very differently and it makes sense to look at things from a broad, objective perspective. As a serious tournament player, I use both Stockfish and Komodo 7 regularly and view them as complementing each other very nicely from an analytical point of view. Stockfish uses depth-based LMR (which you can read about on chessprogramming wikispaces if you’re interested), which means that the greater the depth, the more selective they become. This is one explanation for why Stockfish reaches greater depths, and why each additional ply (a single move for one side) is much less than an additional ply for Komodo. Roughly speaking, beyond around 20 ply, each Komodo ply is almost like 2 Stockfish plies. The interesting question here is, “What is the value of a ply?” From a broad perspective, the empirical evidence (from personal testing as well as CCRL lists) I have seen suggests that Komodo gains roughly 50 elo points per ply from depth 20 to depth 21, with some diminishing returns, such that by depth 28, the gain is possibly closer to 25 elo points per ply. From that perspective, as fans we can expect an even higher quality of play in the future, as hardware improves and pruning becomes more sophisticated. As this happens, it may not necessarily mean that greater depths will be achieved; it may also mean that less will be missed by the same engines playing moves at the same depths.
That being said, Stockfish’s extremely deep search definitely played an important part in the SuperFinal’s result. At the end of the previous SuperFinal (which ended in November 2013), a common consensus was that wide search based on deep positional understanding was slightly superior to primarily deep tactical search. Of course, this is a much too simplified way to look at the subject. There are many factors here, such as sample size, time, depth, and opening selection which also play a role. It is useful for the computer chess community as a whole to consider as many variables as possible before making judgments.
Nevertheless, in the most recent SuperFinal Stockfish’s play was fantastic and currently, I consider it the strongest engine, although I can’t precisely estimate by how many elo points. The number is likely to be between 10 and 30, but I don’t like to speculate. Stockfish’s developments in the last year have been incredible, in part due to the impressive work done on fishtest.
The SuperFinal had a lot of exciting games, some of which were drawn due to superbly staunch defense. I’d like to shed some light on a small collection of decisive games from this Season’s SuperFinal from a human perspective. So let’s jump into some positions, starting with game 1:
In the very first game of the SuperFinal, in this extremely critical position, Komodo played 11. …Nd7, suggesting a score of +.37 at depth 25. I see around 40 correspondence games in this position, but none with 11. …Nd7, which I can only conclude is a bad move. Why did Komodo play it? There are a few likely explanations: 1. Firstly, Komodo only spent 2 minutes and 15 seconds here, which is a very brief think. I can only guess that Komodo’s obvious move algorithm suggested this was clearly the best. If so, there’s a lot of room for improvement. 2. The other reason Komodo seemed to play it was because it misevaluated the strongest move 11. …Qxe5. What now happened in the game was easy to understand: Black didn’t take the central pawn and was left with a cramped position and no real plan for the next 15 moves.
By 26. Qe7, the position already looked lost (or very nearly) for Black. Judging from game 2, where Stockfish played 11. …Qxe5 and drew pretty easily, it seems to me that Komodo had some trouble evaluating the bishop pair in this variation. More importantly, when the White pawn advances (h4-h5), Komodo gives too high of a score for White, since the h-pawn can’t actually go anywhere or do anything. By taking the central pawn and aiming for an exchange of queens, Black could have successfully defended. For those interested in verifying that, take a look at Game 2. What happened in Game 1 was rather lifeless and unpleasant for Black. After the last diagram, White played some preparatory moves, rounded up the c3 pawn, and won the game.
The next error I’d like to point out was in game 8 and shocked me.
Here Komodo has just played 19. Qxd2 with a score of .64 at depth 28. At such a high depth, one would expect the score to be right in a position of this nature without a whole lot of tension or direct mating attacks. Nevertheless, the score here makes no sense. Black is clearly fully equal for a wide variety of reasons: 1. The White knight has no stable central squares to occupy. 2. The White bishop on e2 has nothing to attack and nothing to do. 3. The White rooks have no targets and nothing in particular to do. 4. The Black knight on b6 is coming to e5 immediately, when he is free to gain space on the kingside with …b5, …a5, and …Kb7. Black carried out this rather crude and simple human plan (a natural evolution of the position – a fundamentally difficult concept for computers to grasp) while Komodo’s score steadily dropped by more than 1 from 19. Qxd2 .64 to 31. Be2 -.38 with no tactics having taken place in that time. Structurally, Komodo didn’t initially understand its lack of knight outposts, its lack of useful targets for its bishop, the strength of Black’s dominant e5 knight which is never going to leave, and that Black’s doubled pawn on f4 is anything but a weakness. Here is the mess that Komodo ended up in:
My bulldog is getting very hungry just looking at this position. There’s no need to see any more of this game.
Here is a position that occurred in Game 9. In this position, Komodo played 9. …Nbd7 with an evaluation of .47 at depth 26. If we follow the game’s continuation, clearly Black didn’t even have a fighting chance the rest of the way. As Stockfish played in the game with reversed colors and mentioned in its PV in this game, Black’s only chance was 9. …g5 10. Bg3 Nh5, which grabs the bishop pair and gives Black superb chances of surviving the opening with a playable position. For whatever reason, in the entire match Komodo never played such an idea once, while Stockfish did it on numerous occasions. The Nxg3 hxg3 structure seems to be misevaluated by Komodo, as you can verify in the game in this opening with reversed colors. Humans and computers may have very different assessments of positions in which one side has captured towards the center with a rook pawn. As a human, it’s usually hard to include that doubled pawn as a very favorable part of the evaluation of a position, unless it very clearly is connected to another positional factor. In any case, here’s what actually happened in the game:
After relatively obvious moves from the previous diagram, in which Stockfish doubled rooks and created kingside pressure, now it played the crushing intermediate move Rxf7! and won easily.
Game 18 was probably the most shocking of the match. I don’t want to dwell on it too much, but I’ll mention a key position:
In this position, Komodo played 12. Nge2? with a score of 0.00 at depth 25. I’d rather not make too many comments about this move because the position is almost pure tactics, but the only reasonable plan in the position appears to be 12. h4-h5-xg6 to at least get some counterplay. For whatever reason, Komodo avoided making a direct pawn move aimed at the opponent’s king (and in other games as well), and instead played a move which most likely is -.5 weaker. This looks like a major issue with king safety and also with time management (one or two more ply at least are needed here). White seemed to have completely missed Black’s attacking setup with ….Be6-c4, and Rb7, freeing Black’s major pieces to create serious pressure on the queenside. Here’s what the attack actually led to:
After Black’s last excellent move (…Nf6-d7!), White can essentially resign.
I was really struck by what occurred in Game 27 as well, so I’ll highlight the first position that caught my eye:
Here we have a very tense and complicated position with opposite-side castling but an extra pawn for Black. Now Komodo played 14. …g6?! after a relatively short think. This move surprised me at first glance, since I could never imagine the best move here to be one that moves a pawn in front of your own king, while weakening the dark squares and doing nothing to solve Black’s developmental problems. When I looked at this game, I was wondering what happens if Black plays the typical move 14. …b5 (which is thematic in similar structures in the Semi-Slav and Caro Kann). Both Stockfish and Komodo barked at the move, in view of 15. Ne5 bxc4 16. Qxc4 Qd5 17. Qxc6 Qxc6 18. Nxc6 Nd5 19. Nc3 Bb7, which has a score of around +.3 on both Stockfish (d35) and Komodo (d25). Nevertheless, it is likely that Komodo would have drawn this position with Black (partially because White’s 3 isolated pawns make it hard to convert the advantage) by playing a clueless IM’s intuitive blitz suggestion. Nevertheless, 14. …a5, with the intention of either …a4 or …b5!? would have definitely been stronger, when Stockfish merely suggests 15. Ng3. In this case, Black gives his extra pawn back and obtains an equal position. The move 14. …g6 was most likely an example of an inaccurate assessment of Black’s king safety (and quite possibly a wrong assessment of both king positions). In the game, after 15. h4, White whipped up a very strong and natural attack and obtained a winning position quickly. Here’s what we arrived at:
Black was able to somehow stay alive for some time by uncorking the shocking move …e5!! here, but eventually White converted its large advantage.
Game 37 also provokes a definite head-scratch:
In this position, Komodo’s last move was 43. …Rec8, clocking in at 0.00 at depth 29. This is a very mysterious score. Clearly there’s no perpetual check and White has the plan of moving his rook to b7 eventually (Ra5-b5-b7), so to human eyes it seems likely White is on the verge of winning. Stockfish replied 44. Ra5 with a score of +.95 which looks about right to me. Looking through Komodo’s PV, it suggests an extremely strange line that doesn’t make any sense to me, starting with 44. Rb1? In any case, it is quite possible that Komodo simply lacked the search here to properly catch the shot that occurred in the game:
In this bizarre position where Black’s king may optically look somewhat safe, Stockfish played h6!!, completely wrecking Black’s king position and winning the game easily after …Kxh6 Qf7!, intending Rh1+.
And finally, one of the biggest score discrepancies among the engines occurred in Game 47, where Stockfish scored a very quick and easy win. Take a look at this position on move 19 for White, after Komodo’s last move of 18. …Nf4, which it gives .13 at depth 26:
Komodo’s score here is very hard to understand. I would have imagined that White is close to winning here, without even consulting a computer. He has the bishop pair, a completely closed kingside which Black can never realistically attack, and an eventual break on the queenside by c5 or sacrificing on b6 (as in the game). It seems that Komodo misevaluated the bishop pair and the relative safety of both of the kings here. Just watch how in 5 ply the SF score jumps to +2.15.
In this position, Stockfish played the thirst-quenching Bxb6!! and eventually disrobed the Black king for a quick victory.
I would like to have made some relevant comments about Stockfish’s time management problems or king safety misevaluations, but unfortunately I wasn’t able to spot very many, indicating that the Stockfish developers have done a marvellous job with the engine.
In a basic sense, many people seem to forget that you only lose a chess game by making mistakes (assuming you aren’t playing a position that’s losing by force). A famous quote comes to mind: “the presence of intelligence is the absence of stupidity.” In the same light, the presence of chess strength is principally the absence of bad moves. It is a bit misleading to suggest that Komodo played terribly in the SuperFinal. The important difference from the past was that Stockfish made many errors in previous events, and misevaluated many positions (for example, positions with a space advantage and a pawn on d6 in the Grunfeld were often given huge scores by Stockfish that were much lower on other engines). Now many of those misevaluations and poor scores are simply not a problem with Stockfish. If SF continues to make the least mistakes, there’s little doubt that it will remain champion for the foreseeable future. I can only hope there’s a close clash at the top.
Feel free to join the discussion here, at our Facebook page Chessdom Friends or join the TCEC chat!
Where was Fritz?
>Where was Fritz?
This tournament required all participants to support the UCI protocol (or WinBoard through Polyglot). Fritz has only ever run under its own proprietary interface. Fritz has not shown up to a computer event in a decade, not since the 2004 WCCC.
Too bad: Fritz 14 recently switched engine authors from Frans Morsch to Gyula Horvath, author of Pandix, a frequent amateur competitor at the WCCC.
Also missing was HIARCS, who chose not to participate because they hadn’t updated their engine since the last TCEC event.