Automatic Tuning & Learning for Slow Chess Blitz Classic

Go to Main Page

Phase 1 (version 1.9)

In the process of exploring modern ideas in chess engines, and AI in general, I thought it would be interesting to experiment with automatic evaluation tuning and learning. I expected with some tuning to guide me I could find some elo gain over version 1.8, but I was skeptical that it would be way better than my hand-tuning that I had spent quite a bit of time on. Slow Classic 1.8 was already quite strong compared to older programs like Slow Blitz WV2.1 and almost on level with Rybka 4, a chess engine that was considered an amazing advancement back in its day.

I didn't want to restart from zero, instead I was first curious if automatically tuning values would work at all. After looking at Tuning on the Chess Programming Wiki, I downloaded zurichess-quiet.epd suite of W/D/L scored test positions. After my first pass on creating a Tuner I could instrument a single eval term with a wrapper, ie. if (rookOpenFile) eval += EvalTuner.Tune( SB(16,16) ). The tuner would then calculate the total squared eval error for the test suite, nudge the evaluation term by 1, and if that reduced the error keep nudging in the same direction, if not try the other direction.

My first impression was that the values it spit out looked surprisingly reasonable. They were usually in the ballpark of my hand-tuned values and always made sense relative to other terms (eg. open files are the best file type for rooks.) After going through some of the values I was most curious about and replacing them with tuned values, I ran 1000 games overnight with results suggesting a 20 elo improvement. So the value didn't just look good, they were actually good! I then did a tuning pass on any variable I thought was important and released 1.9, which was about 40-50 elo stronger.

Phase 2 (version 2.0)

Given the initial success of tuning, it was time to streamline the process with additional automation and speed. I added a thread pool class to split the positions among available threads for the error calculation. After this and some other optimizations a single pass over all 750K positions calculating error was under 0.1 seconds on my 12-core Ryzen. Next I added a GUI window to perform tuning operations and display the results. Then I worked on the sometimes tedious process of splitting all the values out of the evaluation itself and into a single table of named values for the tuner. This way everything could be tuned at once, and I could run as many passes as I wanted since changing one value affects what is best for other values. For a full tune, it walks the variable list once calculating error reduction, then will go down the list always trying to adjust the variable with greatest stored error reduction value. From what I can remember, the approximate elo gain after a few passes of tuning everything at once was 25 elo. (The majority of values are tunable, but there are some I still haven't bothered to split out. See Slow Chess Eval.)

The zurichess suite isn't that big and was the first one I had tried, so I looked for other bigger suites and found lichess.epd. The positions weren't quiet so I added a threaded import step to the tuner that would call the qsearch on each position and write the quiet positions an output epd. The gain from using lichess (+zurichess) in this way was 30 elo. Although I can't measure exact amounts, I did some retunings for tests and lichess or lichess + zurichess always scored clearly better than only zurichess.

The next step was to start generating positions on my own. I figured this would improve the results in several ways : more positions, more recent stronger programs, and importantly using SlowChess could give more targeted results on what positions it actually wins or loses in games. For instance it might help iron out some poorly tuned terms that it can achieve on board but don't lead to wins.

For the training games I decided to include gauntlets of various strong programs (always at least 1 side stronger than my own program), and Slow Chess against various programs and itself. I had CuteChess write out match pgns, then I added an import step that could parse pgns and output quiet test positions scored based on the game results. I didn't want to have positions scored as a win only because the winning side was stronger, not because the position was good, so I read the evaluation and only included the positions if the winning side eval was > 1 and draws only if at least one side had an eval < 1.5. I started taking 5 positions per game spaced out by a minimum of 10 half-moves, though as the set grew I switched to 4 positions per game. This process gained about another 25 elo after a week of generating games, then re-tuning, generating games, tuning.

Phase 2+ (version 2.0 to 2.2)

One important thing to mention is my the tuning process tracks the error reduction. This is helpful for testing out adding new terms to the evaluation. As an example maybe I would add a term "BISHOP_ENEMY_ROOK_ALIGNED" to the evaluation with 0 values, run the tuner on the group of Bishop Terms, and see how much the eval error has dropped. If it barely changes I would probably throw out the term, if I see a larger reduction I would keep the term.

Also for evaluation logic, I could just make the change in code and click Show Error and see if it was lower eg. Do I want to ignore attacker blocked pawns in king safety attack square coverage? The tuner says yes. Sometimes terms would start to become close to alternative terms and/or lose significance, so I might remove them (ROOK_FILE_MINOR_OUTPOST was dropped because it became not majorly different than ROOK_FILE_OPEN. This probably happens from a combination of other changing eval/threat terms and adding additional training data.)

Currently I have generated over 1 million scored positions. I don't have time or computing resources to restart and pursue the from zero approach, but I suspect that given enough resources it would lead to similar results. I think it would be best to start all tuneable positional values from zero but leave initial simple material values to prevent some search/eval interaction issues starting out. From zero might even be better at preventing over-fitting by generating more counter-examples for errors/weaknesses.

The time investment to improve elo through evaluation has increased considerably, although the process still continues to work so far. So it is definitive this automatic tuning/learning is able to create a very strong chess program, and I saw improvement much easier and quicker than my attempts at pure hand-tuning. Automatically tuned values resulted in an almost 200 elo gain from 1.8 to 2.2 (Notes : this is a self-play estimate, would be less on rating lists. Also The actual self-play elo gain between the 1.8 to 2.2 is over 300 elo because of search improvements.)

Notes / Caveats

  1. Overfitting is definitely a real issue. For instance the eval started liking knights on edges, my guess is because the knights often would avoid the edge unless it was truly helpful like in a king attack, so it lacked negative examples. Some of it was ironed out by additional training games, like "KNIGHT_OUTPOST_MOVE" started almost as high as "KNIGHT_OUTPOST", but came way down to a more reasonable looking values. The knight on Edge sq rank 4,5,6 tuning also came down a bit but to lesser extent.

  2. Fighting the tuner by changing values isn't necessarily bad, because they aren't always correct, especially if they are a bit less general or common. However usually I'd eventually give up or forget to adjust them because it was quicker to retune and paste. Better training data would be a more convenient way to improve than manually adjusting every time. Also sometimes there is a reason for weird value like how it fits with other eval terms so it's hard to know which ones are bad.

  3. Local minima is also a real issue to some extent. Sometimes manually changing a value or values to something that looked better *and* re-tuning everything would actually lead to less error. I didn't do anyting to automatically avoid local minima. But in general adjusting terms and re-tuning is worth trying.

  4. Sometimes the static eval is way wrong, like +6 for a draw (esp. in endgame or maybe king safety.) Search irons out a lot of it enough to display a realistic score, but not always. Evals showing "this position is statistically likely to be winning" when there's nothing concrete can look a bit silly, but speculative evals results in stronger and more active/interesting play than older more materialistic programs.

  5. Considering the above notes, even after all the Elo gain I didn't become confident the eval and terms were approaching any optimal truth for best play, only that the method was enough to make Slow way stronger than what I had been doing before.

SlowChess 2.3 Evaluation

So what was the result of all this tuning? I've pasted the eval table values below. The exact implementation of these terms is very important too, but that's not instantly copy/pasteable (without release full source code, but even then it's still not as easily understandable.) I am planning on going back and commenting some of these terms to make their details more obvious.
	Group("MaterialV", &MaterialV);
	V("BISHOP_PAIR", SB(27, 51));
	V("MORE_PIECE_BONUS", SB(46, 114));
	V("TWO_MINORS_VS_ROOK", SB(57, 101));
	V("ROOK_V_KNIGHT_END", SB(-17, 56));
	V("ROOK_V_2_KNIGHT_END", SB(0, 31));
	V("ROOK_V_BISHOP_END", SB(-13, 33));

	Group("KnightV", &KnightV);
	V("KNIGHT_BASE_OFFSET", SB(-11, 30));
	V("KNIGHT_MOB_MIN", SB(-23, -20));
	V("KNIGHT_MOB_MAX", SB(6, 27));
	V("KNIGHT_CENTER_MOVE_BONUS", SB(8, 18)); // Knight has centralizing safe move
	V("KNIGHT_AWOL", SB(-5, -7)); // Knight far away from own king
	V("KNIGHT_OUTPOST", { SB(30,17), SB(16,8) }); // Indexed by {ranks 5-7, rank 4}. Outposts on 4th rank get lower values
	V("KNIGHT_OUTPOST_MOVE", { SB(17,13), SB(14,11) }); // Knight can move to an outpost
	V("KNIGHT_OUTPOST_AWOL", { SB(-12,0), SB(-6,-4) }); // Outpost far away from opp king

	Group("BishopV", &BishopV);
	V("BISHOP_MOB_MIN", SB(-25, -28));
	V("BISHOP_MOB_MAX", SB(9, 23));
	V("BISHOP_STUCK_BLOCKED_PAWN", SB(-43, -51)); // Stuck on back rank, any forward directions blocked by blocked pawns 
	V("BISHOP_FIANCHETTO", SB(11, 8)); // g2/b2 square on king side
	V("BISHOP_OUTPOST", SB(29, 20));
	V("BISHOP_ONLY_REACHES_ONE_SIDE", SB(-6, -11)); // Only can moves are on either - our side or opp side of board
	V("BISHOP_NO_PAWN_TARGETS", SB(0, -23)); // In endgame, all opponents pawns are on other color
	V("BISHOP_MAJOR_ALIGNED", SB(8, 17)); // Aligned with opponent R/Q/K on diagonal without own blocked or opp supported pawn.
	V("BISHOP_CENTER_CONTROL", { SB(9,9), SB(18,16) }); // Controls { 1 center squares, 2 center squares }
	V("BISHOP_TRAPPED_OVER_5", SB(-24, -21)); // Trapped by opp pawns on a2/h2, or a3/h3 at 2/5ths value

	Group("RookV", &RookV);
	V("ROOK_BASE_OFFSET", SB(-7, 18));
	V("ROOK_OPEN_FILE_COUNT", SB(10, 4)); // Adjust base by number of open files
	V("ROOK_MOB_MIN", SB(-22, -35));
	V("ROOK_MOB_MAX", SB(15, 48));
	V("ROOK_CAN_MOVE_TO_OPEN_FILE", { SB(5,10), SB(9,9) }); // Move to open file from { non-blocked, blocked file }. (Open file bitboard stored is pawnhash)
	V("ROOK_TRAPPED_BY_KING", SB(-30, -16)); // King is on rook side of board trapping rook
	V("ROOK_TRAPPED_BY_KING_PARTIAL", SB(-15, -3)); // King is on side of board trapping rook, but can castle or has moved to 2nd rank
	V("ROOK_FILE_OPEN", SB(16, 24));
	V("ROOK_FILE_HALF_OPEN_DEFENDED_PAWN", SB(-2, 2)); // Opp pawn on file is defended by a pawn
	V("ROOK_FILE_MOBILE_PAWN", SB(-2, 3)); // Our pawn on file can be pushed forward
	V("ROOK_FILE_BLOCKED_PAWN_BY_PIECE", SB(-8, -3)); // Our pawn is blocked by a piece
	V("ROOK_FILE_BLOCKED_PAWN", SB(-11, -12)); // Our pawn is blocked by a pawn
	V("ROOKS_TWO_7_K8", SB(36, 87)); // 2 rooks on 7th rank threatening king on 8th
	V("ROOK_QUEEN_GUN", SB(16, 11)); // Queen is behind rook on an open/semi-open file

	Group("QueenV", &QueenV);
	V("QUEEN_BASE_OFFSET", SB(-33, 62));
	V("QUEEN_PAWN_SPREAD", SB(-1, 12));  
	V("QUEEN_MOB_MIN", SB(-41, -48));
	V("QUEEN_MOB_MAX", SB(15, 70));
	V("QUEEN_OPP_ROOK_ON_FILE", SB(-10, -7)); // Opp rook is on queen file
	V("QUEEN_OPP_SIDE", SB(0, 13)); // Queen has safe moves to opponent side of board
	V("QUEEN_NO_RETREAT", SB(-17, -6)); // Queen has no safe moves to own side of board
	V("QUEEN_CLOSED_FILE", SB(-5, -11)); // Queen is behind own pawn on closed file

	Group("TacticalV", &TacticalV);
	V("MINOR_ON_MINOR", SB(22, 17)); // Bishop can take knight or knight can take bishop
	V("MINOR_ON_OUTPOST", SB(10, 9)); // Extra bonus for minor threatening an outposted minor
	V("ROOK_ON_MINOR", SB(13, 18)); // Rook attacking a minor (that's not defended by pawn)
	V("WEAKLY_DEFENDED_PAWN", SB(2, 10)); // If a defender moves, the pawn can be taken
	V("WEAKLY_DEFENDED_PIECE", SB(13, 20)); // If a defeneder moves, the piece can be taken
	V("KING_ON_PAWN", SB(4, 18)); // King attacking a pawn
	V("QUEEN_THREATENED_BY_KNIGHT_MOVE", SB(19, 6)); // Queen not a fan of Bob Seger
	V("QUEEN_THREATENED_BY_BISHOP_ROOK_MOVE", SB(23, 13)); // A rook or bishop can move to attack the queen
	V("QUEEN_BEHIND_PIN", SB(37, 9)); // If one piece moves, the queen can be taken by bishop or rook
	V("PINNED_PIECE_THREATENED", SB(100, 163)); // Pinned piece can (probably be safely taken
	V("PINNED_PAWN_PUSH_THREAT", SB(25, 50)); // Pinned piece threatened by a pawn push
	V("PINNED_PIECE_MOBILITY", { SB(1,21), SB(1,-14), SB(11,2), SB(0,0) }); // By piece type. Pinned pieces don't get 0 mobility, it's not necessarily end of world to be blocking a rook with a knight etc.
	V("OPP_COVERED_SQS", SB(-3, -3));

	Group("KingV", &KingV);
	V("KS_WEAK_SQ_COVER", 6); // Attacks to weak squares touching king
	V("KS_EXTENDED_WEAK_SQ_COVER", 4); // Attacks to weak squares near king but not touching
	V("KS_EXT_DOUBLE_COVER", 5); // Double attacks to squares vaguely near king
	V("KS_BASE_COVER_BY_PIECE", { 2, 2, 1, 2 });  // indexed by piece type
	V("KS_SQ_COVER_BY_PIECE", { 7, 9, 7, 10 });    // indexed by piece type
	V("KS_SAFE_CHECK_SCORE", { 10, 15, 24, 31, 29 }); //  // indexed by number of checks
	V("KS_SAFE_BISHOP_CHECK_ADJUST", -3); // but less for bishops
	V("KS_TOUCH_CHECK_ADJUST", 5); // and more for touch checks
	V("KS_UNSAFE_CHECK", 2); // check square is currently defended by a piece
	V("KS_DISCOVERED_CHECK", 22); // discovered checks are nice
	V("KS_PROMO_CHECK", 30); // pawn promotion to check
	V("KS_PAWN_CHECK", 3); // just any pawn check
	V("KS_COUNT_ZONE_BY_PIECE", { 6, 4, 4, 5 }); // indexed by piece type
	V("KS_COUNT_STM_BONUS", 6); // Bonus piece count for STM, they should be able to bring more pieces in
	V("KS_DEF_KING_ZONE_OCCUPIED", 14); // Have own pieces near king
	V("KS_DEF_KING_ZONE_COVERED", 7); // own pieces cover squares near king
	V("KS_ATTACK_SUB", 280, 2);
	V("KS_COVER_SUB", 40);
	V("KS_OPEN_FILE", -10); // king is on open file
	V("KS_OPEN_DIAGONAL", { -4, -7, -10 }); // no own or opp blocked pawns on king diagonal, Indexed by board openness.
	V("KS_OPEN_HORIZONTAL", { -6, -13, -18 }); // no own or opp blocked pawns on king horizontal, Indexed by board openness.
	V("KS_COVER_PAWN_ENPRISE", -30); // one or more of our cover pawns can be taken
	V("KS_TRAPPED_BACKRANK_BY_PAWN", { -17, -27, -35 }); // indexed by board openness
	V("KS_WEAK_BACK_RANK", { -1, -12, -25 });
	V("KS_MOBILITY_BIAS", { 33, 40, 58, 89 }); // mobility by piece is included in king safety score, tuned separately from regular mobility
	V("KS_MOBILITY_MULT", { 26, 20, 26, 18 });
	V("KS_MOBILITY_WEIGHT", 7); // overall weight of piece mobility
	V("KS_DEFENSE_WEIGHT", 19); // overall weight of defending pieces
	V("KS_PIECE_COUNT_WEIGHT", 8); // overall weight of attacking piece "count"
	V("KS_ROOK_TO_OPEN_ATTACK_FILE", 3); // rook can move to open file next to king

	Group("EndGameV", &EndGameV);
	V("KING_STUCK_ON_EDGE", -6); // King can't move off edge of board
	V("OUTSIDE_PASSED_PAWN_ONE_KNIGHT", 57); // one outside passed pawn versus only a knight can often lead to win
	V("OUTSIDE_PASSED_PAWN_KPK", { -23, 0, 22, 44 }); // also can lead to win in king-pawn-king
	V("KING_OUTSIDE_PAWNS_ENDGAME", { -15, -24, -37, -61, -88, -94 }); // king outside all pawns (sometimes cut off)
	V("KING_MOBILITY", { -30, -4, 6, 7, 9, 7, 7, 7, -2 }); // Why is it bad to have 8 king moves? Not near anything?
	V("KING_OUTSIDE_PAWNS_KPK", { 1, 4, 3, -57, -114, -180 }); // If the opponent king is of all pawns by a far distance in KPK, it might lead to loss
	V("PAWN_RACE_WIN", { 610, 410 }); // Tuner made these even bigger so who knows. Indexed by {Race win, Race tie with Promo Check}
	V("CONNECTED_PAWN_UNSTOPPABLE", { 321, 240 } ); // only against rook and knight

	// End game scale percentages (have been in since old Blitz WV, but with better tuning and a few more formulas/factors seems to really help.) 
        // Max is 100, so less than that scales down evaluation.
	Group("ScaleV", &ScaleV);
	V("SCALE_1P_BASE", 86);
	V("SCALE_1P_2P", 9);
	V("OCB_BASE", 52);
	V("OCB_PAWNS", 12);
	V("OCB_ROOK", -6);
	V("OCB_QUEEN", -27);
	V("SCALE_EDGE_BASE", 73, 2);
	V("SCALE_EDGE_ONLY", -11, 2);
	V("SCALE_EDGE_PIECE_TYPE", { 26, -1, -22, -7 });

	Group("CoordV", &CoordV);
	V("MINOR_BEHIND_PAWN", SB(6, 3)); // Minor piece behind our own pawn
	V("PAWN_BLOCKED_BY_PIECE", SB(-5, -11)); // Outside or king cover pawn blocked by piece (this value sometimes becomes more different)
	V("PAWN_BLOCKED_BY_PIECE_CENTER", SB(-7, -11)); // Central pawn blocked by a piece
	V("OUR_SIDE_SAFE_MOVE", SB(17, 0)); // Safe move square count on our side and behind our pawn, computed in the pawn hash

	Group("PawnV", &PawnV);
	V("PAWN_BASE_OFFSET", SB(-13, 10));
	V("PAWN_SUPPORTED", { SB(1,3), SB(14,10) });
	V("PAWN_BACKWARD", { SB(2,-3), SB(-1,-5), SB(-3,-6), SB(-4,-6) }); // Indexed by centralness of file
	V("PAWN_ISOLATED", { SB(-4,-8), SB(-1,-10), SB(-4,-9), SB(-5,-7) });
	V("PAWN_UNCONNECTED_OPEN_FILE", SB(-10, 1)); // These bonuses are in addition to weak/backward/isolated/doubled bonuses. 
	V("PAWN_UNCONNECTED_CLOSED_FILE", SB(-9, -1)); // (Open and Closed values used to be more different)
	V("PAWN_WEAK_OPEN_FILE", SB(-6, -11));
	V("PAWN_DOUBLED", SB(-4, -14));
	V("PAWN_CONNECTED_RANK", { SB(-5,-1), SB(2,3), SB(2,1), SB(4,3), SB(17,22), SB(60,45) }); // Good to have connected pawns, better if they are nearer promo square
	V("PAWN_CONNECTED_FILE", { SB(-1,-2), SB(-1,2), SB(2,5), SB(0,5) });
	V("PAWN_CONNECTED_OPEN_RANK", { SB(3,7), SB(12,19), SB(41,25), SB(6,7) }); // Also better if a connected pawn is on an open file

	Group("PassedV", &PassedV);
	V("PP_PASSED_RANK", { SB(-3,8), SB(2,10), SB(0,19), SB(18,37), SB(46,75), SB(96,141) }); // Passed pawns rank
	V("PP_PASSED_FILE", { SB(1,1), SB(-1,1), SB(-10,-7), SB(-11,-11) }); // Passed pawns file
	V("PP_KING_DIST", { SB(47,151), SB(79,119), SB(61,79), SB(32,41), SB(10,13), SB(5,0), SB(-5,-7), SB(-9,-6) });  // Our king distance to a passed pawn
	V("PP_O_KING_DIST", { SB(-73,-87), SB(-51,-70), SB(-9,-55), SB(-1,-4), SB(8,33), SB(19,48), SB(29,59), SB(44,56) }); // Opponent king distance to a passed pawn
	V("PP_PROMO_DIST", { SB(145,112), SB(99,92), SB(47,54), SB(10,27), SB(2,8), SB(5,2) }); // Passed pawn distance to promo (multiplied with king_dist + o_king_dist and rescaled.)
	V("PP_COVERED_ADVANCE", { SB(21,75), SB(30,42), SB(12,22), SB(10,10) }); // indexed by promo distance - we cover advance square for passed pawn 
	V("PP_COVERED_ADVANCE_PATH", { SB(60,104), SB(21,62), SB(15,36), SB(14,15) }); 
	V("PP_FREE_PUSH", { SB(186,153), SB(14,51), SB(2,19), SB(3,3) }); // indexed by promo distance - opponent does not cover advance square for passed pawn
	V("PP_FREE_ADVANCE_PATH", { SB(122,298), SB(29,142), SB(0,50), SB(-4,24) });
	V("PP_CONNECTED", { SB(42,31), SB(-7,15), SB(1,5), SB(3,2) }); // extra bonus for connected passed pawns
	V("PP_HANGING", { SB(-21,-48), SB(-26,-12) }); // pawn can't safely advance and can be taken. Indexed by rank {7th,6th}
	V("PP_V_ROOK_MULT", SB(100, 121), 3); // Passed pawns worth a bit more against opp rook (maybe will have to trade the rook)
	V("PP_CANDIDATE_NEAR_MULT", SB(110, 70), 5); // Candidate pawns are not passed yet but with a some pawn pushes can be
	V("PP_CANDIDATE_FAR_MULT", SB(-15, 38), 3); // "Far" candidates require more pushes to become passed