CCRL 404FRC
Downloads and Statistics
March 30, 2008
Testing summary:
Total: 26'100 games
played by 35 programs
206 CPU days (X2 4600+)

White wins: 10'705 (41.0%)
Black wins: 9'696 (37.1%)
Draws: 5'699 (21.8%)
White score: 51.9%

Pure list

"Pure" list removes rating distortion

"Pure" list is computed to remove the distortion that may affect the main rating list. Distortion appears when several versions or settings of the same engine are included together in the testing study. Suppose you have engine A and several versions of engine B: B1, B2, B3. Suppose also that A is particularly strong versus any version of B, which often happens in real testing because of some characteristics of those engines. In such case A will have higher rating, comparing to the study where only one version of B is present. Same thing may happen when A is weak versus B, getting lower rating.

To remove that distortion, a separate game database is constructed from games played only by best version in each engine "family". To save some space and time, pure database has all moves stripped out, it contains PGN header and results only. Then the "Pure list" is computed based for that "pure" database using Bayeselo.

Pure lists for all classes of engines

All engines   (32-bit)

1-2-CPU engines   (32-bit)

Single-CPU engines   (32-bit)

Free engines   (32-bit)

Free 1-2-CPU engines   (32-bit)

Free single-CPU engines   (32-bit)

Open source engines   (32-bit)

Open source 1-2-CPU engines   (32-bit)

Open source single-CPU engines   (32-bit)

Pure lists for complete database

Pure database download

To save space, pure database has all moves stripped out, it contains PGN header and results only. This pure database is useful only for rating calculation or similar analysis, it does not have actual games, only the results.

Download pure database, 7'300 games:

CCRL 404FRC Rating List — Pure all engines

Shredder UCI GUI, Ponder off, 3-4-5 piece EGTB, 128MB hash, random openings with switched sides
Time control: Equivalent to 40 moves in 4 minutes on Athlon 64 X2 4600+ (2.4 GHz)
Computed on March 30, 2008 with Bayeselo based on 7'300 games
RankNameRatingScoreAverage
Opponent
DrawsGamesLOS
ELO+
1Shredder 112972 (+4)+21−2167.4%−135.620.9%800 
99.7%
2Naum 3 64-bit2933 (+3)+20−1964.7%−112.823.9%900
89.7%
3Hiarcs 122915 (−2)+19−1962.5%−92.827.7%900
100.0%
4Loop 10.32f2869 (+8)+18−1760.5%−85.126.2%1100
61.7%
5Fruit 0511032865 (+1)+18−1860.2%−80.725.6%1100
99.9%
6Spike 1.2 Turin2824 (+4)+18−1854.3%−36.023.4%1100
86.6%
7Deep Sjeng 2.7 1CPU2810 (+3)+17−1755.1%−45.520.0%1200
76.2%
8Glaurung 2.0.1 64-bit2801 (+1)+17−1753.8%−35.823.0%1200
100.0%
9Movei 00.8.4382674 (+2)+18−1837.2%+101.819.6%1200
95.8%
10Pharaon 3.5.12652 (+3)+18−1840.2%+74.319.9%1200
100.0%
11Hamsters 0.62591 (−1)+19−1941.2%+67.419.1%1100
64.1%
12Ufim 8.022586 (−3)+19−1941.0%+72.815.7%1100
100.0%
13Hermann 2.02492 (−4)+22−2237.3%+105.915.5%800
100.0%
14Aice 0.99.22355 (−5)+28−2929.2%+172.012.0%500
97.6%
15Ayito 0.2.9942314 (−28)+32−3326.9%+192.011.8%400
 

Explanation of the columns

"Rank" — 1 is best, 2 is second best, etc.. It's simple.
"Engine" — Name and version of an engine.
"ELO" — Engine rating computed with Bayeselo. This column has also a number in brackets, which shows the difference between "Pure" rating and rating computed for complete database. For example "2850 (+10)" in the ELO column means that engine's "pure" rating is 2850, which is 10 points higher than its rating in the complete list.
"+" and "−" — 95% confidence intervals. For example, if engine's rating is 2850, "+" is +20 and "−" is −15, it means that there is only 5% estimated probability that engine's "true" rating is outside of the [2850−15 .. 2850+20] range.
"Score" — Number of points scored by an engine, divided by the number of games. Win is 1 point, draw is 1/2 of a point, and loss is 0. Please note that this is computed for "pure" database, so the numbers are different from the main list.
"Average Opponent" — Difference between the rating of engine tested and average of the opponent ratings for all games played by that engine. (Only games from the "pure" database were counted). Positive number means that engine was playing with stronger opponents, averagely. Negative number - weaker opponents.
"Draws" — Percentage of games by an engine, that ended in a draw. (Only games in "pure" database are counted).
"Games" — Total number of games played by an engine. (Only games in the "pure" database are counted).

The detailed explanation how we construct the "pure" list:

1. We have to find the best versions in each engine family. We can't use the "Best versions" list for that, because the "Best versions" list may be affected by distortion which we are trying to remove. To find the true best version in a family of engines we create separate game database, containing only games by engines from that family. Then we compute the ratings for that small database and take the highest rated engine as best, to represent that family in the "pure" list. There is also a requirement that every engine in the "pure" list must have at least 150 games played with other "pure" engines, and it also must be a public release, not beta or private version.

2. After finding a set of "pure" best versions, we exctract all games where both side engines are from that set, and those games form a "pure" database. Pure list is simply a rating list computed for that database using Bayeselo.

Features of the pure list

First thing that you have to realize about the "pure" list is that it is not necessarily more relevant than the big list of all versions. "Pure" list removes one kind of distortion - distortion that may occur from multiple version of same engine. But the price for that is big - the "pure" database is several times smaller than complete database. This results in much larger statistical error, as you can see in the + / - columns. Also, the "pure" list can still have other types of distortion - distortion resulting from too small (including 0) or too large number of games in particular pairs.

So, don't take this list as certainly superior to the "Best versions" list. This list does not substitute the "Best versions" list, but simply provides a different view for those who may be afraid of distortions. It is possible though that in time this list will become clearly superior, when the "pure" database will be large enough.

Please also realize that some engine version being listed in the "Best versions" list does not guarantee that the same version will be listed in the "Pure" list. Most often it will be the case, but theoretically it is possible that different version will turn out to be the best in the "pure" context.


Crosstable for "pure" database

Results matrix

Pure all engines
#NameELO123456789101112131415
1Shredder 112972 57.5 − 42.5
+45−30=25
+12
54.5 − 45.5
+37−28=35
−26
67 − 33
+58−24=18
+26
57.5 − 42.5
+45−30=25
−53
73.5 − 26.5
+62−15=23
+25
68.5 − 31.5
+60−23=17
−18
74 − 26
+66−18=16
+16
87 − 13
+83−9=8
+40
      
2Naum 3 64-bit293342.5 − 57.5
+30−45=25
−12
 47.5 − 52.5
+33−38=29
−33
61.5 − 38.5
+45−22=33
+12
63 − 37
+52−26=22
+27
66 − 34
+53−21=26
+4
62.5 − 37.5
+52−27=21
−31
70.5 − 29.5
+58−17=25
+16
82 − 18
+72−8=20
−8
87 − 13
+80−6=14
+35
     
3Hiarcs 12291545.5 − 54.5
+28−37=35
+26
52.5 − 47.5
+38−33=29
+33
 56.5 − 43.5
+38−25=37
−3
52.5 − 47.5
+39−34=27
−33
63.5 − 36.5
+47−20=33
−1
64 − 36
+52−24=24
−4
63.5 − 36.5
+51−24=25
−17
84.5 − 15.5
+75−6=19
+33
80 − 20
+70−10=20
−31
     
4Loop 10.32f286933 − 67
+24−58=18
−26
38.5 − 61.5
+22−45=33
−12
43.5 − 56.5
+25−38=37
+3
 48.5 − 51.5
+23−26=51
−12
57 − 43
+43−29=28
+3
59.5 − 40.5
+48−29=23
+9
62 − 38
+44−20=36
+11
74.5 − 25.5
+65−16=19
−6
76 − 24
+66−14=20
−18
88 − 12
+83−7=10
+65
84.5 − 15.5
+78−9=13
+9
   
5Fruit 051103286542.5 − 57.5
+30−45=25
+53
37 − 63
+26−52=22
−27
47.5 − 52.5
+34−39=27
+33
51.5 − 48.5
+26−23=51
+12
 56 − 44
+46−34=20
+2
54 − 46
+41−33=26
−26
56.5 − 43.5
+39−26=35
−20
78.5 − 21.5
+69−12=19
+31
71.5 − 28.5
+58−15=27
−59
85 − 15
+76−6=18
+8
82 − 18
+76−12=12
−9
   
6Spike 1.2 Turin282426.5 − 73.5
+15−62=23
−25
34 − 66
+21−53=26
−4
36.5 − 63.5
+20−47=33
+1
43 − 57
+29−43=28
−3
44 − 56
+34−46=20
−2
 58.5 − 41.5
+49−32=19
+49
58.5 − 41.5
+48−31=21
+38
66 − 34
+51−19=30
−40
77 − 23
+68−14=18
+37
78 − 22
+67−11=22
−22
75.5 − 24.5
+67−16=17
−37
   
7Deep Sjeng 2.7 1CPU281031.5 − 68.5
+23−60=17
+18
37.5 − 62.5
+27−52=21
+31
36 − 64
+24−52=24
+4
40.5 − 59.5
+29−48=23
−9
46 − 54
+33−41=26
+26
41.5 − 58.5
+32−49=19
−49
 48.5 − 51.5
+34−37=29
−19
72 − 28
+62−18=20
+30
67.5 − 32.5
+57−22=21
−27
80 − 20
+71−11=18
+17
80.5 − 19.5
+75−14=11
+34
79.5 − 20.5
+74−15=11
−70
  
8Glaurung 2.0.1 64-bit280126 − 74
+18−66=16
−16
29.5 − 70.5
+17−58=25
−16
36.5 − 63.5
+24−51=25
+17
38 − 62
+20−44=36
−11
43.5 − 56.5
+26−39=35
+20
41.5 − 58.5
+31−48=21
−38
51.5 − 48.5
+37−34=29
+19
 67 − 33
+58−24=18
+2
70.5 − 29.5
+57−16=27
−3
75 − 25
+63−13=24
−26
79 − 21
+73−15=12
+26
88 − 12
+84−8=8
+43
  
9Movei 00.8.438267413 − 87
+9−83=8
−40
18 − 82
+8−72=20
+8
15.5 − 84.5
+6−75=19
−33
25.5 − 74.5
+16−65=19
+6
21.5 − 78.5
+12−69=19
−31
34 − 66
+19−51=30
+40
28 − 72
+18−62=20
−30
33 − 67
+24−58=18
−2
 58.5 − 41.5
+50−33=17
+43
65 − 35
+55−25=20
+27
64 − 36
+53−25=22
+14
70.5 − 29.5
+59−18=23
−32
  
10Pharaon 3.5.12652 13 − 87
+6−80=14
−35
20 − 80
+10−70=20
+31
24 − 76
+14−66=20
+18
28.5 − 71.5
+15−58=27
+59
23 − 77
+14−68=18
−37
32.5 − 67.5
+22−57=21
+27
29.5 − 70.5
+16−57=27
+3
41.5 − 58.5
+33−50=17
−43
 58.5 − 41.5
+46−29=25
0
53 − 47
+42−36=22
−45
72.5 − 27.5
+65−20=15
+17
86.5 − 13.5
+80−7=13
+15
 
11Hamsters 0.62591   12 − 88
+7−83=10
−65
15 − 85
+6−76=18
−8
22 − 78
+11−67=22
+22
20 − 80
+11−71=18
−17
25 − 75
+13−63=24
+26
35 − 65
+25−55=20
−27
41.5 − 58.5
+29−46=25
0
 49 − 51
+38−40=22
−12
72 − 28
+60−16=24
+60
76 − 24
+70−18=12
−24
85.5 − 14.5
+78−7=15
+18
12Ufim 8.022586   15.5 − 84.5
+9−78=13
−9
18 − 82
+12−76=12
+9
24.5 − 75.5
+16−67=17
+37
19.5 − 80.5
+14−75=11
−34
21 − 79
+15−73=12
−26
36 − 64
+25−53=22
−14
47 − 53
+36−42=22
+45
51 − 49
+40−38=22
+12
 57.5 − 42.5
+48−33=19
−37
80 − 20
+74−14=12
+20
80.5 − 19.5
+75−14=11
−13
13Hermann 2.02492      20.5 − 79.5
+15−74=11
+70
12 − 88
+8−84=8
−43
29.5 − 70.5
+18−59=23
+32
27.5 − 72.5
+20−65=15
−17
28 − 72
+16−60=24
−60
42.5 − 57.5
+33−48=19
+37
 69.5 − 30.5
+63−24=13
+17
68.5 − 31.5
+63−26=11
−29
14Aice 0.99.22355         13.5 − 86.5
+7−80=13
−15
24 − 76
+18−70=12
+24
20 − 80
+14−74=12
−20
30.5 − 69.5
+24−63=13
−17
 58 − 42
+53−37=10
+24
15Ayito 0.2.9942314          14.5 − 85.5
+7−78=15
−18
19.5 − 80.5
+14−75=11
+13
31.5 − 68.5
+26−63=11
+29
42 − 58
+37−53=10
−24
 
Performance color legend:
(Only pairs with at least 30 games)
-120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120

History of "pure" testing

History of "pure" testing for all engines


Likelihood of superiority for "pure" database

LOS matrix

Pure all engines
#NameELO123456789101112131415
1Shredder 112972 99.7100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0
2Naum 3 64-bit29330.3 89.7100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0
3Hiarcs 1229150.010.3 100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0
4Loop 10.32f28690.00.00.0 61.7100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0
5Fruit 05110328650.00.00.038.3 99.9100.0100.0100.0100.0100.0100.0100.0100.0100.0
6Spike 1.2 Turin28240.00.00.00.00.1 86.696.6100.0100.0100.0100.0100.0100.0100.0
7Deep Sjeng 2.7 1CPU28100.00.00.00.00.013.4 76.2100.0100.0100.0100.0100.0100.0100.0
8Glaurung 2.0.1 64-bit28010.00.00.00.00.03.423.8 100.0100.0100.0100.0100.0100.0100.0
9Movei 00.8.43826740.00.00.00.00.00.00.00.0 95.8100.0100.0100.0100.0100.0
10Pharaon 3.5.126520.00.00.00.00.00.00.00.04.2 100.0100.0100.0100.0100.0
11Hamsters 0.625910.00.00.00.00.00.00.00.00.00.0 64.1100.0100.0100.0
12Ufim 8.0225860.00.00.00.00.00.00.00.00.00.035.9 100.0100.0100.0
13Hermann 2.024920.00.00.00.00.00.00.00.00.00.00.00.0 100.0100.0
14Aice 0.99.223550.00.00.00.00.00.00.00.00.00.00.00.00.0 97.6
15Ayito 0.2.99423140.00.00.00.00.00.00.00.00.00.00.00.00.02.4 
LOS color legend:
0 10 20 30 40 50 60 70 80 90 100

Created in 2005-2007 by CCRL team
Last games added on March 30, 2008