如何高效存储国际象棋游戏?

huangapple go评论59阅读模式
英文:

How to store a chess game efficiently?

问题

The most space-efficient way to store an entire chess game in a database, considering an average game length of 50-70 moves, is to use less than 800 bits. The initial position of a piece could be stored in 6 bits, and the move could take 2-4 bits. It's challenging to store it in less than that while maintaining meaningful data.

英文:

What's the most space efficient way to store an entire chess game in a database? considering an avg game length of 50-70 moves (1 move means 2 player moves once), i hope it takes less than 800 bits.

The initial place of the piece could be stored in 6 bits and the move could take 2-4 bits. is there any way to store it less than that?

答案1

得分: 10

你可以列举给定位置的所有合法走法,并按照约定的方式进行排序(比如按照起始方格,然后是目标方格,然后是升级棋子,如果涉及升级的话...)。然后,通过在这些合法走法的排序列表中标识每个走法的索引来识别每个走法。

拥有最多合法走法的(理论上的)位置有218个合法走法。我们可以确信没有超过256个合法走法的位置,因此8位足以编码单个走法。

使用这种编码方式,一个70步的棋局将需要560位。

动态位数

然而,很明显,在大多数局面中,合法走法的数量更接近40而不是218,因此我们可以使下一个编码的走法的位数动态地依赖于可用的合法走法数量。

由于起始位置只有20个走法,第一步白棋走法只需要5位;对于第一步黑棋走法也是一样的。随着比赛的进行,很快会出现一个具有超过32个合法走法的局面,因此将使用6位来编码。例如,在

1. e3 e5
2. Qg4 d6

...之后,白方有46个合法走法,因此他们的走法将使用6位来编码。对于这四个走法的编码,我们已经节省了3 x 4 = 12位(与每步8位的编码相比)。

在编码时,算法首先会检查有多少个合法走法,然后使用相应数量的位来编码下一步走法。解码器也可以知道合法走法的数量,然后从编码流中消耗相应数量的位。

我不知道在需要的位数方面,70步的最坏情况游戏是什么(在合理的时间内无法计算),但它肯定会远远低于500位。

最小化对合理游戏的优化

上述推导的限制适用于任何游戏,甚至是玩家采取最差走法的游戏。然而,如果你想专注于合理的游戏,你可以进一步优化内存占用:

使用一个确定性的国际象棋引擎,为每个合法走法生成一个分数,并将分数作为实际执行该走法的概率。重要的是,它将在相同的游戏状态下始终为相同的走法生成相同的分数。使用Huffman编码来相应地编码走法。这意味着好的走法将需要比坏走法更少的位数。例如,如果可以捉住皇后,这甚至可以成为一个只有一个位的走法。愚蠢的走法将需要比平均走法更多的位数,甚至可能超过8位(当有许多合法走法并且它们包括非常好和非常差的走法时)。一个几乎所有走法都不好的70步游戏可能需要比非Huffman编码算法更多的位数,但合理的游戏将使用更少的位数。

英文:

You could enumerate all legal moves in a given position, and sort them in an agreed way (like by starting square, then by target square, then by promotion piece -- if it concerns promotion, ...). Then identify each move by its index in this sorted list of legal moves.

The (theoretical) position found with the most legal moves has 218 legal moves. We can be confident there is no position with more than 256 legal moves, so 8 bits are enough to encode a single move.

With this encoding a game of 70 moves would need 560 bits.

Dynamic number of bits

It is however clear that in most positions the number of legal moves is more like 40 and not 218, so we could have the number of bits for the next encoded move depend dynamically on the number of legal moves that are available.

Since the start position only has 20 moves, the first white move would be encoded with just 5 bits; the same for the first black move. As the game continues, there soon will be a position that has more than 32 legal moves, so 6 bits will be used. For instance, after

1. e3 e5
2. Qg4 d6

...white has 46 legal moves, so their move will be encoded using 6 bits. We already saved 3 x 4 = 12 bits (compared to 8 bits per move) for encoding these first four moves.

While encoding, the algorithm will first check how many legal moves there are, and then use the corresponding number of bits to encode the next move. Also the decoder can know the number of legal moves and then consume that number of bits from the encoded stream.

I don't know what would be a worst-case 70-move game in terms of number of needed bits (not feasible to compute in reasonable time), but it will be well below 500 bits.

Minimizing for reasonable games

The above derived limits are applicable to any game, even games where the players make the worst moves. If however you want to focus on reasonable games, you can further optimise the memory footprint:

Use a deterministic chess engine that produces a score for every legal move, and use the score as a probability for that move actually being played. It is important that it will always produce the same score for the same move in the same game state. Use Huffman encoding to encode the move accordingly. This will mean that good moves will take fewer bits than bad moves. For instance, if the Queen can be captured, this could even become a move with just one bit. Stupid moves will need more bits than average, maybe even more than 8 bits (when there are many legal moves and they include very good and very bad moves). A 70-move game where almost all moves are bad could take up more bits than with the non-Huffman encoding algorithm, but reasonable games would use less.

答案2

得分: 3

Number the squares 0-63. Count the number of squares reachable by the current side. Use the minimum number of bits to represent which of the reachable squares was moved to in ceiling(log_2(reachable square count)) bits.

When multiple pieces could move to the destination square, use the same approach to select which of the possible source squares for that destination square was used.

Here's an example using the first few moves from Kasparov v Topalov 4/13/13. I'm numbering with 0 on the bottom-left (white's queen rook starts at 0, white's queen knight at 1, and so on).

  1. e4d6 1100 encodes that this is the 12th of 16 possible squares (0-indexed so 0000 would record e3); no additional bits are needed since only one piece can move here.

  2. d4Nf6 1011 encodes that this is the 11th of 16 possible squares; again no additional bits are needed.

  3. Nc3g6 01101 encodes that this is the 13th of 25 possible squares; again, no additional bits are needed.

  4. Be3Bg7 10001 encodes that this is the 14th of 17 possible squares. Here, the move could have come from two squares (knight or pawn), so we need 1 bit to distinguish which, for a final encoding of 100011.

The bits-per-move this requires will depend on the game. The destination can never require more than 6 bits, and I believe most moves in most games will require only 5. I don't think we should expect <= 4 except for the opening moves and some endgame situations.

The disambiguation step (which of multiple squares move to the destination) will come up fairly often. My guess is that most moves won't require it, but that some moves with 1-2 bits and rarely 3 will come up in most games.

So, without doing the work to implement this and analyze real chess games, I'd expect this to record games in ballpark 6 bits per move on average. It'll be more efficient in games with a long stretch with limited moves (e.g. a long endgame with no rooks or bishops).

英文:

Number the squares 0-63. Count the number of squares reachable by the current side. Use the minimum number of bits to represent which of the reachable squares was moved to in ceiling(log_2(reachable square count)) bits.

When multiple pieces could move to the destination square, use the same approach to select which of the possible source squares for that destination square was used.

Here's an example using the first few moves from Kasparov v Topalov 4/13/13. I'm numbering with 0 on the bottom-left (white's queen rook starts at 0, white's queen knight at 1, and so on).

  1. e4d6 1100 encodes that this is the 12th of 16 possible squares (0-indexed so 0000 would record e3); no additional bits are needed since only one piece can move here.

  2. d4Nf6 1011 encodes that this is the 11th of 16 possible squares; again no additional bits are needed.

  3. Nc3g6 01101 encodes that this is the 13th of 25 possible squares; again, no additional bits are needed.

  4. Be3Bg7 10001 encodes that this is the 14th of 17 possible squares. Here, the move could have come from two squares (knight or pawn), so we need 1 bit to distinguish which, for a final encoding of 100011.

The bits-per-move this requires will depend on the game. The destination can never require more than 6 bits, and I believe most moves in most games will require only 5. I don't think we should expect <= 4 except for the opening moves and some endgame situations.

The disambiguation step (which of multiple squares move to the destination) will come up fairly often. My guess is that most moves won't require it, but that some moves with 1-2 bits and rarely 3 will come up in most games.

So, without doing the work to implement this and analyze real chess games, I'd expect this to record games in ballpark 6 bits per move on average. It'll be more efficient in games with a long stretch with limited moves (e.g. a long endgame with no rooks or bishops).

答案3

得分: 1

以下是您要翻译的内容:

使用简单编码的国际象棋每步需要2*6位:

如何高效存储国际象棋游戏?

移动数(7位)
起始位置(6位) - 目标位置(6位)
起始位置(6位) - 目标位置(6位)
...

例如:

70
E2 - E4
D7 - D5
...

字母'A..H'需要3位,数字'1..8'需要3位,每个移动有2个位置。

70个移动 * 12位 + 7位 = 847位 = 106字节

这已经接近您所需的800位。现在只需应用哈夫曼或LZW压缩。

现在您可以使用每步10位的编码,因为每一方只有16个(0..15 -> 4位)棋子。每局游戏以白子开始(所以奇数步是白子,偶数步是黑子),因此更改编码为:

移动数(7位)
白子索引(4位) - 目标位置(6位)
黑子索引(4位) - 目标位置(6位)
...

这导致:

70个移动 * 10位 + 7位 = 707位 = 89字节

这已经低于您的800位限制。

如果这还不够,您可以使用可变棋子索引表。每当棋子从棋盘上“删除”时,您将索引的值向后移动,以便未使用的棋子将被下一个索引使用,空隙将出现在最后的值上(所以如果棋子13被移除,那么棋子14、15将变成13、14)。这样,一方的棋子少于8个时,您可以使用每个索引的3位,少于4个时使用2位,依此类推,最终您将以每步7到10位结束。

现在您还可以应用压缩(LZW或Huffman),我会选择LZW,因为Huffman还需要传递概率分布表,这将是:

16 * ceil(log2(2*70))位 = ~ 128位

与800位的限制相比,这相对较大,而LZW会在运行时构建其字典,其大小可以任意选择(因此即使RAM较小,也不应该是一个大问题)。

英文:

chess with naive encoding needs 2*6 bits per move:

如何高效存储国际象棋游戏?

number_of_moves(7 bits)
start_position (6bits) - target_position (6bits)
start_position (6bits) - target_position (6bits)
...

for example:

70
E2 - E4
D7 - D5
...

3 bits for letter A..H and 3 for number 1..8 and each move has 2 positions.

70 moves * 12 bits + 7 bits = 847 bits = 106 Bytes

This is already near your desired 800 bits. Now just apply compression either Huffman or LZW.

Now you can use 10 bits per move encoding by exploiting that there is only 16 (0..15 -> 4 bits) chess piece per side. Each game starts with white (so odd moves are white and even are black) so change the encoding to:

number_of_moves(7 bits)
white_piece_ix (4bits) - target_position (6bits)
black_piece_ix (4bits) - target_position (6bits)
...

this leads to:

70 moves * 10 bits + 7 bits = 707 bits = 89 bytes

Which is below your 800 bits already

If this is not enough you could use variable chess piece index table. So each time some piece is "deleted" from board you shift the value of indexes so the unused chess piece will be used by next index and the gap will be at the end values (so if piece 13 is removed then pieces 14,15 would become 13,14). This way once one side has less than 8 pieces you can use 3 bits per its index, less than 4 use 2 bits and so on you will end up with 7 .. 10 bits per move.

Now you could also apply compression (LZW or Hufman) I would go for LZW as Huffman needs to pass also probabililty distribution table which would be:

16 * ceil(log2(2*70)) bits = ~ 128 bits

Which is relatively large portion of data in comparison to 800 bits limit, while LZW constructs its dictionary on the run and its size can be chosed arbitrarily (so even if there is small RAM it should not be a big problem)

答案4

得分: 0

有32个棋子和64个方格。因此,需要11位来存储一个棋子及其位置:5位用于棋子,6位用于位置。

起始位置是固定的,所以不需要存储它。之后,您只需要存储移动的棋子的位置:每个棋子11位。换句话说,您存储的游戏只记录了各个棋子的移动。您知道棋子起始位置,所以您只需要存储棋子最终的位置。

大多数移动可以使用11位存储(一个棋子的移动)。奇怪的是,捉子可以通过仅更新被捉棋子的位置来编码。游戏逻辑可以看到两个棋子占据了相同的空间,并从棋盘上移除被捉的棋子。这是一件好事,因为没有第65个空间(即“不在棋盘上”)可以放置它:您没有相应的位来存储它。吃过路兵也可以以类似的方式编码:让游戏逻辑理解它。
我最初认为王车易位需要编码两个棋子的移动,但实际上并不需要。国王总是移动两个空格,这在任何其他情况下都将是非法移动。您的逻辑可以轻松识别“非法”移动并采取正确的措施。

兵升变是个问题。移动不是问题,但您需要一种方法来引用新创建的棋子。只有5位用于棋子,而这些已经用完了。考虑将您的国王的马兵从G7移动到G8。通常,您会存储的是[14,62](即第14号棋子移动到位置62)。您需要一种方式来表示,嘿,第14号棋子现在是一只皇后而不是一个兵。这是可以做到的,但需要一些额外的位。

因此,在一场70步的游戏中,最好的情况是每步11位,总共770位。在一场50步的游戏中,它是550位。

虽然可能会很紧张,但我认为您所询问的可能在添加任何类型的压缩或复杂逻辑之前就可以实现。当然,对于50到70步的游戏,您可以平均每局游戏800位。

英文:

There are 32 pieces and 64 squares. So it requires 11 bits to store a piece and its position: 5 bits for the piece and 6 bits for the position.

The starting position is fixed, so there's no need to store it. After that, you only have to store the positions of the pieces that move: 11 bits per piece. That is, your stored game is just recording the individual piece moves. You know where the piece started, so all you have to store is where the piece ended up.

Most moves, then, can be stored in 11 bits (one piece moves). Oddly enough, a capture can be encoded by updating only the capturing piece's position. Game logic could see that two pieces are occupying the same space, and remove the captured piece from the board. Which is a good thing, because there's no 65th space (i.e. "not on the board") to put it: you don't have bits for it. En passant can be encoded in much the same way: let the game logic understand it.
I had originally thought that castling would require encoding two piece moves, but it really doesn't. The king always moves two spaces, which in any other situation would be an illegal move. Your logic could easily identify the "illegal" move and do the right thing.

Pawn promotions are a problem. The move isn't the problem, but you need some way to refer to the newly-created piece. There are only 5 bits for pieces and those are already used up. Consider moving your Kings Knight Pawn from G7 to G8. Normally, what you'd store is [14,62] (i.e. piece number 14 moves to position 62). You'd need some way to signal that, hey, piece number 14 is now a Queen instead of a pawn. It can be done, but it'd take some extra bits.

So the best case in a 70-move game is 11 bits per move or 770 bits. In a 50-move game it's 550 bits.

It'd be tight, but I think what you're asking just might be possible even before adding any type of compression or complex logic. Certainly you could average 800 bits per game for games of 50 to 70 moves.

答案5

得分: 0

有很多聪明的答案,但每个回合最多只有16个棋子可以移动,每个棋子的移动次数不超过32次(将王车易位算作王的移动,晋升算作兵的移动)。

这是每回合4位用于棋子,5位用于移动,总共9位。非常简单。

英文:

Lots of clever answers, but there are at most 16 pieces that could move on each turn, and fewer than 32 moves for each piece (count castling as a king move and promotion as a pawn move).

That's 4 bits for the piece and 5 bits for the move -- 9 bits per turn. Easy peasy.

huangapple
  • 本文由 发表于 2023年6月8日 14:33:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76429164.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定