我的极小化极大算法输给了我,但似乎无懈可击。

huangapple go评论72阅读模式
英文:

My minimax algorithm loses to me, but seems flawless

问题

我正在尝试使用Python编写一个井字棋机器人,使用极小化极大算法。在我看来,我的代码应该可以工作,但它错误地评估了位置,搜索了太多或太少的节点,并且在每场比赛中都输给我。

mainboard = ["-", "-", "-", "-", "-", "-", "-", "-", "-"]
nodes = 0

def detectwin(b):
    signs = ["O", "X"]
    for s in signs:
        for i in range(3):
            j = 3 * i
            if ((b[0 + j] == s and b[1 + j] == s and b[2 + j] == s) or
                (b[0 + i] == s and b[1 + i] == s and b[2 + i] == s)):
                if s == "O": return 1
                if s == "X": return -1
        if ((b[0] == s and b[4] == s and b[8] == s) or
            (b[2] == s and b[4] == s and b[6] == s)):
            if s == "O": return 1
            if s == "X": return -1
    return 0

def evaluate(board):
    return detectwin(board)

def fullboard(board):
    return all(cell != "-" for cell in board)

def makemove(board, move, maximizingPlayer):
    if maximizingPlayer:
        board[move] = "O"
        return board
    else:
        board[move] = "X"
        return board

def undomove(board, move):
    board[move] = "-"
    return board

def minimax(board, depth, maximizingPlayer):

    global nodes

    if depth == 0 or fullboard(board) or detectwin(board) != 0:
        nodes += 1
        return evaluate(board)

    if maximizingPlayer:
        maxEval = -1000
        for i in range(9):
            if board[i] == "-":
                board = makemove(board, i, True)
                newEval = minimax(board, depth - 1, False)
                maxEval = max(maxEval, newEval)
                board = undomove(board, i)
        return maxEval

    else:
        minEval = 1000
        for i in range(9):
            if board[i] == "-":
                board = makemove(board, i, False)
                newEval = minimax(board, depth - 1, True)
                minEval = min(minEval, newEval)
                board = undomove(board, i)
        return minEval

def findbestmove(board, maximizingPlayer):

    global nodes

    if maximizingPlayer:
        bestmove = -1
        maxEval = -1000
        for i in range(9):
            if board[i] == "-":
                board = makemove(board, i, True)
                nodes = 0
                newEval = minimax(board, 9, False)
                print(f"Eval move {i}: {newEval} ({nodes} nodes)")
                if newEval > maxEval:
                    maxEval = newEval
                    bestmove = i
                board = undomove(board, i)
        return bestmove

def printboard(b):
    signs = ["No", "O", "X"]
    win = signs[detectwin(b)] + " wins"
    print(f'{b[0]} {b[1]} {b[2]}\n{b[3]} {b[4]} {b[5]}\n{b[6]} {b[7]} {b[8]}\n{win}\n')

print("Ready!")
while True:
    move = findbestmove(mainboard, True)
    mainboard = makemove(mainboard, move, True)
    printboard(mainboard)
    yourmove = int(input())
    mainboard = makemove(mainboard, yourmove, False)
    printboard(mainboard)

我尝试评估不同的位置,期望它可以通过搜索每种可能性来正确地给出位置的客观评估。但它却错误地搜索了错误数量的节点并错误地评估了位置。为什么会这样呢?

英文:

I'm trying to code a tic-tac-toe bot using the minimax algorithm in python. My code seems like it should work to me, but misevaluates positions, searches too many or too few nodes and loses to me every game.

mainboard = ["-", "-", "-", "-", "-", "-", "-", "-", "-"]
nodes = 0
def detectwin(b):
signs = ["O", "X"]
for s in signs:
for i in range(3):
j = 3 * i
if ((b[0 + j]==s and b[1 + j]==s and b[2 + j]==s) or
(b[0 + i]==s and b[1 + i]==s and b[2 + i]==s)):
if s == "O": return 1
if s == "X": return -1
if ((b[0]==s and b[4]==s and b[8]==s) or
(b[2]==s and b[4]==s and b[6]==s)):
if s == "O": return 1
if s == "X": return -1
return 0
def evaluate(board):
return detectwin(board)
def fullboard(board):
return all(cell != "-" for cell in board)
def makemove(board, move, maximizingPlayer):
if maximizingPlayer:
board[move] = "O"
return board
else:
board[move] = "X"
return board
def undomove(board, move):
board[move] = "-"
return board
def minimax(board, depth, maximizingPlayer):
global nodes
if depth == 0 or fullboard(board) or detectwin(board) != 0:
nodes += 1
return evaluate(board)
if maximizingPlayer:
maxEval = -1000
for i in range(9):
if board[i] == "-":
board = makemove(board, i , True)
newEval = minimax(board, depth-1, False)
maxEval = max(maxEval, newEval)
board = undomove(board, i)
return maxEval
else:
minEval = 1000
for i in range(9):
if board[i] == "-":
board = makemove(board, i , False)
newEval = minimax(board, depth-1, True)
minEval = min(minEval, newEval)
board = undomove(board, i)
return minEval
def findbestmove(board, maximizingPlayer):
global nodes
if maximizingPlayer:
bestmove = -1
maxEval = -1000
for i in range(9):
if board[i] == "-":
board = makemove(board, i , True)
nodes = 0
newEval = minimax(board, 9, False)
print(f"Eval move {i}: {newEval} ({nodes} nodes)")
if newEval > maxEval:
maxEval = newEval
bestmove = i
board = undomove(board, i)
return bestmove
def printboard(b):
signs = ["No", "O", "X"]
win = signs[detectwin(b)] + " wins"
print(f'{b[0]} {b[1]} {b[2]}\n{b[3]} {b[4]} {b[5]}\n{b[6]} {b[7]} {b[8]}\n{win}\n')
print("Ready!")
while True:
move = findbestmove(mainboard, True)
mainboard = makemove(mainboard, move, True)
printboard(mainboard)
yourmove = int(input())
mainboard = makemove(mainboard, yourmove, False)
printboard(mainboard)

I've tried evaluating different positions, expecting it to give the correct objective evaluation of the postion by searching every possibility. Instead it searches the wrong number of nodes and evaluates positions incorrectly. Why does it do this?

答案1

得分: 7

在你的 detectwin 函数中的 for 循环中,(b[0 + j]==s and b[1 + j]==s and b[2 + j]==s) 实际上是在检查行,因为每次迭代增加 j 3,你在游戏板上移动到下一行。然而,(b[0 + i]==s and b[1 + i]==s and b[2 + i]==s) 也在检查行而不是列,只是以不同的方式。这是因为每次迭代增加 i 1,而不乘以 3,所以你仍然在相同的行内检查 1D 游戏板表示。

所以问题在于你重复检查了行,而根本没有检查列。

当前的列检查循环:

for i in range(3):
    j = 3 * i
    if ((b[0 + j]==s and b[1 + j]==s and b[2 + j]==s) or
        (b[0 + i]==s and b[1 + i]==s and b[2 + i]==s)):
        if s == "O": return 1
        if s == "X": return -1

这根本不检查第二和第三列。修复后的:

for i in range(3):
    j = 3 * i
    if ((b[0 + j]==s and b[1 + j]==s and b[2 + j]==s) or
        (b[i]==s and b[i + 3]==s and b[i + 6]==s)):
        if s == "O": return 1
        if s == "X": return -1

通过将 (b[0 + i]==s and b[1 + i]==s and b[2 + i]==s) 替换为 (b[i]==s and b[i + 3]==s and b[i + 6]==s),检查将会遍历游戏板的列。希望这有所帮助。

英文:

In the for loop in your detectwin function, (b[0 + j]==s and b[1 + j]==s and b[2 + j]==s) is actually checking rows, because when you increase j by 3 each iteration, you're moving to the next row in the board. However, (b[0 + i]==s and b[1 + i]==s and b[2 + i]==s) is also checking rows instead of columns, just in a different way. This is because when you increase i by 1 each iteration without multiplying it by 3, you're staying within the same row of the 1D board representation.

So the issue is that you're checking the rows twice and the columns not at all.

The current column checking loop:

for i in range(3):
j = 3 * i
if ((b[0 + j]==s and b[1 + j]==s and b[2 + j]==s) or
(b[0 + i]==s and b[1 + i]==s and b[2 + i]==s)):
if s == "O": return 1
if s == "X": return -1

which doesn't check the second and third column at all. Fixed:

for i in range(3):
j = 3 * i
if ((b[0 + j]==s and b[1 + j]==s and b[2 + j]==s) or
(b[i]==s and b[i + 3]==s and b[i + 6]==s)):
if s == "O": return 1
if s == "X": return -1

By replacing (b[0 + i]==s and b[1 + i]==s and b[2 + i]==s) with (b[i]==s and b[i + 3]==s and b[i + 6]==s) the check will go through the columns of the game board. Hope this helps.

huangapple
  • 本文由 发表于 2023年7月7日 04:39:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76632410.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定