比较机器学习模型的输出与CSV文件的输出。

huangapple go评论67阅读模式
英文:

Compare output of a ML learning model vs a CSV file

问题

我在Google Colab上有一个机器学习模型,并且有这段代码

wrong_english=[
    "I has a dog",
    "They is going to the park",
    "She don't like coffee.",
    "The book belong to him.",
    "He play soccer very well.",
    "My sister and me are going to the party.",
    "I'm not sure who's car is parked outside.",
    "There is many people in the room.",
    "She sings good.",
]

tokenized=tokenizer(
  wrong_english,
  padding="longest",
  max_length=MAX_LENGTH,
  truncation=True,
  return_tensors='tf'
)
out = model.generate(**tokenized, max_length=128)
print(out)


for i in range(len(wrong_english)):
  print(wrong_english[i]+"------------>"+tokenizer.decode(out[i], skip_special_tokens=True))

输出如下

I has a dog------------>I have a dog.
They is going to the park------------>They are going to the park.
She don't like coffee.------------>She doesn't like coffee.
The book belong to him.------------>The book belongs to him.
He play soccer very well.------------>He plays soccer very well.
My sister and me are going to the party.------------>My sister and me are going to the party.
I'm not sure who's car is parked outside.------------>I'm not sure who's car is parked outside.
There is many people in the room.------------>There are many people in the room.
She sings good.------------>She sings good.

我还有一个类似这样的CSV文件

如何比较ML模型的输出和CSV文件B列中的每个记录,并在匹配时写入CORRECTINCORRECT这个单词?

例如

I has a dog------------>I have a dog. -> CORRECT
They is going to the park------------>They are going to the park. -> CORRECT
She don't like coffee.------------>She doesn't like coffee. -> CORRECT
The book belong to him.------------>The book belongs to him. -> CORRECT
He play soccer very well.------------>He plays soccer very well. -> CORRECT
My sister and me are going to the party.------------>My sister and me are going to the party. -> CORRECT
I'm not sure who's car is parked outside.------------>I'm not sure who's car is parked outside. -> INCORRECT
There is many people in the room.------------>There are many people in the room. -> CORRECT
She sings good.------------>She sings good. -> CORRECT
英文:

I have a machine learning model on Google Colab and i have this code

wrong_english=[
    "I has a dog",
    "They is going to the park",
    "She don't like coffee.",
    "The book belong to him.",
    "He play soccer very well.",
    "My sister and me are going to the party.",
    "I'm not sure who's car is parked outside.",
    "There is many people in the room.",
    "She sings good.",
]

tokenized=tokenizer(
  wrong_english,
  padding="longest",
  max_length=MAX_LENGTH,
  truncation=True,
  return_tensors='tf'
)
out = model.generate(**tokenized, max_length=128)
print(out)


for i in range(len(wrong_english)):
  print(wrong_english[i]+"------------>"+tokenizer.decode(out[i], skip_special_tokens=True))

and the out put is this one

I has a dog------------>I have a dog.
They is going to the park------------>They are going to the park.
She don't like coffee.------------>She doesn't like coffee.
The book belong to him.------------>The book belongs to him.
He play soccer very well.------------>He plays soccer very well.
My sister and me are going to the party.------------>My sister and me are going to the party.
I'm not sure who's car is parked outside.------------>I'm not sure who's car is parked outside.
There is many people in the room.------------>There are many people in the room.
She sings good.------------>She sings good.

also i have a csv file that looks like this one

比较机器学习模型的输出与CSV文件的输出。

how can i compare the output of the ML model with every record in the column B of my CSV file and write the word CORRECT or INCORRECT if the values matches?

e.g

I has a dog------------>I have a dog. -> CORRECT
They is going to the park------------>They are going to the park. -> CORRECT
She don't like coffee.------------>She doesn't like coffee. -> CORRECT
The book belong to him.------------>The book belongs to him. -> CORRECT
He play soccer very well.------------>He plays soccer very well. -> CORRECT
My sister and me are going to the party.------------>My sister and me are going to the party. -> CORRECT
I'm not sure who's car is parked outside.------------>I'm not sure who's car is parked outside. -> INCORRECT
There is many people in the room.------------>There are many people in the room. -> CORRECT
She sings good.------------>She sings good. -> CORRECT

答案1

得分: 2

将模型的预测放入列表中

predicted_correction = [tokenizer.decode(out[i], skip_special_tokens=True) for i in range(len(wrong_english))]

读取您的CSV文件

df = pd.read_csv(CSV_PATH)

添加一个空的正确/不正确结果列

df.insert(0, "Result", "")

for wr, pc in zip(wrong_english, predicted_correction):
indexes = df[df['Incorrect - input'] == wr].index
for i in indexes: # 不排除不正确输入的多次出现的可能性
if df.iloc[i]['Correct - expected output'] == pc:
df.iloc[i]['Result'] = "Correct"
else:
df.iloc[i]['Result'] = "Incorrect"

英文:
# put your model predictions into a list
predicted_correction = [tokenizer.decode(out[i], skip_special_tokens=True) for i in range(len(wrong_english))]

# read your csv
df = pd.read_csv(CSV_PATH)
# add a correct/incorrect empty result column
df.insert(0,"Result","")

for wr, pc in zip(wrong_english, predicted_correction):
    indexes = df[df['Incorrect - input'] == wr].index
    for i in indexes:  # i'm not excluding the possibility of multiple occurences of the incorrect input
        if df.iloc[i]['Correct - expected output'] == pc:
            df.iloc[i]['Result'] = "Correct"
        else:
            df.iloc[i]['Result'] = "Incorrect"

huangapple
  • 本文由 发表于 2023年6月13日 17:12:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76463358.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定