英文:
Compare output of a ML learning model vs a CSV file
问题
我在Google Colab上有一个机器学习模型,并且有这段代码
wrong_english=[
"I has a dog",
"They is going to the park",
"She don't like coffee.",
"The book belong to him.",
"He play soccer very well.",
"My sister and me are going to the party.",
"I'm not sure who's car is parked outside.",
"There is many people in the room.",
"She sings good.",
]
tokenized=tokenizer(
wrong_english,
padding="longest",
max_length=MAX_LENGTH,
truncation=True,
return_tensors='tf'
)
out = model.generate(**tokenized, max_length=128)
print(out)
for i in range(len(wrong_english)):
print(wrong_english[i]+"------------>"+tokenizer.decode(out[i], skip_special_tokens=True))
输出如下
I has a dog------------>I have a dog.
They is going to the park------------>They are going to the park.
She don't like coffee.------------>She doesn't like coffee.
The book belong to him.------------>The book belongs to him.
He play soccer very well.------------>He plays soccer very well.
My sister and me are going to the party.------------>My sister and me are going to the party.
I'm not sure who's car is parked outside.------------>I'm not sure who's car is parked outside.
There is many people in the room.------------>There are many people in the room.
She sings good.------------>She sings good.
我还有一个类似这样的CSV文件
如何比较ML模型的输出和CSV文件B列中的每个记录,并在匹配时写入CORRECT
或INCORRECT
这个单词?
例如
I has a dog------------>I have a dog. -> CORRECT
They is going to the park------------>They are going to the park. -> CORRECT
She don't like coffee.------------>She doesn't like coffee. -> CORRECT
The book belong to him.------------>The book belongs to him. -> CORRECT
He play soccer very well.------------>He plays soccer very well. -> CORRECT
My sister and me are going to the party.------------>My sister and me are going to the party. -> CORRECT
I'm not sure who's car is parked outside.------------>I'm not sure who's car is parked outside. -> INCORRECT
There is many people in the room.------------>There are many people in the room. -> CORRECT
She sings good.------------>She sings good. -> CORRECT
英文:
I have a machine learning model on Google Colab and i have this code
wrong_english=[
"I has a dog",
"They is going to the park",
"She don't like coffee.",
"The book belong to him.",
"He play soccer very well.",
"My sister and me are going to the party.",
"I'm not sure who's car is parked outside.",
"There is many people in the room.",
"She sings good.",
]
tokenized=tokenizer(
wrong_english,
padding="longest",
max_length=MAX_LENGTH,
truncation=True,
return_tensors='tf'
)
out = model.generate(**tokenized, max_length=128)
print(out)
for i in range(len(wrong_english)):
print(wrong_english[i]+"------------>"+tokenizer.decode(out[i], skip_special_tokens=True))
and the out put is this one
I has a dog------------>I have a dog.
They is going to the park------------>They are going to the park.
She don't like coffee.------------>She doesn't like coffee.
The book belong to him.------------>The book belongs to him.
He play soccer very well.------------>He plays soccer very well.
My sister and me are going to the party.------------>My sister and me are going to the party.
I'm not sure who's car is parked outside.------------>I'm not sure who's car is parked outside.
There is many people in the room.------------>There are many people in the room.
She sings good.------------>She sings good.
also i have a csv file that looks like this one
how can i compare the output of the ML model with every record in the column B of my CSV file and write the word CORRECT
or INCORRECT
if the values matches?
e.g
I has a dog------------>I have a dog. -> CORRECT
They is going to the park------------>They are going to the park. -> CORRECT
She don't like coffee.------------>She doesn't like coffee. -> CORRECT
The book belong to him.------------>The book belongs to him. -> CORRECT
He play soccer very well.------------>He plays soccer very well. -> CORRECT
My sister and me are going to the party.------------>My sister and me are going to the party. -> CORRECT
I'm not sure who's car is parked outside.------------>I'm not sure who's car is parked outside. -> INCORRECT
There is many people in the room.------------>There are many people in the room. -> CORRECT
She sings good.------------>She sings good. -> CORRECT
答案1
得分: 2
将模型的预测放入列表中
predicted_correction = [tokenizer.decode(out[i], skip_special_tokens=True) for i in range(len(wrong_english))]
读取您的CSV文件
df = pd.read_csv(CSV_PATH)
添加一个空的正确/不正确结果列
df.insert(0, "Result", "")
for wr, pc in zip(wrong_english, predicted_correction):
indexes = df[df['Incorrect - input'] == wr].index
for i in indexes: # 不排除不正确输入的多次出现的可能性
if df.iloc[i]['Correct - expected output'] == pc:
df.iloc[i]['Result'] = "Correct"
else:
df.iloc[i]['Result'] = "Incorrect"
英文:
# put your model predictions into a list
predicted_correction = [tokenizer.decode(out[i], skip_special_tokens=True) for i in range(len(wrong_english))]
# read your csv
df = pd.read_csv(CSV_PATH)
# add a correct/incorrect empty result column
df.insert(0,"Result","")
for wr, pc in zip(wrong_english, predicted_correction):
indexes = df[df['Incorrect - input'] == wr].index
for i in indexes: # i'm not excluding the possibility of multiple occurences of the incorrect input
if df.iloc[i]['Correct - expected output'] == pc:
df.iloc[i]['Result'] = "Correct"
else:
df.iloc[i]['Result'] = "Incorrect"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论