2023年6月13日 17:12:57go评论98阅读模式

英文:

Compare output of a ML learning model vs a CSV file

问题

我在Google Colab上有一个机器学习模型，并且有这段代码

wrong_english=[
    "I has a dog",
    "They is going to the park",
    "She don't like coffee.",
    "The book belong to him.",
    "He play soccer very well.",
    "My sister and me are going to the party.",
    "I'm not sure who's car is parked outside.",
    "There is many people in the room.",
    "She sings good.",
]
tokenized=tokenizer(
  wrong_english,
  padding="longest",
  max_length=MAX_LENGTH,
  truncation=True,
  return_tensors='tf'
)
out = model.generate(**tokenized, max_length=128)
print(out)
for i in range(len(wrong_english)):
  print(wrong_english[i]+"------------>"+tokenizer.decode(out[i], skip_special_tokens=True))

输出如下

I has a dog------------>I have a dog.
They is going to the park------------>They are going to the park.
She don't like coffee.------------>She doesn't like coffee.
The book belong to him.------------>The book belongs to him.
He play soccer very well.------------>He plays soccer very well.
My sister and me are going to the party.------------>My sister and me are going to the party.
I'm not sure who's car is parked outside.------------>I'm not sure who's car is parked outside.
There is many people in the room.------------>There are many people in the room.
She sings good.------------>She sings good.

我还有一个类似这样的CSV文件

如何比较ML模型的输出和CSV文件B列中的每个记录，并在匹配时写入CORRECT或INCORRECT这个单词?

例如

I has a dog------------>I have a dog. -&gt; CORRECT
They is going to the park------------>They are going to the park. -&gt; CORRECT
She don't like coffee.------------>She doesn't like coffee. -&gt; CORRECT
The book belong to him.------------>The book belongs to him. -&gt; CORRECT
He play soccer very well.------------>He plays soccer very well. -&gt; CORRECT
My sister and me are going to the party.------------>My sister and me are going to the party. -&gt; CORRECT
I'm not sure who's car is parked outside.------------>I'm not sure who's car is parked outside. -&gt; INCORRECT
There is many people in the room.------------>There are many people in the room. -&gt; CORRECT
She sings good.------------>She sings good. -&gt; CORRECT

英文:

I have a machine learning model on Google Colab and i have this code

wrong_english=[
    &quot;I has a dog&quot;,
    &quot;They is going to the park&quot;,
    &quot;She don&#39;t like coffee.&quot;,
    &quot;The book belong to him.&quot;,
    &quot;He play soccer very well.&quot;,
    &quot;My sister and me are going to the party.&quot;,
    &quot;I&#39;m not sure who&#39;s car is parked outside.&quot;,
    &quot;There is many people in the room.&quot;,
    &quot;She sings good.&quot;,
]
tokenized=tokenizer(
  wrong_english,
  padding=&quot;longest&quot;,
  max_length=MAX_LENGTH,
  truncation=True,
  return_tensors=&#39;tf&#39;
)
out = model.generate(**tokenized, max_length=128)
print(out)
for i in range(len(wrong_english)):
  print(wrong_english[i]+&quot;------------&gt;&quot;+tokenizer.decode(out[i], skip_special_tokens=True))

and the out put is this one

I has a dog------------&gt;I have a dog.
They is going to the park------------&gt;They are going to the park.
She don&#39;t like coffee.------------&gt;She doesn&#39;t like coffee.
The book belong to him.------------&gt;The book belongs to him.
He play soccer very well.------------&gt;He plays soccer very well.
My sister and me are going to the party.------------&gt;My sister and me are going to the party.
I&#39;m not sure who&#39;s car is parked outside.------------&gt;I&#39;m not sure who&#39;s car is parked outside.
There is many people in the room.------------&gt;There are many people in the room.
She sings good.------------&gt;She sings good.

also i have a csv file that looks like this one

how can i compare the output of the ML model with every record in the column B of my CSV file and write the word CORRECT or INCORRECT if the values matches?

e.g

I has a dog------------&gt;I have a dog. -&gt; CORRECT
They is going to the park------------&gt;They are going to the park. -&gt; CORRECT
She don&#39;t like coffee.------------&gt;She doesn&#39;t like coffee. -&gt; CORRECT
The book belong to him.------------&gt;The book belongs to him. -&gt; CORRECT
He play soccer very well.------------&gt;He plays soccer very well. -&gt; CORRECT
My sister and me are going to the party.------------&gt;My sister and me are going to the party. -&gt; CORRECT
I&#39;m not sure who&#39;s car is parked outside.------------&gt;I&#39;m not sure who&#39;s car is parked outside. -&gt; INCORRECT
There is many people in the room.------------&gt;There are many people in the room. -&gt; CORRECT
She sings good.------------&gt;She sings good. -&gt; CORRECT

答案1

得分: 2

将模型的预测放入列表中

predicted_correction = [tokenizer.decode(out[i], skip_special_tokens=True) for i in range(len(wrong_english))]

读取您的CSV文件

df = pd.read_csv(CSV_PATH)

添加一个空的正确/不正确结果列

df.insert(0, "Result", "")

for wr, pc in zip(wrong_english, predicted_correction):
indexes = df[df['Incorrect - input'] == wr].index
for i in indexes: # 不排除不正确输入的多次出现的可能性
if df.iloc[i]['Correct - expected output'] == pc:
df.iloc[i]['Result'] = "Correct"
else:
df.iloc[i]['Result'] = "Incorrect"

英文:

# put your model predictions into a list
predicted_correction = [tokenizer.decode(out[i], skip_special_tokens=True) for i in range(len(wrong_english))]
# read your csv
df = pd.read_csv(CSV_PATH)
# add a correct/incorrect empty result column
df.insert(0,&quot;Result&quot;,&quot;&quot;)
for wr, pc in zip(wrong_english, predicted_correction):
    indexes = df[df[&#39;Incorrect - input&#39;] == wr].index
    for i in indexes:  # i&#39;m not excluding the possibility of multiple occurences of the incorrect input
        if df.iloc[i][&#39;Correct - expected output&#39;] == pc:
            df.iloc[i][&#39;Result&#39;] = &quot;Correct&quot;
        else:
            df.iloc[i][&#39;Result&#39;] = &quot;Incorrect&quot;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

比较机器学习模型的输出与CSV文件的输出。

问题

答案1

将模型的预测放入列表中

读取您的CSV文件

添加一个空的正确/不正确结果列

分组 Spark 数据框并将聚合数据转换为字符串。

如何使用pywikibot处理自定义Wikibase？

如何在Python FMX GUI应用程序中创建一个选项卡控件？

使用CSS防止在两栏HTML报告中过早换行。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。