2023年4月17日 11:39:59go评论69阅读模式

英文:

Merge PyTorch predictions to original dataframe

问题

以下是要翻译的部分：

"I have obtained predictions from my PyTorch model as a tensor with the following shape (torch.Size([2958, 96])). My original dataset has 2958 qids and some of them have a max number of documents of 96 (min is 47). The predictions have padded the missing one with -1s. The shape of my original dataframe is (221567, 7).

I would like to merge the predictions from the PyTorch model or tensor back to this dataframe, using the qid. Every row of the tensor represents a qid, while every column represents the rank of that particular document (based on the order of documents in each qid).

Below is a min example (after converting the tensor to a df):

tensor = {'0': ['3', '1','2'],'1': ['2', '1','2'],'2': ['2', '1','-1']}
y_pred = pd.DataFrame(tensor)

data = {'qid': ['0', '0','0','1', '1','1','2', '2'],'irrelevant_col': ['foo', 'foo','foo','foo', 'bar','bar','bar', 'bar']}
original_df = pd.DataFrame(data)

Notice that for qid==2, there are only 2 rows, and the tensor has a '-1' in row 2 and column 2 because of this. Also, the order of the tensor is correct in the sense that it matches the order of the items in the dataframe. This is the target output:

target = {'qid': ['0', '0','0','1', '1','1','2', '2'],'irrelevant_col': ['foo', 'foo','foo','foo', 'bar','bar','bar', 'bar'],'y_pred': ['3', '2','2','1', '1','1','2', '2']}
target_df = pd.DataFrame(target)"

请告诉我如果你需要进一步的帮助。

英文:

I have obtained predictions from my PyTorch model as a tensor with the following shape (torch.Size([2958, 96])). My original dataset has 2958 qids and some of them have a max number of documents of 96 (min is 47). The predictions have padded the missing one with -1s. The shape of my original dataframe is (221567, 7).

Below is a min example (after converting the tensor to a df):

tensor = {&#39;0&#39;: [&#39;3&#39;, &#39;1&#39;,&#39;2&#39;],&#39;1&#39;: [&#39;2&#39;, &#39;1&#39;,&#39;2&#39;],&#39;2&#39;: [&#39;2&#39;, &#39;1&#39;,&#39;-1&#39;]}
y_pred = pd.DataFrame(tensor)

data = {&#39;qid&#39;: [&#39;0&#39;, &#39;0&#39;,&#39;0&#39;,&#39;1&#39;, &#39;1&#39;,&#39;1&#39;,&#39;2&#39;, &#39;2&#39;],&#39;irrelevant_col&#39;: [&#39;foo&#39;, &#39;foo&#39;,&#39;foo&#39;,&#39;foo&#39;, &#39;bar&#39;,&#39;bar&#39;,&#39;bar&#39;, &#39;bar&#39;]}
original_df = pd.DataFrame(data)

target = {&#39;qid&#39;: [&#39;0&#39;, &#39;0&#39;,&#39;0&#39;,&#39;1&#39;, &#39;1&#39;,&#39;1&#39;,&#39;2&#39;, &#39;2&#39;],&#39;irrelevant_col&#39;: [&#39;foo&#39;, &#39;foo&#39;,&#39;foo&#39;,&#39;foo&#39;, &#39;bar&#39;,&#39;bar&#39;,&#39;bar&#39;, &#39;bar&#39;],&#39;y_pred&#39;: [&#39;3&#39;, &#39;2&#39;,&#39;2&#39;,&#39;1&#39;, &#39;1&#39;,&#39;1&#39;,&#39;2&#39;, &#39;2&#39;]}
target_df = pd.DataFrame(target)

EDIT: I fixed an incorrect column (2 instead of 3, and made the last y_pred -1.

答案1

得分: 1

首先将张量转换为数据框。
然后通过堆叠来重塑数据框以匹配原始形状，然后删除索引的第二级并创建新索引。
最后将原始数据框和重塑后的 y_pred 数据框合并到 qid 列。

y_pred = pd.DataFrame(tensor).replace(-1, np.nan)

y_pred = y_pred.stack().reset_index(level=1, drop=True).to_frame('y_pred').reset_index()

merged_df = original_df.merge(y_pred, on='qid', how='left')

使用 merged_df 将创建一个额外的列索引，如果您不需要它，可以使用 drop() 方法：

merged_df = merged_df.drop('index', axis=1)

英文:

First convert he tensor to a dataframe.
Then reshape the dataframe to match the original shape by stacking it, and drop the the second level of the index and creat a new index.
Finally merge the original dataframe and the reshaped y_pred dataframe to the qid column.

y_pred = pd.DataFrame(tensor).replace(-1, np.nan)

y_pred = y_pred.stack().reset_index(level=1, drop=True).to_frame(&#39;y_pred&#39;).reset_index()

merged_df = original_df.merge(y_pred, on=&#39;qid&#39;, how=&#39;left&#39;)

Use merged_df will create an extra column index, you can use the drop() method if you don't want it:

merged_df = merged_df.drop(&#39;index&#39;, axis=1)

答案2

得分: 1

感谢您发布的回答。我可能没有正确指定问题，但我能够使用您的建议。最后一步是（在按照您的建议重新塑造y_pred之后），而不是使用合并，我必须这样简单地分配列：

merged_df['y_pred'] = y_pred.values

再次感谢！！！！

英文:

Thank for your posted answers. I might have not specified the question correctly but I was able to use your suggestions. The final step was (after following your suggestions for reshaping y_pred), that instead of using merge I had to simply assign the column like so:

merged_df[&#39;y_pred&#39;] = y_pred .values

Thanks again!!!!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

合并 PyTorch 预测至原始数据框。

问题

答案1

答案2

KeyError: ‘date’ – 我不知道为什么我一直收到这个错误

Python Flask 当选择表单选项时，运行脚本并在表格中显示结果？

如何更改物体的位置。

如何逐个打印字符但保持打印功能 —— Python

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论