合并 PyTorch 预测至原始数据框。

huangapple go评论58阅读模式
英文:

Merge PyTorch predictions to original dataframe

问题

以下是要翻译的部分:

"I have obtained predictions from my PyTorch model as a tensor with the following shape (torch.Size([2958, 96])). My original dataset has 2958 qids and some of them have a max number of documents of 96 (min is 47). The predictions have padded the missing one with -1s. The shape of my original dataframe is (221567, 7).

I would like to merge the predictions from the PyTorch model or tensor back to this dataframe, using the qid. Every row of the tensor represents a qid, while every column represents the rank of that particular document (based on the order of documents in each qid).

Below is a min example (after converting the tensor to a df):

tensor = {'0': ['3', '1','2'],'1': ['2', '1','2'],'2': ['2', '1','-1']}
y_pred = pd.DataFrame(tensor)

data = {'qid': ['0', '0','0','1', '1','1','2', '2'],'irrelevant_col': ['foo', 'foo','foo','foo', 'bar','bar','bar', 'bar']}
original_df = pd.DataFrame(data)

Notice that for qid==2, there are only 2 rows, and the tensor has a '-1' in row 2 and column 2 because of this. Also, the order of the tensor is correct in the sense that it matches the order of the items in the dataframe. This is the target output:

target = {'qid': ['0', '0','0','1', '1','1','2', '2'],'irrelevant_col': ['foo', 'foo','foo','foo', 'bar','bar','bar', 'bar'],'y_pred': ['3', '2','2','1', '1','1','2', '2']}
target_df = pd.DataFrame(target)"

请告诉我如果你需要进一步的帮助。

英文:

I have obtained predictions from my PyTorch model as a tensor with the following shape (torch.Size([2958, 96])). My original dataset has 2958 qids and some of them have a max number of documents of 96 (min is 47). The predictions have padded the missing one with -1s. The shape of my original dataframe is (221567, 7).

I would like to merge the predictions from the PyTorch model or tensor back to this dataframe, using the qid. Every row of the tensor represents a qid, while every column represents the rank of that particular document (based on the order of documents in each qid).

Below is a min example (after converting the tensor to a df):

tensor = {'0': ['3', '1','2'],'1': ['2', '1','2'],'2': ['2', '1','-1']}
y_pred = pd.DataFrame(tensor)

data = {'qid': ['0', '0','0','1', '1','1','2', '2'],'irrelevant_col': ['foo', 'foo','foo','foo', 'bar','bar','bar', 'bar']}
original_df = pd.DataFrame(data)

Notice that for qid==2, there are only 2 rows, and the tensor has a '-1' in row 2 and column 2 because of this. Also, the order of the tensor is correct in the sense that it matches the order of the items in the dataframe. This is the target output:

target = {'qid': ['0', '0','0','1', '1','1','2', '2'],'irrelevant_col': ['foo', 'foo','foo','foo', 'bar','bar','bar', 'bar'],'y_pred': ['3', '2','2','1', '1','1','2', '2']}
target_df = pd.DataFrame(target)

EDIT: I fixed an incorrect column (2 instead of 3, and made the last y_pred -1.

答案1

得分: 1

  1. 首先将张量转换为数据框。

  2. 然后通过堆叠来重塑数据框以匹配原始形状,然后删除索引的第二级并创建新索引。

  3. 最后将原始数据框和重塑后的 y_pred 数据框合并到 qid 列。

y_pred = pd.DataFrame(tensor).replace(-1, np.nan)

y_pred = y_pred.stack().reset_index(level=1, drop=True).to_frame('y_pred').reset_index()

merged_df = original_df.merge(y_pred, on='qid', how='left')

使用 merged_df 将创建一个额外的列索引,如果您不需要它,可以使用 drop() 方法:

merged_df = merged_df.drop('index', axis=1)
英文:
  1. First convert he tensor to a dataframe.

  2. Then reshape the dataframe to match the original shape by stacking it, and drop the the second level of the index and creat a new index.

  3. Finally merge the original dataframe and the reshaped y_pred dataframe to the qid column.

y_pred = pd.DataFrame(tensor).replace(-1, np.nan)

y_pred = y_pred.stack().reset_index(level=1, drop=True).to_frame('y_pred').reset_index()

merged_df = original_df.merge(y_pred, on='qid', how='left')

Use merged_df will create an extra column index, you can use the drop() method if you don't want it:

merged_df = merged_df.drop('index', axis=1)

答案2

得分: 1

感谢您发布的回答。我可能没有正确指定问题,但我能够使用您的建议。最后一步是(在按照您的建议重新塑造y_pred之后),而不是使用合并,我必须这样简单地分配列:

merged_df['y_pred'] = y_pred.values

再次感谢!!!!

英文:

Thank for your posted answers. I might have not specified the question correctly but I was able to use your suggestions. The final step was (after following your suggestions for reshaping y_pred), that instead of using merge I had to simply assign the column like so:

merged_df['y_pred'] = y_pred .values

Thanks again!!!!

huangapple
  • 本文由 发表于 2023年4月17日 11:39:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76031569.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定