英文:
Merge PyTorch predictions to original dataframe
问题
以下是要翻译的部分:
"I have obtained predictions from my PyTorch model as a tensor with the following shape (torch.Size([2958, 96])). My original dataset has 2958 qids and some of them have a max number of documents of 96 (min is 47). The predictions have padded the missing one with -1s. The shape of my original dataframe is (221567, 7).
I would like to merge the predictions from the PyTorch model or tensor back to this dataframe, using the qid. Every row of the tensor represents a qid, while every column represents the rank of that particular document (based on the order of documents in each qid).
Below is a min example (after converting the tensor to a df):
tensor = {'0': ['3', '1','2'],'1': ['2', '1','2'],'2': ['2', '1','-1']}
y_pred = pd.DataFrame(tensor)
data = {'qid': ['0', '0','0','1', '1','1','2', '2'],'irrelevant_col': ['foo', 'foo','foo','foo', 'bar','bar','bar', 'bar']}
original_df = pd.DataFrame(data)
Notice that for qid==2, there are only 2 rows, and the tensor has a '-1' in row 2 and column 2 because of this. Also, the order of the tensor is correct in the sense that it matches the order of the items in the dataframe. This is the target output:
target = {'qid': ['0', '0','0','1', '1','1','2', '2'],'irrelevant_col': ['foo', 'foo','foo','foo', 'bar','bar','bar', 'bar'],'y_pred': ['3', '2','2','1', '1','1','2', '2']}
target_df = pd.DataFrame(target)"
请告诉我如果你需要进一步的帮助。
英文:
I have obtained predictions from my PyTorch model as a tensor with the following shape (torch.Size([2958, 96])). My original dataset has 2958 qids and some of them have a max number of documents of 96 (min is 47). The predictions have padded the missing one with -1s. The shape of my original dataframe is (221567, 7).
I would like to merge the predictions from the PyTorch model or tensor back to this dataframe, using the qid. Every row of the tensor represents a qid, while every column represents the rank of that particular document (based on the order of documents in each qid).
Below is a min example (after converting the tensor to a df):
tensor = {'0': ['3', '1','2'],'1': ['2', '1','2'],'2': ['2', '1','-1']}
y_pred = pd.DataFrame(tensor)
data = {'qid': ['0', '0','0','1', '1','1','2', '2'],'irrelevant_col': ['foo', 'foo','foo','foo', 'bar','bar','bar', 'bar']}
original_df = pd.DataFrame(data)
Notice that for qid==2, there are only 2 rows, and the tensor has a '-1' in row 2 and column 2 because of this. Also, the order of the tensor is correct in the sense that it matches the order of the items in the dataframe. This is the target output:
target = {'qid': ['0', '0','0','1', '1','1','2', '2'],'irrelevant_col': ['foo', 'foo','foo','foo', 'bar','bar','bar', 'bar'],'y_pred': ['3', '2','2','1', '1','1','2', '2']}
target_df = pd.DataFrame(target)
EDIT: I fixed an incorrect column (2 instead of 3, and made the last y_pred -1.
答案1
得分: 1
-
首先将张量转换为数据框。
-
然后通过堆叠来重塑数据框以匹配原始形状,然后删除索引的第二级并创建新索引。
-
最后将原始数据框和重塑后的
y_pred
数据框合并到qid
列。
y_pred = pd.DataFrame(tensor).replace(-1, np.nan)
y_pred = y_pred.stack().reset_index(level=1, drop=True).to_frame('y_pred').reset_index()
merged_df = original_df.merge(y_pred, on='qid', how='left')
使用 merged_df
将创建一个额外的列索引,如果您不需要它,可以使用 drop()
方法:
merged_df = merged_df.drop('index', axis=1)
英文:
-
First convert he tensor to a dataframe.
-
Then reshape the dataframe to match the original shape by stacking it, and drop the the second level of the index and creat a new index.
-
Finally merge the original dataframe and the reshaped
y_pred
dataframe to theqid
column.
y_pred = pd.DataFrame(tensor).replace(-1, np.nan)
y_pred = y_pred.stack().reset_index(level=1, drop=True).to_frame('y_pred').reset_index()
merged_df = original_df.merge(y_pred, on='qid', how='left')
Use merged_df
will create an extra column index, you can use the drop()
method if you don't want it:
merged_df = merged_df.drop('index', axis=1)
答案2
得分: 1
感谢您发布的回答。我可能没有正确指定问题,但我能够使用您的建议。最后一步是(在按照您的建议重新塑造y_pred之后),而不是使用合并,我必须这样简单地分配列:
merged_df['y_pred'] = y_pred.values
再次感谢!!!!
英文:
Thank for your posted answers. I might have not specified the question correctly but I was able to use your suggestions. The final step was (after following your suggestions for reshaping y_pred), that instead of using merge I had to simply assign the column like so:
merged_df['y_pred'] = y_pred .values
Thanks again!!!!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论