英文:
Insert values from matrix dataframe into another dataframe using indexes
问题
我有一个形状为(4,5)的pandas数据框df_matrix(实际上它要大得多!)。它的行和列上有整数索引。
df_matrix =
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| 0 | 92.576 | 94.269 | 108.308 | 140.394 | 155.421 | 
| 1 | 117.490 | 104.356 | 104.952 | 131.331 | 144.203 | 
| 2 | 115.405 | 112.536 | 112.069 | 116.328 | 136.226 | 
| 3 | 164.047 | 148.946 | 133.204 | 122.235 | 141.075 | 
df_matrix 包含我想要放入另一个数据框df_main的depth列中的数据,根据它的 df_main['row_i'] 和 df_main['col_i'] 列。因此,预期结果如下:
df_main =
| datetime | latitude | longitude | row_i | col_i | depth | |
|---|---|---|---|---|---|---|
| 0 | 2013-05-29 00:39:44 | 51.708487 | 104.366131 | 0 | 0 | 92.576 | 
| 1 | 2013-05-29 00:40:44 | 51.708268 | 104.362324 | 0 | 2 | 108.308 | 
| 2 | 2013-05-29 00:41:44 | 51.708036 | 104.358530 | 1 | 3 | 133.331 | 
| ... | ... | ... | ... | ... | ... | ... | 
| 296448 | 2022-06-14 03:39:40 | 51.876903 | 105.520172 | 3 | 4 | 141.075 | 
我通过 iterrows 做出了决定:
for index, row in df_main.iterrows():
    df_main.loc[index, 'depth'] = df_matrix.loc[row['row_i'], row['col_i']]
在处理 200000+ 行时花了很多时间。我相信pandas有适当的方法,也许是 merge,但我不知道是哪个以及如何使用它。有没有更具Python特色(pandasonic)的解决方案?
英文:
I have pandas dataframe df_matrix with shape = (4, 5) (Actually it much much much bigger!). It has integer numbers as indexes over rows and columns.
df_matrix =
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| 0 | 92.576 | 94.269 | 108.308 | 140.394 | 155.421 | 
| 1 | 117.490 | 104.356 | 104.952 | 131.331 | 144.203 | 
| 2 | 115.405 | 112.536 | 112.069 | 116.328 | 136.226 | 
| 3 | 164.047 | 148.946 | 133.204 | 122.235 | 141.075 | 
df_matrix contains data, which I want to put into column depth of another dataframe df_main, according its df_main['row_i'] and df_main['col_i'] columns
So, expected result is like this:
df_main =
| datetime | latitude | longitude | row_i | col_i | depth | |
|---|---|---|---|---|---|---|
| 0 | 2013-05-29 00:39:44 | 51.708487 | 104.366131 | 0 | 0 | 92.576 | 
| 1 | 2013-05-29 00:40:44 | 51.708268 | 104.362324 | 0 | 2 | 108.308 | 
| 2 | 2013-05-29 00:41:44 | 51.708036 | 104.358530 | 1 | 3 | 133.331 | 
| ... | ... | ... | ... | ... | ... | ... | 
| 296448 | 2022-06-14 03:39:40 | 51.876903 | 105.520172 | 3 | 4 | 141.075 | 
I decided it by iterrows:
for index, row in df_main.iterrows():
    df_main.loc[index, 'depth'] = df_matrix.loc[row['row_i'], row['col_i']]
it takes a lot of time while handling 200000+ rows. I believe that pandas has appropriate method, may be merge, but i have no idea which one and how to use it.
Is there decision more pythonic (pandasonic)))?
答案1
得分: 0
我不确定 merge 在这里是否有效,但你仍然可以尝试它:
df_main = (
    pd.merge(df_main,
             df_matrix.stack().rename("depth"),
             left_on=["row_i", "col_i"], right_index=True, how="left")
)
输出:
print(df_main)
                   datetime   latitude   longitude  row_i  col_i    depth
0       2013-05-29 00:39:44  51.708487  104.366131      0      0   92.576
1       2013-05-29 00:40:44  51.708268  104.362324      0      2  108.308
2       2013-05-29 00:41:44  51.708036  104.358530      1      3  131.331
296448  2022-06-14 03:39:40  51.876903  105.520172      3      4  141.075
英文:
I'm not sure if merge will be efficient here but you can still try it :
df_main = (
    pd.merge(df_main,
             df_matrix.stack().rename("depth"),
             left_on=["row_i", "col_i"], right_index=True, how="left")
)
Output :
print(df_main)
                   datetime   latitude   longitude  row_i  col_i    depth
0       2013-05-29 00:39:44  51.708487  104.366131      0      0   92.576
1       2013-05-29 00:40:44  51.708268  104.362324      0      2  108.308
2       2013-05-29 00:41:44  51.708036  104.358530      1      3  131.331
296448  2022-06-14 03:39:40  51.876903  105.520172      3      4  141.075
答案2
得分: 0
df_main['depth'] = df_matrix.to_numpy()[df_main['row_i'], df_main['col_i']]
英文:
This should work as well:
df_main['depth'] = df_matrix.to_numpy()[df_main['row_i'],df_main['col_i']]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论