英文:
Insert values from matrix dataframe into another dataframe using indexes
问题
我有一个形状为(4,5)的pandas数据框df_matrix
(实际上它要大得多!)。它的行和列上有整数索引。
df_matrix =
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | 92.576 | 94.269 | 108.308 | 140.394 | 155.421 |
1 | 117.490 | 104.356 | 104.952 | 131.331 | 144.203 |
2 | 115.405 | 112.536 | 112.069 | 116.328 | 136.226 |
3 | 164.047 | 148.946 | 133.204 | 122.235 | 141.075 |
df_matrix
包含我想要放入另一个数据框df_main
的depth
列中的数据,根据它的 df_main['row_i']
和 df_main['col_i']
列。因此,预期结果如下:
df_main =
datetime | latitude | longitude | row_i | col_i | depth | |
---|---|---|---|---|---|---|
0 | 2013-05-29 00:39:44 | 51.708487 | 104.366131 | 0 | 0 | 92.576 |
1 | 2013-05-29 00:40:44 | 51.708268 | 104.362324 | 0 | 2 | 108.308 |
2 | 2013-05-29 00:41:44 | 51.708036 | 104.358530 | 1 | 3 | 133.331 |
... | ... | ... | ... | ... | ... | ... |
296448 | 2022-06-14 03:39:40 | 51.876903 | 105.520172 | 3 | 4 | 141.075 |
我通过 iterrows
做出了决定:
for index, row in df_main.iterrows():
df_main.loc[index, 'depth'] = df_matrix.loc[row['row_i'], row['col_i']]
在处理 200000+ 行时花了很多时间。我相信pandas有适当的方法,也许是 merge
,但我不知道是哪个以及如何使用它。有没有更具Python特色(pandasonic)的解决方案?
英文:
I have pandas dataframe df_matrix
with shape = (4, 5) (Actually it much much much bigger!). It has integer numbers as indexes over rows and columns.
df_matrix =
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | 92.576 | 94.269 | 108.308 | 140.394 | 155.421 |
1 | 117.490 | 104.356 | 104.952 | 131.331 | 144.203 |
2 | 115.405 | 112.536 | 112.069 | 116.328 | 136.226 |
3 | 164.047 | 148.946 | 133.204 | 122.235 | 141.075 |
df_matrix
contains data, which I want to put into column depth
of another dataframe df_main
, according its df_main['row_i']
and df_main['col_i']
columns
So, expected result is like this:
df_main =
datetime | latitude | longitude | row_i | col_i | depth | |
---|---|---|---|---|---|---|
0 | 2013-05-29 00:39:44 | 51.708487 | 104.366131 | 0 | 0 | 92.576 |
1 | 2013-05-29 00:40:44 | 51.708268 | 104.362324 | 0 | 2 | 108.308 |
2 | 2013-05-29 00:41:44 | 51.708036 | 104.358530 | 1 | 3 | 133.331 |
... | ... | ... | ... | ... | ... | ... |
296448 | 2022-06-14 03:39:40 | 51.876903 | 105.520172 | 3 | 4 | 141.075 |
I decided it by iterrows
:
for index, row in df_main.iterrows():
df_main.loc[index, 'depth'] = df_matrix.loc[row['row_i'], row['col_i']]
it takes a lot of time while handling 200000+ rows. I believe that pandas has appropriate method, may be merge
, but i have no idea which one and how to use it.
Is there decision more pythonic (pandasonic)))?
答案1
得分: 0
我不确定 merge
在这里是否有效,但你仍然可以尝试它:
df_main = (
pd.merge(df_main,
df_matrix.stack().rename("depth"),
left_on=["row_i", "col_i"], right_index=True, how="left")
)
输出:
print(df_main)
datetime latitude longitude row_i col_i depth
0 2013-05-29 00:39:44 51.708487 104.366131 0 0 92.576
1 2013-05-29 00:40:44 51.708268 104.362324 0 2 108.308
2 2013-05-29 00:41:44 51.708036 104.358530 1 3 131.331
296448 2022-06-14 03:39:40 51.876903 105.520172 3 4 141.075
英文:
I'm not sure if merge
will be efficient here but you can still try it :
df_main = (
pd.merge(df_main,
df_matrix.stack().rename("depth"),
left_on=["row_i", "col_i"], right_index=True, how="left")
)
Output :
print(df_main)
datetime latitude longitude row_i col_i depth
0 2013-05-29 00:39:44 51.708487 104.366131 0 0 92.576
1 2013-05-29 00:40:44 51.708268 104.362324 0 2 108.308
2 2013-05-29 00:41:44 51.708036 104.358530 1 3 131.331
296448 2022-06-14 03:39:40 51.876903 105.520172 3 4 141.075
答案2
得分: 0
df_main['depth'] = df_matrix.to_numpy()[df_main['row_i'], df_main['col_i']]
英文:
This should work as well:
df_main['depth'] = df_matrix.to_numpy()[df_main['row_i'],df_main['col_i']]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论