将矩阵数据框中的值插入到另一个数据框中,使用索引。

huangapple go评论69阅读模式
英文:

Insert values from matrix dataframe into another dataframe using indexes

问题

我有一个形状为(4,5)的pandas数据框df_matrix(实际上它要大得多!)。它的行和列上有整数索引。

df_matrix =

0 1 2 3 4
0 92.576 94.269 108.308 140.394 155.421
1 117.490 104.356 104.952 131.331 144.203
2 115.405 112.536 112.069 116.328 136.226
3 164.047 148.946 133.204 122.235 141.075

df_matrix 包含我想要放入另一个数据框df_maindepth列中的数据,根据它的 df_main['row_i']df_main['col_i'] 列。因此,预期结果如下:

df_main =

datetime latitude longitude row_i col_i depth
0 2013-05-29 00:39:44 51.708487 104.366131 0 0 92.576
1 2013-05-29 00:40:44 51.708268 104.362324 0 2 108.308
2 2013-05-29 00:41:44 51.708036 104.358530 1 3 133.331
... ... ... ... ... ... ...
296448 2022-06-14 03:39:40 51.876903 105.520172 3 4 141.075

我通过 iterrows 做出了决定:

for index, row in df_main.iterrows():
    df_main.loc[index, 'depth'] = df_matrix.loc[row['row_i'], row['col_i']]

在处理 200000+ 行时花了很多时间。我相信pandas有适当的方法,也许是 merge,但我不知道是哪个以及如何使用它。有没有更具Python特色(pandasonic)的解决方案?

英文:

I have pandas dataframe df_matrix with shape = (4, 5) (Actually it much much much bigger!). It has integer numbers as indexes over rows and columns.

df_matrix =

0 1 2 3 4
0 92.576 94.269 108.308 140.394 155.421
1 117.490 104.356 104.952 131.331 144.203
2 115.405 112.536 112.069 116.328 136.226
3 164.047 148.946 133.204 122.235 141.075

df_matrix contains data, which I want to put into column depth of another dataframe df_main, according its df_main['row_i'] and df_main['col_i'] columns
So, expected result is like this:

df_main =

datetime latitude longitude row_i col_i depth
0 2013-05-29 00:39:44 51.708487 104.366131 0 0 92.576
1 2013-05-29 00:40:44 51.708268 104.362324 0 2 108.308
2 2013-05-29 00:41:44 51.708036 104.358530 1 3 133.331
... ... ... ... ... ... ...
296448 2022-06-14 03:39:40 51.876903 105.520172 3 4 141.075

I decided it by iterrows:

for index, row in df_main.iterrows():
    df_main.loc[index, 'depth'] = df_matrix.loc[row['row_i'], row['col_i']]

it takes a lot of time while handling 200000+ rows. I believe that pandas has appropriate method, may be merge, but i have no idea which one and how to use it.
Is there decision more pythonic (pandasonic)))?

答案1

得分: 0

我不确定 merge 在这里是否有效,但你仍然可以尝试它:

df_main = (
    pd.merge(df_main,
             df_matrix.stack().rename("depth"),
             left_on=["row_i", "col_i"], right_index=True, how="left")
)

输出:

print(df_main)

                   datetime   latitude   longitude  row_i  col_i    depth
0       2013-05-29 00:39:44  51.708487  104.366131      0      0   92.576
1       2013-05-29 00:40:44  51.708268  104.362324      0      2  108.308
2       2013-05-29 00:41:44  51.708036  104.358530      1      3  131.331
296448  2022-06-14 03:39:40  51.876903  105.520172      3      4  141.075
英文:

I'm not sure if merge will be efficient here but you can still try it :

df_main = (
    pd.merge(df_main,
             df_matrix.stack().rename("depth"),
             left_on=["row_i", "col_i"], right_index=True, how="left")
)


Output :

print(df_main)

                   datetime   latitude   longitude  row_i  col_i    depth
0       2013-05-29 00:39:44  51.708487  104.366131      0      0   92.576
1       2013-05-29 00:40:44  51.708268  104.362324      0      2  108.308
2       2013-05-29 00:41:44  51.708036  104.358530      1      3  131.331
296448  2022-06-14 03:39:40  51.876903  105.520172      3      4  141.075

答案2

得分: 0

df_main['depth'] = df_matrix.to_numpy()[df_main['row_i'], df_main['col_i']]

英文:

This should work as well:

df_main['depth'] = df_matrix.to_numpy()[df_main['row_i'],df_main['col_i']]

huangapple
  • 本文由 发表于 2023年5月14日 22:35:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76248021.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定