2023年2月10日 05:00:37go评论88阅读模式

英文:

Mapping a pandas dataframe to a n-dimensional array, where each dimension corresponds to one of the x columns

问题

我有一个包含列 x0 x1 x2 x3 x4 和 y0 y1 y2 y3 y4 的数据帧。

前十行：

Id  x0  x1  x2  x3  x4  y0  y1  y2  y3  y4
0   0   -5.0    -5.0    -5.0    -5.0    -5.0    268035854.2037072    0.94956508069182    3520.7568220782514  -412868933.038522    242572043.87727848
1   1   -5.0    -5.0    -5.0    -5.0    -4.5    268035883.40390667   0.94956508069182    3482.0382462663074  -412868933.038522    242572043.87727848
2   2   -5.0    -5.0    -5.0    -5.0    -4.0    268035901.1170006    0.94956508069182    3443.3196704543634  -412868933.038522    242572043.87727848
3   3   -5.0    -5.0    -5.0    -5.0    -3.5    268035911.8642905    0.94956508069182    3404.6010946424194  -412868933.038522    242572043.87727848
4   4   -5.0    -5.0    -5.0    -5.0    -3.0    268035918.38904288   0.94956508069182    3365.882518830476   -412868933.038522    242572043.87727848
5   5   -5.0    -5.0    -5.0    -5.0    -2.5    268035922.35671327   0.94956508069182    3327.163943018532   -412868933.038522    242572043.87727848
6   6   -5.0    -5.0    -5.0    -5.0    -2.0    268035924.7800574    0.94956508069182    3288.445367206588   -412868933.038522    242572043.87727848
7   7   -5.0    -5.0    -5.0    -5.0    -1.5    268035926.27763835   0.94956508069182    3249.726791394644   -412868933.038522    242572043.87727848
8   8   -5.0    -5.0    -5.0    -5.0    -1.0    268035927.2317166    0.94956508069182    3211.0082155827004  -412868933.038522    242572043.87727848
9   9   -5.0    -5.0    -5.0    -5.0    -0.5    268035927.8858225    0.94956508069182    3172.2896397707564  -412868933.038522    242572043.87727848

我执行了以下操作：

values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values
values.shape

现在的形状是 (4084101, 5)。

我想要的形状是 (21, 21, 21, 21, 21, 5)（就好像我们有一个5D图，第一个维度是 x0，第二个是 x1，以此类推）。基本上，应该是 values[1, 0, 0, 0, 0] 来访问与 x0=-4.5、x1=-5、...、x4=-5 对应的元组 (y0, y1, y2, y3, y4)。

21 是因为 x0, ..., x4 的值范围从 -5 到 5，步长为 0.5，5 是因为 y0, y1, y2, y3, y4。

我尝试过 values = values.reshape(21, 21, 21, 21, 21, 5)，但当我执行 values[1][0][0][0][0] 时，我期望得到与 x1=-4.5, x2=-5, ..., x4=-5 相对应的值，但我没有得到。

我曾经有一个不太好的想法（从复杂性角度来看），那就是创建一个字典，其中键是元组 (x0, x1, x2, x3, x4)，属性是找到 y 值的索引。然后创建一个 np.zeros((21, 21, 21, 21, 21, 5)) 数据帧。

# 获取值
values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values
# 创建一个将 x0, x1, x2, x3, x4 值映射到索引的字典
grid = {}
for i, row in df_train.iterrows():
    x0, x1, x2, x3, x4 = [int((x + 5) / 0.5) for x in [row['x0'], row['x1'], row['x2'], row['x3'], row['x4']]]
    grid[(x0, x1, x2, x3, x4)] = i
# 创建重塑后的数组
reshaped_values = np.zeros((21, 21, 21, 21, 21, 5))
for key, index in grid.items():
    reshaped_values[key[0]][key[1]][key[2]][key[3]][key[4]] = values[index]

但在我的计算机上运行几乎需要一分钟...看起来是一个糟糕的主意。

英文:

I have a dataframe with columns x0 x1 x2 x3 x4 and y0 y1 y2 y3 y4.

First ten rows:

	Id	x0	x1	x2	x3	x4	y0	y1	y2	y3	y4
0	0	-5.0	-5.0	-5.0	-5.0	-5.0	268035854.2037072	0.94956508069182	3520.7568220782514	-412868933.038522	242572043.87727848
1	1	-5.0	-5.0	-5.0	-5.0	-4.5	268035883.40390667	0.94956508069182	3482.0382462663074	-412868933.038522	242572043.87727848
2	2	-5.0	-5.0	-5.0	-5.0	-4.0	268035901.1170006	0.94956508069182	3443.3196704543634	-412868933.038522	242572043.87727848
3	3	-5.0	-5.0	-5.0	-5.0	-3.5	268035911.8642905	0.94956508069182	3404.6010946424194	-412868933.038522	242572043.87727848
4	4	-5.0	-5.0	-5.0	-5.0	-3.0	268035918.38904288	0.94956508069182	3365.882518830476	-412868933.038522	242572043.87727848
5	5	-5.0	-5.0	-5.0	-5.0	-2.5	268035922.35671327	0.94956508069182	3327.163943018532	-412868933.038522	242572043.87727848
6	6	-5.0	-5.0	-5.0	-5.0	-2.0	268035924.7800574	0.94956508069182	3288.445367206588	-412868933.038522	242572043.87727848
7	7	-5.0	-5.0	-5.0	-5.0	-1.5	268035926.27763835	0.94956508069182	3249.726791394644	-412868933.038522	242572043.87727848
8	8	-5.0	-5.0	-5.0	-5.0	-1.0	268035927.2317166	0.94956508069182	3211.0082155827004	-412868933.038522	242572043.87727848
9	9	-5.0	-5.0	-5.0	-5.0	-0.5	268035927.8858225	0.94956508069182	3172.2896397707564	-412868933.038522	242572043.87727848

I did this:

values = df_train[[&#39;y0&#39;, &#39;y1&#39;, &#39;y2&#39;, &#39;y3&#39;, &#39;y4&#39;]].values
values.shape

I now have shape (4084101, 5)

I would like to have shape (21, 21, 21, 21, 21, 5) (so that the first shape is x0, the second x1, like if we had a 5D graph). Basically, it should be values[1, 0, 0, 0, 0] to access the tuple (y0, y1, y2, y3, y4) corresponding to x0=-4.5, x1=-5, ..., x4=-5.

21 because values go from -5 to 5 for the x0, ..., x4 with step 0.5
and 5 because y0, y1, y2, y3, y4
I did values = values.reshape(21, 21, 21, 21, 21, 5)
But when I do values[1][0][0][0][0], I expected to have the value corresponding to x1=-4.5, x2=-5, ..., x4=-5 but I don't.

One bad idea that I had (complexity wise) was to make a dictionary in which keys are tuples (x0, x1, x2, x3, x4) and attributes the index where to find the y values.
And then fill a np.zeros((21, 21, 21, 21, 21, 5)) dataframe.

# Get the values
values = df_train[[&#39;y0&#39;, &#39;y1&#39;, &#39;y2&#39;, &#39;y3&#39;, &#39;y4&#39;]].values
# Create a dictionary to map the x0, x1, x2, x3, x4 values to indices
grid = {}
for i, row in df_train.iterrows():
    x0, x1, x2, x3, x4 = [int((x + 5) / 0.5) for x in [row[&#39;x0&#39;], row[&#39;x1&#39;], row[&#39;x2&#39;], row[&#39;x3&#39;], row[&#39;x4&#39;]]]
    grid[(x0, x1, x2, x3, x4)] = i
# Create the reshaped array
reshaped_values = np.zeros((21, 21, 21, 21, 21, 5))
for key, index in grid.items():
    reshaped_values[key[0]][key[1]][key[2]][key[3]][key[4]] = values[index]

but it takes almost a minute on my computer ... and looks like the worst idea ever.

答案1

得分: 2

你的代码有效，但我认为你的数据框没有排序。

df_train = df_train.sort_values(['x0', 'x1', 'x2', 'x3', 'x4'])
values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values

英文:

Your code works but I think your dataframe is not sorted

df_train = df_train.sort_values([&#39;x0&#39;, &#39;x1&#39;, &#39;x2&#39;, &#39;x3&#39;, &#39;x4&#39;])
values = df_train[[&#39;y0&#39;, &#39;y1&#39;, &#39;y2&#39;, &#39;y3&#39;, &#39;y4&#39;]].values

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将一个 pandas 数据框映射到一个 n 维数组，其中每个维度对应一个 x 列之一

问题

答案1

如何根据从数据库查询的数据创建一个带有键和多个值的字典。

获取模型可用的所有标签/实体组。

使用ctypes调用带有参数（C字符串）的Go DLL。

使用Pandas将一个经过筛选的数值分配给一个Python变量。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。