将一个 pandas 数据框映射到一个 n 维数组,其中每个维度对应一个 x 列之一

huangapple go评论88阅读模式
英文:

Mapping a pandas dataframe to a n-dimensional array, where each dimension corresponds to one of the x columns

问题

我有一个包含列 x0 x1 x2 x3 x4y0 y1 y2 y3 y4 的数据帧。

前十行:

  1. Id x0 x1 x2 x3 x4 y0 y1 y2 y3 y4
  2. 0 0 -5.0 -5.0 -5.0 -5.0 -5.0 268035854.2037072 0.94956508069182 3520.7568220782514 -412868933.038522 242572043.87727848
  3. 1 1 -5.0 -5.0 -5.0 -5.0 -4.5 268035883.40390667 0.94956508069182 3482.0382462663074 -412868933.038522 242572043.87727848
  4. 2 2 -5.0 -5.0 -5.0 -5.0 -4.0 268035901.1170006 0.94956508069182 3443.3196704543634 -412868933.038522 242572043.87727848
  5. 3 3 -5.0 -5.0 -5.0 -5.0 -3.5 268035911.8642905 0.94956508069182 3404.6010946424194 -412868933.038522 242572043.87727848
  6. 4 4 -5.0 -5.0 -5.0 -5.0 -3.0 268035918.38904288 0.94956508069182 3365.882518830476 -412868933.038522 242572043.87727848
  7. 5 5 -5.0 -5.0 -5.0 -5.0 -2.5 268035922.35671327 0.94956508069182 3327.163943018532 -412868933.038522 242572043.87727848
  8. 6 6 -5.0 -5.0 -5.0 -5.0 -2.0 268035924.7800574 0.94956508069182 3288.445367206588 -412868933.038522 242572043.87727848
  9. 7 7 -5.0 -5.0 -5.0 -5.0 -1.5 268035926.27763835 0.94956508069182 3249.726791394644 -412868933.038522 242572043.87727848
  10. 8 8 -5.0 -5.0 -5.0 -5.0 -1.0 268035927.2317166 0.94956508069182 3211.0082155827004 -412868933.038522 242572043.87727848
  11. 9 9 -5.0 -5.0 -5.0 -5.0 -0.5 268035927.8858225 0.94956508069182 3172.2896397707564 -412868933.038522 242572043.87727848

我执行了以下操作:

  1. values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values
  2. values.shape

现在的形状是 (4084101, 5)

我想要的形状是 (21, 21, 21, 21, 21, 5)(就好像我们有一个5D图,第一个维度是 x0,第二个是 x1,以此类推)。基本上,应该是 values[1, 0, 0, 0, 0] 来访问与 x0=-4.5x1=-5、...、x4=-5 对应的元组 (y0, y1, y2, y3, y4)

21 是因为 x0, ..., x4 的值范围从 -5 到 5,步长为 0.5,5 是因为 y0, y1, y2, y3, y4

我尝试过 values = values.reshape(21, 21, 21, 21, 21, 5),但当我执行 values[1][0][0][0][0] 时,我期望得到与 x1=-4.5, x2=-5, ..., x4=-5 相对应的值,但我没有得到。

我曾经有一个不太好的想法(从复杂性角度来看),那就是创建一个字典,其中键是元组 (x0, x1, x2, x3, x4),属性是找到 y 值的索引。然后创建一个 np.zeros((21, 21, 21, 21, 21, 5)) 数据帧。

  1. # 获取值
  2. values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values
  3. # 创建一个将 x0, x1, x2, x3, x4 值映射到索引的字典
  4. grid = {}
  5. for i, row in df_train.iterrows():
  6. x0, x1, x2, x3, x4 = [int((x + 5) / 0.5) for x in [row['x0'], row['x1'], row['x2'], row['x3'], row['x4']]]
  7. grid[(x0, x1, x2, x3, x4)] = i
  8. # 创建重塑后的数组
  9. reshaped_values = np.zeros((21, 21, 21, 21, 21, 5))
  10. for key, index in grid.items():
  11. reshaped_values[key[0]][key[1]][key[2]][key[3]][key[4]] = values[index]

但在我的计算机上运行几乎需要一分钟...看起来是一个糟糕的主意。

英文:

I have a dataframe with columns x0 x1 x2 x3 x4 and y0 y1 y2 y3 y4.

First ten rows:

  1. Id x0 x1 x2 x3 x4 y0 y1 y2 y3 y4
  2. 0 0 -5.0 -5.0 -5.0 -5.0 -5.0 268035854.2037072 0.94956508069182 3520.7568220782514 -412868933.038522 242572043.87727848
  3. 1 1 -5.0 -5.0 -5.0 -5.0 -4.5 268035883.40390667 0.94956508069182 3482.0382462663074 -412868933.038522 242572043.87727848
  4. 2 2 -5.0 -5.0 -5.0 -5.0 -4.0 268035901.1170006 0.94956508069182 3443.3196704543634 -412868933.038522 242572043.87727848
  5. 3 3 -5.0 -5.0 -5.0 -5.0 -3.5 268035911.8642905 0.94956508069182 3404.6010946424194 -412868933.038522 242572043.87727848
  6. 4 4 -5.0 -5.0 -5.0 -5.0 -3.0 268035918.38904288 0.94956508069182 3365.882518830476 -412868933.038522 242572043.87727848
  7. 5 5 -5.0 -5.0 -5.0 -5.0 -2.5 268035922.35671327 0.94956508069182 3327.163943018532 -412868933.038522 242572043.87727848
  8. 6 6 -5.0 -5.0 -5.0 -5.0 -2.0 268035924.7800574 0.94956508069182 3288.445367206588 -412868933.038522 242572043.87727848
  9. 7 7 -5.0 -5.0 -5.0 -5.0 -1.5 268035926.27763835 0.94956508069182 3249.726791394644 -412868933.038522 242572043.87727848
  10. 8 8 -5.0 -5.0 -5.0 -5.0 -1.0 268035927.2317166 0.94956508069182 3211.0082155827004 -412868933.038522 242572043.87727848
  11. 9 9 -5.0 -5.0 -5.0 -5.0 -0.5 268035927.8858225 0.94956508069182 3172.2896397707564 -412868933.038522 242572043.87727848

I did this:

  1. values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values
  2. values.shape

I now have shape (4084101, 5)

I would like to have shape (21, 21, 21, 21, 21, 5) (so that the first shape is x0, the second x1, like if we had a 5D graph). Basically, it should be values[1, 0, 0, 0, 0] to access the tuple (y0, y1, y2, y3, y4) corresponding to x0=-4.5, x1=-5, ..., x4=-5.

21 because values go from -5 to 5 for the x0, ..., x4 with step 0.5
and 5 because y0, y1, y2, y3, y4
I did values = values.reshape(21, 21, 21, 21, 21, 5)
But when I do values[1][0][0][0][0], I expected to have the value corresponding to x1=-4.5, x2=-5, ..., x4=-5 but I don't.

One bad idea that I had (complexity wise) was to make a dictionary in which keys are tuples (x0, x1, x2, x3, x4) and attributes the index where to find the y values.
And then fill a np.zeros((21, 21, 21, 21, 21, 5)) dataframe.

  1. # Get the values
  2. values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values
  3. # Create a dictionary to map the x0, x1, x2, x3, x4 values to indices
  4. grid = {}
  5. for i, row in df_train.iterrows():
  6. x0, x1, x2, x3, x4 = [int((x + 5) / 0.5) for x in [row['x0'], row['x1'], row['x2'], row['x3'], row['x4']]]
  7. grid[(x0, x1, x2, x3, x4)] = i
  8. # Create the reshaped array
  9. reshaped_values = np.zeros((21, 21, 21, 21, 21, 5))
  10. for key, index in grid.items():
  11. reshaped_values[key[0]][key[1]][key[2]][key[3]][key[4]] = values[index]

but it takes almost a minute on my computer ... and looks like the worst idea ever.

答案1

得分: 2

你的代码有效,但我认为你的数据框没有排序。

  1. df_train = df_train.sort_values(['x0', 'x1', 'x2', 'x3', 'x4'])
  2. values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values
英文:

Your code works but I think your dataframe is not sorted

  1. df_train = df_train.sort_values(['x0', 'x1', 'x2', 'x3', 'x4'])
  2. values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values

huangapple
  • 本文由 发表于 2023年2月10日 05:00:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75404376.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定