英文:
Sample 2D grid in Xarray
问题
我有一个包含样本的一维数组,每个样本都有对应的x和y坐标。我想将其转换为一个二维网格,其中每个网格单元格包含落在该网格单元格中的所有样本的平均值。当然我可以手动编程来实现这个,但我有印象这可以通过多维分组来实现。
作为示例数据,我创建了一个Lissajous曲线。
我将这些数据放入一个DataArray中,并使用x
和y
坐标创建了一个MultiIndex。
my_data = <xarray.DataArray 'my_data' (time: 1200)>
array([0.000e+00, 1.000e+00, 2.000e+00, ..., 1.197e+03, 1.198e+03,
1.199e+03])
Coordinates:
h (time) float64 0.0 0.5 1.0 1.5 2.0 ... 598.0 598.5 599.0 599.5
* time (time) object MultiIndex
* x (time) float64 0.0 0.3596 0.6711 0.8929 ... 0.5044 0.7812 0.9535
* y (time) float64 1.0 0.9498 0.8041 0.5777 ... -0.6339 -0.36 -0.04993
完整的示例代码如下:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
DIM_TIME = 'time'
t = np.arange(1200.0)
da = xr.DataArray(
name='my_data',
data = t, dims=[DIM_TIME],
coords = {
'x': (DIM_TIME, np.sin(t / np.e)),
'y': (DIM_TIME, np.cos(t / np.pi)),
'h': (DIM_TIME, t/2)})
da = da.set_xindex(['x', 'y']) # 添加多重索引
print(f"\n{da.name} = {da}")
bins = [-1.0, -0.6, -0.2, 0.2, 0.6, 1.0]
binned_x = da.groupby_bins("x", bins).mean().rename("bin_x_avg")
print(f"\n{binned_x.name} = {binned_x}")
da.to_dataset().plot.scatter(x='x', y='y', hue='h')
plt.show()
# 引发 IndexError: too many indices
binned_xy = da.groupby_bins(("y", "x"), (bins, bins)).mean() # 类似于这样。
我可以很好地按一个维度进行分组(binned_x
),它给出了一个包含5个元素的一维数组。
bin_x_avg = <xarray.DataArray 'bin_x_avg' (x_bins: 5)>
array([603.20738636, 598.84431138, 600.03870968, 596.18823529,
597.48876404])
Coordinates:
* x_bins (x_bins) object (-1.0, -0.6] (-0.6, -0.2] ... (0.2, 0.6] (0.6, 1.0]
我想做类似的双向分组。它应该返回一个5x5的DataArray,类似于我代码中的最后一个语句(binned_xy
)。
在XArray中是否有可能实现这一点?
英文:
I have a 1D array of samples, each having a corresponding x and y coordinate. I want to transform this into a 2D grid where each grid cell contains the average of all samples falling in that grid cell. Of course I could program this by hand, but I've got the impression that this is possible with multidimensional grouping.
As an example data I make a Lissajous curve
I put this data in a DataArray and make a MultiIndex with x
and y
coordinates.
my_data = <xarray.DataArray 'my_data' (time: 1200)>
array([0.000e+00, 1.000e+00, 2.000e+00, ..., 1.197e+03, 1.198e+03,
1.199e+03])
Coordinates:
h (time) float64 0.0 0.5 1.0 1.5 2.0 ... 598.0 598.5 599.0 599.5
* time (time) object MultiIndex
* x (time) float64 0.0 0.3596 0.6711 0.8929 ... 0.5044 0.7812 0.9535
* y (time) float64 1.0 0.9498 0.8041 0.5777 ... -0.6339 -0.36 -0.04993
The full example code is as follows:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
DIM_TIME = 'time'
t = np.arange(1200.0)
da = xr.DataArray(
name='my_data',
data = t, dims=[DIM_TIME],
coords = {
'x': (DIM_TIME, np.sin(t / np.e)),
'y': (DIM_TIME, np.cos(t / np.pi)),
'h': (DIM_TIME, t/2)})
da = da.set_xindex(['x', 'y']) # Add multi index
print(f"\n{da.name} = {da}")
bins = [-1.0, -0.6, -0.2, 0.2, 0.6, 1.0]
binned_x = da.groupby_bins("x", bins).mean().rename("bin_x_avg")
print(f"\n{binned_x.name} = {binned_x}")
da.to_dataset().plot.scatter(x='x', y='y', hue='h')
plt.show()
# Raises IndexError: too many indices
binned_xy = da.groupby_bins(("y", "x"), (bins, bins)).mean() # Something like this.
I can group-by one dimension just fine (binned_x
), it gives a 1D array with 5 elements.
bin_x_avg = <xarray.DataArray 'bin_x_avg' (x_bins: 5)>
array([603.20738636, 598.84431138, 600.03870968, 596.18823529,
597.48876404])
Coordinates:
* x_bins (x_bins) object (-1.0, -0.6] (-0.6, -0.2] ... (0.2, 0.6] (0.6, 1.0]
I would like to do something similar that bins in two dimensions. It should return a 5 by 5 DataArray. Something like the last statement in my code (binned_xy
).
Is this somehow possible in XArray?
答案1
得分: 1
你可以使用flox:
import flox.xarray
result_raw = flox.xarray.xarray_reduce(
da,
da.x,
da.y,
func="mean",
expected_groups=(bins, bins),
isbin=[True, True],
method="map-reduce",
)
print(result_raw)
<xarray.DataArray 'my_data' (x_bins: 5, y_bins: 5)>
array([[602.05454545, 610.55769231, 597.79545455, 613.41666667,
598.03061224],
[612.52941176, 600.84210526, 562.61538462, 640.25 ,
586.64705882],
[582.6744186 , 614.19230769, 630.9375 , 591.19230769,
602.63636364],
[601.26923077, 569.52173913, 640.75 , 507.52631579,
614.73076923],
[604.28 , 615.91666667, 584.89130435, 593.90740741,
590.16666667]])
Coordinates:
* x_bins (x_bins) object (-1.0, -0.6] (-0.6, -0.2] ... (0.2, 0.6] (0.6, 1.0]
* y_bins (y_bins) object (-1.0, -0.6] (-0.6, -0.2] ... (0.2, 0.6] (0.6, 1.0]
如果你想要数值坐标:
x_bin_center = [b.mid for b in result.x_bins.values]
y_bin_center = [b.mid for b in result.y_bins.values]
result = result_raw.assign_coords(
x_bin_center=("x_bins", x_bin_center), y_bin_center=("y_bins", y_bin_center)
).swap_dims(x_bins="x_bin_center", y_bins="y_bin_center")
print(result)
<xarray.DataArray 'my_data' (x_bin_center: 5, y_bin_center: 5)>
array([[602.05454545, 610.55769231, 597.79545455, 613.41666667,
598.03061224],
[612.52941176, 600.84210526, 562.61538462, 640.25 ,
586.64705882],
[582.6744186 , 614.19230769, 630.9375 , 591.19230769,
602.63636364],
[601.26923077, 569.52173913, 640.75 , 507.52631579,
614.73076923],
[604.28 , 615.91666667, 584.89130435, 593.90740741,
590.16666667]])
Coordinates:
x_bins (x_bin_center) object (-1.0, -0.6] (-0.6, -0.2] ... (0.6, 1.0]
y_bins (y_bin_center) object (-1.0, -0.6] (-0.6, -0.2] ... (0.6, 1.0]
* x_bin_center (x_bin_center) float64 -0.8 -0.4 0.0 0.4 0.8
* y_bin_center (y_bin_center) float64 -0.8 -0.4 0.0 0.4 0.8
英文:
You could use flox:
import flox.xarray
result_raw = flox.xarray.xarray_reduce(
da,
da.x,
da.y,
func="mean",
expected_groups=(bins, bins),
isbin=[True, True],
method="map-reduce",
)
print(result_raw)
<xarray.DataArray 'my_data' (x_bins: 5, y_bins: 5)>
array([[602.05454545, 610.55769231, 597.79545455, 613.41666667,
598.03061224],
[612.52941176, 600.84210526, 562.61538462, 640.25 ,
586.64705882],
[582.6744186 , 614.19230769, 630.9375 , 591.19230769,
602.63636364],
[601.26923077, 569.52173913, 640.75 , 507.52631579,
614.73076923],
[604.28 , 615.91666667, 584.89130435, 593.90740741,
590.16666667]])
Coordinates:
* x_bins (x_bins) object (-1.0, -0.6] (-0.6, -0.2] ... (0.2, 0.6] (0.6, 1.0]
* y_bins (y_bins) object (-1.0, -0.6] (-0.6, -0.2] ... (0.2, 0.6] (0.6, 1.0]
and if you want numeric coordinates:
x_bin_center = [b.mid for b in result.x_bins.values]
y_bin_center = [b.mid for b in result.y_bins.values]
result = result_raw.assign_coords(
x_bin_center=("x_bins", x_bin_center), y_bin_center=("y_bins", y_bin_center)
).swap_dims(x_bins="x_bin_center", y_bins="y_bin_center")
print(result)
<xarray.DataArray 'my_data' (x_bin_center: 5, y_bin_center: 5)>
array([[602.05454545, 610.55769231, 597.79545455, 613.41666667,
598.03061224],
[612.52941176, 600.84210526, 562.61538462, 640.25 ,
586.64705882],
[582.6744186 , 614.19230769, 630.9375 , 591.19230769,
602.63636364],
[601.26923077, 569.52173913, 640.75 , 507.52631579,
614.73076923],
[604.28 , 615.91666667, 584.89130435, 593.90740741,
590.16666667]])
Coordinates:
x_bins (x_bin_center) object (-1.0, -0.6] (-0.6, -0.2] ... (0.6, 1.0]
y_bins (y_bin_center) object (-1.0, -0.6] (-0.6, -0.2] ... (0.6, 1.0]
* x_bin_center (x_bin_center) float64 -0.8 -0.4 0.0 0.4 0.8
* y_bin_center (y_bin_center) float64 -0.8 -0.4 0.0 0.4 0.8
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论