Sample 2D grid in Xarray

huangapple go评论59阅读模式
英文:

Sample 2D grid in Xarray

问题

我有一个包含样本的一维数组,每个样本都有对应的x和y坐标。我想将其转换为一个二维网格,其中每个网格单元格包含落在该网格单元格中的所有样本的平均值。当然我可以手动编程来实现这个,但我有印象这可以通过多维分组来实现。

作为示例数据,我创建了一个Lissajous曲线。

Sample 2D grid in Xarray

我将这些数据放入一个DataArray中,并使用xy坐标创建了一个MultiIndex。

my_data = <xarray.DataArray 'my_data' (time: 1200)>
array([0.000e+00, 1.000e+00, 2.000e+00, ..., 1.197e+03, 1.198e+03,
       1.199e+03])
Coordinates:
    h        (time) float64 0.0 0.5 1.0 1.5 2.0 ... 598.0 598.5 599.0 599.5
  * time     (time) object MultiIndex
  * x        (time) float64 0.0 0.3596 0.6711 0.8929 ... 0.5044 0.7812 0.9535
  * y        (time) float64 1.0 0.9498 0.8041 0.5777 ... -0.6339 -0.36 -0.04993

完整的示例代码如下:

import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

DIM_TIME = 'time'

t = np.arange(1200.0)

da = xr.DataArray(
    name='my_data',
    data = t, dims=[DIM_TIME],
    coords = {
        'x': (DIM_TIME, np.sin(t / np.e)),
        'y': (DIM_TIME, np.cos(t / np.pi)),
        'h': (DIM_TIME, t/2)})

da = da.set_xindex(['x', 'y'])  # 添加多重索引
print(f"\n{da.name} = {da}")

bins = [-1.0, -0.6, -0.2,  0.2, 0.6, 1.0]
binned_x = da.groupby_bins("x", bins).mean().rename("bin_x_avg")
print(f"\n{binned_x.name} = {binned_x}")

da.to_dataset().plot.scatter(x='x', y='y', hue='h')
plt.show()

# 引发 IndexError: too many indices
binned_xy = da.groupby_bins(("y", "x"), (bins, bins)).mean()  # 类似于这样。

我可以很好地按一个维度进行分组(binned_x),它给出了一个包含5个元素的一维数组。

bin_x_avg = <xarray.DataArray 'bin_x_avg' (x_bins: 5)>
array([603.20738636, 598.84431138, 600.03870968, 596.18823529,
       597.48876404])
Coordinates:
  * x_bins   (x_bins) object (-1.0, -0.6] (-0.6, -0.2] ... (0.2, 0.6] (0.6, 1.0]

我想做类似的双向分组。它应该返回一个5x5的DataArray,类似于我代码中的最后一个语句(binned_xy)。

在XArray中是否有可能实现这一点?

英文:

I have a 1D array of samples, each having a corresponding x and y coordinate. I want to transform this into a 2D grid where each grid cell contains the average of all samples falling in that grid cell. Of course I could program this by hand, but I've got the impression that this is possible with multidimensional grouping.

As an example data I make a Lissajous curve

Sample 2D grid in Xarray

I put this data in a DataArray and make a MultiIndex with x and y coordinates.

my_data = &lt;xarray.DataArray &#39;my_data&#39; (time: 1200)&gt;
array([0.000e+00, 1.000e+00, 2.000e+00, ..., 1.197e+03, 1.198e+03,
       1.199e+03])
Coordinates:
    h        (time) float64 0.0 0.5 1.0 1.5 2.0 ... 598.0 598.5 599.0 599.5
  * time     (time) object MultiIndex
  * x        (time) float64 0.0 0.3596 0.6711 0.8929 ... 0.5044 0.7812 0.9535
  * y        (time) float64 1.0 0.9498 0.8041 0.5777 ... -0.6339 -0.36 -0.04993

The full example code is as follows:

import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

DIM_TIME = &#39;time&#39;

t = np.arange(1200.0)

da = xr.DataArray(
    name=&#39;my_data&#39;,
    data = t, dims=[DIM_TIME],
    coords = {
        &#39;x&#39;: (DIM_TIME, np.sin(t / np.e)),
        &#39;y&#39;: (DIM_TIME, np.cos(t / np.pi)),
        &#39;h&#39;: (DIM_TIME, t/2)})

da = da.set_xindex([&#39;x&#39;, &#39;y&#39;])  # Add multi index
print(f&quot;\n{da.name} = {da}&quot;)

bins = [-1.0, -0.6, -0.2,  0.2, 0.6, 1.0]
binned_x = da.groupby_bins(&quot;x&quot;, bins).mean().rename(&quot;bin_x_avg&quot;)
print(f&quot;\n{binned_x.name} = {binned_x}&quot;)

da.to_dataset().plot.scatter(x=&#39;x&#39;, y=&#39;y&#39;, hue=&#39;h&#39;)
plt.show()

# Raises IndexError: too many indices
binned_xy = da.groupby_bins((&quot;y&quot;, &quot;x&quot;), (bins, bins)).mean() # Something like this.

I can group-by one dimension just fine (binned_x), it gives a 1D array with 5 elements.

bin_x_avg = &lt;xarray.DataArray &#39;bin_x_avg&#39; (x_bins: 5)&gt;
array([603.20738636, 598.84431138, 600.03870968, 596.18823529,
       597.48876404])
Coordinates:
  * x_bins   (x_bins) object (-1.0, -0.6] (-0.6, -0.2] ... (0.2, 0.6] (0.6, 1.0]

I would like to do something similar that bins in two dimensions. It should return a 5 by 5 DataArray. Something like the last statement in my code (binned_xy).

Is this somehow possible in XArray?

答案1

得分: 1

你可以使用flox

import flox.xarray

result_raw = flox.xarray.xarray_reduce(
    da,
    da.x,
    da.y,
    func="mean",
    expected_groups=(bins, bins),
    isbin=[True, True],
    method="map-reduce",
)
print(result_raw)

<xarray.DataArray 'my_data' (x_bins: 5, y_bins: 5)>
array([[602.05454545, 610.55769231, 597.79545455, 613.41666667,
        598.03061224],
       [612.52941176, 600.84210526, 562.61538462, 640.25      ,
        586.64705882],
       [582.6744186 , 614.19230769, 630.9375    , 591.19230769,
        602.63636364],
       [601.26923077, 569.52173913, 640.75      , 507.52631579,
        614.73076923],
       [604.28      , 615.91666667, 584.89130435, 593.90740741,
        590.16666667]])
Coordinates:
  * x_bins   (x_bins) object (-1.0, -0.6] (-0.6, -0.2] ... (0.2, 0.6] (0.6, 1.0]
  * y_bins   (y_bins) object (-1.0, -0.6] (-0.6, -0.2] ... (0.2, 0.6] (0.6, 1.0]

如果你想要数值坐标:

x_bin_center = [b.mid for b in result.x_bins.values]
y_bin_center = [b.mid for b in result.y_bins.values]

result = result_raw.assign_coords(
    x_bin_center=("x_bins", x_bin_center), y_bin_center=("y_bins", y_bin_center)
).swap_dims(x_bins="x_bin_center", y_bins="y_bin_center")
print(result)

<xarray.DataArray 'my_data' (x_bin_center: 5, y_bin_center: 5)>
array([[602.05454545, 610.55769231, 597.79545455, 613.41666667,
        598.03061224],
       [612.52941176, 600.84210526, 562.61538462, 640.25      ,
        586.64705882],
       [582.6744186 , 614.19230769, 630.9375    , 591.19230769,
        602.63636364],
       [601.26923077, 569.52173913, 640.75      , 507.52631579,
        614.73076923],
       [604.28      , 615.91666667, 584.89130435, 593.90740741,
        590.16666667]])
Coordinates:
    x_bins        (x_bin_center) object (-1.0, -0.6] (-0.6, -0.2] ... (0.6, 1.0]
    y_bins        (y_bin_center) object (-1.0, -0.6] (-0.6, -0.2] ... (0.6, 1.0]
  * x_bin_center  (x_bin_center) float64 -0.8 -0.4 0.0 0.4 0.8
  * y_bin_center  (y_bin_center) float64 -0.8 -0.4 0.0 0.4 0.8
英文:

You could use flox:

import flox.xarray


result_raw = flox.xarray.xarray_reduce(
    da,
    da.x,
    da.y,
    func=&quot;mean&quot;,
    expected_groups=(bins, bins),
    isbin=[True, True],
    method=&quot;map-reduce&quot;,
)
print(result_raw)

&lt;xarray.DataArray &#39;my_data&#39; (x_bins: 5, y_bins: 5)&gt;
array([[602.05454545, 610.55769231, 597.79545455, 613.41666667,
        598.03061224],
       [612.52941176, 600.84210526, 562.61538462, 640.25      ,
        586.64705882],
       [582.6744186 , 614.19230769, 630.9375    , 591.19230769,
        602.63636364],
       [601.26923077, 569.52173913, 640.75      , 507.52631579,
        614.73076923],
       [604.28      , 615.91666667, 584.89130435, 593.90740741,
        590.16666667]])
Coordinates:
  * x_bins   (x_bins) object (-1.0, -0.6] (-0.6, -0.2] ... (0.2, 0.6] (0.6, 1.0]
  * y_bins   (y_bins) object (-1.0, -0.6] (-0.6, -0.2] ... (0.2, 0.6] (0.6, 1.0]

and if you want numeric coordinates:

x_bin_center = [b.mid for b in result.x_bins.values]
y_bin_center = [b.mid for b in result.y_bins.values]

result = result_raw.assign_coords(
    x_bin_center=(&quot;x_bins&quot;, x_bin_center), y_bin_center=(&quot;y_bins&quot;, y_bin_center)
).swap_dims(x_bins=&quot;x_bin_center&quot;, y_bins=&quot;y_bin_center&quot;)
print(result)

&lt;xarray.DataArray &#39;my_data&#39; (x_bin_center: 5, y_bin_center: 5)&gt;
array([[602.05454545, 610.55769231, 597.79545455, 613.41666667,
        598.03061224],
       [612.52941176, 600.84210526, 562.61538462, 640.25      ,
        586.64705882],
       [582.6744186 , 614.19230769, 630.9375    , 591.19230769,
        602.63636364],
       [601.26923077, 569.52173913, 640.75      , 507.52631579,
        614.73076923],
       [604.28      , 615.91666667, 584.89130435, 593.90740741,
        590.16666667]])
Coordinates:
    x_bins        (x_bin_center) object (-1.0, -0.6] (-0.6, -0.2] ... (0.6, 1.0]
    y_bins        (y_bin_center) object (-1.0, -0.6] (-0.6, -0.2] ... (0.6, 1.0]
  * x_bin_center  (x_bin_center) float64 -0.8 -0.4 0.0 0.4 0.8
  * y_bin_center  (y_bin_center) float64 -0.8 -0.4 0.0 0.4 0.8

huangapple
  • 本文由 发表于 2023年7月23日 18:05:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76747664.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定