在 xarray 中使用非索引坐标进行数值插值。

huangapple go评论58阅读模式
英文:

Interpolating values in xarray using non-indexed coordinates

问题

我试图从Google ERA5 Reanalysis数据中获取地理坐标(单点)的时间序列。数据集如下:

import xarray
data = xarray.open_zarr(
    'gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr/',
    chunks={'time': 48},
    consolidated=True,
)
print("Model wind dataset size {:.1f} TiB".format(data.nbytes/(1024**4)))
print(data)

最佳的插值时间序列的方法是什么?

.selinterp这样的方法不起作用:

data['cape'].interp(dict(latitude=60, longitude=20))

ValueError: Dimensions {'longitude', 'latitude'} do not exist. Expected one or more of Frozen({'values': 542080, 'time': 374016})
英文:

I'm trying to fetch time series from geographical coordinates (single points) from Google ERA5 Reanalysis data. The dataset is following:

import xarray
data = xarray.open_zarr(
    'gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr/',
    chunks={'time': 48},
    consolidated=True,
)
print("Model wind dataset size {:.1f} TiB".format(data.nbytes/(1024**4)))
print(data)

Model wind dataset size 28.0 TiB
<xarray.Dataset>
Dimensions:              (time: 374016, values: 542080)
Coordinates:
    depthBelowLandLayer  float64 ...
    entireAtmosphere     float64 ...
    latitude             (values) float64 dask.array<chunksize=(542080,), meta=np.ndarray>
    longitude            (values) float64 dask.array<chunksize=(542080,), meta=np.ndarray>
    number               int64 ...
    step                 timedelta64[ns] ...
    surface              float64 ...
  * time                 (time) datetime64[ns] 1979-01-01 ... 2021-08-31T23:0...
    valid_time           (time) datetime64[ns] dask.array<chunksize=(48,), meta=np.ndarray>
Dimensions without coordinates: values
Data variables: (12/38)
    cape                 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
    d2m                  (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
    hcc                  (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
    istl1                (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
    istl2                (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
    istl3                (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
    ...                   ...
    tsn                  (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
    u10                  (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
    u100                 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
    v10                  (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
    v100                 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
    z                    (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
Attributes:
    Conventions:               CF-1.7
    GRIB_centre:               ecmf
    GRIB_centreDescription:    European Centre for Medium-Range Weather Forec...
    GRIB_edition:              1
    GRIB_subCentre:            0
    history:                   2022-09-23T18:56 GRIB to CDM+CF via cfgrib-0.9...
    institution:               European Centre for Medium-Range Weather Forec...
    pangeo-forge:inputs_hash:  5f4378143e9f42402424280b63472752da3aa79179b53b...
    pangeo-forge:recipe_hash:  0c3415923e347ce9dac9dc5c6d209525f4d45d799bd25b...
    pangeo-forge:version:      0.9.1

What is the best way to interpolate a time series from single geographical point?

The methods like .sel and interp don't work:

data['cape'].interp(dict(latitude=60, longitude=20))

ValueError: Dimensions {'longitude', 'latitude'} do not exist. Expected one or more of Frozen({'values': 542080, 'time': 374016})

答案1

得分: 1

selinterp 无法工作,因为数据不位于结构化、规则的网格上。如果绘制经度/纬度坐标的散点图,你将会得到:在 xarray 中使用非索引坐标进行数值插值。

你需要处理这些非结构化坐标。一种方法是使用最近邻值。你可以使用类似这样的代码来实现:

import numpy as np
import xarray as xr
from sklearn.neighbors import NearestNeighbors

class NearestInterpolator:
    def __init__(self, ds, x='longitude', y='latitude'):
        coords = np.c_[ds[x].values, ds[y].values]
        self.nn = NearestNeighbors().fit(coords)

    def interpolate(self, ds, coords, values='values'):
        index = self.nn.kneighbors(X=np.atleast_2d(coords), n_neighbors=1, return_distance=False).ravel()
        return ds.isel({values: index})

ds = xr.open_zarr("gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr",
                      chunks={'time': 48},
                      consolidated=True)

ni = NearestInterpolator(ds=ds)
ds_interpolated = ni.interpolate(ds, [[2, 47], [5, 50]])
ds_interpolated['d2m']

这不是一个完美的代码,因为它不理解接近0经度和接近360经度的数据在空间上是相邻的,但它能工作。如果你想进一步使用线性插值,你需要计算Delaunay三角剖分,但这对于这542 080个坐标来说可能会很昂贵。

英文:

sel and interp won't work because the data are not located on a structured, regular grid. If you scatter plot the lon/lat coordinates, you will get :在 xarray 中使用非索引坐标进行数值插值。

You have to deal with these unstructured coordinates. One way is to take the nearest neighbor value. You can do it with this kind of code for instance :

import numpy as np
import xarray as xr
from sklearn.neighbors import NearestNeighbors


class NearestInterpolator:
    def __init__(self, ds, x='longitude', y='latitude'):
        coords = np.c_[ds[x].values, ds[y].values]
        self.nn = NearestNeighbors().fit(coords)

    def interpolate(self, ds, coords, values='values'):
        index = self.nn.kneighbors(X=np.atleast_2d(coords), n_neighbors=1, return_distance=False).ravel()
        return ds.isel({values: index})


ds = xr.open_zarr("gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr",
                      chunks={'time': 48},
                      consolidated=True)

ni = NearestInterpolator(ds=ds)
ds_interpolated = ni.interpolate(ds, [[2, 47], [5, 50]])
ds_interpolated['d2m']

>>> <xarray.DataArray 'd2m' (time: 374016, values: 2)>
>>> dask.array<getitem, shape=(374016, 2), dtype=float32, chunksize=(48, 2), chunktype=numpy.ndarray>
>>> Coordinates:
>>>     depthBelowLandLayer  float64 ...
>>>     entireAtmosphere     float64 ...
>>>     latitude             (values) float64 dask.array<chunksize=(2,), meta=np.ndarray>
>>>     longitude            (values) float64 dask.array<chunksize=(2,), meta=np.ndarray>
>>>     number               int64 ...
>>>     step                 timedelta64[ns] ...
>>>     surface              float64 ...
>>>   * time                 (time) datetime64[ns] 1979-01-01 ... 2021-08-31T23:0...
>>>     valid_time           (time) datetime64[ns] dask.array<chunksize=(48,), meta=np.ndarray>
>>> Dimensions without coordinates: values

This is not a perfect code, since it doesn't understand that near 0 longitude data and near 360 data are spatially close, but it works. If you want to go further with linear interpolation, you would have to compute a Delaunay triangulation, which can be expensive for these 542 080 coordinates.

huangapple
  • 本文由 发表于 2023年5月13日 12:10:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76241028.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定