英文:
Interpolating values in xarray using non-indexed coordinates
问题
我试图从Google ERA5 Reanalysis数据中获取地理坐标(单点)的时间序列。数据集如下:
import xarray
data = xarray.open_zarr(
'gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr/',
chunks={'time': 48},
consolidated=True,
)
print("Model wind dataset size {:.1f} TiB".format(data.nbytes/(1024**4)))
print(data)
最佳的插值时间序列的方法是什么?
像.sel
和interp
这样的方法不起作用:
data['cape'].interp(dict(latitude=60, longitude=20))
ValueError: Dimensions {'longitude', 'latitude'} do not exist. Expected one or more of Frozen({'values': 542080, 'time': 374016})
英文:
I'm trying to fetch time series from geographical coordinates (single points) from Google ERA5 Reanalysis data. The dataset is following:
import xarray
data = xarray.open_zarr(
'gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr/',
chunks={'time': 48},
consolidated=True,
)
print("Model wind dataset size {:.1f} TiB".format(data.nbytes/(1024**4)))
print(data)
Model wind dataset size 28.0 TiB
<xarray.Dataset>
Dimensions: (time: 374016, values: 542080)
Coordinates:
depthBelowLandLayer float64 ...
entireAtmosphere float64 ...
latitude (values) float64 dask.array<chunksize=(542080,), meta=np.ndarray>
longitude (values) float64 dask.array<chunksize=(542080,), meta=np.ndarray>
number int64 ...
step timedelta64[ns] ...
surface float64 ...
* time (time) datetime64[ns] 1979-01-01 ... 2021-08-31T23:0...
valid_time (time) datetime64[ns] dask.array<chunksize=(48,), meta=np.ndarray>
Dimensions without coordinates: values
Data variables: (12/38)
cape (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
d2m (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
hcc (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
istl1 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
istl2 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
istl3 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
... ...
tsn (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
u10 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
u100 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
v10 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
v100 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
z (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
Attributes:
Conventions: CF-1.7
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forec...
GRIB_edition: 1
GRIB_subCentre: 0
history: 2022-09-23T18:56 GRIB to CDM+CF via cfgrib-0.9...
institution: European Centre for Medium-Range Weather Forec...
pangeo-forge:inputs_hash: 5f4378143e9f42402424280b63472752da3aa79179b53b...
pangeo-forge:recipe_hash: 0c3415923e347ce9dac9dc5c6d209525f4d45d799bd25b...
pangeo-forge:version: 0.9.1
What is the best way to interpolate a time series from single geographical point?
The methods like .sel
and interp
don't work:
data['cape'].interp(dict(latitude=60, longitude=20))
ValueError: Dimensions {'longitude', 'latitude'} do not exist. Expected one or more of Frozen({'values': 542080, 'time': 374016})
答案1
得分: 1
sel
和 interp
无法工作,因为数据不位于结构化、规则的网格上。如果绘制经度/纬度坐标的散点图,你将会得到:
你需要处理这些非结构化坐标。一种方法是使用最近邻值。你可以使用类似这样的代码来实现:
import numpy as np
import xarray as xr
from sklearn.neighbors import NearestNeighbors
class NearestInterpolator:
def __init__(self, ds, x='longitude', y='latitude'):
coords = np.c_[ds[x].values, ds[y].values]
self.nn = NearestNeighbors().fit(coords)
def interpolate(self, ds, coords, values='values'):
index = self.nn.kneighbors(X=np.atleast_2d(coords), n_neighbors=1, return_distance=False).ravel()
return ds.isel({values: index})
ds = xr.open_zarr("gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr",
chunks={'time': 48},
consolidated=True)
ni = NearestInterpolator(ds=ds)
ds_interpolated = ni.interpolate(ds, [[2, 47], [5, 50]])
ds_interpolated['d2m']
这不是一个完美的代码,因为它不理解接近0经度和接近360经度的数据在空间上是相邻的,但它能工作。如果你想进一步使用线性插值,你需要计算Delaunay三角剖分,但这对于这542 080个坐标来说可能会很昂贵。
英文:
sel
and interp
won't work because the data are not located on a structured, regular grid. If you scatter plot the lon/lat coordinates, you will get :
You have to deal with these unstructured coordinates. One way is to take the nearest neighbor value. You can do it with this kind of code for instance :
import numpy as np
import xarray as xr
from sklearn.neighbors import NearestNeighbors
class NearestInterpolator:
def __init__(self, ds, x='longitude', y='latitude'):
coords = np.c_[ds[x].values, ds[y].values]
self.nn = NearestNeighbors().fit(coords)
def interpolate(self, ds, coords, values='values'):
index = self.nn.kneighbors(X=np.atleast_2d(coords), n_neighbors=1, return_distance=False).ravel()
return ds.isel({values: index})
ds = xr.open_zarr("gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr",
chunks={'time': 48},
consolidated=True)
ni = NearestInterpolator(ds=ds)
ds_interpolated = ni.interpolate(ds, [[2, 47], [5, 50]])
ds_interpolated['d2m']
>>> <xarray.DataArray 'd2m' (time: 374016, values: 2)>
>>> dask.array<getitem, shape=(374016, 2), dtype=float32, chunksize=(48, 2), chunktype=numpy.ndarray>
>>> Coordinates:
>>> depthBelowLandLayer float64 ...
>>> entireAtmosphere float64 ...
>>> latitude (values) float64 dask.array<chunksize=(2,), meta=np.ndarray>
>>> longitude (values) float64 dask.array<chunksize=(2,), meta=np.ndarray>
>>> number int64 ...
>>> step timedelta64[ns] ...
>>> surface float64 ...
>>> * time (time) datetime64[ns] 1979-01-01 ... 2021-08-31T23:0...
>>> valid_time (time) datetime64[ns] dask.array<chunksize=(48,), meta=np.ndarray>
>>> Dimensions without coordinates: values
This is not a perfect code, since it doesn't understand that near 0 longitude data and near 360 data are spatially close, but it works. If you want to go further with linear interpolation, you would have to compute a Delaunay triangulation, which can be expensive for these 542 080 coordinates.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论