在 xarray 中使用非索引坐标进行数值插值。

huangapple go评论100阅读模式
英文:

Interpolating values in xarray using non-indexed coordinates

问题

我试图从Google ERA5 Reanalysis数据中获取地理坐标(单点)的时间序列。数据集如下:

  1. import xarray
  2. data = xarray.open_zarr(
  3. 'gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr/',
  4. chunks={'time': 48},
  5. consolidated=True,
  6. )
  7. print("Model wind dataset size {:.1f} TiB".format(data.nbytes/(1024**4)))
  8. print(data)

最佳的插值时间序列的方法是什么?

.selinterp这样的方法不起作用:

  1. data['cape'].interp(dict(latitude=60, longitude=20))
  2. ValueError: Dimensions {'longitude', 'latitude'} do not exist. Expected one or more of Frozen({'values': 542080, 'time': 374016})
英文:

I'm trying to fetch time series from geographical coordinates (single points) from Google ERA5 Reanalysis data. The dataset is following:

  1. import xarray
  2. data = xarray.open_zarr(
  3. 'gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr/',
  4. chunks={'time': 48},
  5. consolidated=True,
  6. )
  7. print("Model wind dataset size {:.1f} TiB".format(data.nbytes/(1024**4)))
  8. print(data)
  9. Model wind dataset size 28.0 TiB
  10. <xarray.Dataset>
  11. Dimensions: (time: 374016, values: 542080)
  12. Coordinates:
  13. depthBelowLandLayer float64 ...
  14. entireAtmosphere float64 ...
  15. latitude (values) float64 dask.array<chunksize=(542080,), meta=np.ndarray>
  16. longitude (values) float64 dask.array<chunksize=(542080,), meta=np.ndarray>
  17. number int64 ...
  18. step timedelta64[ns] ...
  19. surface float64 ...
  20. * time (time) datetime64[ns] 1979-01-01 ... 2021-08-31T23:0...
  21. valid_time (time) datetime64[ns] dask.array<chunksize=(48,), meta=np.ndarray>
  22. Dimensions without coordinates: values
  23. Data variables: (12/38)
  24. cape (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
  25. d2m (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
  26. hcc (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
  27. istl1 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
  28. istl2 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
  29. istl3 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
  30. ... ...
  31. tsn (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
  32. u10 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
  33. u100 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
  34. v10 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
  35. v100 (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
  36. z (time, values) float32 dask.array<chunksize=(48, 542080), meta=np.ndarray>
  37. Attributes:
  38. Conventions: CF-1.7
  39. GRIB_centre: ecmf
  40. GRIB_centreDescription: European Centre for Medium-Range Weather Forec...
  41. GRIB_edition: 1
  42. GRIB_subCentre: 0
  43. history: 2022-09-23T18:56 GRIB to CDM+CF via cfgrib-0.9...
  44. institution: European Centre for Medium-Range Weather Forec...
  45. pangeo-forge:inputs_hash: 5f4378143e9f42402424280b63472752da3aa79179b53b...
  46. pangeo-forge:recipe_hash: 0c3415923e347ce9dac9dc5c6d209525f4d45d799bd25b...
  47. pangeo-forge:version: 0.9.1

What is the best way to interpolate a time series from single geographical point?

The methods like .sel and interp don't work:

  1. data['cape'].interp(dict(latitude=60, longitude=20))
  2. ValueError: Dimensions {'longitude', 'latitude'} do not exist. Expected one or more of Frozen({'values': 542080, 'time': 374016})

答案1

得分: 1

selinterp 无法工作,因为数据不位于结构化、规则的网格上。如果绘制经度/纬度坐标的散点图,你将会得到:在 xarray 中使用非索引坐标进行数值插值。

你需要处理这些非结构化坐标。一种方法是使用最近邻值。你可以使用类似这样的代码来实现:

  1. import numpy as np
  2. import xarray as xr
  3. from sklearn.neighbors import NearestNeighbors
  4. class NearestInterpolator:
  5. def __init__(self, ds, x='longitude', y='latitude'):
  6. coords = np.c_[ds[x].values, ds[y].values]
  7. self.nn = NearestNeighbors().fit(coords)
  8. def interpolate(self, ds, coords, values='values'):
  9. index = self.nn.kneighbors(X=np.atleast_2d(coords), n_neighbors=1, return_distance=False).ravel()
  10. return ds.isel({values: index})
  11. ds = xr.open_zarr("gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr",
  12. chunks={'time': 48},
  13. consolidated=True)
  14. ni = NearestInterpolator(ds=ds)
  15. ds_interpolated = ni.interpolate(ds, [[2, 47], [5, 50]])
  16. ds_interpolated['d2m']

这不是一个完美的代码,因为它不理解接近0经度和接近360经度的数据在空间上是相邻的,但它能工作。如果你想进一步使用线性插值,你需要计算Delaunay三角剖分,但这对于这542 080个坐标来说可能会很昂贵。

英文:

sel and interp won't work because the data are not located on a structured, regular grid. If you scatter plot the lon/lat coordinates, you will get :在 xarray 中使用非索引坐标进行数值插值。

You have to deal with these unstructured coordinates. One way is to take the nearest neighbor value. You can do it with this kind of code for instance :

  1. import numpy as np
  2. import xarray as xr
  3. from sklearn.neighbors import NearestNeighbors
  4. class NearestInterpolator:
  5. def __init__(self, ds, x='longitude', y='latitude'):
  6. coords = np.c_[ds[x].values, ds[y].values]
  7. self.nn = NearestNeighbors().fit(coords)
  8. def interpolate(self, ds, coords, values='values'):
  9. index = self.nn.kneighbors(X=np.atleast_2d(coords), n_neighbors=1, return_distance=False).ravel()
  10. return ds.isel({values: index})
  11. ds = xr.open_zarr("gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr",
  12. chunks={'time': 48},
  13. consolidated=True)
  14. ni = NearestInterpolator(ds=ds)
  15. ds_interpolated = ni.interpolate(ds, [[2, 47], [5, 50]])
  16. ds_interpolated['d2m']
  17. >>> <xarray.DataArray 'd2m' (time: 374016, values: 2)>
  18. >>> dask.array<getitem, shape=(374016, 2), dtype=float32, chunksize=(48, 2), chunktype=numpy.ndarray>
  19. >>> Coordinates:
  20. >>> depthBelowLandLayer float64 ...
  21. >>> entireAtmosphere float64 ...
  22. >>> latitude (values) float64 dask.array<chunksize=(2,), meta=np.ndarray>
  23. >>> longitude (values) float64 dask.array<chunksize=(2,), meta=np.ndarray>
  24. >>> number int64 ...
  25. >>> step timedelta64[ns] ...
  26. >>> surface float64 ...
  27. >>> * time (time) datetime64[ns] 1979-01-01 ... 2021-08-31T23:0...
  28. >>> valid_time (time) datetime64[ns] dask.array<chunksize=(48,), meta=np.ndarray>
  29. >>> Dimensions without coordinates: values

This is not a perfect code, since it doesn't understand that near 0 longitude data and near 360 data are spatially close, but it works. If you want to go further with linear interpolation, you would have to compute a Delaunay triangulation, which can be expensive for these 542 080 coordinates.

huangapple
  • 本文由 发表于 2023年5月13日 12:10:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76241028.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定