将xarray的维度转换为纬度和经度

huangapple go评论69阅读模式
英文:

Convert xarray dimensions to latitude and longitude

问题

我正在使用xarray查看卫星netcdf格式数据,但首先需要将维度从scanline(与卫星扫描方向对应的y索引)和ground-pixel(与扫描方向相邻的x索引对应)转换为纬度和经度。纬度和经度维度当前定义为坐标,格式如下:latitude(scanline, ground_pixel)。如何将它们转换为纬度(latitude)和经度(longitude)的维度呢?我想要能够使用经纬度坐标进行xarray的绘图和查询操作。

这是xarray的示例图片。我还没有能够复制出这种数据格式的简单示例,其中为纬度和经度坐标定义了两个维度。

将xarray的维度转换为纬度和经度

地理坐标中的纬度和经度可以使用以下方式找到:ds.latitude.values 和 ds.longitude.values,但它们被分割成了scanline和ground-pixel数组。我认为我需要将它们合并成一个纬度/经度的单一列表。

英文:

I am looking at satellite netcdf format data using xarray, but I first need to convert the dimensions from scanline (the y index corresponding with the satellite scan direction) and ground-pixel (the x index corresponding with the direction adjacent to the scan direction) to latitude and longitude. The latitude and longitude dimensions are currently defined as coordinates in the format: latitude(scanline, ground_pixel). How can I convert these into dimensions of latitude(latitude) and longitude(longitude)? I'd like to be able to plot and query the xarray using lat lon coordinates and xarray query/plotting functions.

Here's a picture of the xarray. I've not yet been able to reproduce a simple example of this data format, with the two dimensions defined for the latitude and longitude coordinates.

将xarray的维度转换为纬度和经度

The latitude and longitudes in geographical coordinates can be found using: ds.latitude.values and ds.longitude.values, but these are subset into the scanline and ground-pixel arrays. I think I need to collapse these into a single list of latitudes/longitudes.

答案1

得分: 1

根据您对数据的描述,看起来这是一组卫星(或传感器扫描线)的观测数据,这些数据报告了每颗卫星在给定通道中的所有像素的情况。也许每个188个扫描线在当天可能都有多达109个范围内的像素。它们本质上是较大网格中的小圆圈或斑点,每个斑点由扫描线ID索引。

由于这是一个如此小的数据集,将其转换为网格的最简单方法可能是使用Pandas来对像素进行分组,然后再转换回Xarray。以下代码将返回每个纬度/经度观测的平均值:

gridwise_mean = (
    ds_subset.to_dataframe()
    .dropna(how="all")
    .groupby(["latitude", "longitude"])
    .methane_mixing_ratio_bias_corrected
    .mean()
    .to_xarray()
)

请注意,这将返回一个nLons x nLats数组。如果您对所有像素有良好的覆盖,并且纬度/经度确实位于规则网格上,那么这将很可能是一个相当合理的结果,可以使用gridwise_mean.plot()绘制一个漂亮的平均观测的图。

警告: 如果您的纬度/经度不在规则网格上,这可能会消耗大量内存。在最坏的情况下,如果每个数据点都有唯一的纬度/经度值,结果将是(188 * 109)^ 2 = 4.2亿个数据点,约为3.1 GB,每个纬度/经度对只有一个非NaN数据点。如果您在更多数据点上使用此方法,它会迅速变得更大。

要诊断是否存在这样的问题,您可以首先计算数据集中唯一纬度和经度的数量,例如:np.unique(ds_subset.latitude),并确保唯一纬度和经度的数量的乘积是一个相对较小的数字,并且远小于原始数据集中的总数据点数。

其他摘要统计信息,如计数、标准差、最小值和最大值,也可能很有用,因此您可以使用以下代码计算多个摘要统计信息:

gridwise_summary = (
    ds_subset.to_dataframe()
    .groupby(["latitude", "longitude"])
    .methane_mixing_ratio_bias_corrected
    .agg(["mean", "count", "std", "max", "min"])
    .to_xarray()
)

这将返回一个xr.Dataset,其中变量是上述减少的结果,可以通过例如gridwise_summary["max"]来访问。

英文:

Given your description of the data, it seems like the data is observational data for a set of satellites (or sensor passes/scanlines) which are reported for all pixels where each satellite is in range for a given pass. Maybe each of the 188 scanlines had as many as 109 pixels within range on that day. They're essentially little circles or blobs within the larger grid, with each blob indexed by the scanline ID.

Since this is such a small dataset, the easiest way to convert this to a grid would probably be to drop into pandas to group on pixels and then convert back to xarray. The following will return the mean value observed for each latitude/longitude observation:

gridwise_mean = (
    ds_subset.to_dataframe()
    .dropna(how="all")
    .groupby(["latitude", "longitude"])
    .methane_mixing_ratio_bias_corrected
    .mean()
    .to_xarray()
)

Note that this will return a nLons x nLats array. If you have good coverage of all pixels, and the latitude/longitudes are truly on a regular grid, then this will likely be a pretty reasonable result to work with, and plotting a colormesh with e.g. gridwise_mean.plot() should return a nice plot of the average observation for each pixel.

> Warning: If your latitudes/longitudes are not on a regular grid, this could explode your memory. At worst, if each data point has a unique lat/lon value attached, the result would be (188 * 109) ^ 2 = 420 million points, or about 3.1 GB, with only one non-NaN data point per lat/lon pair. This gets larger fast if you use this method on a larger number of points.
>
> To diagnose whether you have such an issue, you could first compute the number of unique latitudes and longitudes in the dataset with e.g. np.unique(ds_subset.latitude) and make sure the product of the number of unique lats and lons is a reasonably small number, and is much smaller than the total number of points in the original dataset.

Other summary stats such as the count, std. dev., min, and max might also be useful to know, so you could compute multiple summary statistics with:

gridwise_summary = (
    ds_subset.to_dataframe()
    .groupby(["latitude", "longitude"])
    .methane_mixing_ratio_bias_corrected
    .agg(["mean", "count", "std", "max", "min"])
    .to_xarray()
)

This will return an xr.Dataset where the variables are the above reductions, and can be accessed with e.g. gridwise_summary["max"].

答案2

得分: 0

数据位于不规则网格上,我使用以下方法重新调整它(有关更多信息,请参阅XESMF文档:https://xesmf.readthedocs.io/en/latest/notebooks/Pure_numpy.html):

%%time
import xesmf as xe

lats = ds_subset.latitude.values
lons = ds_subset.longitude.values

# 创建纬度和经度的规则网格数组
# 我只是用100来快速测试方法,需要更大的值来更好地表示原始数据的分辨率

grid_lats = np.linspace(lats.min(), lats.max(), 100)
grid_lons = np.linspace(lons.min(), lons.max(), 100)

# 创建数据将要重新调整到的网格
new_grid = xr.Dataset({'lat':(['lat'],grid_lats), 'lon':(['lon'],grid_lons)})

# 如果纬度和/或经度维度不规则,请使用periodic=False。我发现最近邻方法效果最好。双线性插值会丢失数据。
regridder = xe.Regridder(ds_subset,new_grid, 'nearest_s2d', periodic=False,)

# 重新调整数据
ds_new = regridder(ds_subset)

ds_new
英文:

The data was on an irregular grid and I used the following method to regrid it (see the XESMF docs for further info: https://xesmf.readthedocs.io/en/latest/notebooks/Pure_numpy.html):

%%time
import xesmf as xe

lats = ds_subset.latitude.values
lons = ds_subset.longitude.values

# creating the lat and lon regular grid arrays
# I've just used 100 to test the method quickly, a larger value is required to best represent the resolution of the original data

grid_lats = np.linspace(lats.min(), lats.max(), 100)
grid_lons = np.linspace(lons.min(), lons.max(), 100)

# make the grid that the data will be regridded to
new_grid = xr.Dataset({'lat':(['lat'],grid_lats), 'lon':(['lon'],grid_lons)})

# use periodic=False if either or both the lat and lon dimensions are not regular. I found that the nearest neighbour method works the best. There's data loss with bilinear.
regridder = xe.Regridder(ds_subset,new_grid, 'nearest_s2d', periodic=False,)

# regrid the data
ds_new = regridder(ds_subset)

ds_new

huangapple
  • 本文由 发表于 2023年3月4日 02:06:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75630488.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定