创建带有观测和平均值的xarray数据集,该数据集具有合并的索引。

huangapple go评论81阅读模式
英文:

Create xarray Dataset with observations and averages that has combined index

问题

Here are the translated code parts:

假设我有以下包含不同位置随时间变化的观测数据的dataarray:

import numpy as np
import pandas as pd
import xarray as xr

np.random.seed(42)

data = xr.DataArray(
    np.random.randint(1, 100, (36, 3)),
    dims=("time", "location"),
    coords={
        "time": pd.date_range("2022-01-01", periods=36, freq="10D"),
        "location": ["A", "B", "C"]
    },
    name="observations"
)

现在我计算月度平均值,并将其与观测数据合并成一个数据集:

monthly_avg = data.groupby("time.month").mean()
data = data.to_dataset()
data["average"] = monthly_avg

这将给我:

创建带有观测和平均值的xarray数据集,该数据集具有合并的索引。

如何正确设置索引(如果可能的话),以便当我运行:

data.sel(time="2022-01-01")

我得到一个子集,其中包括一个时间、所有位置和一个月度平均值(对应所选时间段)?

目前当我运行这个时,我得到:

创建带有观测和平均值的xarray数据集,该数据集具有合并的索引。

返回了该时间步长的所有月度平均值。

反之,当我运行:

data.sel(month=1)

我希望得到只包含在一月份的时间步长的子集。

英文:

Suppose I have the following dataarray containing observations for different locations over time:

import numpy as np
import pandas as pd
import xarray as xr

np.random.seed(42)

data = xr.DataArray(
    np.random.randint(1,100, (36, 3)), 
    dims=("time", "location"), 
    coords={
        "time": pd.date_range("2022-01-01", periods=36, freq="10D"), 
        "location": ["A", "B", "C"]
    },
    name="observations"
)

and now I calculate the monthly average and combine it with the observations to a dataset:

monthly_avg = data.groupby("time.month").mean()
data = data.to_dataset()
data["average"] = monthly_avg

giving me

创建带有观测和平均值的xarray数据集,该数据集具有合并的索引。

How can is set the indices correctly (if possible) so when I run:

data.sel(time="2022-01-01")

I get a subset of the dataset for one time, all locations and one monthly average (which corresponds to the selected time)?

At the moment when I run this I get

创建带有观测和平均值的xarray数据集,该数据集具有合并的索引。

returning all monthly averages for the timestep.

Conversely, when I run

data.sel(month=1)

I'd like a subset with only the timesteps that are in January.

答案1

得分: 2

为了获得您想要的选择结果,我首先会计算月度平均值,并重复它们以匹配原始的时间维度。然后,我会创建一个多级索引,以便您可以选择特定日期或月份。

#设置测试数据
import numpy as np
import pandas as pd
import xarray as xr

np.random.seed(42)

data = xr.DataArray(
    np.random.randint(1,100, (36, 3)), 
    dims=("time", "location"), 
    coords={
        "time": pd.date_range("2022-01-01", periods=36, freq="10D"), 
        "location": ["A", "B", "C"]
    },
    name="observations"
)

#计算月度数组并使用列表推导重复
data=data.to_dataset()
monthly_avg = data.groupby("time.month").mean()['observations'].values
data['average']=(('time','location'),np.array([monthly_avg[i-1,:] for i in data.time.dt.month]))

#添加月份并创建多级索引
data['month']=data.time.dt.month
data=data.set_index(day_month=['time','month'])

然后,您可以运行选择以获取您想要的结果。

print(data.sel(time="2022-01-01"))
<xarray.Dataset>
Dimensions:       (location: 3, month: 1)
Coordinates:
  * location      (location) <U1 'A' 'B' 'C'
  * month         (month) int64 1
    time          <U10 '2022-01-01'
Data variables:
    observations  (month, location) int64 52 93 15
    average       (month, location) float64 70.5 82.25 33.75
print(data.sel(month=1))
<xarray.Dataset>
Dimensions:       (location: 3, time: 4)
Coordinates:
  * location      (location) <U1 'A' 'B' 'C'
  * time          (time) datetime64[ns] 2022-01-01 2022-01-11 ... 2022-01-31
    month         int64 1
Data variables:
    observations  (time, location) int64 52 93 15 72 61 21 83 87 75 75 88 24
    average       (time, location) float64 70.5 82.25 33.75 ... 70.5 82.25 33.7

这会为第二个命令提供重复的值。

也许有更好的设置多级索引的方法。您可以查看pandas多级索引文档:https://pandas.pydata.org/docs/user_guide/advanced.html,
或者查看xarray中的stack/unstack文档:https://xarray.pydata.org/en/v0.7.2/reshaping.html#stack-and-unstack,
以防您之前没有这样做过。

英文:

To get the selection return what you want, I would first compute the monthly averages and repeat them to match the original time-dimension.
Then I would create a multi-index, such that you can select either the specific date or the month.

#setup test data 
import numpy as np
import pandas as pd
import xarray as xr

np.random.seed(42)

data = xr.DataArray(
    np.random.randint(1,100, (36, 3)), 
    dims=(&quot;time&quot;, &quot;location&quot;), 
    coords={
        &quot;time&quot;: pd.date_range(&quot;2022-01-01&quot;, periods=36, freq=&quot;10D&quot;), 
        &quot;location&quot;: [&quot;A&quot;, &quot;B&quot;, &quot;C&quot;]
    },
    name=&quot;observations&quot;
)

#compute monthly array and repeat with list comprehension
data=data.to_dataset()
monthly_avg = data.groupby(&quot;time.month&quot;).mean()[&#39;observations&#39;].values
data[&#39;average&#39;]=((&#39;time&#39;,&#39;location&#39;),np.array([monthly_avg[i-1,:] for i in data.time.dt.month]))

#add month  and create multiindex
data[&#39;month&#39;]=data.time.dt.month
data=data.set_index(day_month=[&#39;time&#39;,&#39;month&#39;])

You can then run the selection to get what you want.

print(data.sel(time=&quot;2022-01-01&quot;))
&lt;xarray.Dataset&gt;
Dimensions:       (location: 3, month: 1)
Coordinates:
  * location      (location) &lt;U1 &#39;A&#39; &#39;B&#39; &#39;C&#39;
  * month         (month) int64 1
    time          &lt;U10 &#39;2022-01-01&#39;
Data variables:
    observations  (month, location) int64 52 93 15
    average       (month, location) float64 70.5 82.25 33.75
print(data.sel(month=1))
&lt;xarray.Dataset&gt;
Dimensions:       (location: 3, time: 4)
Coordinates:
  * location      (location) &lt;U1 &#39;A&#39; &#39;B&#39; &#39;C&#39;
  * time          (time) datetime64[ns] 2022-01-01 2022-01-11 ... 2022-01-31
    month         int64 1
Data variables:
    observations  (time, location) int64 52 93 15 72 61 21 83 87 75 75 88 24
    average       (time, location) float64 70.5 82.25 33.75 ... 70.5 82.25 33.7

This gives repeated values for the second command.

Maybe there is a better way to set up the multi-index.
You can have a look at the pandas multiindex documentation: https://pandas.pydata.org/docs/user_guide/advanced.html:
or into stack/unstack in xarray: https://xarray.pydata.org/en/v0.7.2/reshaping.html#stack-and-unstack
in case you haven't done so before.

huangapple
  • 本文由 发表于 2023年5月10日 18:10:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76217189.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定