2023年7月13日 19:27:18go评论78阅读模式

英文:

Python - Converting a dataframe with columns x, y and a variable "A" into a netCDF file

问题

我的（简化的）数据结构如下：

x = [1,1,2,2,3,3,4,4,...n,n]

y = [1,2,1,2,1,2,1,2,...1,2]

A = [7,5,6,5,4,6,2,5,...4,3]

"A" 是一个与坐标 x 和 y 关联的变量。数据框包含三列。变量最初是从上到下读取的。从 x = 1 和 y = 1 开始，向下移动到 y = 最大值，然后 x = 2，y 从 1 移动到 y_max，依此类推。因此，这是二维数据，"变量 A" 的每个值在我的数据框中的同一行具有 x 和 y 的坐标值。

然而，当我直接将其转换为 netCDF 时：

Data.to_netcdf("filename.nc")

我得到大量的 x 和 y 变量（维度最终成为从 1 到 n 的索引）。例如，如果我的 x 坐标从 1 到 5，如 1,1,1,2,2,2,3,3,3,4,4,4,5,5,5，那么 netCDF 将有 15 个 x 坐标，而我希望只有 5 个。y 坐标也是相同的情况。我尝试了许多其他方法，但最终没有得到有用的结果。

我希望得到一个 netCDF，其中 "A" 是一个变量，而 x 和 y 是维度，但它们不会多次重复。我的真实数据集有超过一百个 x 值和将近一百个 y 值。因此，每个 x 值都会重复 y 次，反之亦然。

编辑：

这是回答者 @mozway 请求的原始代码：

import pandas as pd

S_2017 = pd.read_csv("S_2017.csv")

EachValue = []
for i in range(124):
    Lon_min = 19.3 + i*0.1
    Lon_max = Lon_min + 0.1
    for j in range(45):
        S_2017_Analyze = S_2017
        Lat_max = 64.2 - j*0.1
        Lat_min = Lat_max - 0.1
        S_2017_Analyze = S_2017_Analyze[S_2017_Analyze.iloc[:,1] >= Lon_min]
        S_2017_Analyze = S_2017_Analyze[S_2017_Analyze.iloc[:,1] <= Lon_max]
        S_2017_Analyze = S_2017_Analyze[S_2017_Analyze.iloc[:,2] >= Lat_min]
        S_2017_Analyze = S_2017_Analyze[S_2017_Analyze.iloc[:,2] <= Lat_max]
        S_Sum_2017 = S_2017_Analyze.iloc[:,3].sum()
        Pixel_S_2017 = [round(Lat_min,2),round(Lon_min,2),S_Sum_2017]
        EachValue.append(Pixel_S_2017)
DataFrame = pd.DataFrame(EachValue,columns=["Latitude","Longitude","S_Sum_2017"])

这是 @mozway 提供的解决方案，我已经应用：

import xarray as xr

S_2017 = pd.DataFrame({'Lat':S_2017.iloc[:,0]
                       'Lon':S_2017.iloc[:,1]
                       'Variable':S_2017.iloc[:,2]
                       })
xr.Dataset.from_dataframe(S_2017.set_index(["Latitude","Longitude"])).to_netcdf("S_2017.nc")

如果你需要更多的帮助，请随时告诉我。

英文:

My (simplified) data structure is as follows:

> x = [1,1,2,2,3,3,4,4,...n,n]

> y = [1,2,1,2,1,2,1,2,...1,2]

> A = [7,5,6,5,4,6,2,5,...4,3]

"A" is a variable which is linked to coordinates x and y. Dataframe consists of three columns. The variables are being read originally top down. Starting with x = 1 and y = 1, going down to y = max and after that x = 2, y from 1 to y_max -> next x = 3 and so on. So, this is 2 dimensional data, each value of "variable A" has a coordinate value of x and y in the same row in my dataframe.

However when I convert this directly to netCDF with

> Data.to_netcdf("filename.nc")

I get massive amount of x and y variables (dimension ends up being an index from 1 to n). For example if my x coordinate goes from 1 to 5 like 1,1,1,2,2,2,3,3,3,4,4,4,5,5,5 the netCDF will have 15 x -coordinates while I would like it to only have 5 of them. And same happens with the y -coordinates. I have tried many other approaches but I do not end up with anything useful.

I would like to have a netCDF with "A" as a variable and x and y as dimensions without them being repeated multiple times. My real dataset has more than a hundred x values and nearly a hundred y values. So every x value is repeated y times and vice versa.

Edit:

Here was the original code as requested by the answer giver @mozway

import pandas as pd

S_2017 = pd.read_csv(&quot;S_2017.csv&quot;)

EachValue = []
for i in range(124):
    Lon_min = 19.3 + i*0.1
    Lon_max = Lon_min + 0.1
    for j in range(45):
        S_2017_Analyze = S_2017
        Lat_max = 64.2 - j*0.1
        Lat_min = Lat_max - 0.1
        S_2017_Analyze = S_2017_Analyze[S_2017_Analyze.iloc[:,1] &gt;= Lon_min]
        S_2017_Analyze = S_2017_Analyze[S_2017_Analyze.iloc[:,1] &lt;= Lon_max]
        S_2017_Analyze = S_2017_Analyze[S_2017_Analyze.iloc[:,2] &gt;= Lat_min]
        S_2017_Analyze = S_2017_Analyze[S_2017_Analyze.iloc[:,2] &lt;= Lat_max]
        S_Sum_2017 = S_2017_Analyze.iloc[:,3].sum()
        Pixel_S_2017 = [round(Lat_min,2),round(Lon_min,2),S_Sum_2017]
        EachValue.append(Pixel_S_2017)
DataFrame = pd.DataFrame(EachValue,columns=[&quot;Latitude&quot;,&quot;Longitude&quot;,&quot;S_Sum_2017&quot;])

And here is the solution by @mozway which I applied

import xarray as xr 

S_2017 = pd.DataFrame({&#39;Lat&#39;:S_2017.iloc[:,0]
                       &#39;Lon&#39;:S_2017.iloc[:,1]
                       &#39;Variable&#39;:S_2017.iloc[:,2]
                       })
xr.Dataset.from_dataframe(S_2017.set_index([&quot;Latitude&quot;,&quot;Longitude&quot;])).to_netcdf(&quot;S_2017.nc&quot;)

答案1

得分: 1

IIUC，您可以将x/y设置为索引，将其转换为xarray，然后再转换为netCDF：

import pandas as pd
import xarray as xr

df = pd.DataFrame({'x': [1,1,2,2,3,3,4,4],
                   'y': [1,2,1,2,1,2,1,2],
                   'A': [7,5,6,5,4,6,2,5],
                   })

xr.Dataset.from_dataframe(df.set_index(['x', 'y'])).to_netcdf('filename.nc')

数据集（Dataset）：

<xarray.Dataset>
Dimensions:  (x: 4, y: 2)
Coordinates:
  * x        (x) int32 1 2 3 4
  * y        (y) int32 1 2
Data variables:
    A        (x, y) int32 ...

底层的A：

array([[7, 5],
       [6, 5],
       [4, 6],
       [2, 5]])

英文:

IIUC, you could set the x/y as index, convert to xarray and then to netCDF:

import pandas as pd
import xarray as xr

df = pd.DataFrame({&#39;x&#39;: [1,1,2,2,3,3,4,4],
                   &#39;y&#39;: [1,2,1,2,1,2,1,2],
                   &#39;A&#39;: [7,5,6,5,4,6,2,5],
                   })

xr.Dataset.from_dataframe(df.set_index([&#39;x&#39;, &#39;y&#39;])).to_netcdf(&#39;filename.nc&#39;)

Dataset:

&lt;xarray.Dataset&gt;
Dimensions:  (x: 4, y: 2)
Coordinates:
  * x        (x) int32 1 2 3 4
  * y        (y) int32 1 2
Data variables:
    A        (x, y) int32 ...

Underlying A:

array([[7, 5],
       [6, 5],
       [4, 6],
       [2, 5]])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python – 将具有列 x、y 和变量 “A” 的数据框转换为 netCDF 文件

问题

答案1

Tkinter. 调整字体大小以适应窗口，通过更改窗口大小

使用循环从单个数据框中获取不同的数据框

使用另一个ndarray中定义的索引切片ndarray。

Dict in Django TemplateView throws Server Error 500, Suggested to use ListView that helps for DetailView

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论