Python:从JSON数据创建3D网格

huangapple go评论60阅读模式
英文:

Python: 3D grid from JSON data

问题

我已经 struggiling 了相当一段时间与 Python 相关的问题(或挑战)。现在我相信我应该从你们这里得到一些帮助或提示。

在附加文件中,你会找到一个 JSON 文件对象,其中包含我正在处理的数据。我使用以下代码行加载数据:

with open('json_data.json', 'r') as openfile:
    json_object = json.load(openfile)

现在,当查看 json_object 时,你会注意到许多相关信息,例如 2D 维度中的单元格数量 NumxNumy。现在,我感兴趣的关键/参数是 data 键,即 json_object['data']。这个参数是一个列表项,包含 Numx 乘以 Numy 个字典实例,每个实例都包含 X 和 Y 坐标。每个实例还包含第三维,时间维度在键 Data 中,它是一个列表项,包含 n 个时间步的值。可能很明显,但我觉得我应该提到时间戳对于每个坐标实例都是相同的。

因此,总结一下,对于每个坐标,都有一个包含值的时间序列,我想将其转换为使用 numpy 的 3D 维度网格。我应该如何做?

我尝试将 data 键转换为 Pandas DataFrame:

df = pd.json_normalize(json_object["data"], record_path=["Data"], meta=["X","Y"])

这给我一个包含与给定时间步长和坐标对应的值的 DataFrame。但然后我不知道如何继续 - 如何将其转换为 3D 网格?

然后我尝试循环每个时间戳,这样我就会得到 n 个时间步的 2D 网格。但然后我不知道如何实现时间以使其成为 3 维的。

timeStamps = [t['Time'] for t in json_object['data'][0]['Data']]
dfTimestamps = {}
for i, ts in enumerate(timeStamps):
   dfTimestamps[ts] = {}
   X = []
   Y = []
   vals = []
   for d in json_object['data']:
      X.append(d['X'])
      Y.append(d['Y'])
      vals.append(d['Data'][i]['Value'])
      dfTimestamps[ts]['X'] = X
      dfTimestamps[ts]['Y'] = Y
      dfTimestamps[ts]['Value'] = vals

编辑:
我将尝试从 JSON 文件对象中写一些示例数据。

{
   'info': {
      'Parameters': None,
      'Unit': 'mm/hour',
      'Location': 'Input geojson',
      'Point': {
         'IdPoints': 0,
         'Name': None,
         'Description': None,
         'X': 571125,
         'Y': 6225625,
         'EPSG': None,
         'Latitude': 0,
         'Longitude': 0,
         'CreatedDatetime': '0001-01-01T00:00:00',
         'OrganizationId': None,
         'ResponsibleUserId': None,
         'Id': 0
      },
      'PointId': None,
      'ParameterId': 'Rainintensity, id: 204',
      'Timezone': 'UTC',
      'DataSource': 'X-band Sabro, RadarId: {303}',
      'EPSG': '32632',
      'CreatedDateTime': '2023-07-13T13:55:18.9099301Z',
      'AllPoints': False,
      'dxdy': 250,
      'Numx': 7,
      'Numy': 6,
      'X0': 0,
      'Y0': 0,
      'MissingSteps': 0,
      'ProcessQuality': [
         {
            'qualityIndex': 0,
            'qualityDescription': 'No problem detected',
            'qualitySteps': 3241,
            'missingSteps': None,
            'fromUTC': '2023-07-10T00:00:00',
            'toUTC': '2023-07-12T06:00:00'
         }
      ]
   },
   'data': [
      {
         'X': 572125,
         'Y': 6226875,
         'Data': [
            {'Time': '2023-07-10T00:00:00', 'Value': 0},
            {'Time': '2023-07-10T00:01:00', 'Value': 0},
            {'Time': '2023-07-10T00:02:00', 'Value': 0.259},
            {'Time': '2023-07-10T00:03:00', 'Value': 0},
            {'Time': '2023-07-10T00:04:00', 'Value': 0},
            {'Time': '2023-07-10T00:05:00', 'Value': 0.321},
            {'Time': '2023-07-10T00:06:00', 'Value': 0.279},
            {'Time': '2023-07-10T00:07:00', 'Value': 0},
            {'Time': '2023-07-10T00:08:00', 'Value': 0.371},
            {'Time': '2023-07-10T00:09:00', 'Value': 0.399},
            {'Time': '2023-07-10T00:10:00', 'Value': 0.345}
            # ...
            {'Time': '2023-07-10T16:37:00', 'Value': 0.299},
            {'Time': '2023-07-10T16:38:00', 'Value': 0},
            {'Time': '2023-07-10T16:39:00', 'Value': 0}
            # ...
         ]
      },
      {
         'X': 572125,
         'Y': 6226875,
         'Data': [
            {'Time': '2023-07-10T00:00:00', 'Value': 0},
            {'Time': '2023-07-10T00:01:00', 'Value': 0},
            {'Time': '2023-07-10T00:02:00', 'Value': 0},
            {'Time': '2023-07-10T00:03:00', 'Value': 

<details>
<summary>英文:</summary>

I have been struggling with a Python related problem (or challenge) for quite some time now. And now I believe I should get some help or hints from you folks.

In the [attached file](https://aarhusvand-my.sharepoint.com/:u:/g/personal/anl_aarhusvand_dk/ESEC8xbP2OJCpLobZTRcGt8B09b_GpHzxcRJU2na1rAaJg?e=CVa0Tt) you will find a JSON file object with the data I&#39;m working with. I&#39;m loading in the data using the following lines of code:

```python
with open(&#39;json_data.json&#39;,&#39;r&#39;) as openfile:
    json_object = json.load(openfile)

Now, when looking into json_object, you will notice a lot of relevant information, such as the number of cells in a 2D dimension, Numx and Numy. Now, the key/parameter I am interested in is the data key, i.e. json_object[&#39;data&#39;]. This parameter, which is a list item, holds Numx by Numy dictionary instances, holding X and Y coordinates for each. Each instance also holds the 3rd dimension, the time dimension in the key Data, which is a list item, holding n timesteps of values. It may be obvious, but I feel I should mention that the timestamps are the same for each coordinate instance.

So, to summarize, for each coordinate, there is a timeseries with values, which I would like to convert into a 3D dimensional grid using numpy. How would I do that?

I've tried to convert the data key into a Pandas DataFrame with this:

df = pd.json_normalize(json_object[&quot;data&quot;], record_path =[&quot;Data&quot;],meta=[&quot;X&quot;,&quot;Y&quot;])

This gives me a DataFrame with values corresponding to the given timestep and coordinate.
But then I don't know how to continue - how do I turn this into a 3D grid?

Then I tried to loop over each timestamp, so I would have n timesteps of 2D grids. But then I struggle how to implement the time in order to make it 3-dimensional.

timeStamps = [t[&#39;Time&#39;] for t in data_json[&#39;data&#39;][0][&#39;Data&#39;]]
dfTimestamps = {}
for i,ts in enumerate(timeStamps):
dfTimestamps[ts] = {}
X = []
Y = []
vals = []
for d in data_json[&#39;data&#39;]:
X.append(d[&#39;X&#39;])
Y.append(d[&#39;Y&#39;])
vals.append(d[&#39;Data&#39;][i][&#39;Value&#39;])
dfTimestamps[ts][&#39;X&#39;] = X
dfTimestamps[ts][&#39;Y&#39;] = Y
dfTimestamps[ts][&#39;Value&#39;] = vals

EDIT:
I will try to write som example data from the JSON fileobject below.

{&#39;info&#39;: {&#39;Parameters&#39;: None,
&#39;Unit&#39;: &#39;mm/hour&#39;,
&#39;Location&#39;: &#39;Input geojson&#39;,
&#39;Point&#39;: {&#39;IdPoints&#39;: 0,
&#39;Name&#39;: None,
&#39;Description&#39;: None,
&#39;X&#39;: 571125,
&#39;Y&#39;: 6225625,
&#39;EPSG&#39;: None,
&#39;Latitude&#39;: 0,
&#39;Longitude&#39;: 0,
&#39;CreatedDatetime&#39;: &#39;0001-01-01T00:00:00&#39;,
&#39;OrganizationId&#39;: None,
&#39;ResponsibleUserId&#39;: None,
&#39;Id&#39;: 0},
&#39;PointId&#39;: None,
&#39;ParameterId&#39;: &#39;Rainintensity, id: 204&#39;,
&#39;Timezone&#39;: &#39;UTC&#39;,
&#39;DataSource&#39;: &#39;X-band Sabro, RadarId: {303}&#39;,
&#39;EPSG&#39;: &#39;32632&#39;,
&#39;CreatedDateTime&#39;: &#39;2023-07-13T13:55:18.9099301Z&#39;,
&#39;AllPoints&#39;: False,
&#39;dxdy&#39;: 250,
&#39;Numx&#39;: 7,
&#39;Numy&#39;: 6,
&#39;X0&#39;: 0,
&#39;Y0&#39;: 0,
&#39;MissingSteps&#39;: 0,
&#39;ProcessQuality&#39;: [{&#39;qualityIndex&#39;: 0,
&#39;qualityDescription&#39;: &#39;No problem detected&#39;,
&#39;qualitySteps&#39;: 3241,
&#39;missingSteps&#39;: None,
&#39;fromUTC&#39;: &#39;2023-07-10T00:00:00&#39;,
&#39;toUTC&#39;: &#39;2023-07-12T06:00:00&#39;}]},
&#39;data&#39;: [{&#39;X&#39;: 572125,
&#39;Y&#39;: 6226875,
&#39;Data&#39;: [{&#39;Time&#39;: &#39;2023-07-10T00:00:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:01:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:02:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:03:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:04:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:05:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:06:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:07:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:08:00&#39;, &#39;Value&#39;: 0.399},
{&#39;Time&#39;: &#39;2023-07-10T00:09:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:10:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:11:00&#39;, &#39;Value&#39;: 0},
...
{&#39;Time&#39;: &#39;2023-07-10T16:37:00&#39;, &#39;Value&#39;: 0.299},
{&#39;Time&#39;: &#39;2023-07-10T16:38:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T16:39:00&#39;, &#39;Value&#39;: 0},
...]},
{&#39;X&#39;: 572125,
&#39;Y&#39;: 6226875,
&#39;Data&#39;: [{&#39;Time&#39;: &#39;2023-07-10T00:00:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:01:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:02:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:03:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:04:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:05:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:06:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:07:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:08:00&#39;, &#39;Value&#39;: 0.399},
{&#39;Time&#39;: &#39;2023-07-10T00:09:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:10:00&#39;, &#39;Value&#39;: 0},
...
{&#39;Time&#39;: &#39;2023-07-10T16:37:00&#39;, &#39;Value&#39;: 0.299},
{&#39;Time&#39;: &#39;2023-07-10T16:38:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T16:39:00&#39;, &#39;Value&#39;: 0},
...]},
{&#39;X&#39;: 571125,
&#39;Y&#39;: 6226125,
&#39;Data&#39;: [{&#39;Time&#39;: &#39;2023-07-10T00:00:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:01:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:02:00&#39;, &#39;Value&#39;: 0.259},
{&#39;Time&#39;: &#39;2023-07-10T00:03:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:04:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:05:00&#39;, &#39;Value&#39;: 0.321},
{&#39;Time&#39;: &#39;2023-07-10T00:06:00&#39;, &#39;Value&#39;: 0.279},
{&#39;Time&#39;: &#39;2023-07-10T00:07:00&#39;, &#39;Value&#39;: 0},
{&#39;Time&#39;: &#39;2023-07-10T00:08:00&#39;, &#39;Value&#39;: 0.371},
{&#39;Time&#39;: &#39;2023-07-10T00:09:00&#39;, &#39;Value&#39;: 0.399},
{&#39;Time&#39;: &#39;2023-07-10T00:10:00&#39;, &#39;Value&#39;: 0.345},
...

So, I have in the beginning of my file, the info key that holds relevant information. The key data holds the information that I want to convert into a 3D grid. For each coordinate, there is time series instances as seen above, where I have included examples of 3 coordinate instances.

答案1

得分: 0

您的尝试

df = pd.json_normalize(json_object[&quot;data&quot;], record_path =[&quot;Data&quot;],meta=[&quot;X&quot;,&quot;Y&quot;])

基本上是正确的。然后您提出了以下问题:

如何将此转化为三维网格?

也许您的意思是一个二维值数组,但由于许多因素,包括您的数据中有许多重复项,且数据的网格覆盖不规则,这会变得复杂:

import json
import pandas as pd

with open('json_data.json') as file:
    data = json.load(file)['data']

df = pd.json_normalize(data=data, record_path='Data', meta=['X', 'Y'])
df['Time'] = pd.to_datetime(df.Time)

df = (
    df.set_index(['Y', 'X'])
    .Value.groupby(level=['Y', 'X'])
    .mean()
    .unstack(level='X')
)

如果您实际上想要三个索引(X/Y/Time),同样的问题仍然存在:

import json
import pandas as pd

with open('json_data.json') as file:
    data = json.load(file)['data']

df = pd.json_normalize(data=data, record_path='Data', meta=['X', 'Y'])
df['Time'] = pd.to_datetime(df.Time)

df = (
    df.set_index(['Y', 'X', 'Time'])
    .Value.groupby(level=['Y', 'X', 'Time'])
    .mean()
    .unstack(level='X')
)

根据您的评论,您实际上想要:

  • 一个填充NaN为零的数据框;
  • 时间作为外部索引,然后是Y;
  • X作为内部(列)索引;
  • 使用first()方法消除重复项。

这意味着:

import json
import pandas as pd

with open('json_data.json') as file:
    data = json.load(file)['data']

df = pd.json_normalize(data=data, record_path='Data', meta=['X', 'Y'])
df['Time'] = pd.to_datetime(df.Time)

Z = (
    df.set_index(['Time', 'Y', 'X'])
    .Value.groupby(level=['Time', 'Y', 'X'])
    .first()
    .unstack(level='X', fill_value=0)
)

不要使用循环。

英文:

Your attempt

df = pd.json_normalize(json_object[&quot;data&quot;], record_path =[&quot;Data&quot;],meta=[&quot;X&quot;,&quot;Y&quot;])

is basically correct. You then ask:

> how do I turn this into a 3D grid?

Perhaps you mean a 2D array of values, but this is complicated by many factors, including that you have a lot of duplicates and there is irregular grid coverage in your data:

import json
import pandas as pd

with open(&#39;json_data.json&#39;) as file:
    data = json.load(file)[&#39;data&#39;]

df = pd.json_normalize(data=data, record_path=&#39;Data&#39;, meta=[&#39;X&#39;, &#39;Y&#39;])
df[&#39;Time&#39;] = pd.to_datetime(df.Time)

df = (
    df.set_index([&#39;Y&#39;, &#39;X&#39;])
    .Value.groupby(level=[&#39;Y&#39;, &#39;X&#39;])
    .mean()
    .unstack(level=&#39;X&#39;)
)

Python:从JSON数据创建3D网格

If you actually want three indices (X/Y/Time), the same problems persist:

import json
import pandas as pd

with open(&#39;json_data.json&#39;) as file:
    data = json.load(file)[&#39;data&#39;]

df = pd.json_normalize(data=data, record_path=&#39;Data&#39;, meta=[&#39;X&#39;, &#39;Y&#39;])
df[&#39;Time&#39;] = pd.to_datetime(df.Time)

df = (
    df.set_index([&#39;Y&#39;, &#39;X&#39;, &#39;Time&#39;])
    .Value.groupby(level=[&#39;Y&#39;, &#39;X&#39;, &#39;Time&#39;])
    .mean()
    .unstack(level=&#39;X&#39;)
)

Python:从JSON数据创建3D网格

Based on your comments, what you actually want is

  • a frame where NaN are filled by zero;
  • an outer index of Time, then Y;
  • an inner (column) index of X; and
  • duplicates eliminated by first().

This means

import json
import pandas as pd

with open(&#39;json_data.json&#39;) as file:
    data = json.load(file)[&#39;data&#39;]

df = pd.json_normalize(data=data, record_path=&#39;Data&#39;, meta=[&#39;X&#39;, &#39;Y&#39;])
df[&#39;Time&#39;] = pd.to_datetime(df.Time)

Z = (
    df.set_index([&#39;Time&#39;, &#39;Y&#39;, &#39;X&#39;])
    .Value.groupby(level=[&#39;Time&#39;, &#39;Y&#39;, &#39;X&#39;])
    .first()
    .unstack(level=&#39;X&#39;, fill_value=0)
)

Don't loop.

huangapple
  • 本文由 发表于 2023年7月13日 22:24:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76680498.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定